Cursor 3.3, Claude Managed Agents, Gemma 4 MTP

Cursor 3.3 ships alongside Claude’s new self-improving managed agents and a 40% local inference speedup via multi-token prediction for Gemma 4 on llama.cpp.

Must read

Cursor 3.3 — New Cursor release — check changelog for agent mode, model selection, and parallel workstream changes relevant to your daily driver.
Claude adds Self-Improving Agents (Dreaming, Outcomes, Multiagent) — Dreaming lets agents self-improve from past sessions; outcomes add self-correction — directly applicable to your overnight-agent-factory pattern.
Anthropic raises Claude limits via SpaceX/xAI Colossus deal — 220,000+ GPUs secured; higher usage limits now live — your team’s Claude Code and API throughput constraints should ease.
Claude Code v2.1.133: worktree baseRef, sandbox paths — New worktree.baseRef setting changes how —worktree branches; affects your headless agent isolation setup directly.
Multi-Token Prediction for llama.cpp — Gemma 4 40% faster — Gemma 26B hits 138 tok/s on M5 Max with MTP drafting; meaningful for your local Apple Silicon coding workflows.

Tools & Frameworks

TokenSpeed: Speed-of-Light LLM Inference for Agentic Workloads

Compiler-backed inference engine outperforms TensorRT-LLM on coding agent workloads with optimised MLA for Blackwell GPUs.

Why this matters: Relevant if you route agentic workloads through self-hosted infra via LiteLLM.

How AI Agent Memory Works

Deep-dive on memory architectures for agents: what information to carry forward in each loop iteration.

Why this matters: Directly applicable to persistent memory layers in your overnight agents.

ProgramBench: Agent Software Recreation Benchmark

248,000 behavioural tests across 200 tasks challenge agents to recreate executables from docs alone — no source code.

Why this matters: Novel eval methodology for coding agents; useful for benchmarking your own agent pipelines.

Ollama v0.23.2

Removes Claude Desktop integration; /api/show latency improved ~6.7× via caching — faster VS Code and tool integrations.

Why this matters: If you use Ollama locally, the show-cache speedup helps MCP-connected editors.

LangChain-core 0.3.86 — CVE path-traversal fix

Patches CVE-2026-34070 path-traversal vulnerability in loads/dumps; upgrade recommended.

Why this matters: Security patch — check if any internal tooling depends on langchain-core.

Open Models & Local

llama.cpp b9055: MiMo V2.5 support merged

Xiaomi MiMo V2.5 (310B total / 15B active MoE, 1M context, multimodal with MTP) now runs in llama.cpp.

Why this matters: A 15B-active multimodal MoE with 1M context is interesting for local hybrid routing experiments.

ZAYA1-74B-Preview: Scaling Pretraining on AMD

Zyphra releases a 74B model pretrained entirely on AMD hardware, demonstrating non-NVIDIA training viability.

Why this matters: Watch-but-don’t-act; signals AMD ecosystem maturing for open model training.

llm-gemini 0.31 — Gemini 3.1 Flash-Lite GA

Simon Willison’s LLM plugin updated; Gemini 3.1 Flash-Lite exits preview and is now generally available.

Why this matters: Cheap, fast model option for your LiteLLM gateway routing decisions.

Anthropic NLA weights for Gemma 3 27B released

Natural Language Autoencoders translate Gemma 3’s internal representations into readable text; weights on HuggingFace.

Why this matters: Interpretability tooling you can run locally — useful for debugging agent behaviour.

Industry & Trends

DeepSeek raising at $50B valuation from China’s national AI fund

Government-backed fund investing billions; DeepSeek positioned as China’s hedge against US export controls.

Why this matters: DeepSeek models are in your local stack; signals continued investment in open-weight frontier models.

OpenAI Codex with GPT-5.5 reportedly surpassing Claude Code

Every’s team reports Codex now outperforms Claude Code after GPT-5.5 integration and app improvements.

Why this matters: Worth testing Codex against your Claude Code workflows to validate or refute.

Mozilla used Claude Mythos to find hundreds of Firefox vulnerabilities

Claude Mythos preview generated high-quality security bug reports at scale, fixing hundreds of real Firefox vulnerabilities.

Why this matters: Concrete evidence of frontier models in security auditing — applicable to your RegTech codebase.

Pragmatic Engineer: Did capacity shortages turn Anthropic hostile to devs?

Gergely Orosz covers Anthropic’s compute crunch, Amazon allowing Claude Code/Codex, and the rise of small AI-forward teams.

Why this matters: Directly relevant to your team’s experience with Claude limits and org-design thinking.

Next.js May 2026 security release — 13 advisories

Patches DoS, middleware bypass, SSRF, cache poisoning, and XSS across Next.js; includes upstream React Server Components CVE.

Why this matters: You deploy on Vercel with React — patch immediately.

Auto-curated daily by Claude Opus 4.7 from Apple ML research, Ben’s Bites, Cursor changelog, Don’t Worry About the Vase (Zvi), Every — Chain of Thought (Dan Shipper), GitHub: BerriAI/litellm, GitHub: anthropics/claude-code, GitHub: ggml-org/llama.cpp, GitHub: langchain-ai/langchain, GitHub: langchain-ai/langgraph, GitHub: ollama/ollama, Hugging Face blog, Interconnects (Nathan Lambert), Latent Space, NVIDIA developer blog, OpenAI blog, Simon Willison, TLDR AI, The Algorithmic Bridge (Alberto Romero), The Pragmatic Engineer (Gergely Orosz), Vercel blog, r/ClaudeAI top, r/LocalLLaMA top, r/MachineLearning top, smol.ai news. Source list and editorial profile maintained by Daniel.