Cursor 3.3, Claude Managed Agents, Gemma 4 MTP
Friday, 8 May 2026 - AI News · (last 24h)
Cursor 3.3 ships alongside Claude’s new self-improving managed agents and a 40% local inference speedup via multi-token prediction for Gemma 4 on llama.cpp.
Must read
- Cursor 3.3 — New Cursor release — check changelog for agent mode, model selection, and parallel workstream changes relevant to your daily driver.
- Claude adds Self-Improving Agents (Dreaming, Outcomes, Multiagent) — Dreaming lets agents self-improve from past sessions; outcomes add self-correction — directly applicable to your overnight-agent-factory pattern.
- Anthropic raises Claude limits via SpaceX/xAI Colossus deal — 220,000+ GPUs secured; higher usage limits now live — your team’s Claude Code and API throughput constraints should ease.
- Claude Code v2.1.133: worktree baseRef, sandbox paths — New worktree.baseRef setting changes how —worktree branches; affects your headless agent isolation setup directly.
- Multi-Token Prediction for llama.cpp — Gemma 4 40% faster — Gemma 26B hits 138 tok/s on M5 Max with MTP drafting; meaningful for your local Apple Silicon coding workflows.
Tools & Frameworks
TokenSpeed: Speed-of-Light LLM Inference for Agentic Workloads
Compiler-backed inference engine outperforms TensorRT-LLM on coding agent workloads with optimised MLA for Blackwell GPUs.
Why this matters: Relevant if you route agentic workloads through self-hosted infra via LiteLLM.
How AI Agent Memory Works
Deep-dive on memory architectures for agents: what information to carry forward in each loop iteration.
Why this matters: Directly applicable to persistent memory layers in your overnight agents.
ProgramBench: Agent Software Recreation Benchmark
248,000 behavioural tests across 200 tasks challenge agents to recreate executables from docs alone — no source code.
Why this matters: Novel eval methodology for coding agents; useful for benchmarking your own agent pipelines.
Ollama v0.23.2
Removes Claude Desktop integration; /api/show latency improved ~6.7× via caching — faster VS Code and tool integrations.
Why this matters: If you use Ollama locally, the show-cache speedup helps MCP-connected editors.
LangChain-core 0.3.86 — CVE path-traversal fix
Patches CVE-2026-34070 path-traversal vulnerability in loads/dumps; upgrade recommended.
Why this matters: Security patch — check if any internal tooling depends on langchain-core.
Open Models & Local
llama.cpp b9055: MiMo V2.5 support merged
Xiaomi MiMo V2.5 (310B total / 15B active MoE, 1M context, multimodal with MTP) now runs in llama.cpp.
Why this matters: A 15B-active multimodal MoE with 1M context is interesting for local hybrid routing experiments.
ZAYA1-74B-Preview: Scaling Pretraining on AMD
Zyphra releases a 74B model pretrained entirely on AMD hardware, demonstrating non-NVIDIA training viability.
Why this matters: Watch-but-don’t-act; signals AMD ecosystem maturing for open model training.
llm-gemini 0.31 — Gemini 3.1 Flash-Lite GA
Simon Willison’s LLM plugin updated; Gemini 3.1 Flash-Lite exits preview and is now generally available.
Why this matters: Cheap, fast model option for your LiteLLM gateway routing decisions.
Anthropic NLA weights for Gemma 3 27B released
Natural Language Autoencoders translate Gemma 3’s internal representations into readable text; weights on HuggingFace.
Why this matters: Interpretability tooling you can run locally — useful for debugging agent behaviour.
Industry & Trends
DeepSeek raising at $50B valuation from China’s national AI fund
Government-backed fund investing billions; DeepSeek positioned as China’s hedge against US export controls.
Why this matters: DeepSeek models are in your local stack; signals continued investment in open-weight frontier models.
OpenAI Codex with GPT-5.5 reportedly surpassing Claude Code
Every’s team reports Codex now outperforms Claude Code after GPT-5.5 integration and app improvements.
Why this matters: Worth testing Codex against your Claude Code workflows to validate or refute.
Mozilla used Claude Mythos to find hundreds of Firefox vulnerabilities
Claude Mythos preview generated high-quality security bug reports at scale, fixing hundreds of real Firefox vulnerabilities.
Why this matters: Concrete evidence of frontier models in security auditing — applicable to your RegTech codebase.
Pragmatic Engineer: Did capacity shortages turn Anthropic hostile to devs?
Gergely Orosz covers Anthropic’s compute crunch, Amazon allowing Claude Code/Codex, and the rise of small AI-forward teams.
Why this matters: Directly relevant to your team’s experience with Claude limits and org-design thinking.
Next.js May 2026 security release — 13 advisories
Patches DoS, middleware bypass, SSRF, cache poisoning, and XSS across Next.js; includes upstream React Server Components CVE.
Why this matters: You deploy on Vercel with React — patch immediately.
Auto-curated daily by Claude Opus 4.7 from Apple ML research, Ben’s Bites, Cursor changelog, Don’t Worry About the Vase (Zvi), Every — Chain of Thought (Dan Shipper), GitHub: BerriAI/litellm, GitHub: anthropics/claude-code, GitHub: ggml-org/llama.cpp, GitHub: langchain-ai/langchain, GitHub: langchain-ai/langgraph, GitHub: ollama/ollama, Hugging Face blog, Interconnects (Nathan Lambert), Latent Space, NVIDIA developer blog, OpenAI blog, Simon Willison, TLDR AI, The Algorithmic Bridge (Alberto Romero), The Pragmatic Engineer (Gergely Orosz), Vercel blog, r/ClaudeAI top, r/LocalLLaMA top, r/MachineLearning top, smol.ai news. Source list and editorial profile maintained by Daniel.