Cursor 3.5, Karpathy Joins Anthropic, Anthropic-xAI $15B/yr
Donnerstag, 21. Mai 2026 - AI News · (letzte 24h)
Cursor ships 3.5, Karpathy joins Anthropic for frontier R&D, and SpaceX’s S-1 reveals Anthropic is paying $15B/year for Colossus compute.
Must read
- Cursor 3.5 — Major version bump for your primary IDE — check for agent-mode, model selection, and parallel workstream changes.
- Karpathy joins Anthropic — Karpathy returning to frontier R&D at your primary model provider signals Anthropic is doubling down on research velocity.
- Anthropic-SpaceX $15B/year compute deal revealed in S-1 — Anthropic securing $1.25B/month of Colossus capacity through 2029 means sustained frontier model training — good for your Claude Code dependency.
- Claude Code: The unreasonable effectiveness of HTML — Directly applicable to your skills/spec workflow — HTML specs outperform Markdown for context ingestion in Claude Code.
- Gemini 3.5 Flash — New agentic-focused model from Google; worth evaluating via LiteLLM as a routing option for cost-sensitive tasks.
Tools & Frameworks
Claude Code v2.1.146
Renames /simplify to /code-review with effort levels; fixes MCP pagination dropping items past page 1; fixes Windows PowerShell regression.
Why this matters: The MCP pagination fix matters if your in-house MCP servers paginate.
Warp Oz: multi-harness agent control plane
Oz orchestrates Claude Code, Codex, and Warp Agent from a single pane with cross-harness memory and cost controls.
Why this matters: Relevant to your overnight-agent-factory pattern — unified dispatch across harnesses.
LangChain Deep Agents: embedded interpreters
Agents can now write code between tool calls in a sandboxed runtime to hold working state and control context.
Why this matters: Pattern for your agentic orchestration — code as glue between tool calls.
Vercel Chat SDK ships AI SDK toolset
One createChatTools() call wires read/write actions into agents with approval gating and preset scopes (reader, messenger, moderator).
Why this matters: Useful if you build customer-facing chat on Vercel — agent tool wiring simplified.
Grok Build 0.1 on Vercel AI Gateway
xAI’s beta agentic coding model now accessible via AI SDK as xai/grok-build-0.1 — reasoning-only, no configurable effort.
Why this matters: Another coding model option for your LiteLLM gateway to evaluate.
Open Models & Local
Gemma 4 MTP support in llama.cpp (WIP)
Work-in-progress PR adds Multi-Token Prediction for Gemma 4 in llama.cpp — compile-from-source only, not stable yet.
Why this matters: MTP on Apple Silicon could meaningfully speed up local Gemma 4 inference for you.
Qwen3.6 35B: 56 tok/s at 128k context, MTP doesn’t help
Benchmarks show Q4_K_XL at 128k context hits 56 tok/s generation; MTP converges to same speed at long contexts — skip the complexity.
Why this matters: Practical data point if you’re evaluating Qwen 3.6 for local coding agent use.
Qwen 3.6 35B GGUF: NTP vs MTP quantization benchmarks
ByteShape releases NTP and MTP quants; finding: largest quant that fits outperforms lower bpw — quality trumps compression.
Why this matters: Quantization guidance for your Apple Silicon local setup.
Cohere Command A+ (open-weight MoE)
Cohere’s first MoE model released as open weights — efficiency-focused, hybrid sliding-window/full attention, large context.
Why this matters: New open MoE option; watch for GGUF quants to test locally.
Kimi K2.6 on Cerebras: ~1,000 tok/s
Trillion-parameter Kimi K2.6 achieves fastest frontier inference ever measured by Artificial Analysis at ~1,000 tokens/second on Cerebras.
Why this matters: Benchmark for what’s possible with dedicated silicon — context for cloud routing decisions.
Industry & Trends
Google I/O 2026: agentic Gemini across products
Google reports 3.2 quadrillion monthly tokens across AI systems; Gemini integration expanding to Search, Android Studio, and enterprise tools.
Why this matters: Scale context for where Google’s agentic push is heading — relevant for model selection.
Railway: the agent-native cloud
Railway has 3M users, 100K signups/week, $200K+ coding agent spend, and is building for a post-PR workflow — own-metal data centres.
Why this matters: Concrete example of infra built for agentic dev — the ‘death of PRs’ framing maps to your leaf-node verification problem.
Ramp engineers use Codex for code review
Ramp uses GPT-5.5 via Codex to get substantive code review feedback in minutes instead of hours.
Why this matters: Real adoption story at an engineering org — before/after on review latency.
OpenAI model disproves 80-year-old geometry conjecture
An OpenAI model solved the unit distance problem, disproving a central conjecture in discrete geometry — milestone for AI-driven mathematics.
Why this matters: Frontier capability signal; not directly actionable but notable for reasoning model trajectory.
Org & Leadership
Addressing vibe coding at the professional level
Senior engineer describes colleague shipping 5K LOC / 50 files via zero-plan one-shot prompting with no tests — asks how to intervene.
Why this matters: Exactly the ‘vibe coding as a management problem’ you write about — real-world case of the discipline gap.
Auto-curated daily by Claude Opus 4.7 from Cursor changelog, GitHub: BerriAI/litellm, GitHub: anthropics/claude-code, GitHub: ggml-org/llama.cpp, GitHub: huggingface/transformers, GitHub: langchain-ai/langchain, LangChain blog, Last Week in AI, Latent Space, NVIDIA developer blog, OpenAI blog, Simon Willison, TLDR AI, The Algorithmic Bridge (Alberto Romero), The Pragmatic Engineer (Gergely Orosz), Vercel blog, r/ClaudeAI top, r/LocalLLaMA top, r/MachineLearning top. Source list and editorial profile maintained by Daniel.