Cursor 3.5, Karpathy Joins Anthropic, Anthropic-xAI $15B/yr

Cursor ships 3.5, Karpathy joins Anthropic for frontier R&D, and SpaceX’s S-1 reveals Anthropic is paying $15B/year for Colossus compute.

Must read

Cursor 3.5 — Major version bump for your primary IDE — check for agent-mode, model selection, and parallel workstream changes.
Karpathy joins Anthropic — Karpathy returning to frontier R&D at your primary model provider signals Anthropic is doubling down on research velocity.
Anthropic-SpaceX $15B/year compute deal revealed in S-1 — Anthropic securing $1.25B/month of Colossus capacity through 2029 means sustained frontier model training — good for your Claude Code dependency.
Claude Code: The unreasonable effectiveness of HTML — Directly applicable to your skills/spec workflow — HTML specs outperform Markdown for context ingestion in Claude Code.
Gemini 3.5 Flash — New agentic-focused model from Google; worth evaluating via LiteLLM as a routing option for cost-sensitive tasks.

Tools & Frameworks

Claude Code v2.1.146

Renames /simplify to /code-review with effort levels; fixes MCP pagination dropping items past page 1; fixes Windows PowerShell regression.

Why this matters: The MCP pagination fix matters if your in-house MCP servers paginate.

Warp Oz: multi-harness agent control plane

Oz orchestrates Claude Code, Codex, and Warp Agent from a single pane with cross-harness memory and cost controls.

Why this matters: Relevant to your overnight-agent-factory pattern — unified dispatch across harnesses.

LangChain Deep Agents: embedded interpreters

Agents can now write code between tool calls in a sandboxed runtime to hold working state and control context.

Why this matters: Pattern for your agentic orchestration — code as glue between tool calls.

Vercel Chat SDK ships AI SDK toolset

One createChatTools() call wires read/write actions into agents with approval gating and preset scopes (reader, messenger, moderator).

Why this matters: Useful if you build customer-facing chat on Vercel — agent tool wiring simplified.

Grok Build 0.1 on Vercel AI Gateway

xAI’s beta agentic coding model now accessible via AI SDK as xai/grok-build-0.1 — reasoning-only, no configurable effort.

Why this matters: Another coding model option for your LiteLLM gateway to evaluate.

Open Models & Local

Gemma 4 MTP support in llama.cpp (WIP)

Work-in-progress PR adds Multi-Token Prediction for Gemma 4 in llama.cpp — compile-from-source only, not stable yet.

Why this matters: MTP on Apple Silicon could meaningfully speed up local Gemma 4 inference for you.

Qwen3.6 35B: 56 tok/s at 128k context, MTP doesn’t help

Benchmarks show Q4_K_XL at 128k context hits 56 tok/s generation; MTP converges to same speed at long contexts — skip the complexity.

Why this matters: Practical data point if you’re evaluating Qwen 3.6 for local coding agent use.

Qwen 3.6 35B GGUF: NTP vs MTP quantization benchmarks

ByteShape releases NTP and MTP quants; finding: largest quant that fits outperforms lower bpw — quality trumps compression.

Why this matters: Quantization guidance for your Apple Silicon local setup.

Cohere Command A+ (open-weight MoE)

Cohere’s first MoE model released as open weights — efficiency-focused, hybrid sliding-window/full attention, large context.

Why this matters: New open MoE option; watch for GGUF quants to test locally.

Kimi K2.6 on Cerebras: ~1,000 tok/s

Trillion-parameter Kimi K2.6 achieves fastest frontier inference ever measured by Artificial Analysis at ~1,000 tokens/second on Cerebras.

Why this matters: Benchmark for what’s possible with dedicated silicon — context for cloud routing decisions.

Industry & Trends

Google I/O 2026: agentic Gemini across products

Google reports 3.2 quadrillion monthly tokens across AI systems; Gemini integration expanding to Search, Android Studio, and enterprise tools.

Why this matters: Scale context for where Google’s agentic push is heading — relevant for model selection.

Railway: the agent-native cloud

Railway has 3M users, 100K signups/week, $200K+ coding agent spend, and is building for a post-PR workflow — own-metal data centres.

Why this matters: Concrete example of infra built for agentic dev — the ‘death of PRs’ framing maps to your leaf-node verification problem.

Ramp engineers use Codex for code review

Ramp uses GPT-5.5 via Codex to get substantive code review feedback in minutes instead of hours.

Why this matters: Real adoption story at an engineering org — before/after on review latency.

OpenAI model disproves 80-year-old geometry conjecture

An OpenAI model solved the unit distance problem, disproving a central conjecture in discrete geometry — milestone for AI-driven mathematics.

Why this matters: Frontier capability signal; not directly actionable but notable for reasoning model trajectory.

Org & Leadership

Addressing vibe coding at the professional level

Senior engineer describes colleague shipping 5K LOC / 50 files via zero-plan one-shot prompting with no tests — asks how to intervene.

Why this matters: Exactly the ‘vibe coding as a management problem’ you write about — real-world case of the discipline gap.

Auto-curated daily by Claude Opus 4.7 from Cursor changelog, GitHub: BerriAI/litellm, GitHub: anthropics/claude-code, GitHub: ggml-org/llama.cpp, GitHub: huggingface/transformers, GitHub: langchain-ai/langchain, LangChain blog, Last Week in AI, Latent Space, NVIDIA developer blog, OpenAI blog, Simon Willison, TLDR AI, The Algorithmic Bridge (Alberto Romero), The Pragmatic Engineer (Gergely Orosz), Vercel blog, r/ClaudeAI top, r/LocalLLaMA top, r/MachineLearning top. Source list and editorial profile maintained by Daniel.