Anthropic-xAI Colossus Deal, GPT-5.5 Instant, Claude Managed Agents

Anthropic dominated the week: a $5B/yr compute deal with SpaceX/xAI for Colossus gives them 220,000+ GPUs, immediately doubling Claude Code rate limits for Pro/Max/Team/Enterprise. At Code w/ Claude they shipped Managed Agents (dreaming, outcomes, multiagent orchestration) and Claude Security public beta. Meanwhile OpenAI dropped GPT-5.5 Instant as the new default with a 2x price bump, and Cursor shipped v3.3 with four changelog updates. For your overnight-agent-factory setup, the doubled Claude Code limits and the new worktree baseRef setting in v2.1.133 are the most immediately actionable changes.

Launches & releases this week

Models

GPT-5.5 Instant — New default ChatGPT/API model with improved factuality and reduced hallucinations; 2x price increase over GPT-5.4. (TLDR AI)
GPT-5.5-Cyber Trusted Access — Specialised GPT-5.5 variant for verified security defenders doing vulnerability research on critical infrastructure. (OpenAI blog)
GPT-Realtime-2 Voice API — GPT-5-class reasoning in realtime voice with tool use, 128K context, and top scores on Big Bench Audio. (OpenAI blog)
SubQ 12M Context Window — Subquadratic ships a 12-million-token context model outperforming GPT-5.5 on retrieval benchmarks; 50M planned. (TLDR AI)

Features & Tools

Claude Managed Agents — Claude adds dreaming (self-improvement from past sessions), outcomes (self-correction), and multiagent orchestration for complex tasks. (TLDR AI)
Gemma 4 MTP Drafters — Multi-token prediction drafters deliver up to 3x inference speedup for Gemma 4 with zero quality loss. (TLDR AI)

Products

Claude Security Public Beta — Opus 4.7-powered vulnerability scanning for Enterprise customers; partners include Microsoft Security and Palo Alto Networks. (TLDR AI)
Vercel deepsec — Open-source security harness using Claude or Codex to find vulnerabilities locally in large codebases. (Vercel blog)

Deals & Partnerships

Anthropic-xAI Colossus Deal — Anthropic gains 220,000+ NVIDIA GPUs via SpaceX/xAI’s Colossus datacenter at ~$5B/yr, doubling Claude Code rate limits immediately. (TLDR AI)
DeepSeek $50B Valuation Round — DeepSeek in talks with China’s National AI Fund to raise billions at ~$50B valuation. (TLDR AI)
Anthropic $900B Valuation Round — Anthropic nearing ~$50B raise at $900B valuation, driven by ~$40B ARR run rate. (TLDR AI)

Other Releases

Cursor 3.3 — Four changelog drops this week (May 1, 4, 6, 7) culminating in Cursor 3.3 release. (Cursor changelog)
Claude Code v2.1.133 — Adds worktree.baseRef setting (fresh|head), custom sandbox paths, and five releases shipped this week. (GitHub: anthropics/claude-code)
Ollama v0.23 + Gemma 4 MTP — Gemma 4 MTP speculative decoding on Mac gives 2x+ speed on 31B coding; v0.23.0 adds Claude Desktop integration. (GitHub: ollama/ollama)
MiMo V2.5 in llama.cpp — Xiaomi’s 310B MoE (15B active), 1M context, text/image/video/audio with MTP support lands in llama.cpp. (r/LocalLLaMA top)

Stories to follow

Compute as kingmaker

The week’s biggest story isn’t a model — it’s who has GPUs. Anthropic’s Colossus deal, its $200B Google Cloud commitment, and the $900B valuation round all trace to one constraint: compute scarcity at 80x growth. The immediate payoff for developers is doubled Claude Code limits, but the strategic signal is that access to inference capacity is now the primary competitive moat, not model quality alone.

Anthropic-SpaceX/xAI Colossus compute deal — 220,000+ GPUs, ~$5B/yr, Claude Code limits doubled immediately. (TLDR AI)
Anthropic commits $200B to Google Cloud over 5 years — Deepening relationship as Google plans $40B Anthropic investment. (TLDR AI)
Anthropic CEO: 80x annualised growth in Q1 — Planned for 10x, actual demand was 80x, causing compute constraints. (r/ClaudeAI top)

Local inference gets serious speed

Multi-token prediction landed across the local stack this week. Ollama v0.23.1 ships Gemma 4 MTP on Mac with 2x+ speedup; a community llama.cpp implementation shows 40% gains on M5 Max. Combined with MiMo V2.5 (310B/15B-active, 1M context) arriving in llama.cpp, the gap between local and frontier for coding tasks is narrowing faster than expected — especially relevant for hybrid routing decisions in your LiteLLM gateway.

Ollama v0.23.1: Gemma 4 MTP on Mac — 2x+ speed on Gemma 4 31B coding tasks via speculative decoding. (GitHub: ollama/ollama)
MTP for llama.cpp — Gemma 4 40% faster — 97→138 tok/s on MacBook Pro M5 Max for Gemma 26B. (r/LocalLLaMA top)
MiMo V2.5 lands in llama.cpp — 310B MoE, 15B active, 1M context, multimodal with MTP support. (r/LocalLLaMA top)

Harness > model for agent performance

Cursor’s blog on continually improving its agent harness, the Model-Harness-Fit analysis showing Opus 4.6 jumping from Top 30 to Top 5 by changing only the harness, and JetBrains’ eval showing IDE-native search tools cut agent cost and latency all point the same way: the orchestration layer around the model matters more than the model itself. This validates the skills/spec-file approach — the harness IS the product.

Cursor: Continually improving our agent harness — Vision-driven development, A/B testing, dynamic context adaptation drive gains. (TLDR AI)
Model-Harness-Fit analysis — Cursor jumped Top 30→Top 5 by changing only the harness, not the model. (TLDR AI)
JetBrains: IDE-native search makes agents faster and cheaper — Prebundled tooling reduced latency, cost, and budget overruns across models. (JetBrains AI blog)

What I’m watching

Vibe coding converging with agentic engineering — Simon Willison’s admission that the two are merging in his own practice signals the discipline gap is real and growing.
- Vibe coding and agentic engineering are getting closer than I’d like (Simon Willison)
- What you’re actually writing when you write a SKILL.md (TLDR AI)
AI prior restraint in the US — White House ordering Anthropic to limit Mythos access and considering pre-release vetting could change what models you can deploy.
- The AI Ad-Hoc Prior Restraint Era Begins (Don’t Worry About the Vase (Zvi))
- White House Considers Vetting AI Models Before Release (TLDR AI)
Anthropic Orbit proactive assistant — A briefing/insights layer in Claude Code connected to work tools could change how your team uses Claude daily.
- Anthropic working on Orbit proactive assistant (TLDR AI)

darrylmorley/whatcable

2.1k★ · Swift · apple-silicon hardware-info iokit mac-app macos macOS menu bar app that tells you, in plain English, what each USB-C cable plugged into your Mac can actually do

V4bel/dirtyfrag

2k★ · C no description

aattaran/deepclaude

1.6k★ · JavaScript Use Claude Code’s autonomous agent loop with DeepSeek V4 Pro, OpenRouter, or any Anthropic-compatible backend. Same UX, 17x cheaper.

antirez/ds4

1.4k★ · C DeepSeek 4 Flash local inference engine for Metal

yaojingang/yao-open-prompts

1.3k★ · Python · ai chinese-prompts geo prompt-engineering prompts Yao Open Prompts：中文 AI 提示词库，覆盖工作、学习、内容、营销和生活场景

Read this weekend

Vibe coding and agentic engineering are getting closer than I’d like

Willison articulates the uncomfortable convergence you’ve been writing about — when the most productive agentic workflow starts feeling indistinguishable from vibe coding, the verification problem becomes existential. Directly relevant to your leaf-nodes/22,000-line-PR thinking.

Quote of the week

We planned for a 10-fold increase, but the level of growth has been so extreme that we haven’t been able to meet compute demand.

— Dario Amodei, Anthropic CEO · link

Auto-curated weekly by Claude Opus 4.7 from Apple ML research, Ben’s Bites, Cursor changelog, Don’t Worry About the Vase (Zvi), Eugene Yan, Every — Chain of Thought (Dan Shipper), Exponential View (Azeem Azhar), GitHub: All-Hands-AI/OpenHands, GitHub: anthropics/claude-code, GitHub: cline/cline, GitHub: ggml-org/llama.cpp, GitHub: huggingface/transformers, GitHub: langchain-ai/langchain, GitHub: langchain-ai/langgraph, GitHub: ollama/ollama, GitHub: sgl-project/sglang, GitHub: vllm-project/vllm, Google DeepMind blog, Hacker News (AI), Hugging Face blog, Import AI (Jack Clark), Interconnects (Nathan Lambert), JetBrains AI blog, LangChain blog, Last Week in AI, Latent Space, Lenny’s Newsletter, NVIDIA developer blog, OpenAI blog, Simon Willison, Sourcegraph blog, TLDR AI, The Algorithmic Bridge (Alberto Romero), The Pragmatic Engineer (Gergely Orosz), Together AI blog, Understanding AI (Timothy B. Lee), Vercel blog, r/ClaudeAI top, r/LocalLLaMA top, r/MachineLearning top, smol.ai news. Source list and editorial profile maintained by Daniel.