Skip to content

← AI Tracker

AI Weekly Digest

Anthropic-xAI Colossus Deal, GPT-5.5 Instant, Claude Managed Agents

Friday, 8 May 2026 - Weekly AI Briefing · (last 7 days)

Anthropic dominated the week: a $5B/yr compute deal with SpaceX/xAI for Colossus gives them 220,000+ GPUs, immediately doubling Claude Code rate limits for Pro/Max/Team/Enterprise. At Code w/ Claude they shipped Managed Agents (dreaming, outcomes, multiagent orchestration) and Claude Security public beta. Meanwhile OpenAI dropped GPT-5.5 Instant as the new default with a 2x price bump, and Cursor shipped v3.3 with four changelog updates. For your overnight-agent-factory setup, the doubled Claude Code limits and the new worktree baseRef setting in v2.1.133 are the most immediately actionable changes.

Launches & releases this week

Models

  • GPT-5.5 Instant — New default ChatGPT/API model with improved factuality and reduced hallucinations; 2x price increase over GPT-5.4. (TLDR AI)
  • GPT-5.5-Cyber Trusted Access — Specialised GPT-5.5 variant for verified security defenders doing vulnerability research on critical infrastructure. (OpenAI blog)
  • GPT-Realtime-2 Voice API — GPT-5-class reasoning in realtime voice with tool use, 128K context, and top scores on Big Bench Audio. (OpenAI blog)
  • SubQ 12M Context Window — Subquadratic ships a 12-million-token context model outperforming GPT-5.5 on retrieval benchmarks; 50M planned. (TLDR AI)

Features & Tools

  • Claude Managed Agents — Claude adds dreaming (self-improvement from past sessions), outcomes (self-correction), and multiagent orchestration for complex tasks. (TLDR AI)
  • Gemma 4 MTP Drafters — Multi-token prediction drafters deliver up to 3x inference speedup for Gemma 4 with zero quality loss. (TLDR AI)

Products

  • Claude Security Public Beta — Opus 4.7-powered vulnerability scanning for Enterprise customers; partners include Microsoft Security and Palo Alto Networks. (TLDR AI)
  • Vercel deepsec — Open-source security harness using Claude or Codex to find vulnerabilities locally in large codebases. (Vercel blog)

Deals & Partnerships

  • Anthropic-xAI Colossus Deal — Anthropic gains 220,000+ NVIDIA GPUs via SpaceX/xAI’s Colossus datacenter at ~$5B/yr, doubling Claude Code rate limits immediately. (TLDR AI)
  • DeepSeek $50B Valuation Round — DeepSeek in talks with China’s National AI Fund to raise billions at ~$50B valuation. (TLDR AI)
  • Anthropic $900B Valuation Round — Anthropic nearing ~$50B raise at $900B valuation, driven by ~$40B ARR run rate. (TLDR AI)

Other Releases

  • Cursor 3.3 — Four changelog drops this week (May 1, 4, 6, 7) culminating in Cursor 3.3 release. (Cursor changelog)
  • Claude Code v2.1.133 — Adds worktree.baseRef setting (fresh|head), custom sandbox paths, and five releases shipped this week. (GitHub: anthropics/claude-code)
  • Ollama v0.23 + Gemma 4 MTP — Gemma 4 MTP speculative decoding on Mac gives 2x+ speed on 31B coding; v0.23.0 adds Claude Desktop integration. (GitHub: ollama/ollama)
  • MiMo V2.5 in llama.cpp — Xiaomi’s 310B MoE (15B active), 1M context, text/image/video/audio with MTP support lands in llama.cpp. (r/LocalLLaMA top)

Stories to follow

Compute as kingmaker

The week’s biggest story isn’t a model — it’s who has GPUs. Anthropic’s Colossus deal, its $200B Google Cloud commitment, and the $900B valuation round all trace to one constraint: compute scarcity at 80x growth. The immediate payoff for developers is doubled Claude Code limits, but the strategic signal is that access to inference capacity is now the primary competitive moat, not model quality alone.

Local inference gets serious speed

Multi-token prediction landed across the local stack this week. Ollama v0.23.1 ships Gemma 4 MTP on Mac with 2x+ speedup; a community llama.cpp implementation shows 40% gains on M5 Max. Combined with MiMo V2.5 (310B/15B-active, 1M context) arriving in llama.cpp, the gap between local and frontier for coding tasks is narrowing faster than expected — especially relevant for hybrid routing decisions in your LiteLLM gateway.

Harness > model for agent performance

Cursor’s blog on continually improving its agent harness, the Model-Harness-Fit analysis showing Opus 4.6 jumping from Top 30 to Top 5 by changing only the harness, and JetBrains’ eval showing IDE-native search tools cut agent cost and latency all point the same way: the orchestration layer around the model matters more than the model itself. This validates the skills/spec-file approach — the harness IS the product.

What I’m watching

darrylmorley/whatcable

2.1k★ · Swift · apple-silicon hardware-info iokit mac-app macos macOS menu bar app that tells you, in plain English, what each USB-C cable plugged into your Mac can actually do

V4bel/dirtyfrag

2k★ · C no description

aattaran/deepclaude

1.6k★ · JavaScript Use Claude Code’s autonomous agent loop with DeepSeek V4 Pro, OpenRouter, or any Anthropic-compatible backend. Same UX, 17x cheaper.

antirez/ds4

1.4k★ · C DeepSeek 4 Flash local inference engine for Metal

yaojingang/yao-open-prompts

1.3k★ · Python · ai chinese-prompts geo prompt-engineering prompts Yao Open Prompts:中文 AI 提示词库,覆盖工作、学习、内容、营销和生活场景

Read this weekend

Vibe coding and agentic engineering are getting closer than I’d like

Willison articulates the uncomfortable convergence you’ve been writing about — when the most productive agentic workflow starts feeling indistinguishable from vibe coding, the verification problem becomes existential. Directly relevant to your leaf-nodes/22,000-line-PR thinking.

Quote of the week

We planned for a 10-fold increase, but the level of growth has been so extreme that we haven’t been able to meet compute demand.

Dario Amodei, Anthropic CEO · link


Auto-curated weekly by Claude Opus 4.7 from Apple ML research, Ben’s Bites, Cursor changelog, Don’t Worry About the Vase (Zvi), Eugene Yan, Every — Chain of Thought (Dan Shipper), Exponential View (Azeem Azhar), GitHub: All-Hands-AI/OpenHands, GitHub: anthropics/claude-code, GitHub: cline/cline, GitHub: ggml-org/llama.cpp, GitHub: huggingface/transformers, GitHub: langchain-ai/langchain, GitHub: langchain-ai/langgraph, GitHub: ollama/ollama, GitHub: sgl-project/sglang, GitHub: vllm-project/vllm, Google DeepMind blog, Hacker News (AI), Hugging Face blog, Import AI (Jack Clark), Interconnects (Nathan Lambert), JetBrains AI blog, LangChain blog, Last Week in AI, Latent Space, Lenny’s Newsletter, NVIDIA developer blog, OpenAI blog, Simon Willison, Sourcegraph blog, TLDR AI, The Algorithmic Bridge (Alberto Romero), The Pragmatic Engineer (Gergely Orosz), Together AI blog, Understanding AI (Timothy B. Lee), Vercel blog, r/ClaudeAI top, r/LocalLLaMA top, r/MachineLearning top, smol.ai news. Source list and editorial profile maintained by Daniel.