Anthropic SpaceX/xAI Deal, DeepSeek V4 Launch, OpenAI Symphony Spec

This was Anthropic’s week. The Code w/ Claude event, the SpaceX/xAI Colossus compute deal, the 80x annualised growth admission from Dario, and doubled rate limits for Pro/Max users — it all paints a picture of a company that’s simultaneously capacity-constrained and sprinting ahead of its own infrastructure. The Colossus deal is the headline, but the engineering signal is in the details: Claude Code limits doubling, Anthropic’s natural language autoencoders research (turning internal representations into inspectable text), and Simon Willison’s observation that vibe coding and agentic engineering are converging in his own practice. Meanwhile, OpenAI shipped GPT-5.5 Instant as the new default, open-sourced Symphony (an orchestration spec for Codex), and put models on AWS — a clear multi-cloud breakout from Azure exclusivity.

The other dominant thread is the open-weights ecosystem quietly reaching ‘good enough’ for daily agent work. DeepSeek V4 landed with a million-token context window, Granite 4.1 shipped Apache 2.0 at 3B/8B/30B, Qwen 3.6 models are trickling out, and LangChain published evals showing open models matching frontier on core agent tasks. Multi-Token Prediction in llama.cpp is delivering 40% speedups on Gemma 4. The gap isn’t closed, but for structured edits, summarisation, and lightweight agents — the economics are shifting fast.

For CTOs managing AI-augmented teams, the week’s sharpest insight came from the convergence of JetBrains’ Skill Manager/Repository, OpenAI’s Symphony spec, and LangChain’s ‘Evaluating Skills’ framework. The industry is clearly coalescing around skills-as-units-of-agent-work — discoverable, testable, reusable. This is the discipline layer above vibe coding that your progressive-disclosure model has been pointing toward.

What’s the story this week

Anthropic’s capacity crisis and compute land-grab

Anthropic’s 80x annualised growth broke their own infrastructure planning (they’d prepared for 10x). The SpaceX/xAI Colossus deal — reportedly $5B/year for 300MW of capacity — is unprecedented: a safety-focused PBC renting from Musk’s operation, complete with the environmental baggage of methane turbines. The immediate developer payoff is doubled Claude Code rate limits and removed peak-hour throttling. But the deeper signal is that compute access is now the binding constraint on who wins the agent race. Anthropic chose speed over optics, and the community is split — some see pragmatism, others see a betrayal of PBC values. For teams building on Claude Code, the practical upshot is that the reliability problems of recent months should ease materially.

Anthropic-SpaceXai’s 300MW/$5B/yr deal for Colossus I — Latent Space breaks down the economics: Anthropic gets all of Colossus I capacity, ARR growth is 8000% annualised. (Latent Space)
Notes on the xAI/Anthropic data center deal — Simon Willison contextualises the deal’s scale and environmental concerns, noting it was the biggest announcement from Code w/ Claude. (Simon Willison)
Did capacity shortages turn Anthropic hostile to devs? — Gergely Orosz reports on how rate-limit frustrations eroded developer goodwill, and whether the Colossus deal fixes the underlying problem. (The Pragmatic Engineer)
Anthropic CEO says 80-fold growth in first quarter — Dario Amodei publicly attributes compute difficulties to demand growing 8x beyond their planning assumptions. (r/ClaudeAI)

Skills, specs, and the discipline layer for agents

Multiple players independently shipped ‘skills as first-class objects’ this week. JetBrains launched a Skill Manager and Skill Repository — install once, reuse across agents and projects. OpenAI open-sourced Symphony, an orchestration spec that turns issue trackers into always-on agent systems. LangChain published ‘Evaluating Skills’ with concrete patterns for measuring agent competence per-skill. This convergence validates the progressive-disclosure model: vibe coding gets you started, but production agents need discoverable, testable, composable skill units. The implication for engineering leaders is clear — your agent governance story needs a skills registry, not just prompt libraries.

Introducing the Skill Manager and Skill Repository — JetBrains ships skill discovery, trust verification, and cross-project reuse for AI Assistant skills. (JetBrains AI blog)
Symphony: an open-source spec for Codex orchestration — OpenAI’s spec turns issue trackers into persistent agent systems, reducing context switching for engineering teams. (OpenAI blog)
Evaluating Skills — LangChain defines best practices for measuring agent skill performance with LangSmith observability. (LangChain blog)

Vibe coding meets agentic engineering — the convergence

Simon Willison articulated what many practitioners are feeling: the line between ‘vibe coding’ (fast, low-verification, disposable) and ‘agentic engineering’ (structured, verified, production-grade) is blurring in practice. His podcast discussion and blog post describe catching himself applying vibe-coding habits to production agent work. Meanwhile, the Zig project’s hard anti-AI contribution policy and Andrew Kelley’s observation about ‘digital smell’ represent the counter-position — that AI-generated code carries detectable patterns that erode maintainability. JetBrains’ data on IDE-catchable errors in AI-generated PRs adds empirical weight. The management question isn’t whether to use agents, but how to build verification into the workflow before the 22,000-line PR problem compounds.

Vibe coding and agentic engineering are getting closer than I’d like — Willison admits the two modes are converging in his own work, raising questions about verification discipline. (Simon Willison)
Stop Sending IDE-Catchable AI Code Errors to Review — PR volume is up but AI-generated code carries error patterns that weren’t common before — and reviewers are drowning. (JetBrains AI blog)
The Zig project’s rationale for their firm anti-AI contribution policy — Zig bans all LLM-assisted contributions; Andrew Kelley says AI code has a ‘digital smell’ obvious to maintainers. (Simon Willison)

Open models cross the ‘good enough’ threshold for agent work

DeepSeek V4 arrived with 1M-token context and 49B active parameters in a 1.6T MoE — immediately available on Together AI and NVIDIA Blackwell endpoints. LangChain’s evals show open models (GLM-5, MiniMax M2.7) matching frontier on file operations, tool use, and instruction following. IBM’s Granite 4.1 shipped Apache 2.0 at sizes that run on Apple Silicon. Multi-Token Prediction landed in llama.cpp, giving Gemma 4 a 40% decode speedup on M5 Max. For hybrid routing setups via LiteLLM, the decision boundary is shifting: more tasks can stay local or on cheap open-model endpoints without quality regression. The cost arbitrage is real.

DeepSeek-V4: a million-token context that agents can actually use — 1.6T MoE with 49B active params, hybrid attention, compressed KV — the strongest open-weights reasoning model after Kimi K2.6. (Hugging Face blog)
Open Models have crossed a threshold — LangChain evals show open models matching closed frontier on core agent tasks at a fraction of cost and latency. (LangChain blog)
Multi-Token Prediction for llama.cpp — Gemma 4 speedup by 40% — MTP implementation delivers 97→138 tok/s on Gemma 26B on MacBook Pro M5 Max. (r/LocalLLaMA)

Security tooling goes agent-native

Mozilla used Claude Mythos Preview to find and fix hundreds of Firefox vulnerabilities — and the bugs were ‘very good’, a step-change from the noise of previous AI security reports. Vercel open-sourced deepsec, a security harness that runs coding agents against your codebase on your own infra, using existing Claude or Codex subscriptions. The UK AISI evaluated GPT-5.5’s cyber capabilities and found it comparable to Mythos. The pattern: security scanning is moving from static analysis to agent-driven exploration, and the tools are arriving that let small teams run these workflows without third-party code access. For a RegTech CTO, this is directly relevant — agent-powered vulnerability discovery on your own repos, overnight, using infrastructure you already have.

Behind the Scenes Hardening Firefox with Claude Mythos Preview — Mozilla found hundreds of real vulnerabilities using Mythos — a qualitative jump from previous AI security bug reports. (Simon Willison)
Introducing deepsec: security harness for finding vulnerabilities — Open-source tool runs coding agents against your codebase locally; uses existing Claude/Codex subscriptions for inference. (Vercel blog)

What I’m watching

Agent-maintained persistent knowledge layers — Multiple projects are converging on ‘LLM-native knowledge substrates’ — wikis, filesystems, memory stores that agents both read and write. This compounds context across sessions, which is the missing piece for overnight agent factories.
- Wuphf: A Karpathy-style LLM wiki your agents maintain (Markdown + Git) (Hacker News (AI))
- How we built Agent Builder’s memory (LangChain blog)
IDE as agent quality variable — JetBrains published data showing that IDE-native search tools reduce agent latency and cost measurably. As you route between Claude Code (terminal) and Cursor (IDE), the tooling context each provides is becoming a first-order performance variable worth benchmarking.
- We Gave Agents IDE-Native Search Tools. They Got Faster and Cheaper. (JetBrains AI blog)
- The IDE Is Already an AI Quality Variable (JetBrains AI blog)
Structured output correctness (beyond schema validity) — A new benchmark specifically targets the gap between ‘valid JSON’ and ‘correct values’ — hallucinated dates, wrong ordering, plausible-but-wrong fields. This is the silent failure mode in production agent pipelines, especially in RegTech where data accuracy is non-negotiable.
- A new benchmark for testing LLMs for deterministic outputs (Hacker News (AI))
AMD PCIe inference cards for on-prem — AMD announced the Instinct MI350P (CDNA 4, PCIe form factor) alongside a Taiwanese startup shipping 384GB inference cards at ~240W. If pricing is competitive, these change the economics of running DeepSeek V4 or Qwen 3.6 on-prem without rack-scale GPU clusters.
- AMD Intros Instinct MI350P Accelerator: CDNA 4 Comes to PCIe Cards (r/LocalLLaMA)
- Skymizer HTX301 — PCIe inference card with 384GB memory at ~240W (r/LocalLLaMA)

Read this weekend

Vibe coding and agentic engineering are getting closer than I’d like

Willison articulates the exact tension you’ve been writing about — the leaf-nodes problem, verification debt, the blurring line between disposable prototypes and production agent work. He’s honest about catching himself doing it wrong, which makes the piece genuinely useful for calibrating your own team’s discipline boundaries.

Quote of the week

People who come from the world of agentic coding have a certain digital smell that is not obvious to them but is obvious to everyone else.

— Andrew Kelley (Zig creator) · link

Sources unavailable this week: GitHub: Aider-AI/aider, GitHub: All-Hands-AI/OpenHands, GitHub: BerriAI/litellm, GitHub: anthropics/claude-code, GitHub: cline/cline, GitHub: continuedev/continue, GitHub: crewAIInc/crewAI, GitHub: ggml-org/llama.cpp, GitHub: huggingface/text-generation-inference, GitHub: huggingface/transformers, GitHub: langchain-ai/langchain, GitHub: langchain-ai/langgraph, GitHub: microsoft/autogen, GitHub: ml-explore/mlx, GitHub: ollama/ollama, GitHub: princeton-nlp/SWE-agent, GitHub: sgl-project/sglang, GitHub: simonw/llm, GitHub: vllm-project/vllm

Auto-curated weekly by Claude Opus 4.7 from Apple ML research, Ben’s Bites, Cursor changelog, Don’t Worry About the Vase (Zvi), Eric Jang, Eugene Yan, Every — Chain of Thought (Dan Shipper), Exponential View (Azeem Azhar), Google DeepMind blog, Hacker News (AI), Hugging Face blog, Import AI (Jack Clark), Interconnects (Nathan Lambert), JetBrains AI blog, LangChain blog, Last Week in AI, Latent Space, Lenny’s Newsletter, NVIDIA developer blog, OpenAI blog, Simon Willison, Sourcegraph blog, TLDR AI, The Algorithmic Bridge (Alberto Romero), The Pragmatic Engineer (Gergely Orosz), Together AI blog, Understanding AI (Timothy B. Lee), Vercel blog, r/ClaudeAI top, r/LocalLLaMA top, r/MachineLearning top, smol.ai news. Source list and editorial profile maintained by Daniel.