Anthropic's $965B Valuation Month, Gemini 3.5 Flash Ships, Dynamic Workflows Go Parallel

May belonged to Anthropic. A $65B Series H at $965B valuation, first profitable quarter in sight, Opus 4.8 shipped, and Dynamic Workflows in Claude Code — orchestrating hundreds of parallel subagents — landed in research preview. Jarred Sumner rewrote Bun from Zig to Rust in 11 days using it. For teams building with agents, this is the month the overnight-agent-factory pattern got official infrastructure.

Google countered at I/O with Gemini 3.5 Flash (strong agentic coding, 4× faster than 3.1 Pro) and Agent Executor as an open-source distributed runtime. Cursor hit $3B ARR, shipped Composer 2.5 with RL-trained agents and Auto Review, while OpenAI pushed Codex into mobile, Windows sandboxing, and enterprise partnerships. The practical upshot: the harness layer — not the model — is now the competitive surface for coding agents.

Launches & releases this month

Models

Claude Opus 4.8 — Incremental Opus upgrade with adjustable effort controls, faster mode at ~2.5× output speed, same API price. (TLDR AI)
Gemini 3.5 Flash — Google’s new agentic-focused model: improved coding, parallel execution loops, 4× faster than 3.1 Pro. (TLDR AI)
GPT-5.5 Instant — New ChatGPT/API default model with reduced hallucinations, improved factuality, and personalisation controls. (OpenAI blog)
Qwen3.7 Max — Alibaba’s proprietary agent-foundation model tops Terminal-Bench 2.0, SWE-Pro, and MCP-Mark benchmarks. (TLDR AI)

Features & Tools

Claude Code Dynamic Workflows — Research preview: Claude Code breaks tasks into subtasks with hundreds of parallel subagents converging on results. (TLDR AI)
Cursor Auto Review — Cursor 3.6 adds automated code review integrated into the agent workflow. (Cursor changelog)
OpenAI Secure MCP Tunnel — Connects private MCP servers to OpenAI products via outbound HTTPS tunnels without public exposure. (TLDR AI)
Codex Mobile + Windows Sandbox — Codex tasks manageable from ChatGPT mobile; Windows sandbox with controlled file access and network limits. (OpenAI blog)
Vercel Sandbox Docker Support — Agents can now build and run Docker containers inside Vercel Sandbox without touching the host system. (Vercel blog)

Products

Grok Build CLI — xAI’s coding agent CLI in beta with plan mode, headless execution, and specialised subagents. (TLDR AI)
LangSmith Engine — Watches production traces, clusters failures into named issues, proposes targeted fixes and eval coverage automatically. (LangChain blog)
Warp Oz Multi-Harness Control — Single pane of glass managing Claude Code, Codex, and Warp agents with cross-harness memory and cost controls. (TLDR AI)

Deals & Partnerships

Anthropic $65B Series H — $65B round at $965B post-money valuation; run-rate revenue crossed $47B; first profitable quarter imminent. (TLDR AI)
Cognition $1B Series D — Devin maker raises $1B at $26B valuation; 80% of commits now from Devin at some customers. (TLDR AI)
Karpathy Joins Anthropic — Andrej Karpathy joined Anthropic for frontier R&D, citing the next few years as especially formative. (TLDR AI)
OpenRouter $1.3B Series B — AI gateway processing 100T tokens/month raises $113M at $1.3B; validates multi-model routing as infrastructure. (TLDR AI)
Cursor $3B ARR — Cursor hit $3B annualised revenue with 3,000+ customers paying $100K+; SpaceX acquisition window opens soon. (TLDR AI)

Research

DeepSWE Benchmark — Contamination-free SWE benchmark: 91 repos, 5 languages, sharper separation than SWE-Bench Pro. (TLDR AI)

Other Releases

Cursor Composer 2.5 — RL-trained coding agent with synthetic data and distributed training; ships in Cursor 3.5. (TLDR AI)
MCP Spec Release Candidate — Largest MCP revision: stateless HTTP core, OAuth/OIDC auth, extensions system, formal deprecation policy; ships July 28. (TLDR AI)
LangSmith Sandboxes GA — Kernel-isolated microVMs with snapshots, parallel forks, and auth proxies for running agents safely in production. (LangChain blog)
Deep Agents v0.6 — Code interpreter, harness profiles per model family, delta channels for O(1) checkpointing, and streaming v3. (LangChain blog)
Google Agent Executor — Open-source distributed runtime for durable, long-running agent workflows with session consistency and trajectory branching. (TLDR AI)
Forge: 8B Model → 99% — Open-source guardrails layer takes an 8B local model from 53% to 99% on multi-step agentic tasks without model changes. (Hacker News (AI))

Stories of the month

The harness is the product now

May made explicit what’s been building for months: the model is table stakes; the harness — sandboxing, memory, orchestration, eval loops — is where differentiation lives. LangChain shipped Sandboxes GA and Engine (auto-triaging production failures). Vercel added Docker-in-Sandbox and Claude Managed Agent support. Warp launched Oz as a multi-harness control plane. Cursor’s Composer 2.5 is RL-trained specifically as a harness-level improvement. For your team routing through LiteLLM, this means the gateway alone isn’t enough — the execution environment and feedback loop around it matter more than which model sits behind it.

LangSmith Sandboxes GA — Kernel-isolated microVMs with snapshots and auth proxies for safe agent execution. (LangChain blog)
LangSmith Engine — Auto-clusters production agent failures and proposes fixes with eval coverage. (LangChain blog)
Docker in Vercel Sandbox — Agents can build and run containers inside sandboxes without host access. (Vercel blog)
Warp Oz multi-harness control plane — Manages Claude Code, Codex, and Warp agents from a single pane with shared memory. (TLDR AI)
The Anatomy of an Agent Harness — LangChain formalises harness components: filesystems, sandboxes, memory, middleware. (LangChain blog)

Parallel agents at production scale

Dynamic Workflows in Claude Code, Conductor on Vercel Sandbox, and Deep Agents v0.6 all shipped infrastructure for running fleets of agents in parallel. The pattern is converging: break a large task into subtasks, spin up isolated agents per subtask, converge results. Jarred Sumner’s 750K-line Bun rewrite in 11 days is the proof point. This directly enables the overnight-agent-factory workflow — dispatch N agents before sleep, review converged output in the morning. The missing piece is still verification at scale, but the orchestration layer is now real.

Dynamic Workflows in Claude Code — Hundreds of parallel subagents converge on large tasks; used to rewrite Bun in 11 days. (TLDR AI)
Conductor on Vercel Sandbox — Fleet of parallel coding agents in cloud; used by Notion, Linear, Ramp. (Vercel blog)
Deep Agents v0.6 — Delta channels keep storage flat as parallel long-running sessions scale. (LangChain blog)
The Age of Async Agents — Cognition — 80% Devin commits, spec-to-PR workflows, full VMs, agent memory at scale. (Latent Space)

Coding agent economics hit reality

Anthropic’s near-profitability, Cursor’s $3B ARR, and stories of companies shocked by LLM bills all point to the same thing: coding agents found product-market fit, and the bills are real. Simon Willison noted companies spending $200+/user/month on Claude Code. Pragmatic Engineer reported top-down efforts to rationalise AI token spend. The 62.5-minute cache rule for Claude’s pricing, DeepSeek V4 Pro’s permanent 75% price cut, and OpenRouter’s growth all reflect teams actively optimising inference costs. For your LiteLLM gateway, routing decisions now have direct P&L impact.

Anthropic and OpenAI found product-market fit — Companies spending $200+/user/month on coding agents; bills surprising finance teams. (Simon Willison)
Trend of cutting AI spend in eng departments — Top-down and bottom-up efforts to rationalise AI token spend emerging. (The Pragmatic Engineer (Gergely Orosz))
The 62.5-minute cache rule — Decision point for refreshing Claude’s prompt cache is always 62.5 minutes regardless of size. (TLDR AI)
DeepSeek V4 Pro 75% permanent price cut — DeepSeek permanently slashes V4 Pro pricing by 75%, pressuring frontier pricing. (TLDR AI)

Agent containment and security matures

Anthropic published a detailed breakdown of how they sandbox Claude across products — the most thorough documentation of agent containment to date. Vercel shipped Sandbox firewall proxying, Docker isolation, and Postgres connectivity. The TanStack npm supply chain attack forced OpenAI to respond with certificate rotation. Microsoft Copilot Cowork was caught exfiltrating files via prompt injection. The pattern is clear: as agents gain more autonomy, the containment boundary becomes the trust boundary. Teams running headless agents overnight need to treat sandboxing as a first-class engineering concern.

How we contain Claude across products — Detailed documentation of sandbox techniques across Claude.ai, Claude Code, and API. (TLDR AI)
OpenAI response to TanStack supply chain attack — Certificate rotation and security hardening after npm supply chain compromise. (OpenAI blog)
Microsoft Copilot Cowork exfiltrates files — Prompt injection allowed data exfiltration from Microsoft’s agentic product. (Simon Willison)
Vercel Sandbox firewall proxying — Route outbound agent traffic through controlled proxies with credential brokering. (Vercel blog)

GitLab Act 2 and agentic org design

GitLab’s Act 2 restructure — flatten management layers, ~60 smaller teams with end-to-end ownership, ‘if an agent can do it, automate it’ — is the clearest blueprint for agentic-era org design. GitLab 19.0 shipped alongside it with Developer Flow automating the full MR lifecycle. Cognition’s $26B valuation with 80% Devin commits validates the model. General Intelligence runs 5 engineers shipping 70+ commits/day each. The structural question for CTOs: how do you reorganise when agents handle the work that justified your current team shape?

GitLab Act 2 — Flattened 2-3 management layers; ~60 smaller teams; three new operating principles for agentic era. (GitLab blog)
GitLab 19.0 released — Developer Flow automates full MR lifecycle: feedback, conflicts, rebasing via single agent. (GitLab blog)
General Intelligence: 5 engineers, 70+ commits/day each — 8-person team with 90% SRE work automated; 4,000+ preview branches running. (Vercel blog)
Cognition Series D at $26B — Devin producing 80% of commits at some customers; raised $1B to expand. (TLDR AI)

What I’m watching into next month

MCP spec overhaul (July 28) — Breaking changes in the release candidate will require updating your in-house MCP servers before the final spec ships.
- MCP 2026-07-28 Release Candidate (TLDR AI)
- OpenAI Secure MCP Tunnel (TLDR AI)
xAI/Cursor acquisition dynamics — SpaceX’s $60B acquisition window opens soon; if it closes, Cursor’s roadmap independence changes materially.
- xAI warns staff to limit Cursor contact (TLDR AI)
- Cursor hits $3B ARR (TLDR AI)
Local model viability gap — Forge’s 8B→99% result and DeepSeek V4’s price cuts suggest hybrid local/cloud routing is increasingly viable for your stack.
- Forge: guardrails take 8B to 99% (Hacker News (AI))
- How far behind are open models? (TLDR AI)
Anthropic compute capacity constraints — SpaceX $45B deal, Microsoft Maia chip talks, and doubled rate limits signal capacity was the binding constraint — watch for pricing changes.
- Anthropic pays SpaceX $45B over 3 years (TLDR AI)
- Anthropic-Microsoft Maia chip talks (TLDR AI)

antirez/ds4

12.6k★ · C DeepSeek 4 Flash local inference engine for Metal and CUDA

BigPizzaV3/CodexPlusPlus

9.3k★ · Rust An enhanced tool for CodexApp, striving to make Codex better to use and more comfortable 一个CodexApp的增强工具，努力让Codex变得更好用更舒服

FULU-Foundation/OrcaSlicer-bambulab

6.7k★ · C++ no description

nexu-io/html-anything

5.5k★ · HTML · agent-skills agentic ai-agents ai-design ai-editor ✨ The agentic HTML editor — your local AI agent writes the HTML, you ship it. 🚀 75 Skills × 9 Surfaces (magazine · deck · poster · XHS / tweet · prototype · data report · Hyperframes) 🛡️ Sandboxed preview · 📤 1-click to WeChat / X / Zhihu / HTML / PNG 🔑 Zero API key — Claude Code / Cursor / Codex / Gemini / Copilot / OpenCode / Qwen / Aider.

darrylmorley/whatcable

5k★ · Swift · apple-silicon hardware-info iokit mac-app macos macOS menu bar app that tells you, in plain English, what each USB-C cable plugged into your Mac can actually do

V4bel/dirtyfrag

4.8k★ · C no description

vercel-labs/zerolang

4.8k★ · C The programming language for agents

vercel-labs/zero-native

4k★ · Zig Build desktop + mobile apps with Zig and web UI

perplexityai/bumblebee

4k★ · Go · golang package-inventory supply-chain-security Read-only developer endpoint scanner for on-disk package, extension, and developer-tool metadata, built to check exposure to known software supply-chain compromises.

simplifaisoul/osiris

3.8k★ · TypeScript Open Source Global Intelligence Platform - Real-Time OSINT Dashboard - A Palantir Alternative

Read this month

How we contain Claude across products

The most thorough public documentation of agent sandboxing architecture to date. Directly applicable to anyone running headless Claude Code sessions — covers isolation strength matched to oversight capacity, which is exactly the trust model for overnight agent factories.

Quote of the month

I think this is because OpenAI and Anthropic have both found product-market fit with coding/general-purpose agent products. Companies spending over $200 per month per user helps these businesses cover their costs much better than charging $10 to $20 per month per user.

— Simon Willison · link

Sources unavailable this month: r/ChatGPTCoding top, r/ClaudeAI top, r/LocalLLaMA top, r/MachineLearning top

Auto-curated monthly by Claude Opus 4.7 from A Smart Bear (Jason Cohen), Apple ML research, Ben’s Bites, Benedict Evans, Cursor changelog, Don’t Worry About the Vase (Zvi), Eugene Yan, Every — Chain of Thought (Dan Shipper), Exponential View (Azeem Azhar), GitLab blog, Google DeepMind blog, Hacker News (AI), Hugging Face blog, Import AI (Jack Clark), Interconnects (Nathan Lambert), JetBrains AI blog, LangChain blog, Last Week in AI, Latent Space, Lenny’s Newsletter, NVIDIA developer blog, Not Boring (Packy McCormick), One Useful Thing (Ethan Mollick), OpenAI blog, SaaStr (Jason Lemkin), Sebastian Raschka, Simon Willison, Sourcegraph blog, TLDR AI, The Algorithmic Bridge (Alberto Romero), The Pragmatic Engineer (Gergely Orosz), Together AI blog, Tomasz Tunguz, Understanding AI (Timothy B. Lee), Vercel blog, smol.ai news, swyx.io. Source list and editorial profile maintained by Daniel.