Skip to content

← AI Tracker

Digest AI Mensuel

Anthropic's $965B Valuation Month, Gemini 3.5 Flash Ships, Dynamic Workflows Go Parallel

dimanche 31 mai 2026 - Briefing AI Mensuel · mai 2026

May belonged to Anthropic. A $65B Series H at $965B valuation, first profitable quarter in sight, Opus 4.8 shipped, and Dynamic Workflows in Claude Code — orchestrating hundreds of parallel subagents — landed in research preview. Jarred Sumner rewrote Bun from Zig to Rust in 11 days using it. For teams building with agents, this is the month the overnight-agent-factory pattern got official infrastructure.

Google countered at I/O with Gemini 3.5 Flash (strong agentic coding, 4× faster than 3.1 Pro) and Agent Executor as an open-source distributed runtime. Cursor hit $3B ARR, shipped Composer 2.5 with RL-trained agents and Auto Review, while OpenAI pushed Codex into mobile, Windows sandboxing, and enterprise partnerships. The practical upshot: the harness layer — not the model — is now the competitive surface for coding agents.

Launches & releases this month

Models

  • Claude Opus 4.8 — Incremental Opus upgrade with adjustable effort controls, faster mode at ~2.5× output speed, same API price. (TLDR AI)
  • Gemini 3.5 Flash — Google’s new agentic-focused model: improved coding, parallel execution loops, 4× faster than 3.1 Pro. (TLDR AI)
  • GPT-5.5 Instant — New ChatGPT/API default model with reduced hallucinations, improved factuality, and personalisation controls. (OpenAI blog)
  • Qwen3.7 Max — Alibaba’s proprietary agent-foundation model tops Terminal-Bench 2.0, SWE-Pro, and MCP-Mark benchmarks. (TLDR AI)

Features & Tools

  • Claude Code Dynamic Workflows — Research preview: Claude Code breaks tasks into subtasks with hundreds of parallel subagents converging on results. (TLDR AI)
  • Cursor Auto Review — Cursor 3.6 adds automated code review integrated into the agent workflow. (Cursor changelog)
  • OpenAI Secure MCP Tunnel — Connects private MCP servers to OpenAI products via outbound HTTPS tunnels without public exposure. (TLDR AI)
  • Codex Mobile + Windows Sandbox — Codex tasks manageable from ChatGPT mobile; Windows sandbox with controlled file access and network limits. (OpenAI blog)
  • Vercel Sandbox Docker Support — Agents can now build and run Docker containers inside Vercel Sandbox without touching the host system. (Vercel blog)

Products

  • Grok Build CLI — xAI’s coding agent CLI in beta with plan mode, headless execution, and specialised subagents. (TLDR AI)
  • LangSmith Engine — Watches production traces, clusters failures into named issues, proposes targeted fixes and eval coverage automatically. (LangChain blog)
  • Warp Oz Multi-Harness Control — Single pane of glass managing Claude Code, Codex, and Warp agents with cross-harness memory and cost controls. (TLDR AI)

Deals & Partnerships

  • Anthropic $65B Series H — $65B round at $965B post-money valuation; run-rate revenue crossed $47B; first profitable quarter imminent. (TLDR AI)
  • Cognition $1B Series D — Devin maker raises $1B at $26B valuation; 80% of commits now from Devin at some customers. (TLDR AI)
  • Karpathy Joins Anthropic — Andrej Karpathy joined Anthropic for frontier R&D, citing the next few years as especially formative. (TLDR AI)
  • OpenRouter $1.3B Series B — AI gateway processing 100T tokens/month raises $113M at $1.3B; validates multi-model routing as infrastructure. (TLDR AI)
  • Cursor $3B ARR — Cursor hit $3B annualised revenue with 3,000+ customers paying $100K+; SpaceX acquisition window opens soon. (TLDR AI)

Research

  • DeepSWE Benchmark — Contamination-free SWE benchmark: 91 repos, 5 languages, sharper separation than SWE-Bench Pro. (TLDR AI)

Other Releases

  • Cursor Composer 2.5 — RL-trained coding agent with synthetic data and distributed training; ships in Cursor 3.5. (TLDR AI)
  • MCP Spec Release Candidate — Largest MCP revision: stateless HTTP core, OAuth/OIDC auth, extensions system, formal deprecation policy; ships July 28. (TLDR AI)
  • LangSmith Sandboxes GA — Kernel-isolated microVMs with snapshots, parallel forks, and auth proxies for running agents safely in production. (LangChain blog)
  • Deep Agents v0.6 — Code interpreter, harness profiles per model family, delta channels for O(1) checkpointing, and streaming v3. (LangChain blog)
  • Google Agent Executor — Open-source distributed runtime for durable, long-running agent workflows with session consistency and trajectory branching. (TLDR AI)
  • Forge: 8B Model → 99% — Open-source guardrails layer takes an 8B local model from 53% to 99% on multi-step agentic tasks without model changes. (Hacker News (AI))

Stories of the month

The harness is the product now

May made explicit what’s been building for months: the model is table stakes; the harness — sandboxing, memory, orchestration, eval loops — is where differentiation lives. LangChain shipped Sandboxes GA and Engine (auto-triaging production failures). Vercel added Docker-in-Sandbox and Claude Managed Agent support. Warp launched Oz as a multi-harness control plane. Cursor’s Composer 2.5 is RL-trained specifically as a harness-level improvement. For your team routing through LiteLLM, this means the gateway alone isn’t enough — the execution environment and feedback loop around it matter more than which model sits behind it.

Parallel agents at production scale

Dynamic Workflows in Claude Code, Conductor on Vercel Sandbox, and Deep Agents v0.6 all shipped infrastructure for running fleets of agents in parallel. The pattern is converging: break a large task into subtasks, spin up isolated agents per subtask, converge results. Jarred Sumner’s 750K-line Bun rewrite in 11 days is the proof point. This directly enables the overnight-agent-factory workflow — dispatch N agents before sleep, review converged output in the morning. The missing piece is still verification at scale, but the orchestration layer is now real.

Coding agent economics hit reality

Anthropic’s near-profitability, Cursor’s $3B ARR, and stories of companies shocked by LLM bills all point to the same thing: coding agents found product-market fit, and the bills are real. Simon Willison noted companies spending $200+/user/month on Claude Code. Pragmatic Engineer reported top-down efforts to rationalise AI token spend. The 62.5-minute cache rule for Claude’s pricing, DeepSeek V4 Pro’s permanent 75% price cut, and OpenRouter’s growth all reflect teams actively optimising inference costs. For your LiteLLM gateway, routing decisions now have direct P&L impact.

Agent containment and security matures

Anthropic published a detailed breakdown of how they sandbox Claude across products — the most thorough documentation of agent containment to date. Vercel shipped Sandbox firewall proxying, Docker isolation, and Postgres connectivity. The TanStack npm supply chain attack forced OpenAI to respond with certificate rotation. Microsoft Copilot Cowork was caught exfiltrating files via prompt injection. The pattern is clear: as agents gain more autonomy, the containment boundary becomes the trust boundary. Teams running headless agents overnight need to treat sandboxing as a first-class engineering concern.

GitLab Act 2 and agentic org design

GitLab’s Act 2 restructure — flatten management layers, ~60 smaller teams with end-to-end ownership, ‘if an agent can do it, automate it’ — is the clearest blueprint for agentic-era org design. GitLab 19.0 shipped alongside it with Developer Flow automating the full MR lifecycle. Cognition’s $26B valuation with 80% Devin commits validates the model. General Intelligence runs 5 engineers shipping 70+ commits/day each. The structural question for CTOs: how do you reorganise when agents handle the work that justified your current team shape?

What I’m watching into next month

antirez/ds4

12.6k★ · C DeepSeek 4 Flash local inference engine for Metal and CUDA

BigPizzaV3/CodexPlusPlus

9.3k★ · Rust An enhanced tool for CodexApp, striving to make Codex better to use and more comfortable 一个CodexApp的增强工具,努力让Codex变得更好用更舒服

FULU-Foundation/OrcaSlicer-bambulab

6.7k★ · C++ no description

nexu-io/html-anything

5.5k★ · HTML · agent-skills agentic ai-agents ai-design ai-editor ✨ The agentic HTML editor — your local AI agent writes the HTML, you ship it. 🚀 75 Skills × 9 Surfaces (magazine · deck · poster · XHS / tweet · prototype · data report · Hyperframes) 🛡️ Sandboxed preview · 📤 1-click to WeChat / X / Zhihu / HTML / PNG 🔑 Zero API key — Claude Code / Cursor / Codex / Gemini / Copilot / OpenCode / Qwen / Aider.

darrylmorley/whatcable

5k★ · Swift · apple-silicon hardware-info iokit mac-app macos macOS menu bar app that tells you, in plain English, what each USB-C cable plugged into your Mac can actually do

V4bel/dirtyfrag

4.8k★ · C no description

vercel-labs/zerolang

4.8k★ · C The programming language for agents

vercel-labs/zero-native

4k★ · Zig Build desktop + mobile apps with Zig and web UI

perplexityai/bumblebee

4k★ · Go · golang package-inventory supply-chain-security Read-only developer endpoint scanner for on-disk package, extension, and developer-tool metadata, built to check exposure to known software supply-chain compromises.

simplifaisoul/osiris

3.8k★ · TypeScript Open Source Global Intelligence Platform - Real-Time OSINT Dashboard - A Palantir Alternative

Read this month

How we contain Claude across products

The most thorough public documentation of agent sandboxing architecture to date. Directly applicable to anyone running headless Claude Code sessions — covers isolation strength matched to oversight capacity, which is exactly the trust model for overnight agent factories.

Quote of the month

I think this is because OpenAI and Anthropic have both found product-market fit with coding/general-purpose agent products. Companies spending over $200 per month per user helps these businesses cover their costs much better than charging $10 to $20 per month per user.

Simon Willison · link


Sources unavailable this month: r/ChatGPTCoding top, r/ClaudeAI top, r/LocalLLaMA top, r/MachineLearning top

Auto-curated monthly by Claude Opus 4.7 from A Smart Bear (Jason Cohen), Apple ML research, Ben’s Bites, Benedict Evans, Cursor changelog, Don’t Worry About the Vase (Zvi), Eugene Yan, Every — Chain of Thought (Dan Shipper), Exponential View (Azeem Azhar), GitLab blog, Google DeepMind blog, Hacker News (AI), Hugging Face blog, Import AI (Jack Clark), Interconnects (Nathan Lambert), JetBrains AI blog, LangChain blog, Last Week in AI, Latent Space, Lenny’s Newsletter, NVIDIA developer blog, Not Boring (Packy McCormick), One Useful Thing (Ethan Mollick), OpenAI blog, SaaStr (Jason Lemkin), Sebastian Raschka, Simon Willison, Sourcegraph blog, TLDR AI, The Algorithmic Bridge (Alberto Romero), The Pragmatic Engineer (Gergely Orosz), Together AI blog, Tomasz Tunguz, Understanding AI (Timothy B. Lee), Vercel blog, smol.ai news, swyx.io. Source list and editorial profile maintained by Daniel.