Anthropic $65B / IPO, Nemotron 3 Ultra, Claude Code Plugins
Friday, 5 June 2026 - Weekly AI Briefing · (last 7 days)
Anthropic dominated the week: a $65B Series H at $965B valuation, a confidential IPO filing, Opus 4.8 shipping with dynamic workflows and effort controls, and the claim that 80% of its production code is now Claude-authored. Meanwhile NVIDIA dropped Nemotron 3 Ultra (550B MoE, 55B active, 1M context) as the strongest US open-weights model, Microsoft launched its MAI model family at Build, and Claude Code gained auto-loaded plugins from .claude/skills — directly relevant to your skills-framework workflow. The capital intensity of the race is now undeniable: Alphabet announced an $80B stock raise purely for AI compute.
Launches & releases this week
Models
- Claude Opus 4.8 — Incremental Opus update with adjustable effort controls, dynamic workflows in Claude Code, and cheaper fast mode; tripled GPT-5.5’s ARC-AGI-3 score. (TLDR AI)
- Nemotron 3 Ultra — 550B MoE (55B active), 1M context, NVFP4 quant, 300+ tok/s, scoring 48 on Artificial Analysis Intelligence Index. (TLDR AI)
- MAI-Thinking-1 & MAI Family — Microsoft shipped 7 MAI models; Thinking-1 is 1T params (35B active MoE), 256K context, 97% AIME 2025. (TLDR AI)
- MiniMax M3 — Open-weights multimodal agent model with 1M-token context, sparse attention, and desktop computer-use capability. (TLDR AI)
- Grok Build 0.1 — xAI’s coding model in public beta: 100+ tok/s, $1/$2 per million tokens in/out, integrates with Cursor. (TLDR AI)
- Mellum2 12B MoE — JetBrains open-sourced a 12B MoE model (Apache 2.0) optimised for routing, sub-agents, and code workflows. (JetBrains AI blog)
- Qwen3.7-Plus — Multimodal agent model unifying vision/language for GUI+CLI operation in a single agent loop. (TLDR AI)
Features & Tools
- Claude Code Dynamic Workflows — Claude breaks tasks into parallel subtasks; Jarred Sumner rewrote Bun from Zig to Rust (750K lines) in 11 days. (TLDR AI)
- Claude Code v2.1.157 Plugins — Plugins in
.claude/skillsauto-load without marketplace;claude plugin initscaffolds new plugins. (GitHub: anthropics/claude-code) - Cursor 3.7 + Teams Pricing — Cursor 3.7 ships canvas improvements; new Premium seat tier and higher Teams limits for heavy agent users. (Cursor changelog)
- ChatGPT Dreaming V3 — New memory synthesis system for ChatGPT that consolidates preferences across conversations; rolling out to Plus/Pro. (TLDR AI)
Products
- OpenAI Models on AWS — OpenAI frontier models and Codex now GA on AWS Bedrock with native security and billing integration. (TLDR AI)
Deals & Partnerships
- Anthropic $65B Series H — $65B round at $965B post-money valuation; run-rate revenue crossed $47B. (TLDR AI)
- Anthropic Confidential IPO Filing — Anthropic submitted a confidential draft S-1 to the SEC for a proposed IPO. (TLDR AI)
- DeepSeek $7B Fundraise — DeepSeek reportedly raising $7B in its first external funding round. (TLDR AI)
- Alphabet $80B AI Raise — Alphabet selling $80B in stock (incl. $10B from Berkshire) to fund AI compute infrastructure. (TLDR AI)
Stories to follow
Agent cost & verification at scale
Uber capped Claude Code usage after blowing its AI budget in four months. Anthropic itself reports 80% AI-authored code with 8× volume per engineer. Cognition now runs more Devin sessions asynchronously than interactively, making verified-before-merge a hard requirement. The pattern: agent output is exploding, but cost control and verification infrastructure haven’t kept pace — exactly the leaf-node verification problem you’ve written about.
- Uber Caps Usage of AI Tools Like Claude Code to Manage Costs — Uber blew its 2026 AI budget in four months; now rate-limiting Claude Code. (Simon Willison)
- Anthropic says 80% of its production code is now authored by Claude — 8× code volume per engineer; raises verification and quality questions. (TLDR AI)
- Verifying Agentic Development at Scale — Cognition runs 10-20 parallel Devins with end-to-end tests as merge gate. (TLDR AI)
- Slow down to speed up when working with AI agents — Devs generating 2× more code than 6 months ago; quality/debt concerns rising. (The Pragmatic Engineer)
Open-weights models closing the gap
Nemotron 3 Ultra, MiniMax M3, Mellum2, and Qwen3.7-Plus all shipped in one week — each targeting agentic workloads with long context, tool use, or MoE efficiency. The open-weights tier is now explicitly competing on agent orchestration, not just chat benchmarks. For your LiteLLM gateway and local routing, the viable model menu just expanded significantly.
- Nemotron 3 Ultra — 550B MoE, 55B active, 1M context, 300+ tok/s — strongest US open model. (TLDR AI)
- MiniMax M3 — 1M-token context, sparse attention, frontier coding and desktop computer-use. (TLDR AI)
- Mellum2 12B MoE open-sourced — 12B MoE under Apache 2.0; designed for routing, sub-agents, and low-latency code. (JetBrains AI blog)
- How far behind are open models? — Open models trail frontier by 4-6 months on public benchmarks; gap widening slightly. (TLDR AI)
Claude Code hardening for teams
This week’s Claude Code releases (v2.1.157–165) added auto-loaded plugins from .claude/skills, managed version pinning (requiredMinimumVersion), OTEL metric labels for team/repo slicing, auto mode on Bedrock/Vertex, and security prompts before writing to shell startup files. Taken together, these are enterprise-governance features — exactly what you need for overnight-agent-factory setups where multiple headless agents run unsupervised.
- Claude Code v2.1.157 — Auto-loaded plugins from
.claude/skills;claude plugin initscaffolding. (GitHub: anthropics/claude-code) - Claude Code v2.1.163 — Version pinning via managed settings;
/plugin listcommand; hook return values. (GitHub: anthropics/claude-code) - Claude Code v2.1.161 — OTEL resource attributes as metric labels;
claude agentsshows fan-out progress. (GitHub: anthropics/claude-code) - Claude Code v2.1.158 — Auto mode now available on Bedrock, Vertex, and Foundry for Opus 4.7/4.8. (GitHub: anthropics/claude-code)
What I’m watching
- Anthropic recursive self-improvement — Anthropic publicly describes AI systems designing successors — if real, the iteration speed of frontier models accelerates beyond human-paced release cycles.
- When AI builds itself (TLDR AI)
- Claude Oceanus (Mythos successor) in red-team (TLDR AI)
- Local-plus-cloud hybrid routing — Perplexity shipped hybrid local/cloud inference; Tomasz Tunguz reports 78% of his AI work now runs on-device — validates the routing architecture you’re building with LiteLLM.
- The Data Center Moves to Your Machine (TLDR AI)
- The Minimill of AI (Tomasz Tunguz)
- Ladybird bans AI-generated PRs — First major open-source project to explicitly reject AI-authored pull requests — signals growing tension between AI-generated code volume and maintainer trust.
- Ladybird will no longer accept public pull requests (Simon Willison)
Top trending GitHub repos this week
pewdiepie-archdaemon/odysseus
54.3k★ · Python Self-hosted AI workspace.
zgwl/chinese-buy-us-stock-guide
3.2k★ 美股指南
Gloridust/WechatOnCloud
2.1k★ · TypeScript 云微WOC,云微信,自由连接
b-nnett/goose
2.1k★ · Rust Goose Swift proof-of-concept README
asz798838958/aBaiAutoplus
1.5k★ · Python 多平台 AI 账号自动注册与管理 · 协议化付款一键开通 ChatGPT Plus
Read this weekend
Slow down to speed up when working with AI agents
Gergely Orosz directly addresses the quality/debt problem you face when agents generate 2× more code than six months ago. Practical framing of when to throttle agent output — the management-layer complement to your leaf-nodes verification thinking.
Quote of the week
A substantial patch used to imply substantial effort, and that effort was a reasonable proxy for good faith. That assumption no longer holds.
— Andreas Kling, Ladybird · link
Sources unavailable this week: r/ChatGPTCoding top, r/ClaudeAI top, r/LocalLLaMA top, r/MachineLearning top
Auto-curated weekly by Claude Opus 4.7 from Ben’s Bites, Cursor changelog, Don’t Worry About the Vase (Zvi), Exponential View (Azeem Azhar), GitHub: BerriAI/litellm, GitHub: anthropics/claude-code, GitHub: cline/cline, GitHub: ggml-org/llama.cpp, GitHub: huggingface/transformers, GitHub: langchain-ai/langchain, GitHub: langchain-ai/langgraph, GitHub: ollama/ollama, GitHub: vllm-project/vllm, Hugging Face blog, Import AI (Jack Clark), Interconnects (Nathan Lambert), JetBrains AI blog, LangChain blog, Latent Space, Lenny’s Newsletter, NVIDIA developer blog, Not Boring (Packy McCormick), One Useful Thing (Ethan Mollick), OpenAI blog, SaaStr (Jason Lemkin), Simon Willison, TLDR AI, The Algorithmic Bridge (Alberto Romero), The Pragmatic Engineer (Gergely Orosz), Together AI blog, Tomasz Tunguz, Understanding AI (Timothy B. Lee), Vercel blog, smol.ai news. Source list and editorial profile maintained by Daniel.