Claude Code v2.1.139, Gemini 3.1 Flash-Lite GA, llama.cpp Parallel Drafting

Claude Code v2.1.139 ships agent view, /goal command for autonomous multi-turn sessions, and Remote Control integration.

Must read

Claude Code v2.1.139: Agent View and /goal command — Agent view gives a single dashboard of all running sessions; /goal enables autonomous multi-turn work — directly upgrades your overnight-agent-factory pattern.
Google ships Gemini 3.1 Flash-Lite in GA — Sub-second latency at low cost — a strong candidate for your LiteLLM routing layer on high-volume, latency-sensitive tasks.
LLM memory rewrites degrade agent performance — Directly relevant to persistent-memory layers in your agent stack: continuous memory consolidation can make agents worse than no memory at all.
Google’s SkillOS for Self-Evolving AI Agents — RL-trained skill curation from past experience maps to your agent-skills / progressive-disclosure framework thinking.
Spec-driven development at Notion — Notion’s spec-first workflow where agents code from specs mirrors your skills-framework approach — concrete patterns from a real engineering org.

Tools & Frameworks

Claude Code v2.1.139

Adds claude agents view (all sessions in one list), /goal command for autonomous multi-turn completion conditions, and scroll-speed tuning.

Why this matters: Core tool upgrade — /goal + agent view = better headless dispatch.

Cursor changelog – May 11

New changelog entry shipped; details sparse but includes Microsoft Teams integration reference.

Why this matters: Worth checking for agent-mode or model-selection changes.

LangGraph 1.2.0 stable

Adds durable error-handler resume across host crashes, set_node_defaults(), and delta channel checkpointing with Postgres/SQLite.

Why this matters: Durable crash recovery matters if you orchestrate long-running agents.

Running Codex safely at OpenAI

Documents OpenAI’s sandboxing, approval policies, and controlled environments for their Codex agent.

Why this matters: Useful reference for your own agent sandboxing patterns.

Vercel Sandbox firewall adds request proxying

Outbound sandbox traffic can now route through your own proxy with domain-level matchers and credential brokering.

Why this matters: Relevant if you run agentic code in Vercel sandboxes — adds security controls.

Open Models & Local

llama.cpp b9109: parallel drafting support

Adds parallel speculative drafting with unified spec context, async draft eval, and prompt caching — significant inference speed-up for local models.

Why this matters: Directly improves tok/s on Apple Silicon for your local coding LLMs.

Guide to local LLM inference hardware

Practical walkthrough of hardware choices for running local models, covering Apple Silicon, consumer GPUs, and cost trade-offs.

Why this matters: Reference for your local-plus-cloud hybrid routing decisions.

Allen AI EMO: Emergent Modularity in MoE

MoE model learns modular expert organisation from pretraining; tasks run on 12.5% of experts with near full-model performance.

Why this matters: Points toward more efficient local inference if applied to open MoE models.

Industry & Trends

Anthropic signs $1.8B Akamai deal for compute

Anthropic committed $1.8B over seven years to Akamai, adding to deals with CoreWeave, Amazon, Google, and xAI this month alone.

Why this matters: Signals Anthropic is capacity-constrained — context for Claude rate limits you experience.

GitLab restructures for the agentic era

GitLab cutting countries by 30%, restructuring around AI-native workflows — Simon Willison’s analysis of what it means for dev tooling.

Why this matters: Signals how platform vendors are reorganising around agentic development.

Shopify’s River: public-channel coding agent

Tobi Lütke describes Shopify’s internal agent that works only in public Slack channels — every conversation is observable by the org.

Why this matters: A concrete ‘context not control’ pattern for managing AI agents in teams.

ChatGPT 5.5 Pro produces PhD-level maths research

Fields Medallist Tim Gowers reports ChatGPT 5.5 Pro solved a novel research problem with no serious human mathematical input in about an hour.

Why this matters: Benchmark of frontier reasoning capability — watch but don’t act.

Mistral’s 20x ARR growth and sovereign positioning

Mistral crossed toward $1B ARR by targeting regulated, multinational enterprises wanting jurisdiction control and vendor diversification.

Why this matters: Relevant positioning model for RegTech/identity — your customers have similar concerns.

Labs overfitting models to their own harnesses

Big labs are training harness designs into models, improving narrow use cases but reducing generalisation and increasing lock-in.

Why this matters: Validates your multi-model routing via LiteLLM — avoid single-vendor lock-in.

Sources unavailable today: r/ChatGPTCoding top, r/ClaudeAI top, r/LocalLLaMA top, r/MachineLearning top

Auto-curated daily by Claude Opus 4.7 from Apple ML research, Cursor changelog, Don’t Worry About the Vase (Zvi), Exponential View (Azeem Azhar), GitHub: anthropics/claude-code, GitHub: ggml-org/llama.cpp, GitHub: langchain-ai/langchain, GitHub: langchain-ai/langgraph, Hugging Face blog, Import AI (Jack Clark), JetBrains AI blog, Latent Space, Lenny’s Newsletter, NVIDIA developer blog, OpenAI blog, Simon Willison, TLDR AI, The Algorithmic Bridge (Alberto Romero), Vercel blog, smol.ai news. Source list and editorial profile maintained by Daniel.