Claude Code v2.1.139, Gemini 3.1 Flash-Lite GA, llama.cpp Parallel Drafting
Tuesday, 12 May 2026 - AI News · (last 24h)
Claude Code v2.1.139 ships agent view, /goal command for autonomous multi-turn sessions, and Remote Control integration.
Must read
- Claude Code v2.1.139: Agent View and /goal command — Agent view gives a single dashboard of all running sessions; /goal enables autonomous multi-turn work — directly upgrades your overnight-agent-factory pattern.
- Google ships Gemini 3.1 Flash-Lite in GA — Sub-second latency at low cost — a strong candidate for your LiteLLM routing layer on high-volume, latency-sensitive tasks.
- LLM memory rewrites degrade agent performance — Directly relevant to persistent-memory layers in your agent stack: continuous memory consolidation can make agents worse than no memory at all.
- Google’s SkillOS for Self-Evolving AI Agents — RL-trained skill curation from past experience maps to your agent-skills / progressive-disclosure framework thinking.
- Spec-driven development at Notion — Notion’s spec-first workflow where agents code from specs mirrors your skills-framework approach — concrete patterns from a real engineering org.
Tools & Frameworks
Claude Code v2.1.139
Adds claude agents view (all sessions in one list), /goal command for autonomous multi-turn completion conditions, and scroll-speed tuning.
Why this matters: Core tool upgrade — /goal + agent view = better headless dispatch.
Cursor changelog – May 11
New changelog entry shipped; details sparse but includes Microsoft Teams integration reference.
Why this matters: Worth checking for agent-mode or model-selection changes.
LangGraph 1.2.0 stable
Adds durable error-handler resume across host crashes, set_node_defaults(), and delta channel checkpointing with Postgres/SQLite.
Why this matters: Durable crash recovery matters if you orchestrate long-running agents.
Running Codex safely at OpenAI
Documents OpenAI’s sandboxing, approval policies, and controlled environments for their Codex agent.
Why this matters: Useful reference for your own agent sandboxing patterns.
Vercel Sandbox firewall adds request proxying
Outbound sandbox traffic can now route through your own proxy with domain-level matchers and credential brokering.
Why this matters: Relevant if you run agentic code in Vercel sandboxes — adds security controls.
Open Models & Local
llama.cpp b9109: parallel drafting support
Adds parallel speculative drafting with unified spec context, async draft eval, and prompt caching — significant inference speed-up for local models.
Why this matters: Directly improves tok/s on Apple Silicon for your local coding LLMs.
Guide to local LLM inference hardware
Practical walkthrough of hardware choices for running local models, covering Apple Silicon, consumer GPUs, and cost trade-offs.
Why this matters: Reference for your local-plus-cloud hybrid routing decisions.
Allen AI EMO: Emergent Modularity in MoE
MoE model learns modular expert organisation from pretraining; tasks run on 12.5% of experts with near full-model performance.
Why this matters: Points toward more efficient local inference if applied to open MoE models.
Industry & Trends
Anthropic signs $1.8B Akamai deal for compute
Anthropic committed $1.8B over seven years to Akamai, adding to deals with CoreWeave, Amazon, Google, and xAI this month alone.
Why this matters: Signals Anthropic is capacity-constrained — context for Claude rate limits you experience.
GitLab restructures for the agentic era
GitLab cutting countries by 30%, restructuring around AI-native workflows — Simon Willison’s analysis of what it means for dev tooling.
Why this matters: Signals how platform vendors are reorganising around agentic development.
Shopify’s River: public-channel coding agent
Tobi Lütke describes Shopify’s internal agent that works only in public Slack channels — every conversation is observable by the org.
Why this matters: A concrete ‘context not control’ pattern for managing AI agents in teams.
ChatGPT 5.5 Pro produces PhD-level maths research
Fields Medallist Tim Gowers reports ChatGPT 5.5 Pro solved a novel research problem with no serious human mathematical input in about an hour.
Why this matters: Benchmark of frontier reasoning capability — watch but don’t act.
Mistral’s 20x ARR growth and sovereign positioning
Mistral crossed toward $1B ARR by targeting regulated, multinational enterprises wanting jurisdiction control and vendor diversification.
Why this matters: Relevant positioning model for RegTech/identity — your customers have similar concerns.
Labs overfitting models to their own harnesses
Big labs are training harness designs into models, improving narrow use cases but reducing generalisation and increasing lock-in.
Why this matters: Validates your multi-model routing via LiteLLM — avoid single-vendor lock-in.
Sources unavailable today: r/ChatGPTCoding top, r/ClaudeAI top, r/LocalLLaMA top, r/MachineLearning top
Auto-curated daily by Claude Opus 4.7 from Apple ML research, Cursor changelog, Don’t Worry About the Vase (Zvi), Exponential View (Azeem Azhar), GitHub: anthropics/claude-code, GitHub: ggml-org/llama.cpp, GitHub: langchain-ai/langchain, GitHub: langchain-ai/langgraph, Hugging Face blog, Import AI (Jack Clark), JetBrains AI blog, Latent Space, Lenny’s Newsletter, NVIDIA developer blog, OpenAI blog, Simon Willison, TLDR AI, The Algorithmic Bridge (Alberto Romero), Vercel blog, smol.ai news. Source list and editorial profile maintained by Daniel.