Gemini 3.5 Flash, Anthropic Acquires Stainless, Composer 2.5

Google I/O shipped Gemini 3.5 Flash to GA with improved agentic execution, while Anthropic acquired Stainless and Cursor released Composer 2.5.

Must read

Gemini 3.5 Flash: more expensive, but Google plan to use it for everything — Straight to GA with improved parallel agentic loops and reasoning traces — relevant for your LiteLLM routing decisions.
Anthropic Acquires SDK Startup Stainless — Stainless generates production MCP servers from OpenAPI specs; Anthropic now owns the MCP toolchain pipeline your in-house servers depend on.
Cursor Released Composer 2.5 — RL-trained coding agent with new distributed training — directly affects your daily Cursor workflow quality.
Claude Code v2.1.145 — claude agents --json enables scripting session pickers and status bars — useful for your overnight-agent-factory dispatch setup.
Forge: Guardrails take an 8B model from 53% to 99% on agentic tasks — Domain-agnostic reliability layer for local models; directly applicable to your Apple Silicon local-LLM agentic experiments.

Tools & Frameworks

Cursor Changelog May 19

Cursor shipped its May 19 changelog alongside the Composer 2.5 blog post detailing RL-based agent improvements.

Why this matters: Check for incremental fixes beyond the Composer 2.5 headline.

Cline CLI v3.0.9

Concurrent plugin loading, cached tool descriptors, and fuzzy @-mention file picker restore — startup speed improvements.

Why this matters: Relevant if evaluating Cline as a headless alternative.

Manus Scheduled Tasks 2.0

Tasks now run with persistent context across projects and apps, enabling continuity in automated workflows.

Why this matters: Comparable to your overnight-agent-factory pattern.

LangSmith Engine: An Agent for Improving Agents

LangChain details how they built an agent that iterates on other agents’ prompts and evals inside LangSmith.

Why this matters: Meta-agent eval pattern relevant to your team’s agent orchestration.

Agent Evaluation: A Detailed Guide

Comprehensive guide covering realistic harnesses, long-horizon testing, and outcome-oriented eval for production agents.

Why this matters: Directly applicable to verifying your 22,000-line-PR agent outputs.

Open Models & Local

Qwen3.7 Preview lands on Arena

Qwen3.7 Max Preview ranks 13th overall in Text Arena; Plus Preview ranks 16th in Vision Arena.

Why this matters: Tracks the Qwen family you run locally via Ollama.

Political censorship inside Qwen3.5-9B’s weights

Censorship is a small circuit layered on top of intact factual knowledge — can be read and disabled without fine-tuning.

Why this matters: Actionable if you deploy Qwen locally and need uncensored outputs.

llama.cpp b9235: MTP clean-up

Major MTP (multi-token prediction) clean-up: re-enables p-min with MTP drafts, fixes ngram spec acceptance logic.

Why this matters: MTP speculative decoding boosts local inference speed on Apple Silicon.

HRM-Text: 1B model trainable for ~$800

1B text-gen model trained on 8 H100s in ~50 hours using 130–600× less compute than standard foundation models.

Why this matters: Watch — interesting architecture efficiency, not yet coding-focused.

KV cache quantization benchmarks: q5 deserves more attention

Thorough PPL/KLD benchmarks on Qwen 3.6 27B at 64k–128k context show q5 outperforms TurboQuant; symmetric q8 wastes VRAM.

Why this matters: Directly useful for your local Qwen quantisation choices.

Industry & Trends

Google I/O 2026 Roundup: Flash, Spark, Antigravity 2.0

Covers Gemini 3.5 Flash GA, Spark background agents, Omni video model, and Antigravity 2.0 IDE features from I/O Day 1.

Why this matters: Single-page overview of everything Google shipped today.

Gemini 3.5 Flash on Vercel AI Gateway

Vercel AI Gateway now routes to Gemini 3.5 Flash with medium thinking level default and parallel agentic loops.

Why this matters: You deploy on Vercel — one-click access to the new model.

Together AI: Coding agent inference benchmarks

31% more TPS than TensorRT-LLM, 2× better TTFT at saturation, 76% lower cost than Claude Opus 4.6 for coding agents.

Why this matters: Cost/latency data relevant if you route coding tasks away from Anthropic.

AI’s impact on software engineers in 2026 (Part 2)

Gergely Orosz covers tradeoffs of AI tooling adoption at company level, with survey data on what’s changed in two years.

Why this matters: Useful framing for your own team’s adoption story and talks.

xAI launches Skills for Grok

Users can teach Grok persistent functions it remembers across interactions — similar to Claude’s skills/memory pattern.

Why this matters: Watch — validates the skills-framework pattern you’re building on.

Jury dismisses all Musk claims against OpenAI

Musk’s lawsuit against Altman/OpenAI dismissed — jury ruled he waited too long to file. Plans to appeal.

Why this matters: Removes an existential legal overhang from the OpenAI ecosystem.

Sources unavailable today: Eric Jang

Auto-curated daily by Claude Opus 4.7 from Ben’s Bites, Cursor changelog, Don’t Worry About the Vase (Zvi), GitHub: anthropics/claude-code, GitHub: cline/cline, GitHub: ggml-org/llama.cpp, Hacker News (AI), Hugging Face blog, LangChain blog, Latent Space, Lenny’s Newsletter, NVIDIA developer blog, OpenAI blog, Simon Willison, TLDR AI, The Algorithmic Bridge (Alberto Romero), The Pragmatic Engineer (Gergely Orosz), Together AI blog, Vercel blog, r/ClaudeAI top, r/LocalLLaMA top, r/MachineLearning top. Source list and editorial profile maintained by Daniel.