Skip to content

← AI Tracker

AI Briefing

MCP Spec Overhaul, Anthropic Mythos 1, DeepSeek V4 Price Cut

Dienstag, 26. Mai 2026 - AI News · (letzte 24h)

The next MCP specification release candidate drops with a stateless HTTP core, OAuth alignment, and breaking changes shipping 28 July.

Must read

Tools & Frameworks

Reasonix: DeepSeek-native terminal coding agent

Terminal coding agent engineered for prefix-cache stability; designed to run unattended with low token costs across long sessions.

Why this matters: Potential alternative to Claude Code for DeepSeek-routed headless agent tasks.

Perplexity open-sources Bumblebee security scanner

Read-only scanner that identifies risky packages, extensions, and AI tool configs on developer machines — now open source.

Why this matters: Useful for auditing your team’s MCP server and extension surface area.

Anthropic plans Claude Memory Files

Memory Files distributes notes across multiple structured documents organised by topic, project, or context.

Why this matters: Structured persistent memory aligns with your .md-based agent memory patterns.

Cline v3.85.0: DeepSeek V4, Gemini 3.5 Flash support

Adds DeepSeek V4 Flash/Pro, Gemini 3.5 Flash, GPT-5.5 on SAP AI Core, and fixes Vertex Claude routing.

Why this matters: Model roster update if any of your team uses Cline alongside Cursor.

Open Models & Local

Gemini 3.5 Flash (Low) outperforms (High) on SWE tasks

Gemini 3.5 Flash (Low) generates ~45% fewer tokens than Medium and generally beats High on SWE benchmarks.

Why this matters: Counter-intuitive routing signal: lower thinking budget = better code — relevant for your LiteLLM config.

MiMo-V2.5-coder Q2 quantisation released

Q2 quant of MiMo-V2.5-coder fits in 128GB RAM; reported as strong alternative to Qwen3.6 and DS4 for coding with reliable tool calling.

Why this matters: If you have a Mac Studio with 128GB, this is a credible local coding model with agentic tool-call support.

Qwen3.6 35B A3B: current king for local agentic use?

Community consensus: Qwen3.6 35B MoE (IQ4_NL) leads for local agentic tool-calling; Gemma4 and GLM 4.7 Flash loop or break.

Why this matters: Practical field report for choosing your local model in hybrid routing setups.

NuExtract3: 4B VLM for OCR and structured extraction (Apache-2.0)

4B open-weight model based on Qwen3.5-4B; handles PDFs, forms, tables → Markdown extraction, Apache-2.0 licensed.

Why this matters: Directly useful for identity/RegTech document processing pipelines you could self-host.

llama.cpp b9310: checkpoint fix for agentic sessions

Fixes checkpoint creation so long agentic sessions don’t stall on trivial follow-up prompts — critical for tools like opencode.

Why this matters: If you run local models for agentic coding, this eliminates a painful stall bug.

How the engineer behind Claude Cowork actually uses it

Felix Rieseberg (Anthropic) demos building 3D house walkthroughs from floor plans, auto-tracking promises, and a $20 hardware buddy with Claude.

Why this matters: First-party insight into Anthropic’s own dogfooding of Claude Cowork — patterns you can steal.

METR AI time horizons graph contains severe errors

NYU researcher documents numerous compounding flaws in the widely-cited METR Long Tasks benchmark for coding capabilities.

Why this matters: If you’ve cited METR internally to justify agentic investment, the methodology is now contested.

Hugging Face: AI Agent Terms Worth Getting Right

Defines harness, scaffold, and other agent terminology with precision — useful shared vocabulary for cross-team alignment.

Why this matters: Helpful reference when your team debates agent architecture boundaries.


Auto-curated daily by Claude Opus 4.7 from Exponential View (Azeem Azhar), GitHub: BerriAI/litellm, GitHub: cline/cline, GitHub: ggml-org/llama.cpp, Hugging Face blog, Lenny’s Newsletter, OpenAI blog, Simon Willison, TLDR AI, The Algorithmic Bridge (Alberto Romero), r/ClaudeAI top, r/LocalLLaMA top, r/MachineLearning top. Source list and editorial profile maintained by Daniel.