Claude Code v2.1.144, Cursor Composer 2.5, Deep Agents v0.6

Claude Code v2.1.144 ships /resume for background sessions, and Anthropic engineers explain their Claude Code workflows on Lenny’s Podcast.

Must read

Claude Code v2.1.144: /resume for background sessions — /resume now surfaces bg sessions alongside interactive ones — directly upgrades your overnight-agent-factory workflow with session continuity.
HTML is the new Markdown: How Anthropic engineers build with Claude Code — Anthropic’s own Claude Code engineer describes micro-apps for spec editing and becoming a ‘compute allocator’ — validates your skills-framework thinking.
Cursor Changelog: Composer 2.5 — You run Cursor daily; Composer updates directly affect your team’s agent-mode workflows.
The 62.5-minute rule for Claude’s prompt cache — Concrete decision rule for your LiteLLM gateway: refresh cache before 62.5 min or let it expire — same breakeven across all Claude models.
Deep Agents v0.6: code interpreter, ContextHub, streaming v3 — Ships model-specific profiles for Anthropic/OpenAI/Google with 10–20 point tau2-bench gains — relevant if you evaluate orchestration alternatives to in-house.

Tools & Frameworks

Headroom: context compression for agents

Compresses everything an agent reads before it hits the LLM, producing equivalent answers at a fraction of the tokens.

Why this matters: Reduces cost/latency for your long-context Claude Code sessions.

InsForge: open-source Heroku for coding agents

Apache 2.0 platform (YC P26) lets coding agents deploy, operate, and debug end-to-end — targets the agent-to-production gap.

Why this matters: Potential dispatch infra for your headless overnight agents.

CrewAI 1.14.5: deprecates CrewAgentExecutor

Defaults to AgentExecutor, adds restore_from_state_id kickoff parameter and skills loading events for traces.

Why this matters: State restore aligns with your session-resumption patterns.

GitLab: Why governance matters for AI agents beyond BYOK

Argues model selection is insufficient — governance of agent actions across the SDLC pipeline is the harder problem.

Why this matters: Frames the control layer you need above vibe coding agents.

Open Models & Local

Apple Silicon LLM inference costs more than OpenRouter

Analysis shows OpenRouter is ~⅓ the price at ~2× the speed vs local Apple Silicon for comparable models.

Why this matters: Challenges the economics of your local-LLM setup; routing to cloud may save money.

DeepSeek-V4-Flash makes LLM steering interesting again

Steering vectors — manipulating activations mid-flight — become practical again with DeepSeek-V4-Flash’s architecture.

Why this matters: Watch-but-don’t-act: could matter if you run DeepSeek locally.

Lighthouse Attention: up to 17× faster long-context attention

Selection-based hierarchical attention achieves 1.4–1.7× pretraining speedup and 17× faster passes at large contexts using FlashAttention on dense sub-sequences.

Why this matters: If adopted in open models, extends viable local context windows on your Mac.

Industry & Trends

OpenAI + Dell: Codex on-premise

Partnership brings Codex to hybrid and on-prem enterprise environments for secure deployment across data and workflows.

Why this matters: Signals Codex moving toward regulated-industry use cases like your RegTech domain.

Codex to gain Computer Use on locked Macs

OpenAI is building capability for Codex to operate macOS apps via Computer Use even when the laptop is locked or asleep.

Why this matters: Directly relevant to overnight-agent patterns — removes the unlocked-session constraint.

Simon Willison: last 6 months of LLMs in 5 minutes

Annotated PyCon US 2026 lightning talk slides covering the key LLM developments from late 2025 through mid-2026.

Why this matters: Good calibration piece — Simon’s framing is practitioner-grade.

AI economics: GPU scaling doesn’t scale compute linearly

Analysis of GPU demand/supply dynamics argues efficiency matters more than raw scale given finite supply — scaling GPUs hits diminishing returns.

Why this matters: Context for cost decisions on your LiteLLM gateway routing.

Gemini app adds ‘Extended’ thinking level

Google rolling out configurable thinking levels (including Extended) for Gemini 3.1 Pro, plus third-party app integrations.

Why this matters: If you route to Gemini via LiteLLM, extended thinking may improve complex reasoning tasks.

Sources unavailable today: r/ChatGPTCoding top, r/ClaudeAI top, r/LocalLLaMA top, r/MachineLearning top

Auto-curated daily by Claude Opus 4.7 from Cursor changelog, Don’t Worry About the Vase (Zvi), Exponential View (Azeem Azhar), GitHub: anthropics/claude-code, GitHub: crewAIInc/crewAI, GitHub: ggml-org/llama.cpp, GitHub: langchain-ai/langchain, GitLab blog, Hacker News (AI), Hugging Face blog, Import AI (Jack Clark), LangChain blog, Latent Space, Lenny’s Newsletter, OpenAI blog, Simon Willison, TLDR AI, The Algorithmic Bridge (Alberto Romero), Vercel blog, smol.ai news. Source list and editorial profile maintained by Daniel.