Soham Shah
Designing the interface between human intent and machine autonomy.
Building agent systems that compress context, coordinate specialists, and sustain coherence across long-horizon tasks.
Routing the Routers
A technical survey of where LLM model routing actually sits in production. Seven research streams (learned routers, cascades, capability matrices, self-routing, cache-aware scheduling, agent dispatch, vendor gateways) mapped onto one stack, with an off-policy estimation playbook for evaluating routers on logged traffic, an article-by-article EU AI Act mapping for the audit surface, and a closing list of experiments worth running.
The Shape of a Working Multi-Agent System
"Should I use a multi-agent system?" is the wrong question. A long, opinionated technical synthesis from ~200 primary sources — eight parts on context, write authority, orchestration, protocols, control plane, and a reference architecture you can build today. Plus six staked bets for May 2026 → May 2027.
Self-Evolving Skills in Multi-Agent Systems
Skills became the npm of agentic AI in eight weeks. A survey of that explosion — and what an optimal architecture looks like.
Context Compaction for Long-Running Agents
Agent sessions degrade as context grows -- not at the limit, but continuously. A research-grounded design for keeping context lean and high-signal, synthesizing the latest work from 40+ papers.
Specialized Subagent Architecture
From generic delegation to context-aware multi-agent coordination -- how to decompose, delegate, persist, and train specialized agents.