Reference
Router Cheatsheet
One-page quick reference for hybrid classification, token savings, routing rules, debugging, and key metrics
Router Architecture: Quick Reference Cheatsheet
🎯 One-Minute Overview
📊 Token Savings: By The Numbers
| Mechanism | Savings | Details |
|---|---|---|
| Hybrid Classification | 60% | 80% pattern match (0 tokens) + 20% LLM (300 tokens) |
| Batch Queuing | 55% | 3-5 messages in 1 call vs separate calls |
| Simple Agent | 40% | No planning overhead |
| Deduplication | 10% | Skip webhook retries (5-10% of requests) |
| Heartbeat Edits | 5% | Edit existing message, not send new |
| TOTAL | ~75% | 7,500 tokens vs 30,000 without router |
🚦 Classification Rules
Phase 1: Pattern Match (Zero Tokens)
Phase 2: LLM Classification (Only 20% of queries)
Returns JSON with:
type: simple | complexcategory: code | research | github | duyet | generalcomplexity: low | medium | highrequiresHumanApproval: booleanreasoning: string
Route Determination
🤖 Agent vs Worker
| Type | Called By | Tokens | Purpose |
|---|---|---|---|
| SimpleAgent | Router | 50-150 | Direct LLM, no planning |
| OrchestratorAgent | Router | 500-2000 | Plan + dispatch workers |
| HITLAgent | Router | 300-1000 | Confirmation flow |
| LeadResearcherAgent | Router | 1000-3000 | Parallel research agents |
| DuyetInfoAgent | Router | 100-300 | MCP info retrieval |
| CodeWorker | Orchestrator | N/A | Stateless code execution |
| ResearchWorker | Orchestrator | N/A | Stateless web search |
| GitHubWorker | Orchestrator | N/A | Stateless GitHub ops |
🔴 CRITICAL: Router ONLY dispatches to Agents. Workers are ONLY called by OrchestratorAgent.
🔄 Batch Processing Architecture
💰 Token Cost Examples
Example 1: Simple Query
Example 2: Semantic Query
Example 3: 3 Rapid Messages (Without Router)
Example 3: 3 Rapid Messages (With Router + Batching)
⚙️ Configuration
📈 Performance Targets
| Metric | Target | Status |
|---|---|---|
| Pattern match latency | <50ms | ✅ |
| LLM classification | 200-500ms | ✅ |
| Webhook to queue | <6ms | ✅ |
| Batch window | 500ms | ✅ |
| Stuck detection | 30s timeout | ✅ |
| Total P95 latency | <2s | ✅ |
| DO success rate | >99.9% | ✅ |
🐛 Debugging
🚨 Common Mistakes
❌ Blocking the webhook
❌ Dispatching workers from router
❌ Combining messages incorrectly
❌ Not recovering from stuck batches
📋 Monitoring Checklist
- Pattern match latency <50ms
- LLM classification only 15-20% of queries
- Batch size averaging 2-3 messages
- No stuck batches in past 24h
- Deduplication catching 5-10% of retries
- Token usage ~7,500/100 queries (not 30,000)
- Cost per query <$0.01
- Routing accuracy >95%
🔗 Key Files
| File | Purpose |
|---|---|
packages/cloudflare-agent/src/cloudflare-agent.ts | Main DO wrapper |
packages/cloudflare-agent/src/agents/router-agent.ts | Hybrid classifier |
packages/cloudflare-agent/src/routing/classifier.ts | Pattern + LLM logic |
packages/cloudflare-agent/src/batch-types.ts | Dual-batch implementation |
packages/cloudflare-agent/src/feature-flags.ts | Configuration |
packages/prompts/src/agents/router.ts | Classification prompt |
docs/architecture.md | Full architecture docs |
docs/multiagent-flows.html | Interactive dashboard |
docs/token-optimization-guide.md | Detailed token guide |
🎓 Learning Path
- Start here -> This cheatsheet (5 min read)
- Interactive view ->
docs/multiagent-flows.html(10 min explore) - Deep dive ->
docs/token-optimization-guide.md(20 min read) - Implementation ->
docs/architecture.md(30 min study) - Code review ->
packages/cloudflare-agent/src/(60 min exploration)
💡 Quick Stats
Last Updated: 2025-11-29 Router Version: 2.0 (Hybrid Classifier + Dual-Batch) Status: Production Ready ✅