Token Optimization Guide
Deep dive into hybrid classification, dual-batch queuing, and agent routing achieving 75% token reduction
Token Optimization Guide: duyetbot-agent Hybrid Router
Overview
This guide demonstrates how duyetbot-agent achieves ~75% token reduction through intelligent routing and batch processing, even when handling the same volume of queries.
Key Statistics
Architecture: Three-Tier Token Optimization
Tier 1: Hybrid Classification (Pattern -> LLM)
Problem: Every query classification costs 200-300 LLM tokens.
Solution: Use pattern matching for common queries, only fall back to LLM for semantic analysis.
Pattern Matching (Zero Tokens)
Token Savings:
- 80% of queries (greetings, commands, approvals): 0 tokens
- 20% of queries (semantic analysis): 300 tokens
- Average per query: 60 tokens (vs 300 without router)
- Savings: 80% on classification alone
LLM Classification (When Needed)
Cost per LLM classification: ~300 tokens (only 20% of the time)
Tier 2: Dual-Batch Queuing (Message Combining)
Problem: Rapid messages create multiple LLM calls instead of one.
Solution: Collect messages in a 500ms window, combine them into a single LLM call.
Without Batching (3 Tokens × Overhead)
With Dual-Batch Queuing (1 Combined Call)
Processing Flow:
Token Savings Example:
Benefits:
- No per-call overhead — Context setup happens once
- Better context — LLM sees full conversation at once
- Faster to first token — User sees thinking indicator sooner
- Automatic recovery — Stuck batch detected after 30s no heartbeat
Tier 3: Simple Agent Routing (Skip Planning)
Problem: Complex agents do planning + execution = more tokens.
Solution: Route simple queries to SimpleAgent for direct LLM response.
SimpleAgent: Direct LLM (No Planning)
OrchestratorAgent: Planning + Execution (More Tokens)
Token Cost Comparison:
| Agent Type | Typical Tokens | Use Case |
|---|---|---|
| SimpleAgent | 50-150 | Greetings, FAQs, simple Q&A |
| HITLAgent | 300-1000 | Confirmations, approvals |
| DuyetInfoAgent | 100-300 | Personal info (with MCP) |
| LeadResearcherAgent | 1000-3000 | Multi-agent research |
| OrchestratorAgent | 500-2000 | Complex planning tasks |
Routing Decision Tree:
Tier 4: Deduplication (Skip Redundant Calls)
Problem: Telegram/GitHub retry webhooks if they don't get ACK in time.
Solution: Track request IDs, skip duplicate processing.
Token Savings:
- Platform retries: ~5-10% of requests
- Each retry avoided: 100-300 tokens
- Daily impact (100 queries): 500-3000 tokens saved
Token Savings Summary
Breakdown by Mechanism
Real-World Impact
Scenario: 1000 queries/day
Configuration & Tuning
Feature Flags
Tuning Batch Window
Smaller window (100ms):
- Pro: Faster responses (lower perceived latency)
- Con: Fewer messages collected, less batch efficiency
- Use: Real-time chat apps where speed matters
Larger window (1000ms):
- Pro: More messages collected (better batching)
- Con: Slower first response
- Use: Batch processing, background jobs
Optimal: 500ms
- Balance between speed and efficiency
- User doesn't perceive 500ms delay
- Collects 3-5 rapid messages typically
Monitoring Token Usage
Key Metrics to Track
Dashboard Insights
Best Practices
1. Keep Patterns Updated
2. Monitor Misclassifications
3. Batch Window Tuning
4. Deduplication Effectiveness
Testing Token Savings
Unit Tests
Load Testing
Conclusion
The hybrid router architecture achieves ~75% token reduction through:
- Pattern matching (80% instant, 0 tokens)
- Dual-batch queuing (55% overhead reduction)
- Simple agent routing (40% less planning)
- Request deduplication (5-10% retry skipping)
- Smart heartbeat (message edits, not sends)
This makes duyetbot-agent 5x more efficient than naive full-LLM per-query approaches.
Next Steps:
- Enable Claude Prompt Caching (25% additional savings)
- Add semantic caching for frequent queries
- Monitor daily token usage with dashboards
- Fine-tune patterns based on user data
- Consider model upgrading when cost/performance warrants
References
- Architecture:
docs/architecture.md - Interactive Dashboard:
docs/multiagent-flows.html - Implementation:
packages/cloudflare-agent/src/ - Tests:
packages/cloudflare-agent/test/