Centralized Data Monitoring Architecture
Design document for the centralized data monitoring system using SQL views
Centralized Data Monitoring Architecture
Overview
This document outlines the refactored data monitoring architecture that centralizes all data in two source-of-truth tables and uses SQL views for analytics.
Problem Statement
The current schema has data duplication:
| Table | Purpose | Duplication Issue |
|---|---|---|
observability_events | Webhook events + agent chain | ✅ Source of truth |
chat_messages | Simple message storage | ✅ Source of truth |
analytics_messages | Rich message metadata | ❌ Duplicates chat_messages |
analytics_agent_steps | Agent execution traces | ❌ Duplicates observability_events.agents JSON |
analytics_conversations | Conversation metadata | ❌ Denormalized from messages |
analytics_token_aggregates | Pre-computed aggregates | ❌ Duplicates computed data |
Issues with current design:
- Data inconsistency risk between source and analytics tables
- Double storage cost
- Sync logic complexity
- Potential data loss if sync fails
Proposed Solution
Single Source of Truth
Keep only two tables with ALL necessary fields:
Migration Strategy
Phase 1: Enhance Source Tables
Enhance chat_messages (0002) with analytics fields:
Phase 2: Create Views Instead of Tables
Replace analytics_messages with a view:
Replace analytics_agent_steps with a view:
Replace analytics_conversations with a view:
Replace analytics_token_aggregates with views:
Phase 3: Update Storage Classes
The storage classes in packages/analytics/ need to be updated to query from views instead of tables:
Phase 4: Drop Deprecated Tables
After migration is complete and views are working:
Data Flow
Write Path (No Change to Sources)
Read Path (Views Instead of Tables)
Schema Changes Summary
Tables to Keep (Enhanced)
| Table | Changes |
|---|---|
observability_events | No changes (already complete) |
chat_messages | Add: message_id, platform, user_id, visibility, is_archived, cached_tokens, reasoning_tokens, model, metadata |
analytics_cost_config | Keep as-is (configuration, not data) |
Tables to Replace with Views
| Table | Replacement View |
|---|---|
analytics_messages | analytics_messages_view |
analytics_agent_steps | analytics_agent_steps_view |
analytics_conversations | analytics_conversations_view |
analytics_token_aggregates | analytics_user_daily_view, analytics_platform_daily_view, etc. |
Benefits
- Single Source of Truth: All data in
observability_eventsandchat_messages - No Data Loss: Source tables are never deleted
- Always Consistent: Views compute from source data
- Reduced Storage: No duplicate data
- Simpler Sync: No background jobs to sync tables
- Faster Writes: Insert only to source tables
Performance Considerations
View Performance
Views are computed on-read, which may be slower than pre-computed tables. Mitigations:
- Indexes on source tables: Ensure proper indexes exist
- Materialized views: D1 doesn't support materialized views, but we can use scheduled workers to refresh aggregate views periodically
- Caching: Use Cloudflare KV or Workers cache for frequently-accessed aggregates
Recommended Indexes
Migration Plan
- Week 1: Create migration
0007_enhance_chat_messages.sqlwith new columns - Week 1: Backfill data from
analytics_messagestochat_messages - Week 2: Create views in
0008_analytics_views.sql - Week 2: Update storage classes to use views
- Week 3: Test thoroughly
- Week 3: Create
0009_drop_analytics_tables.sql(run after verification)
Rollback Plan
If issues arise:
- Views can be dropped without data loss
- Original tables can be recreated from source data
- Storage classes can be reverted to query original tables
Implementation Status (Updated: 2025-12-08)
Completed Migrations
| Migration | Status | Description |
|---|---|---|
0007_enhance_chat_messages.sql | ✅ Applied | Added columns: message_id, platform, user_id, visibility, etc. |
0008_centralized_analytics_views.sql | ✅ Applied | Created views: analytics_messages_view, analytics_conversations_view, etc. |
0009_drop_analytics_tables.sql | ⏳ Pending | Will drop deprecated tables after storage class updates |
Verified Views
| View | Count | Source |
|---|---|---|
analytics_messages_view | 24 rows | chat_messages + observability_events |
analytics_conversations_view | 3 rows | Aggregated from chat_messages |
analytics_agent_steps_view | 0 rows | Extracted from observability_events.agents JSON |
Remaining Tasks
- Create migration
0007_enhance_chat_messages.sql - Create migration
0008_centralized_analytics_views.sql - Update
packages/analytics/src/storage/to use viewsmessage-storage.ts: Writes tochat_messages, reads fromanalytics_messagesviewagent-step-storage.ts: UpdatesagentsJSON inobservability_events, reads from viewconversation-storage.ts: Writes tochat_messages, reads fromanalytics_conversationsviewaggregate-storage.ts: Computes aggregates on-demand fromchat_messages
- Test dashboard with new views (TypeScript compiles, API routes use storage classes correctly)
- Create migration
0009_drop_analytics_tables.sql - Apply migration 0009 after deployment verification
- Deploy and monitor