🎯 The Architectural Challenge
🔴 The Core Problem
Context: PeopleWorksGPT is evolving from simple grid/pivot output to rich conversational experiences like Claude, ChatGPT, and Gemini.
Current Flow:
- Step 1 (Data Fetch): User question → Schema + Hints → SQL Generation → Execute → ExpandedObject
- Step 2 (Rich Output): Data + New Prompt → LLM → Enriched text with charts, explanations, insights
The Decision Point: When user asks a follow-up question, how do we determine:
- ❓ Execute Step 1 again? (New data query needed)
- ❓ Continue with existing data? (Step 2 only with current context)
💡 The Recommended Solution: Hybrid Intent Classifier
✅ Multi-Layer Routing Architecture
A three-tier decision system combining heuristics, context analysis, and LLM classification:
- Tier 1 - Fast Heuristics: Pattern matching for obvious cases (90% of scenarios)
- Tier 2 - Context Analysis: Evaluate conversation state and data availability
- Tier 3 - LLM Classifier: Intelligent routing for ambiguous cases
graph TD
A[User Follow-up Question] --> B{Tier 1: Fast Heuristics}
B -->|Contains Data Keywords| C[NEW_DATA Required]
B -->|Contains Visualization Keywords| D[CONTINUE with Data]
B -->|Ambiguous| E{Tier 2: Context Analysis}
E -->|Data Still Valid| F{Covers Current Scope}
E -->|Data Expired/Invalid| C
F -->|Yes Within Scope| D
F -->|No Different Scope| C
F -->|Uncertain| G{Tier 3: LLM Classifier}
G -->|Analyze Intent| H[Classification Prompt]
H --> I{Decision}
I -->|NEW_DATA| C
I -->|CONTINUE| D
C --> J[Execute Step 1: SQL Generation]
D --> K[Execute Step 2: Rich Output Only]
J --> L[Return with Fresh Data]
K --> M[Return with Existing Data]
style C fill:#fc8181,stroke:#e53e3e,stroke-width:3px
style D fill:#9ae6b4,stroke:#38a169,stroke-width:3px
style G fill:#fef08a,stroke:#d69e2e,stroke-width:3px
style J fill:#fecaca,stroke:#f56565,stroke-width:2px
style K fill:#bbf7d0,stroke:#48bb78,stroke-width:2px
🚀 Tier 1: Fast Heuristic Patterns (< 10ms)
Pattern Detection Rules
| Pattern Type | Keywords/Indicators | Decision | Confidence | Example Questions |
|---|---|---|---|---|
| NEW_DATA Signals |
• "show", "get", "fetch" • "who", "what", "when", "where" • "find", "search", "list" • "compare", "analyze" • Entity names (tables/columns) |
NEW_DATA | 95% |
"Show me sales from 2024" "What are the top 10 customers?" "Get employees in Marketing" |
| CONTINUE Signals |
• "explain", "why", "how" • "visualize", "chart", "graph" • "format as", "create a" • "summarize", "interpret" • "this data", "these results" |
CONTINUE | 92% |
"Explain why this happened" "Create a pie chart with this" "Summarize these findings" |
| Temporal Changes |
• Different time period • "now show", "what about" • "instead of", "rather than" |
NEW_DATA | 88% |
"Now show 2023 instead" "What about last quarter?" "Change date range to May" |
| Drill-Down Requests |
• "more details on" • "breakdown of", "drill into" • Specific record reference |
NEW_DATA | 75% |
"Show details for customer #123" "Drill into that department" "Breakdown by region" |
| Refinement Requests |
• "also include", "add" • "filter by", "exclude" • "sort by", "order by" |
DEPENDS | 60% |
"Also include inactive users" "Sort by revenue DESC" "Filter out cancelled orders" |
🔍 Tier 2: Context Analysis (< 50ms)
Conversational State Evaluation
📊 Data Context State
Maintain a conversation state object tracking:
- Last Query Metadata: Original question, generated SQL, execution timestamp
- Data Scope: Tables queried, date ranges, filters applied
- Result Summary: Row count, columns returned, data freshness
- User Intent History: Last 3-5 questions to understand trajectory
🌳 Context Decision Tree
- Is data expired? (> 15 min) → NEW_DATA
- Does question reference new entities? → NEW_DATA
- Are date/time filters different? → NEW_DATA
- Is it a visualization/formatting request? → CONTINUE
- Is it a clarification/explanation? → CONTINUE
- Still uncertain? → Escalate to Tier 3
🧠 Tier 3: LLM Intent Classifier (< 200ms)
The Classification Prompt
When Tiers 1 & 2 are uncertain, delegate to the LLM with a specialized classification prompt that returns structured output:
sequenceDiagram
participant User
participant Router
participant Tier1 as Tier 1
participant Tier2 as Tier 2
participant LLM as Tier 3 LLM
participant Engine
User->>Router: Follow-up Question
Router->>Tier1: Fast Heuristics
alt Clear Pattern Match
Tier1-->>Router: NEW_DATA or CONTINUE
Router->>Engine: Execute Decision
else Uncertain
Router->>Tier2: Context Analysis
alt Context Determines
Tier2-->>Router: NEW_DATA or CONTINUE
Router->>Engine: Execute Decision
else Still Uncertain
Router->>LLM: Classification Prompt
LLM-->>Router: JSON Response
Router->>Engine: Execute Decision
end
end
Engine-->>User: Result
⚡ Performance Optimization
- Caching: Cache classification decisions for similar questions (embeddings similarity)
- Parallel Processing: Run Tier 2 and Tier 3 concurrently, use fastest result
- Model Selection: Use fast model (Claude Haiku, GPT-4o-mini) for classification
- Streaming: Start response generation while classification completes
🛠️ Complete Implementation Strategy
Step 1: Build State Manager
Create a ConversationStateManager that persists:
- Last N queries with metadata
- Current data context (tables, filters, results)
- User interaction history
Step 2: Implement Routing Pipeline
Step 3: Add Telemetry & Learning
Track routing decisions to improve over time:
- Log decision accuracy (did user correct us?)
- Measure latency per tier
- Identify patterns that should move from Tier 3 → Tier 1
- A/B test threshold values
Step 4: Handle Edge Cases
- Ambiguous Refinements: "Add region column" → Try CONTINUE first, fallback to NEW_DATA if data doesn't support it
- User Corrections: "No, I meant..." → Always NEW_DATA
- Multi-Intent Questions: "Show sales AND explain the trend" → NEW_DATA (primary intent)
- Timeout/Failure: Default to NEW_DATA (safer choice)
📊 Strategy Comparison: Alternative Approaches
| Strategy | Accuracy | Latency | Cost | Complexity | Pros | Cons |
|---|---|---|---|---|---|---|
| Always NEW_DATA | 100% | High | High | Low | Simple, no errors | Wasteful, slow, expensive |
| Heuristics Only | 65-75% | < 10ms | Free | Low | Fast, cheap | Miss nuanced cases |
| LLM Only | 90-95% | 150-300ms | $0.002/call | Medium | Accurate, flexible | Slower, costs add up |
| Hybrid (Recommended) | 88-93% | 20-100ms | $0.0003/call avg | Medium | Best balance of speed, accuracy, cost | More code to maintain |
| User Selection | 100% | 0ms | Free | Low | Perfect accuracy, user control | Requires UI elements, friction |
✅ Hybrid Approach Advantages
- Performance: 70% of requests route in < 10ms via heuristics
- Accuracy: 88-93% correct routing with graceful degradation
- Cost-Effective: LLM calls only for 15-20% of difficult cases
- Self-Improving: Telemetry identifies patterns → migrate to heuristics
- User Experience: Fast responses without decision friction
❌ Potential Challenges
- Complexity: Three-tier system requires more engineering
- Maintenance: Heuristic rules need periodic updates
- Edge Cases: 7-12% misclassifications need fallback handling
- State Management: Must persist conversation context reliably
- Testing: Requires comprehensive test coverage across tiers
🎨 User Experience Enhancement Options
Option 1: Invisible Routing (Recommended)
System makes decision transparently. User never sees routing logic.
- ✅ Seamless conversational flow
- ✅ No cognitive load on user
- ❌ Occasional misrouting (handle with "Not what you meant? Try rephrasing")
Option 2: Smart Suggestions
When uncertain, show user quick-action buttons:
- ✅ User has final say
- ✅ Doubles as feedback mechanism
- ❌ Adds UI friction for ambiguous cases
Option 3: Hybrid with Override
Auto-route but provide subtle "Using current data" indicator with option to refresh:
- ✅ Best of both worlds
- ✅ Transparency without friction
- ✅ Easy corrections
🎯 Real-World Routing Scenarios
| Conversation Flow | Routing Decision | Reasoning | Tier Used |
|---|---|---|---|
|
Q1: "Show 2024 sales by region" Q2: "Create a bar chart" |
CONTINUE | Visualization request, data still valid | Tier 1 (Heuristic) |
|
Q1: "Show top 10 customers" Q2: "Now show bottom 10" |
NEW_DATA | Different data subset required | Tier 1 (Heuristic) |
|
Q1: "Analyze Q3 performance" Q2: "Why did Region A underperform?" |
CONTINUE | Explanation within existing scope | Tier 2 (Context) |
|
Q1: "Show employee count" Q2: "Also include their salaries" |
NEW_DATA | Additional columns = new SQL needed | Tier 2 (Context) |
|
Q1: "Show this month's orders" Q2: "Compare with last month" |
NEW_DATA | New time range = new query | Tier 1 (Heuristic) |
|
Q1: "Get product inventory" Q2: "How is this calculated?" |
CONTINUE | Meta-question about data/process | Tier 1 (Heuristic) |
|
Q1: "Show sales performance" Q2: "Break down by product category" |
AMBIGUOUS | Could be in current data OR need new query | Tier 3 (LLM) |
🏆 Final Recommendation: Phased Implementation
gantt
title PeopleWorksGPT Routing Implementation Roadmap
dateFormat YYYY-MM-DD
section Phase 1 Foundation
State Manager Heuristics :done, p1, 2025-01-01, 2w
Basic Routing Pipeline :done, p2, after p1, 1w
Testing Infrastructure :active, p3, after p2, 1w
section Phase 2 Intelligence
Context Analysis :p4, after p3, 2w
LLM Classifier Integration :p5, after p4, 1w
Performance Optimization :p6, after p5, 1w
section Phase 3 Refinement
Telemetry Analytics :p7, after p6, 1w
User Feedback Loop :p8, after p7, 2w
AB Testing Framework :p9, after p8, 1w
section Phase 4 Scale
Pattern Learning System :p10, after p9, 2w
Cost Optimization :p11, after p10, 1w
Production Launch :milestone, p12, after p11, 0d
✅ Recommended Implementation Path
Phase 1: MVP (Week 1-2)
- ✅ Implement Tier 1 heuristics only
- ✅ Default to NEW_DATA for uncertain cases
- ✅ Add basic state management
- ✅ Deploy with telemetry
Phase 2: Intelligence (Week 3-5)
- ✅ Add Tier 2 context analysis
- ✅ Integrate LLM classifier (Tier 3)
- ✅ Optimize heuristic rules based on data
- ✅ A/B test against always-NEW_DATA baseline
Phase 3: Optimization (Week 6-8)
- ✅ Implement caching and parallel processing
- ✅ Add user override options (smart suggestions)
- ✅ Build automated pattern learning
- ✅ Cost and performance optimization
- Accuracy: ≥ 85% correct routing decisions
- Performance: Average routing latency < 100ms
- Cost: < $0.0005 per conversation on average
- User Satisfaction: < 10% regeneration rate
- Coverage: ≥ 70% routed via Tier 1 (heuristics)