Waiting for meeting to start...
Meeting summary will appear here
Context history will appear here
No intents detected yet
Polls, scheduling, Jira, insights will appear here
State history will appear here
Debug data appears here
ποΈ Live Agentic Framework Architecture
Stateful Orchestration for Real-Time Meeting Intelligence
π End-to-End Architecture Comparison
Two competing approaches to live meeting intelligence. The Stateless Observer (current) vs the Stateful Orchestrator (proposed).
The "Live Agentic Framework" (Stateful) β The "Moat"
π Click image to view fullscreen
π Architecture Flow
ASR β Flink (Meeting Memory) β Orchestrator (LLM) β Agent Registry β Webex Client
The Orchestrator is the single consumer; adding a new agent just requires registering it in the registry. The data flow doesn't change.
π Key Architecture Properties
- π§ "Compute-to-Data" Model: The Orchestrator (LLM) sits directly next to the Meeting Memory (Flink). It reasons over the entire context in-memory without making network calls to a database.
- π Single Source of Truth: The "Agentic Meeting Memory" handles ASR corrections natively. If the ASR changes "Site-level" to "Org-level," the memory updates instantly before the Orchestrator sees it.
- π O(1) Complexity: Adding 50 new agents adds zero additional load to the database or the ASR stream. The Orchestrator filters intents once and dispatches only when necessary.
- β‘ Real-Time Latency: < 2 seconds end-to-end, designed for real-time intervention during live meetings.
The Agentic Meeting Memory + Orchestrator operate as a unified in-memory reasoning engine.
Live Agent Registry allows adding new agents with zero infrastructure changes.
The "Observer" Architecture (Stateless) β The "Trap"
π Click image to view fullscreen
π Architecture Flow
ASR β AI Bridge β DB β Intent Detection β Polling Agent β DB (Read) β DB (Write) β Client
Every new agent requires its own detection logic, its own database read to get context, and its own database write.
β οΈ Critical Problems
- π "Thundering Herd" Problem: Every agent (Polling, Jira, Scheduling) must independently query the "Transcript DB" for context. If you have 10 agents, you have 10x the database load for every sentence spoken.
- π Race Conditions: The "Polling Agent" reads from the DB while the "AI Bridge" is writing to it. If an ASR correction happens (e.g., "Wait, don't launch the poll"), the Agent might read the old, wrong text and launch it anyway.
- π High Latency: The multi-hop chain (Bridge β API β Agent β DB) introduces 10-15 seconds of lag, making "real-time" intervention impossible.
- π Siloed Logic: The "Polling Intent detection" is hardcoded. Adding a "Jira Agent" requires building a whole new detection pipe, duplicating effort.
Multiple round-trips to Media Backend/DB on every agent decision.
Polling Agent cannot see ASR corrections β may act on stale/wrong data.
π Architecture Comparison
π¬ This PoC: Live Implementation
This demo implements the stateful architecture end-to-end:
ββββββββββββββββ βββββββββββββββββ βββββββββββββββββββββββββββββββββββββββ
β Browser ββββββΆβ Google Cloud ββββββΆβ Apache Flink β
β Transcript β β Pub/Sub β β βββββββββββββββββββββββββββββββββ β
β Generator β β (Message Bus)β β β StatefulNoteProcessor β β
ββββββββββββββββ βββββββββββββββββ β β β’ keyed by meeting_id β β
β β β’ stores last N chunks β β
β β β’ maintains topic/intent β β
β β β’ manages MoM state β β
β βββββββββββββββββ¬ββββββββββββββββ β
β β β
β βββββββββΌββββββββ β
β β LLM Proxy β β
β β (GPT-4/etc) β β
β βββββββββ¬ββββββββ β
ββββββββββββββββββββΌβββββββββββββββββββ
β
βββββββββΌββββββββ
β Centrifugo β
β (WebSocket) β
βββββββββ¬ββββββββ
β
βββββββββΌββββββββ
β Browser β
β (This UI) β
βββββββββββββββββ
π Transcript Generator
Browser + Node.js
- Simulates live ASR stream
- Uploads VTT/Webex transcripts
- Configurable playback speed
π¬ Pub/Sub
Google Cloud Pub/Sub
- Fully managed message bus
- At-least-once delivery
- Decouples producers from consumers
β‘ Flink Orchestrator
Apache Flink (Java)
- Keyed state per meeting_id
- In-memory context window
- Manages MoM, intents, topics
- Calls LLM for analysis
π€ LLM Proxy
Python Flask + GPT-4
- Configurable model selection
- Prompt management
- Token tracking & cost calc
π‘ Centrifugo
WebSocket Server
- Per-meeting channels
- Real-time push to clients
- Sub-100ms delivery
β±οΈ Observed Latency (This PoC)
π‘ LLM inference dominates. With faster models (SLM), total could drop to <500ms.
π― Intent Detection Capabilities
The orchestrator detects intents in real-time via a configurable Intent Registry:
Poll Suggestion
Detects debates/votes and suggests creating a poll
Scheduling
Identifies follow-up meeting requests
Knowledge Fetch
Detects questions about past decisions
Action Items
Captures commitments with owners/deadlines
Decisions
Logs key decisions made during meeting
Open Questions
Tracks unresolved questions for follow-up
π Reference Documents
π οΈ Tech Stack
βοΈ Prompt Editor & Tester
Edit prompts and test them with sample data to see LLM response
π€ Global LLM Model
π Applies to ALL requests β This model is used for live meeting processing and all API calls. Changes are saved to the server.
β‘ Prompt Architecture
Using single prompt with all 5 steps. Best for prompt refinement.
π System Prompt
π User Prompt Template
{previous_summary}, {previous_topics}, {previousIntents},
{context_window}, {current_chunk}
π§ͺ Test Sample Data
Edit sample data matching prompt placeholders, then click "Test Prompt"
Click "Preview Request" or "Test Prompt" to see the JSON request body
π€ LLM Response
Click "Test Prompt" to see LLM response