%
chunks
max
max
Connected
πŸ“₯0 πŸ“€0 πŸ’°$0
✨ Live Context
β€” β–Ό

Waiting for meeting to start...

πŸ“‹

Meeting summary will appear here

πŸ“–

Context history will appear here

🎯

No intents detected yet

Polls, scheduling, Jira, insights will appear here

⚑

State history will appear here

πŸ“Š Debug Log

Debug data appears here

πŸ—οΈ Live Agentic Framework Architecture

Stateful Orchestration for Real-Time Meeting Intelligence

πŸ”„ End-to-End Architecture Comparison

Two competing approaches to live meeting intelligence. The Stateless Observer (current) vs the Stateful Orchestrator (proposed).

βœ… PROPOSED

The "Live Agentic Framework" (Stateful) β€” The "Moat"

Stateful Live Agentic Framework Architecture

πŸ” Click image to view fullscreen

3 Total Hops to Action
0 Hops per New Agent
<2s Latency
O(1) Scaling

πŸ“ Architecture Flow

ASR β†’ Flink (Meeting Memory) β†’ Orchestrator (LLM) β†’ Agent Registry β†’ Webex Client

The Orchestrator is the single consumer; adding a new agent just requires registering it in the registry. The data flow doesn't change.

πŸ”‘ Key Architecture Properties

  • 🧠 "Compute-to-Data" Model: The Orchestrator (LLM) sits directly next to the Meeting Memory (Flink). It reasons over the entire context in-memory without making network calls to a database.
  • πŸ“ Single Source of Truth: The "Agentic Meeting Memory" handles ASR corrections natively. If the ASR changes "Site-level" to "Org-level," the memory updates instantly before the Orchestrator sees it.
  • πŸ“ˆ O(1) Complexity: Adding 50 new agents adds zero additional load to the database or the ASR stream. The Orchestrator filters intents once and dispatches only when necessary.
  • ⚑ Real-Time Latency: < 2 seconds end-to-end, designed for real-time intervention during live meetings.
🧠 "The Brain" (In-Memory State)

The Agentic Meeting Memory + Orchestrator operate as a unified in-memory reasoning engine.

πŸ”Œ Plug-and-Play Expansion

Live Agent Registry allows adding new agents with zero infrastructure changes.

❌ CURRENT

The "Observer" Architecture (Stateless) β€” The "Trap"

Stateless Observer Architecture

πŸ” Click image to view fullscreen

7+ Total Hops to Action
3-4 Hops per New Agent
10-15s Latency
O(n) Scaling

πŸ“ Architecture Flow

ASR β†’ AI Bridge β†’ DB β†’ Intent Detection β†’ Polling Agent β†’ DB (Read) β†’ DB (Write) β†’ Client

Every new agent requires its own detection logic, its own database read to get context, and its own database write.

⚠️ Critical Problems

  • 🐘 "Thundering Herd" Problem: Every agent (Polling, Jira, Scheduling) must independently query the "Transcript DB" for context. If you have 10 agents, you have 10x the database load for every sentence spoken.
  • πŸƒ Race Conditions: The "Polling Agent" reads from the DB while the "AI Bridge" is writing to it. If an ASR correction happens (e.g., "Wait, don't launch the poll"), the Agent might read the old, wrong text and launch it anyway.
  • 🐌 High Latency: The multi-hop chain (Bridge β†’ API β†’ Agent β†’ DB) introduces 10-15 seconds of lag, making "real-time" intervention impossible.
  • πŸ”’ Siloed Logic: The "Polling Intent detection" is hardcoded. Adding a "Jira Agent" requires building a whole new detection pipe, duplicating effort.
⚠️ Latency Bottleneck (Network I/O)

Multiple round-trips to Media Backend/DB on every agent decision.

πŸ‘οΈ Stateless Blindness

Polling Agent cannot see ASR corrections β€” may act on stale/wrong data.

πŸ“Š Architecture Comparison

Feature
Current Architecture (Stateless)
Proposed Framework (Stateful)
Data Model
Message-based (Append Only)
Stream-based (Living Document)
Context Handling
Fetch from DB (Slow)
In-Memory (Instant)
ASR Corrections
Fails (Context Pollution)
Succeeds (Native Updates)
Scaling Cost
Linear (More Agents = More DB Load)
Constant (More Agents = Same Load)
Latency
10-15 Seconds
< 2 Seconds
Total Hops to Action
7+ Hops
3 Hops
Hops per New Agent
3-4 Hops (new detection pipe)
0 Hops (registry only)

πŸ”¬ This PoC: Live Implementation

This demo implements the stateful architecture end-to-end:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Browser    │────▢│ Google Cloud  │────▢│        Apache Flink                 β”‚
β”‚ Transcript   β”‚     β”‚   Pub/Sub     β”‚     β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  Generator   β”‚     β”‚  (Message Bus)β”‚     β”‚  β”‚  StatefulNoteProcessor        β”‚  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚  β”‚  β€’ keyed by meeting_id        β”‚  β”‚
                                           β”‚  β”‚  β€’ stores last N chunks       β”‚  β”‚
                                           β”‚  β”‚  β€’ maintains topic/intent     β”‚  β”‚
                                           β”‚  β”‚  β€’ manages MoM state          β”‚  β”‚
                                           β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
                                           β”‚                  β”‚                  β”‚
                                           β”‚          β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”          β”‚
                                           β”‚          β”‚  LLM Proxy    β”‚          β”‚
                                           β”‚          β”‚  (GPT-4/etc)  β”‚          β”‚
                                           β”‚          β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
                                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                              β”‚
                                                      β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”
                                                      β”‚  Centrifugo   β”‚
                                                      β”‚  (WebSocket)  β”‚
                                                      β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                                                              β”‚
                                                      β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”
                                                      β”‚    Browser    β”‚
                                                      β”‚    (This UI)  β”‚
                                                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              

πŸ“ Transcript Generator

Browser + Node.js

  • Simulates live ASR stream
  • Uploads VTT/Webex transcripts
  • Configurable playback speed

πŸ“¬ Pub/Sub

Google Cloud Pub/Sub

  • Fully managed message bus
  • At-least-once delivery
  • Decouples producers from consumers

⚑ Flink Orchestrator

Apache Flink (Java)

  • Keyed state per meeting_id
  • In-memory context window
  • Manages MoM, intents, topics
  • Calls LLM for analysis

πŸ€– LLM Proxy

Python Flask + GPT-4

  • Configurable model selection
  • Prompt management
  • Token tracking & cost calc

πŸ“‘ Centrifugo

WebSocket Server

  • Per-meeting channels
  • Real-time push to clients
  • Sub-100ms delivery

⏱️ Observed Latency (This PoC)

Ingest (Browser β†’ Pub/Sub) ~50-100ms
Queue (Pub/Sub β†’ Flink) ~100-200ms
LLM Inference (GPT-4) ~500-2000ms
Push (Centrifugo β†’ Browser) ~50-100ms
Total E2E ~700-2500ms

πŸ’‘ LLM inference dominates. With faster models (SLM), total could drop to <500ms.

🎯 Intent Detection Capabilities

The orchestrator detects intents in real-time via a configurable Intent Registry:

πŸ“Š

Poll Suggestion

Detects debates/votes and suggests creating a poll

πŸ“…

Scheduling

Identifies follow-up meeting requests

πŸ”

Knowledge Fetch

Detects questions about past decisions

βœ…

Action Items

Captures commitments with owners/deadlines

βš–οΈ

Decisions

Logs key decisions made during meeting

❓

Open Questions

Tracks unresolved questions for follow-up

πŸ“š Reference Documents

πŸ› οΈ Tech Stack

Kubernetes (GKE) Apache Flink Google Pub/Sub Centrifugo OpenAI GPT-4 Java 17 TypeScript Python/Flask Helm Charts

✏️ Prompt Editor & Tester

Edit prompts and test them with sample data to see LLM response

πŸ€– Global LLM Model

🌐 Applies to ALL requests β€” This model is used for live meeting processing and all API calls. Changes are saved to the server.

⚑ Prompt Architecture

Using single prompt with all 5 steps. Best for prompt refinement.

πŸ“ System Prompt

πŸ“ User Prompt Template

Available Placeholders: {previous_summary}, {previous_topics}, {previousIntents}, {context_window}, {current_chunk}

πŸ§ͺ Test Sample Data

Edit sample data matching prompt placeholders, then click "Test Prompt"

Click "Preview Request" or "Test Prompt" to see the JSON request body
πŸ§ͺ One-time test only

πŸ“€ LLM Response

πŸ€–

Click "Test Prompt" to see LLM response