Ramblytics

Live, Topic-Threaded Conversations

Muskan Sahetai

Developer

In Development

Ramblytics turns any two-speaker conversation into structured, actionable notes in real time. It transcribes speech, diarizes who's talking (Speaker A/B), segments the dialogue into topics, and builds "threads" with short summaries, key phrases, and auto-detected action items.

Participants can interact with each thread during or after the call—confirm decisions, assign tasks, answer open questions—and export everything to Markdown/Notion/CSV.

Technology Stack

Backend & AI

•Python 3.11+ - Core backend runtime
•Whisper - On-device speech recognition
•PyAnnote - Speaker diarization pipeline
•spaCy / Transformers - NLP for topic extraction
•FastAPI - WebSocket-based real-time API

Frontend & UI

•React 18 - Component-based UI framework
•TypeScript - Type-safe development
•WebSocket API - Real-time audio streaming
•Tailwind CSS - Utility-first styling
•Web Audio API - Browser-based audio capture

Core Features

•On-device ASR + speaker diarization (privacy-first)
•Live topic detection & mini-summaries
•Threaded workspace with per-topic chat
•Auto extraction of questions, decisions, and tasks (with owners A/B)
•Analytics: talk-time balance, interruptions, unanswered questions
•Clean exports for docs/CRMs/LMS

Use Case

Ideal for meetings, interviews, lectures, and sales calls where "who said what about which topic" matters—without sending audio to the cloud.

Development Challenges

1.Real-Time Performance

Balancing transcription accuracy with latency is critical. Whisper models need to process audio chunks fast enough for "live" feel (~2s lag max), which means optimizing model size, GPU utilization, and chunking strategy without sacrificing quality. We're exploring quantized models and streaming inference to hit our latency targets.

2.Speaker Diarization Accuracy

Distinguishing between two speakers in real-world conditions (background noise, overlapping speech, varying distances from mic) is non-trivial. PyAnnote's pretrained models work well in ideal conditions, but we're building custom fine-tuning pipelines and voice embedding strategies to handle edge cases like similar-sounding voices or cross-talk.

3.Topic Segmentation

Conversations don't have clear chapter breaks. Deciding when to split dialogue into a new "thread" requires semantic understanding of context shifts. We're experimenting with sentence embeddings, sliding-window topic coherence scores, and lightweight LLMs to detect topic boundaries without sending full transcripts to the cloud.

4.Privacy-First Architecture

Running everything on-device means no cloud APIs, which is great for privacy but challenging for resource-constrained machines. We need efficient model serving, smart caching, and graceful degradation on lower-end hardware. WebAssembly and ONNX Runtime are being evaluated for browser-based processing.

5.Action Item Extraction

Automatically detecting tasks, decisions, and questions from natural dialogue requires understanding intent and context. We're building NER (Named Entity Recognition) pipelines combined with rule-based patterns to catch phrases like "let's follow up on..." or "can you send me..." and attribute them to the right speaker.

Current Status

Core ASR Pipeline: Functional with Whisper base model

Speaker Diarization: Working in controlled environments

Topic Detection: In active development

Action Item Extraction: Early prototype stage

Export Features: Planned for next sprint

Key Technologies

ASRSpeaker DiarizationNLPReal-timePrivacy-firstTopic DetectionWebSocketsPythonReact