AI EngineeringProduction

AI Speech-to-Text Journaling API

Turning spoken words into structured, speaker-attributed intelligence

A RESTful API that turns raw audio into speaker-attributed transcripts with a multi-model pipeline: Whisper for transcription, PyAnnote for diarization, and Gemma-3 for insight extraction.

Faster-WhisperPyAnnote 3.1Gemma-3FastAPICeleryRedispydubDocker

Problem Statement

Manual note-taking loses context and accuracy, offers no speaker attribution, and turns long recordings into hours of work. Inconsistent formats then hinder downstream search and analytics.

Manual note-taking loses context and accuracy.
No attribution: who said what is unclear.
Long recordings take hours to process by hand.
Inconsistent formats hinder searchability.

Headline Outcomes

~$0.1099% saved

Cost per hour of audio

~$90 (typist)

8–12 min20× faster

Time-to-transcript (60 min)

3–4 hours

>90%PyAnnote 3.1

Speaker attribution accuracy

The Solution

An asynchronous, queue-backed API that accepts audio uploads and returns structured, speaker-attributed transcripts. It is format-agnostic, with three processing modes behind a single endpoint.

Faster-Whisper (large, int8) transcription; int8 quantization halves CPU memory vs fp32.

PyAnnote 3.1 assigns per-speaker labels with millisecond timestamps.

Gemma-3-27B extracts structured insights on-device, removing cloud API costs and producing RAG-ready output.

pydub splits long audio into overlapping 3-min chunks to prevent boundary-cut errors; Celery scales workers.

System Architecture

How the data flows

Audio Upload

6+ formats

Format Normalize

pydub conversion

Chunk & Queue

Celery + Redis

Whisper + PyAnnote

Transcribe + diarize

Structured Output

JSON + Markdown

Result 01

Auto-captures meeting minutes, call records and interview logs at scale.

Result 02

No vendor lock-in: fully open-source stack, cloud-deployable with no infra changes.

Result 03

Plug-and-play JSON API feeds LLM summarisation and RAG pipelines downstream.

Build Something Like This Services & Pricing More Case Studies

Taking on new projects · Outside IR35

Have a data pipeline or warehouse problem worth solving?

From messy source data to analytics-ready warehouses that cut cost. Let's scope it. I reply within one business day.

Start a Project Connect on LinkedIn