✻ Apache-2.0 · Runs offline · Trained on Modal

fluentWhisper

Speak messy. Read clean.

On the DisfluencySpeech test set, vanilla Whisper scores 9.4% WER. This adapter brings it down to 3.4%.

That is a 6-point drop in word error rate, with a 95% bootstrap interval of [+5.0, +7.0]. The gain holds up.

As far as we know, the only open Apache-2.0 model that removes fillers, discourse markers, repetitions, and self-repairs in one shot.

It cleans your speech in a single pass. No second model, no LLM rewrite, no cloud round trip.

Then it shows you exactly what it removed, struck through inline so you can trust the edit.

Under the hood it is a small LoRA adapter on whisper-large-v3-turbo: rank 16, a few megabytes, loaded on top of the base.

Trained on Modal from synthetic speech we built ourselves: LibriSpeech text, disfluencies injected, voiced with Kokoro across 54 voices.

Runs offline on your own laptop. Apache-2.0, weights on Hugging Face, reproducible end to end.

Speak or upload audio

Or try a real clip from the DisfluencySpeech test set

Record or upload some speech, then hit Transcribe.

Your cleaning summary will show up here.