AI startup ElevenLabs, best known for its text-to-speech technology, has launched Scribe, a stand-alone speech-to-text model.
This puts the company in direct competition with OpenAI’s Whisper, AssemblyAI, and Speechmatics.
Backed by $180 million in funding and now valued at $3.3 billion, ElevenLabs is aiming to improve speech recognition across multiple languages.
Scribe supports 99+ languages, with 25 achieving top accuracy (less than 5% word error rate), including English (97%), French, German, and Japanese.
Other languages fall into high (5–10%), good (10–20%), and moderate (25–50%) accuracy levels.
Benchmark tests show Scribe outperforming models like Google Gemini 2.0 Flash and OpenAI’s Whisper Large V3 in various languages.
CEO Mati Staniszewski says the company wants to improve speech recognition, especially for languages that still struggle with accuracy.
Key features include:
Speaker identification (knows who is talking)
Word-level timestamps (for precise subtitles)
Auto-tagging (marks sounds like audience laughter)
Direct video transcription (adds subtitles or captions)
Right now, Scribe only works with pre-recorded audio. A real-time version is in the works.
Pricing is $0.40 per hour, making it competitive, though some rivals offer cheaper options.
99+ languages? My Duolingo owl is sweating.