Deepgram, a Y Combinator graduate building tailored speech recognition models, today announced it has raised $12 million in series A financing. CEO and cofounder Scott Stephenson says the proceeds will bolster the development of Deepgram’s platform, which helps enterprises to process meeting, call, and presentation recordings. If all goes according to plan — if Deepgram’s scale eventually matches that of the competition — it could save organizations valuable time by spotlighting key results.
Deepgram leverages a backend speech stack that eschews hand-engineered pipelines for heuristics, stats-based, and fully end-to-end AI processing, with hybrid models trained on PCs equipped with powerful graphics processing units. Each custom model is trained from the ground up and can ingest files in formats ranging from phone calls and podcasts to recorded meetings and videos. Deepgram processes the speech, which is stored in what’s called a “deep representation index” that groups sounds by phonetics as opposed to words. Customers can search for words by the way they sound and, even if they’re misspelled, Deepgram can find them.