Загрузка видео...
Не удалось загрузить видео
Introducing Nova-2, our next-gen model for superhuman speech-to-text. TL;DR Nova-2 delivers: 💥 Next-level accuracy: +18% accuracy than Nova-1 & over 36% accuracy than OpenAI Whisper large 💥 Up to 40x faster 💥 Same low cost: 3-7x cheaper 🧵👇
2,184,459 просмотров • 2 лет назад •via X (Twitter)
Комментарии: 10

Extending upon Nova's groundbreaking training, which spanned +100 domains and 47 billion tokens, Nova-2 continues to be the deepest-trained ASR model in the world.

Nova-2 was trained in a 2-stage curriculum starting from the largest, most diverse dataset in Deepgram’s history: nearly 6M resources and an extensive library of high-quality human transcriptions. The result? 👇

A new state-of-the-art model capable of superhuman transcription performance that consistently outperforms any other STT model in the market today across a wide range of speech application domains. Onto the benchmark results…

In our benchmarking, Nova-2 has an overall WER of 8.4% for the median files tested, representing a 16.8% relative error rate improvement compared to the closest provider. Nova-2 surpassed all tested competitors by an average of 30% and outperformed OpenAI Whisper large by 36%.

Modern speech apps are increasingly used to automate real-time interactions with end users for use cases like agent assist and live captioning. But there are limited options for true real-time STT and several providers like OpenAI lack native streaming models...

...However, in our real-time accuracy benchmarking, Nova-2 handily outperforms the field with an average relative reduction in WER of 28.6% across all domains.

Regarding speed, our benchmarks reveal that Nova-2 surpasses all other STT models, achieving a median inference time of 29.8 seconds per hour of diarized audio. This represents a significant speed advantage ranging from 5-40x faster than comparable vendors offering diarization.

In terms of cost, Nova-2 maintains the same starting price as Nova at just $0.0043 per minute of pre-recorded audio, nearly 3-5x more affordable than any other full-functionality provider (based on currently listed pricing) in the market.

Since launching Nova-1 this year, we have also released new features encompassing improved speaker diarization, smart formatting, filler words support, and our inaugural domain-specific language model for summarization.

You can dive deeper into our approach to model development and the benchmarks in the full announcement. Plus, get started with Nova-2 by requesting early access. Link to announcement:

