正在加载视频...

视频加载失败

Kyutai Speech-To-Text is now open-source! It’s streaming, supports batched inference, and runs blazingly fast: perfect for interactive applications. Check out the details here:

66,264 次观看 • 1 年前 •via X (Twitter)

9 条评论

kyutai 的头像
kyutai1 年前

Today we are releasing two models. The first one is a 2.6B English-only model that beats Whisper Large v3 on benchmarks even though it’s a streaming model that doesn’t process all the audio at once. It can process 400 sequences in parallel on a single H100.

kyutai 的头像
kyutai1 年前

The other model is a lightweight English/French 1B model optimized for real-time voice chat apps like It comes with a semantic voice activity detector that predicts if you’re done talking or just pausing mid-sentence. The open-source releases of Kyutai Text-To-Speech and will follow soon!

clem 🤗 的头像
clem 🤗1 年前

Magnifique !

Alex Volkov (Thursd/AI) 的头像
Alex Volkov (Thursd/AI)1 年前

This is great!! Well cover on @thursdai_pod on an hour

@gerry 的头像
@gerry1 年前

That is really good. Well done :)

Dan Western 的头像
Dan Western1 年前

Interesting... Great conversation with this ai. Wondering about potential opportunities to embed this functionality into apps...

karai 的头像
karai1 年前

It needs mooore languages

ratwell 的头像
ratwell1 年前

@dankvr finally

Simon Icard  的头像
Simon Icard 1 年前

👏

相关视频