正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Kyutai Speech-To-Text is now open-source! It’s streaming, supports batched inference, and runs blazingly fast: perfect for interactive applications. Check out the details here:

kyutai

26,286 subscribers

66,264 次观看 • 1 年前 •via X (Twitter)

科学技术新闻政治教育

Anya Rossi• Live Now

Private livecam show

9 条评论

kyutai 的头像

kyutai1 年前

Today we are releasing two models. The first one is a 2.6B English-only model that beats Whisper Large v3 on benchmarks even though it’s a streaming model that doesn’t process all the audio at once. It can process 400 sequences in parallel on a single H100.

kyutai 的头像

kyutai1 年前

The other model is a lightweight English/French 1B model optimized for real-time voice chat apps like It comes with a semantic voice activity detector that predicts if you’re done talking or just pausing mid-sentence. The open-source releases of Kyutai Text-To-Speech and will follow soon!

clem 🤗 的头像

clem 🤗1 年前

Magnifique !

Alex Volkov (Thursd/AI) 的头像

Alex Volkov (Thursd/AI)1 年前

This is great!! Well cover on @thursdai_pod on an hour

@gerry 的头像

@gerry1 年前

That is really good. Well done :)

Dan Western 的头像

Dan Western1 年前

Interesting... Great conversation with this ai. Wondering about potential opportunities to embed this functionality into apps...

karai 的头像

karai1 年前

It needs mooore languages

ratwell 的头像

ratwell1 年前

@dankvr finally

Simon Icard  的头像

Simon Icard 1 年前

👏

相关视频

Kyutai TTS and Unmute are now open source! The text-to-speech is natural, customizable, and fast: it can serve 32 users with a 350ms latency on a single L40S. Try it out and get started on the project page:

Kyutai TTS and Unmute are now open source! The text-to-speech is natural, customizable, and fast: it can serve 32 users with a 350ms latency on a single L40S. Try it out and get started on the project page:

kyutai

171,391 次观看 • 11 个月前

Releasing kyutai pocket TTS inference in webgpu today — it’s open-source, here’s a demo from my phone It took a while and a bunch of code to implement streaming inference from scratch!

Releasing kyutai pocket TTS inference in webgpu today — it’s open-source, here’s a demo from my phone It took a while and a bunch of code to implement streaming inference from scratch!

Eric Zhang

22,085 次观看 • 5 个月前

Hacked today: A realtime voice-to-voice assistant for my Mac that runs in the background and helps me be productive. Stack - RealtimeTTS package (for python) - Groq for fast inference - Mac native text to speech Quite happy with this :)

Hacked today: A realtime voice-to-voice assistant for my Mac that runs in the background and helps me be productive. Stack - RealtimeTTS package (for python) - Groq for fast inference - Mac native text to speech Quite happy with this :)

Paras Chopra

19,316 次观看 • 1 年前

Grok's Text to Speech API is now available in LiveKit Inference. Natural, expressive voices with low-latency streaming. Multilingual in 20+ languages. Telephony and production-ready out of the box. One API key. No extra setup. →

Grok's Text to Speech API is now available in LiveKit Inference. Natural, expressive voices with low-latency streaming. Multilingual in 20+ languages. Telephony and production-ready out of the box. One API key. No extra setup. →

LiveKit

158,923 次观看 • 3 个月前

Kyutai released their Streaming Text to Speech model, ~2B param model, ultra low latency (220ms), CC-BY-4.0 license 🔥 Trained on 2.5 Million Hours of audio, it can serve up to 32 users w/ less than 350ms latency on a SINGLE L40 🤯 Incredible release by kyutai folks, go check out their hugging face page now!

Kyutai released their Streaming Text to Speech model, ~2B param model, ultra low latency (220ms), CC-BY-4.0 license 🔥 Trained on 2.5 Million Hours of audio, it can serve up to 32 users w/ less than 350ms latency on a SINGLE L40 🤯 Incredible release by kyutai folks, go check out their hugging face page now!

Vaibhav (VB) Srivastav

93,512 次观看 • 11 个月前

We’re excited to introduce Pocket TTS: a 100M-parameter text-to-speech model with high-quality voice cloning that runs on your laptop—no GPU required. Open-source, lightweight, and incredibly fast. 🧵👇

We’re excited to introduce Pocket TTS: a 100M-parameter text-to-speech model with high-quality voice cloning that runs on your laptop—no GPU required. Open-source, lightweight, and incredibly fast. 🧵👇

kyutai

236,570 次观看 • 5 个月前

Our 2023 #SnowFellowship round is now open for applications! Find application details on our website and speak with your research office about submitting your EOI. #SnowMedical #EMCR #AustralianResearch Check out our latest Snow Fellows here:

Our 2023 #SnowFellowship round is now open for applications! Find application details on our website and speak with your research office about submitting your EOI. #SnowMedical #EMCR #AustralianResearch Check out our latest Snow Fellows here:

Snow Medical

10,810 次观看 • 3 年前

📣 Copilot Autofix is now available for free for open source! Fix vulnerabilities as fast as they're found⚡ Check it out!

📣 Copilot Autofix is now available for free for open source! Fix vulnerabilities as fast as they're found⚡ Check it out!

GitHub

58,143 次观看 • 1 年前

🚨 New model alert! Dialog by vibx — a leading text-to-speech model — now runs on GroqCloud™. That means natural-sounding speech with ultra-low latency, making real-time voice applications smoother and more responsive. Learn more & build fast — links in the comments!

🚨 New model alert! Dialog by vibx — a leading text-to-speech model — now runs on GroqCloud™. That means natural-sounding speech with ultra-low latency, making real-time voice applications smoother and more responsive. Learn more & build fast — links in the comments!

Groq Inc

47,183 次观看 • 1 年前

🎙️Do you know you now have all the building blocks for full speech-to-speech? - Voxtral Realtime: High-quality, real-time speech-to-text. - Mistral Small 4: Fast, efficient, general-purpose agentic model. - Voxtral TTS: Realistic customizable text-to-speech with streaming output.

🎙️Do you know you now have all the building blocks for full speech-to-speech? - Voxtral Realtime: High-quality, real-time speech-to-text. - Mistral Small 4: Fast, efficient, general-purpose agentic model. - Voxtral TTS: Realistic customizable text-to-speech with streaming output.

Mistral AI for Developers

27,647 次观看 • 2 个月前

What will you build with Vision Agents? Out-of-the-box support for: - Turn detection - Speech-to-text + text-to-speech - Voice activity detection - MCP & function-calling support Open-source. Video-first. Ready to build.

What will you build with Vision Agents? Out-of-the-box support for: - Turn detection - Speech-to-text + text-to-speech - Voice activity detection - MCP & function-calling support Open-source. Video-first. Ready to build.

Stream

226,723 次观看 • 5 个月前

🛡️IT’S OFFICIAL: Zcash 🛡️ x Maya Protocol You asked. You waited. And now… It’s finally here! 🚀Maya now supports private cross-chain swaps with Zcash! Get your privacy on. Check out our blog and dive into the details:

🛡️IT’S OFFICIAL: Zcash 🛡️ x Maya Protocol You asked. You waited. And now… It’s finally here! 🚀Maya now supports private cross-chain swaps with Zcash! Get your privacy on. Check out our blog and dive into the details:

Maya Protocol

31,742 次观看 • 1 年前

Wohoo!! 🎉 You can now generate 3D structures of proteins, RNA, DNA, and small molecules. 🚀 Check out BOLTZ! — the first open-source and commercially available model to achieve AlphaFold3-level accuracy in biomolecular structure prediction. 🧪 Try it yourself on Gradio ! Explore and generate different proteins and viruses: For faster inference, check it out here (on L4): 👉 Let's make biology more open on Hugging Face ! If there’s anything to fear, it’s tech being closed :) !

Wohoo!! 🎉 You can now generate 3D structures of proteins, RNA, DNA, and small molecules. 🚀 Check out BOLTZ! — the first open-source and commercially available model to achieve AlphaFold3-level accuracy in biomolecular structure prediction. 🧪 Try it yourself on Gradio ! Explore and generate different proteins and viruses: For faster inference, check it out here (on L4): 👉 Let's make biology more open on Hugging Face ! If there’s anything to fear, it’s tech being closed :) !

Jade Choghari

55,770 次观看 • 1 年前

Meet CosyVoice 3 — An open-source multilingual speech synthesis model delivering high-fidelity voices, natural prosody, and accurate pronunciation for lifelike speech. Ready to bring the voices to real‑world applications?

Meet CosyVoice 3 — An open-source multilingual speech synthesis model delivering high-fidelity voices, natural prosody, and accurate pronunciation for lifelike speech. Ready to bring the voices to real‑world applications?

Alibaba Cloud

3,149,715 次观看 • 5 个月前

🔊Introducing Voxtral TTS: our new frontier open-weight model for natural, expressive, and ultra-fast text-to-speech 🎭Realistic, emotionally expressive speech. 🌍Supports 9 languages and accurately captures diverse dialects. ⚡Very low latency for time-to-first-audio. 🔄Easily adaptable to new voices

🔊Introducing Voxtral TTS: our new frontier open-weight model for natural, expressive, and ultra-fast text-to-speech 🎭Realistic, emotionally expressive speech. 🌍Supports 9 languages and accurately captures diverse dialects. ⚡Very low latency for time-to-first-audio. 🔄Easily adaptable to new voices

Mistral AI

937,612 次观看 • 3 个月前

Open-source project Soundstorm (AI generated speech from Google Research) is going to give Elevenlabs a run for it's money: The text-to-speech project specializes in dialogue between multiple parties, and is available on Github:

Open-source project Soundstorm (AI generated speech from Google Research) is going to give Elevenlabs a run for it's money: The text-to-speech project specializes in dialogue between multiple parties, and is available on Github:

AI Breakfast

326,474 次观看 • 3 年前

Meet CosyVoice 3 🔊 An open-source multilingual speech synthesis model delivering high-fidelity voices, natural prosody, and accurate pronunciation for lifelike speech. Ready to bring the voices to real‑world applications?

Meet CosyVoice 3 🔊 An open-source multilingual speech synthesis model delivering high-fidelity voices, natural prosody, and accurate pronunciation for lifelike speech. Ready to bring the voices to real‑world applications?

Alibaba Group

126,533 次观看 • 5 个月前