
kyutai
@kyutai_labs • 26,215 subscribers
Shorts
Videos

Speech-native models like Moshi sound great and answer fast, but aren’t as smart as text LLMs. In our new paper, MoshiRAG, we show how Moshi can ask for advice from a text LLM or a knowledge base. The tricky part is how to do this in real time without adding latency. 🧵
kyutai52,215 Aufrufe • vor 1 Monat

Meet Hibiki, our simultaneous speech-to-speech translation model, currently supporting 🇫🇷➡️🇬🇧. Hibiki produces spoken and text translations of the input speech in real-time, while preserving the speaker’s voice and optimally adapting its pace based on the semantic content of the source speech. Based on objective and human evaluations, Hibiki outperforms previous systems for quality, naturalness and speaker similarity and approaches human interpreters. 🧵
kyutai167,337 Aufrufe • vor 1 Jahr

Meet MoshiVis🎙️🖼️, the first open-source real-time speech model that can talk about images! It sees, understands, and talks about images — naturally, and out loud. Voice interaction with a compact model endowed with visual understanding opens up new applications, from audio description for the visual impaired to visual access to information. Try it out 👉 Blog post 👉
kyutai47,907 Aufrufe • vor 1 Jahr

With Invincible Voice, we help people living with ALS communicate more easily. Encountering Olivier Goy, an entrepreneur who lives with ALS and relentlessly fights to help all patients, made it obvious that our cutting-edge voice AI should help. We turned our Unmute voice-wrapper into a new system that 1/ transcribes interlocutor’s speech in real time, 2/ suggests various relevant responses via a personalised language model, 3/ utters patient's chosen response with their voice (using 10s pre-disease speech recordings). True to our philosophy, we open-source Invincible Voice, so that developers can refine the prototype, port it from French to other languages, adapt it to other conditions (aphasia, neurodegenerative diseases) and turn it into a deployable product. Its modularity also allows it to leverage technologies developed by Gradium that supports Invincible Voice by granting it free access to its multilingual speech models.
kyutai11,028 Aufrufe • vor 4 Monaten

Have you enjoyed talking to 🟢Moshi? Have you dreamt of making your own speech to speech chat experience🧑🔬🤖 ? It's now possible with the moshi-finetune codebase! Plug your own dataset and change the voice, the tone and the personality of Moshi 💚🔌💿. Here's an example after finetuning w/ only 20 hours from the public DailyTalk dataset. 🧵
kyutai19,937 Aufrufe • vor 1 Jahr
Keine weiteren Inhalte verfügbar