正在加载视频...

视频加载失败

Meet MoshiVis🎙️🖼️, the first open-source real-time speech model that can talk about images! It sees, understands, and talks about images — naturally, and out loud. Voice interaction with a compact model endowed with visual understanding opens up new applications, from audio description for the visual impaired to visual access...

47,924 次观看 • 1 年前 •via X (Twitter)

11 条评论

kyutai 的头像
kyutai1 年前

🧠 How it works MoshiVis builds on Moshi, our speech-to-speech LLM — now enhanced with vision. 206M lightweight parameters on top of a frozen Moshi give it the power to discuss images while still remaining real-time on consumer-grade hardware. E.g., on a MacMini M4, MoshiVis only adds ~7ms per step to the ~45ms per step of the base model, thus remaining well below the 80ms threshold for our audio codec. That means fluid, live, and multimodal conversations with Moshi on your own device!

kyutai 的头像
kyutai1 年前

🧰 Fully open-source We’re releasing a detailed preprint, along with model weights and a first of its kind benchmark dataset for spoken visual question answering: 📄 Preprint 🧠 Speech Benchmarks 🧾 Model weights 🧪 Inference code in PyTorch, MLX, and Rust

kyutai 的头像
kyutai1 年前

If you want to work on cutting-edge research, join our non-profit AI lab in Paris 🇫🇷 Thanks to Iliad Group, CMA-CGM Group, Schmidt Sciences — and the open-source community.

AssemblyAI 的头像
AssemblyAI1 年前

Announcing: Our most advanced speech-to-text model goes beyond accuracy to capture the real-world complexity of human conversation and deliver reliable, source-of-truth audio data. Explore Universal-2 updates 👇

ZOHEB 的头像
ZOHEB1 年前

More lightweight! Sweet

Utopic e/λ 的头像
Utopic e/λ1 年前

share the repo with us ❤

Danish Khan 的头像
Danish Khan1 年前

Tried it out, very good performance for everyday tasks!

Frédéric H. (E/ACC) 的头像
Frédéric H. (E/ACC)1 年前

Ridiculous. Your AI is still not able to speak other language than English and doesn't even work decently. What a shame.

haareblond 的头像
haareblond1 年前

wow this is very cool!

Mogomra (e/acc) 的头像
Mogomra (e/acc)1 年前

Holey moley, it's the AI from "Start-up" in the real world!!

X_Learning969 的头像
X_Learning9691 年前

Local not working. Despite enabling mic and trying out the web ui it does not speak. Wonder if it can even hear me.

相关视频