Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Meet MoshiVis🎙️🖼️, the first open-source real-time speech model that can talk about images! It sees, understands, and talks about images — naturally, and out loud. Voice interaction with a compact model endowed with visual understanding opens up new applications, from audio description for the visual impaired to visual access...

47,924 Aufrufe • vor 1 Jahr •via X (Twitter)

11 Kommentare

Profilbild von kyutai
kyutaivor 1 Jahr

🧠 How it works MoshiVis builds on Moshi, our speech-to-speech LLM — now enhanced with vision. 206M lightweight parameters on top of a frozen Moshi give it the power to discuss images while still remaining real-time on consumer-grade hardware. E.g., on a MacMini M4, MoshiVis only adds ~7ms per step to the ~45ms per step of the base model, thus remaining well below the 80ms threshold for our audio codec. That means fluid, live, and multimodal conversations with Moshi on your own device!

Profilbild von kyutai
kyutaivor 1 Jahr

🧰 Fully open-source We’re releasing a detailed preprint, along with model weights and a first of its kind benchmark dataset for spoken visual question answering: 📄 Preprint 🧠 Speech Benchmarks 🧾 Model weights 🧪 Inference code in PyTorch, MLX, and Rust

Profilbild von kyutai
kyutaivor 1 Jahr

If you want to work on cutting-edge research, join our non-profit AI lab in Paris 🇫🇷 Thanks to Iliad Group, CMA-CGM Group, Schmidt Sciences — and the open-source community.

Profilbild von AssemblyAI
AssemblyAIvor 1 Jahr

Announcing: Our most advanced speech-to-text model goes beyond accuracy to capture the real-world complexity of human conversation and deliver reliable, source-of-truth audio data. Explore Universal-2 updates 👇

Profilbild von ZOHEB
ZOHEBvor 1 Jahr

More lightweight! Sweet

Profilbild von Utopic e/λ
Utopic e/λvor 1 Jahr

share the repo with us ❤

Profilbild von Danish Khan
Danish Khanvor 1 Jahr

Tried it out, very good performance for everyday tasks!

Profilbild von Frédéric H. (E/ACC)
Frédéric H. (E/ACC)vor 1 Jahr

Ridiculous. Your AI is still not able to speak other language than English and doesn't even work decently. What a shame.

Profilbild von haareblond
haareblondvor 1 Jahr

wow this is very cool!

Profilbild von Mogomra (e/acc)
Mogomra (e/acc)vor 1 Jahr

Holey moley, it's the AI from "Start-up" in the real world!!

Profilbild von X_Learning969
X_Learning969vor 1 Jahr

Local not working. Despite enabling mic and trying out the web ui it does not speak. Wonder if it can even hear me.

Ähnliche Videos