Loading video...
Video Failed to Load
End to End Speech models are on fire - LLAMA-OMNI 8B - Apache licensed! 🔥 > Speech Encoder - Whisper Large v3 > LLM backbone - Llama 3.1 8B Instruct > Speech Decoder - HuBERT (UnitY) > Simultaneously generate Speech + Text > Less than 250 ms latency >... show more
47,921 views • 1 year ago •via X (Twitter)
10 Comments

Vaibhav (VB) Srivastav1 year ago
Model checkpoint:

Vaibhav (VB) Srivastav1 year ago
Github repo:

Qingkai Fang1 year ago
Thanks for sharing our work!

Vaibhav (VB) Srivastav1 year ago
🔥

Tommy D. Rossi1 year ago
I wouldn't call this end to end, let's keep that term for single multi modal models that do everything by themselves

ThisAndThat1 year ago
less than 250ms latency on what?

Vaibhav (VB) Srivastav1 year ago
Time to first audio chunk according to their GH.

Waifuology1 year ago
License looks good, but the voice quality isn't really there yet.

Hiro1 year ago
Do you know what are supported languages?

Trying my best :-)1 year ago
Can it detect emotion?


