Video yükleniyor...
Video Yüklenemedi
MARS5 TTS: Open Source Text to Speech with insane prosodic control! 🔥 > Voice cloning with less than 5 seconds of audio > Two stage Auto-Regressive (750M) + Non-Auto Regressive (450M) model architecture > Used BPE tokenizer to enable control over punctuations, pauses, stops etc. > AR model predicts... show more
162,180 görüntüleme • 2 yıl önce •via X (Twitter)
10 Yorum

Vaibhav (VB) Srivastav2 yıl önce
Check out the model here:

Vaibhav (VB) Srivastav2 yıl önce
GitHub for more deets:

Carlos DP2 yıl önce
Wow, these outputs are incredible. Like, is this the new SOTA? The samples sound better than the 11labs ones, at least, but idk what params were used

Vaibhav (VB) Srivastav2 yıl önce
750M + 450M -> pretty lightweight overall, in the GitHub README they promise more updates coming soon :D

Furkan Gözükara2 yıl önce
5 seconds to clone is always a lie but i can't say for sure without testing i asked them for gradio demo app to be shared

marko.2 yıl önce
Released under GNU AGPL 3.0, a very curious choice for a model but I'll take it 🎉

Marouane Belkouri2 yıl önce
Finnetunning code ?

adivina_soy32 yıl önce
@huggingface Impresionante. Crees que seria posible combinarlo con Hallo?

Thomas Hill2 yıl önce
Nice share 🔥

STEVE blowJOBS2 yıl önce
This is racist ask me why
Benzer Videolar
0:48
Sensitive content
Meta just released MusicGen, a simple and controllable model for music generation MusicGen is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like MusicLM, MusicGen doesn't not require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, can predict them in parallel, thus having only 50 auto-regressive steps per second of audio try out the Gradio demo: Models on Hugging Face: github:
AK
627,429 görüntüleme • 3 yıl önce
