Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

LETS GOO! Parler TTS 🔥 A fully open-source, Apache 2.0 licensed Text-to-speech model focused on providing maximum controllability. Through voice prompts, you can control the pitch, speed, gender, noise levels, emotion characteristics and more! > Trained on 10K hours of permissive data. > Offers control over the generations. >...

156,386 görüntüleme • 2 yıl önce •via X (Twitter)

9 Yorum

Vaibhav (VB) Srivastav profil fotoğrafı
Vaibhav (VB) Srivastav2 yıl önce

Try it out in the space directly (& share your generations below)!

Vaibhav (VB) Srivastav profil fotoğrafı
Vaibhav (VB) Srivastav2 yıl önce

Check out our inference plus training code base here:

Vaibhav (VB) Srivastav profil fotoğrafı
Vaibhav (VB) Srivastav2 yıl önce

You should also be able to use it in a Colab with less than 10 lines of code: import torch from parler_tts import ParlerTTSForConditionalGeneration from transformers import AutoTokenizer import soundfile as sf device = "cuda:0" if torch. cuda. is_available() else "cpu" model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler_tts_mini_v0.1").to(device) tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler_tts_mini_v0.1") prompt = "Hey, how are you doing today?" description = "A female speaker with a slightly low-pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality." input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device) prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device) generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids) audio_arr = generation.cpu().numpy().squeeze() sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)

Dennis Lysenko profil fotoğrafı
Dennis Lysenko2 yıl önce

@reach_vb this is awesome -- can we run this on Replicate?

Vaibhav (VB) Srivastav profil fotoğrafı
Vaibhav (VB) Srivastav2 yıl önce

Not yet, but you can try it out and use it here:

Javier de la Rosa @versae@mastodon.social profil fotoğrafı
Javier de la Rosa @[email protected]2 yıl önce

This is really cool! I've been looking at Parler and Data-Speech and would love to give it a try for low-resource languages. What's the minimum amount of hours needed for this to adapt to another language? And does the audio need to be separated by speaker?

Vaibhav (VB) Srivastav profil fotoğrafı
Vaibhav (VB) Srivastav2 yıl önce

We will release fine-tuning support soon. I think for the most part the quality of the dataset matters way more than the quantity. You’d need to have enough diversity to ensure a balance in the voice prompts. Once that is in you should be able to train in any language. That said, we haven’t tried this yet, so this is all a hypothesis at this point.

bitBrain profil fotoğrafı
bitBrain2 yıl önce

@ClementDelangue but can it laugh? nono sorry, I mean Holy shit! nice!

bane profil fotoğrafı
bane2 yıl önce

@huggingface Not bad

Benzer Videolar