Loading video...

Video Failed to Load

Go Home

LETS GOO! Parler TTS 🔥 A fully open-source, Apache 2.0 licensed Text-to-speech model focused on providing maximum controllability. Through voice prompts, you can control the pitch, speed, gender, noise levels, emotion characteristics and more! > Trained on 10K hours of permissive data. > Offers control over the generations. >...

156,386 views • 2 years ago •via X (Twitter)

9 Comments

Vaibhav (VB) Srivastav's profile picture
Vaibhav (VB) Srivastav2 years ago

Try it out in the space directly (& share your generations below)!

Vaibhav (VB) Srivastav's profile picture
Vaibhav (VB) Srivastav2 years ago

Check out our inference plus training code base here:

Vaibhav (VB) Srivastav's profile picture
Vaibhav (VB) Srivastav2 years ago

You should also be able to use it in a Colab with less than 10 lines of code: import torch from parler_tts import ParlerTTSForConditionalGeneration from transformers import AutoTokenizer import soundfile as sf device = "cuda:0" if torch. cuda. is_available() else "cpu" model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler_tts_mini_v0.1").to(device) tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler_tts_mini_v0.1") prompt = "Hey, how are you doing today?" description = "A female speaker with a slightly low-pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality." input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device) prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device) generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids) audio_arr = generation.cpu().numpy().squeeze() sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)

Dennis Lysenko's profile picture
Dennis Lysenko2 years ago

@reach_vb this is awesome -- can we run this on Replicate?

Vaibhav (VB) Srivastav's profile picture
Vaibhav (VB) Srivastav2 years ago

Not yet, but you can try it out and use it here:

Javier de la Rosa @versae@mastodon.social's profile picture
Javier de la Rosa @[email protected]2 years ago

This is really cool! I've been looking at Parler and Data-Speech and would love to give it a try for low-resource languages. What's the minimum amount of hours needed for this to adapt to another language? And does the audio need to be separated by speaker?

Vaibhav (VB) Srivastav's profile picture
Vaibhav (VB) Srivastav2 years ago

We will release fine-tuning support soon. I think for the most part the quality of the dataset matters way more than the quantity. You’d need to have enough diversity to ensure a balance in the voice prompts. Once that is in you should be able to train in any language. That said, we haven’t tried this yet, so this is all a hypothesis at this point.

bitBrain's profile picture
bitBrain2 years ago

@ClementDelangue but can it laugh? nono sorry, I mean Holy shit! nice!

bane's profile picture
bane2 years ago

@huggingface Not bad

Related Videos