Video wird geladen...
Video konnte nicht geladen werden
LETS GOO! Parler TTS 🔥 A fully open-source, Apache 2.0 licensed Text-to-speech model focused on providing maximum controllability. Through voice prompts, you can control the pitch, speed, gender, noise levels, emotion characteristics and more! > Trained on 10K hours of permissive data. > Offers control over the generations. >... show more
156,386 Aufrufe • vor 2 Jahren •via X (Twitter)
9 Kommentare

Try it out in the space directly (& share your generations below)!

Check out our inference plus training code base here:

You should also be able to use it in a Colab with less than 10 lines of code: import torch from parler_tts import ParlerTTSForConditionalGeneration from transformers import AutoTokenizer import soundfile as sf device = "cuda:0" if torch. cuda. is_available() else "cpu" model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler_tts_mini_v0.1").to(device) tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler_tts_mini_v0.1") prompt = "Hey, how are you doing today?" description = "A female speaker with a slightly low-pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality." input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device) prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device) generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids) audio_arr = generation.cpu().numpy().squeeze() sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)

@reach_vb this is awesome -- can we run this on Replicate?

Not yet, but you can try it out and use it here:

This is really cool! I've been looking at Parler and Data-Speech and would love to give it a try for low-resource languages. What's the minimum amount of hours needed for this to adapt to another language? And does the audio need to be separated by speaker?

We will release fine-tuning support soon. I think for the most part the quality of the dataset matters way more than the quantity. You’d need to have enough diversity to ensure a balance in the voice prompts. Once that is in you should be able to train in any language. That said, we haven’t tried this yet, so this is all a hypothesis at this point.

@ClementDelangue but can it laugh? nono sorry, I mean Holy shit! nice!

@huggingface Not bad
