Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

LETS GOO! Parler TTS 🔥 A fully open-source, Apache 2.0 licensed Text-to-speech model focused on providing maximum controllability. Through voice prompts, you can control the pitch, speed, gender, noise levels, emotion characteristics and more! > Trained on 10K hours of permissive data. > Offers control over the generations. >...

156,386 Aufrufe • vor 2 Jahren •via X (Twitter)

9 Kommentare

Profilbild von Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastavvor 2 Jahren

Try it out in the space directly (& share your generations below)!

Profilbild von Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastavvor 2 Jahren

Check out our inference plus training code base here:

Profilbild von Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastavvor 2 Jahren

You should also be able to use it in a Colab with less than 10 lines of code: import torch from parler_tts import ParlerTTSForConditionalGeneration from transformers import AutoTokenizer import soundfile as sf device = "cuda:0" if torch. cuda. is_available() else "cpu" model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler_tts_mini_v0.1").to(device) tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler_tts_mini_v0.1") prompt = "Hey, how are you doing today?" description = "A female speaker with a slightly low-pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality." input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device) prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device) generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids) audio_arr = generation.cpu().numpy().squeeze() sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)

Profilbild von Dennis Lysenko
Dennis Lysenkovor 2 Jahren

@reach_vb this is awesome -- can we run this on Replicate?

Profilbild von Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastavvor 2 Jahren

Not yet, but you can try it out and use it here:

Profilbild von Javier de la Rosa @versae@mastodon.social
Javier de la Rosa @[email protected]vor 2 Jahren

This is really cool! I've been looking at Parler and Data-Speech and would love to give it a try for low-resource languages. What's the minimum amount of hours needed for this to adapt to another language? And does the audio need to be separated by speaker?

Profilbild von Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastavvor 2 Jahren

We will release fine-tuning support soon. I think for the most part the quality of the dataset matters way more than the quantity. You’d need to have enough diversity to ensure a balance in the voice prompts. Once that is in you should be able to train in any language. That said, we haven’t tried this yet, so this is all a hypothesis at this point.

Profilbild von bitBrain
bitBrainvor 2 Jahren

@ClementDelangue but can it laugh? nono sorry, I mean Holy shit! nice!

Profilbild von bane
banevor 2 Jahren

@huggingface Not bad

Ähnliche Videos