Загрузка видео...

Не удалось загрузить видео

На главную

LETS GOO! Parler TTS 🔥 A fully open-source, Apache 2.0 licensed Text-to-speech model focused on providing maximum controllability. Through voice prompts, you can control the pitch, speed, gender, noise levels, emotion characteristics and more! > Trained on 10K hours of permissive data. > Offers control over the generations. >...

156,386 просмотров • 2 лет назад •via X (Twitter)

Комментарии: 9

Фото профиля Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastav2 лет назад

Try it out in the space directly (& share your generations below)!

Фото профиля Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastav2 лет назад

Check out our inference plus training code base here:

Фото профиля Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastav2 лет назад

You should also be able to use it in a Colab with less than 10 lines of code: import torch from parler_tts import ParlerTTSForConditionalGeneration from transformers import AutoTokenizer import soundfile as sf device = "cuda:0" if torch. cuda. is_available() else "cpu" model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler_tts_mini_v0.1").to(device) tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler_tts_mini_v0.1") prompt = "Hey, how are you doing today?" description = "A female speaker with a slightly low-pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality." input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device) prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device) generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids) audio_arr = generation.cpu().numpy().squeeze() sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)

Фото профиля Dennis Lysenko
Dennis Lysenko2 лет назад

@reach_vb this is awesome -- can we run this on Replicate?

Фото профиля Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastav2 лет назад

Not yet, but you can try it out and use it here:

Фото профиля Javier de la Rosa @versae@mastodon.social
Javier de la Rosa @[email protected]2 лет назад

This is really cool! I've been looking at Parler and Data-Speech and would love to give it a try for low-resource languages. What's the minimum amount of hours needed for this to adapt to another language? And does the audio need to be separated by speaker?

Фото профиля Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastav2 лет назад

We will release fine-tuning support soon. I think for the most part the quality of the dataset matters way more than the quantity. You’d need to have enough diversity to ensure a balance in the voice prompts. Once that is in you should be able to train in any language. That said, we haven’t tried this yet, so this is all a hypothesis at this point.

Фото профиля bitBrain
bitBrain2 лет назад

@ClementDelangue but can it laugh? nono sorry, I mean Holy shit! nice!

Фото профиля bane
bane2 лет назад

@huggingface Not bad

Похожие видео