Video wird geladen...
Video konnte nicht geladen werden
Presenting MetaVoice-1B, a 1.2B parameter base model for TTS (text-to-speech). * Emotional speech in English * Voice cloning with fine-tuning * Zero-shot cloning for American & British voices * Support for long-form synthesis
111,799 Aufrufe • vor 2 Jahren •via X (Twitter)
10 Kommentare

We’re releasing MetaVoice-1B under the Apache 2.0 license, it can be used without restrictions. Model on HF:

Thanks also to @honualx, @jadecopet, @RobinSanroman, @adiyossLC, @FelixKreuk, @osanseviero, @reach_vb, @librivox, DeepFilterNet, and all the other open-source contributors who made this possible. Also, a big shoutout to @togethercompute for their 24x7 help with our cluster.

You can also try it out on @replicate here:

This sounds great! Does it support streaming? What's the real time factor on a 3090 or 4090?

The future of TTS is looking incredibly dynamic! Open Source emotional depth and voice cloning capabilities seem like game-changers. Curious about the quality of long-form content synthesis.

Tried the demo. I think xtts does better zero-shot for english voices, and is much lighter.

What about the paper ??

This is very cool. We’ve been using Azure’s text to speech for some of our work, it’s reassuring to see there’s some optionality in the space. If anyone has any other suggestions please comment

Are more languages planned?

will it be possible to add other laguages in future? or maby with finetuing?
