Загрузка видео...

Не удалось загрузить видео

На главную

Presenting MetaVoice-1B, a 1.2B parameter base model for TTS (text-to-speech). * Emotional speech in English * Voice cloning with fine-tuning * Zero-shot cloning for American & British voices * Support for long-form synthesis

111,799 просмотров • 2 лет назад •via X (Twitter)

Комментарии: 10

Фото профиля MetaVoice
MetaVoice2 лет назад

We’re releasing MetaVoice-1B under the Apache 2.0 license, it can be used without restrictions. Model on HF:

Фото профиля MetaVoice
MetaVoice2 лет назад

Thanks also to @honualx, @jadecopet, @RobinSanroman, @adiyossLC, @FelixKreuk, @osanseviero, @reach_vb, @librivox, DeepFilterNet, and all the other open-source contributors who made this possible. Also, a big shoutout to @togethercompute for their 24x7 help with our cluster.

Фото профиля Luis C
Luis C2 лет назад

You can also try it out on @replicate here:

Фото профиля James Darpinian
James Darpinian2 лет назад

This sounds great! Does it support streaming? What's the real time factor on a 3090 or 4090?

Фото профиля Kolin Koehl
Kolin Koehl2 лет назад

The future of TTS is looking incredibly dynamic! Open Source emotional depth and voice cloning capabilities seem like game-changers. Curious about the quality of long-form content synthesis.

Фото профиля 🩷Otome-chan🩷
🩷Otome-chan🩷2 лет назад

Tried the demo. I think xtts does better zero-shot for english voices, and is much lighter.

Фото профиля Abraham Owodunni
Abraham Owodunni2 лет назад

What about the paper ??

Фото профиля mmolony
mmolony2 лет назад

This is very cool. We’ve been using Azure’s text to speech for some of our work, it’s reassuring to see there’s some optionality in the space. If anyone has any other suggestions please comment

Фото профиля Andre.W
Andre.W2 лет назад

Are more languages planned?

Фото профиля haareblond
haareblond2 лет назад

will it be possible to add other laguages in future? or maby with finetuing?

Похожие видео

VoxCPM 2 just dropped by OpenBMB Only 2B-param open-source TTS (Text-to-Speech) model built for production-grade multilingual voice work. Apache-2.0 license, Can run on only 8GB VRAM. • Eliminates the "robotic" feel of traditional TTS, delivering prosody and emotional depth suitable for high-stakes professional environments like filmmaking, gaming, animation, and audiobooks. • 30-language multilingual: no language tag needed, just type in a supported language and generate directly. • Voice design: create a brand-new voice from a text description alone, like age, tone, pace, or emotion. No reference audio required. Describe the desired voice characteristics (gender, age, tone, emotion, pace …) in Control Instruction, and VoxCPM2 will craft a unique voice from your description alone. • Controllable cloning: clone from a short clip, then steer delivery style without losing the speaker’s core voice. • Ultimate cloning: use reference audio + transcript for continuation-style cloning that keeps the tiny vocal details. • 48kHz output: takes 16kHz reference audio and produces studio-quality speech without an external upsampler. • Real-time ready: around 0.3 RTF on RTX 4090, even lower with Nano-VLLM. • Commercial use: Apache-2.0 licensed. Developer-Friendly Infrastructure: - Native Torch Inference: Direct support for PyTorch-based workflows. - Training Flexibility: Supports both full-parameter and LoRA fine-tuning for specific domain adaptation. - Production Readiness: Compatible with voxcpm-nanovllm for large-scale, high-concurrency deployment.

Rohan Paul

13,541 просмотров • 2 месяцев назад