Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Presenting MetaVoice-1B, a 1.2B parameter base model for TTS (text-to-speech). * Emotional speech in English * Voice cloning with fine-tuning * Zero-shot cloning for American & British voices * Support for long-form synthesis

111,799 görüntüleme • 2 yıl önce •via X (Twitter)

10 Yorum

MetaVoice profil fotoğrafı
MetaVoice2 yıl önce

We’re releasing MetaVoice-1B under the Apache 2.0 license, it can be used without restrictions. Model on HF:

MetaVoice profil fotoğrafı
MetaVoice2 yıl önce

Thanks also to @honualx, @jadecopet, @RobinSanroman, @adiyossLC, @FelixKreuk, @osanseviero, @reach_vb, @librivox, DeepFilterNet, and all the other open-source contributors who made this possible. Also, a big shoutout to @togethercompute for their 24x7 help with our cluster.

Luis C profil fotoğrafı
Luis C2 yıl önce

You can also try it out on @replicate here:

James Darpinian profil fotoğrafı
James Darpinian2 yıl önce

This sounds great! Does it support streaming? What's the real time factor on a 3090 or 4090?

Kolin Koehl profil fotoğrafı
Kolin Koehl2 yıl önce

The future of TTS is looking incredibly dynamic! Open Source emotional depth and voice cloning capabilities seem like game-changers. Curious about the quality of long-form content synthesis.

🩷Otome-chan🩷 profil fotoğrafı
🩷Otome-chan🩷2 yıl önce

Tried the demo. I think xtts does better zero-shot for english voices, and is much lighter.

Abraham Owodunni profil fotoğrafı
Abraham Owodunni2 yıl önce

What about the paper ??

mmolony profil fotoğrafı
mmolony2 yıl önce

This is very cool. We’ve been using Azure’s text to speech for some of our work, it’s reassuring to see there’s some optionality in the space. If anyone has any other suggestions please comment

Andre.W profil fotoğrafı
Andre.W2 yıl önce

Are more languages planned?

haareblond profil fotoğrafı
haareblond2 yıl önce

will it be possible to add other laguages in future? or maby with finetuing?

Benzer Videolar

VoxCPM 2 just dropped by OpenBMB Only 2B-param open-source TTS (Text-to-Speech) model built for production-grade multilingual voice work. Apache-2.0 license, Can run on only 8GB VRAM. • Eliminates the "robotic" feel of traditional TTS, delivering prosody and emotional depth suitable for high-stakes professional environments like filmmaking, gaming, animation, and audiobooks. • 30-language multilingual: no language tag needed, just type in a supported language and generate directly. • Voice design: create a brand-new voice from a text description alone, like age, tone, pace, or emotion. No reference audio required. Describe the desired voice characteristics (gender, age, tone, emotion, pace …) in Control Instruction, and VoxCPM2 will craft a unique voice from your description alone. • Controllable cloning: clone from a short clip, then steer delivery style without losing the speaker’s core voice. • Ultimate cloning: use reference audio + transcript for continuation-style cloning that keeps the tiny vocal details. • 48kHz output: takes 16kHz reference audio and produces studio-quality speech without an external upsampler. • Real-time ready: around 0.3 RTF on RTX 4090, even lower with Nano-VLLM. • Commercial use: Apache-2.0 licensed. Developer-Friendly Infrastructure: - Native Torch Inference: Direct support for PyTorch-based workflows. - Training Flexibility: Supports both full-parameter and LoRA fine-tuning for specific domain adaptation. - Production Readiness: Compatible with voxcpm-nanovllm for large-scale, high-concurrency deployment.

Rohan Paul

13,541 görüntüleme • 2 ay önce