Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Presenting MetaVoice-1B, a 1.2B parameter base model for TTS (text-to-speech). * Emotional speech in English * Voice cloning with fine-tuning * Zero-shot cloning for American & British voices * Support for long-form synthesis

111,799 Aufrufe • vor 2 Jahren •via X (Twitter)

10 Kommentare

Profilbild von MetaVoice
MetaVoicevor 2 Jahren

We’re releasing MetaVoice-1B under the Apache 2.0 license, it can be used without restrictions. Model on HF:

Profilbild von MetaVoice
MetaVoicevor 2 Jahren

Thanks also to @honualx, @jadecopet, @RobinSanroman, @adiyossLC, @FelixKreuk, @osanseviero, @reach_vb, @librivox, DeepFilterNet, and all the other open-source contributors who made this possible. Also, a big shoutout to @togethercompute for their 24x7 help with our cluster.

Profilbild von Luis C
Luis Cvor 2 Jahren

You can also try it out on @replicate here:

Profilbild von James Darpinian
James Darpinianvor 2 Jahren

This sounds great! Does it support streaming? What's the real time factor on a 3090 or 4090?

Profilbild von Kolin Koehl
Kolin Koehlvor 2 Jahren

The future of TTS is looking incredibly dynamic! Open Source emotional depth and voice cloning capabilities seem like game-changers. Curious about the quality of long-form content synthesis.

Profilbild von 🩷Otome-chan🩷
🩷Otome-chan🩷vor 2 Jahren

Tried the demo. I think xtts does better zero-shot for english voices, and is much lighter.

Profilbild von Abraham Owodunni
Abraham Owodunnivor 2 Jahren

What about the paper ??

Profilbild von mmolony
mmolonyvor 2 Jahren

This is very cool. We’ve been using Azure’s text to speech for some of our work, it’s reassuring to see there’s some optionality in the space. If anyone has any other suggestions please comment

Profilbild von Andre.W
Andre.Wvor 2 Jahren

Are more languages planned?

Profilbild von haareblond
haareblondvor 2 Jahren

will it be possible to add other laguages in future? or maby with finetuing?

Ähnliche Videos

VoxCPM 2 just dropped by OpenBMB Only 2B-param open-source TTS (Text-to-Speech) model built for production-grade multilingual voice work. Apache-2.0 license, Can run on only 8GB VRAM. • Eliminates the "robotic" feel of traditional TTS, delivering prosody and emotional depth suitable for high-stakes professional environments like filmmaking, gaming, animation, and audiobooks. • 30-language multilingual: no language tag needed, just type in a supported language and generate directly. • Voice design: create a brand-new voice from a text description alone, like age, tone, pace, or emotion. No reference audio required. Describe the desired voice characteristics (gender, age, tone, emotion, pace …) in Control Instruction, and VoxCPM2 will craft a unique voice from your description alone. • Controllable cloning: clone from a short clip, then steer delivery style without losing the speaker’s core voice. • Ultimate cloning: use reference audio + transcript for continuation-style cloning that keeps the tiny vocal details. • 48kHz output: takes 16kHz reference audio and produces studio-quality speech without an external upsampler. • Real-time ready: around 0.3 RTF on RTX 4090, even lower with Nano-VLLM. • Commercial use: Apache-2.0 licensed. Developer-Friendly Infrastructure: - Native Torch Inference: Direct support for PyTorch-based workflows. - Training Flexibility: Supports both full-parameter and LoRA fine-tuning for specific domain adaptation. - Production Readiness: Compatible with voxcpm-nanovllm for large-scale, high-concurrency deployment.

Rohan Paul

13,541 Aufrufe • vor 2 Monaten