Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Smol TTS keeps getting better! Introducing OuteTTS v0.2 - 500M parameters, multilingual with voice cloning! 🔥 > Multilingual - English, Chinese, Korean & Japanese > Cross platform inference w/ llama.cpp > Zero-shot voice cloning > Trained on 5 Billion audio tokens > Qwen 2.5 0.5B LLM backbone > Trained... show more

Vaibhav (VB) Srivastav

37,574 subscribers

44,566 просмотров • 1 год назад •via X (Twitter)

Наука и технологии Образование

Anya Rossi• Live Now

Private livecam show

Комментарии: 11

Фото профиля Vaibhav (VB) Srivastav

Vaibhav (VB) Srivastav1 год назад

Check out the model weights and inference code base here:

Фото профиля Vaibhav (VB) Srivastav

Vaibhav (VB) Srivastav1 год назад

llama.cpp compatible GGUFs:

Фото профиля Vaibhav (VB) Srivastav

Vaibhav (VB) Srivastav1 год назад

OuteTTS GitHub:

Фото профиля Haorui He

Haorui He1 год назад

Big Congrats!!! Another SOTA TTS model trained on Emilia after F5-TTS & MaskGCT! Try out:

Фото профиля Tommy Falkowski

Tommy Falkowski1 год назад

Just tested it out and the quality is very good. More importantly, the fact that you can generate speaker profiles is awesome! Will test it out some more and add it to my growing list of supported tts engines in my app 🤣

Фото профиля SkyTab

SkyTab1 год назад

Switch to SkyTab and get $5,000! A modern and sleek POS system with commercial-grade durability. 💪 ✅ $0 upfront costs ✅ Best in-class POS ✅ Local service & 24/7 support ✅ And much more! Make the switch today:

Фото профиля Umesh

Umesh1 год назад

This is improving so fast that I don't want to speak myself anymore. Just use this and get done 🤖

Фото профиля Fronesis

Fronesis1 год назад

Thank you for your work and for sharing insights! 🙌 Advancements like OuteTTS v0.2 showcase the rapid evolution of AI and its potential to empower global communities. 🚀 The future of #AI is bright, and collaborative innovation is key to unlocking its full potential!

Фото профиля Digital Doctor

Digital Doctor1 год назад

Are you saying you can voice CLONE on a R-Pi? Is that what you're saying????

Фото профиля 斎藤ただし, Tadashi Saito

斎藤ただし, Tadashi Saito1 год назад

The font of Japanese characters is wrong, it's for (maybe) Chinese. I hope you'll pay attention and respect to each of them when you are working for multilingual/multicultural things. (like your TTS engine itself does. Brilliant quality✨️)

Фото профиля Ahmed Mansour

Ahmed Mansour1 год назад

I tried to run it on HF. average inference time for 200 chars is >1 hour running on CPU. Why is this model so heavy?

Похожие видео

Wow! New Speech to Speech model - Fish Agent v0.1 3B by Fish Audio 🔥 > Trained on 700K hours of multilingual audio > Continue-pretrained version of Qwen-2.5-3B-Instruct for 200B audio & text tokens > Zero-shot voice cloning > Text + audio input/ Audio output > Ultra-fast inference w/ 200ms TTFA > Models on the Hub & Finetuning code on its way! 🚀 What an amazing time to be alive 🤗

Wow! New Speech to Speech model - Fish Agent v0.1 3B by Fish Audio 🔥 > Trained on 700K hours of multilingual audio > Continue-pretrained version of Qwen-2.5-3B-Instruct for 200B audio & text tokens > Zero-shot voice cloning > Text + audio input/ Audio output > Ultra-fast inference w/ 200ms TTFA > Models on the Hub & Finetuning code on its way! 🚀 What an amazing time to be alive 🤗

Vaibhav (VB) Srivastav

66,945 просмотров • 1 год назад

Fish: Multilingual Voice Cloning Fish is a voice cloning AI model that supports multiple languages out of the box: - English - Japanese - Korean - Chinese - French - German - Arabic - Spanish I wrote a 1-click launcher for Fish gradio app, so you can run it locally. Enjoy.

Fish: Multilingual Voice Cloning Fish is a voice cloning AI model that supports multiple languages out of the box: - English - Japanese - Korean - Chinese - French - German - Arabic - Spanish I wrote a 1-click launcher for Fish gradio app, so you can run it locally. Enjoy.

cocktail peanut

35,815 просмотров • 1 год назад

Pretty fucking wild - Chatterbox TTS by Resemble AI - Zero shot voice cloning, Apache 2.0 licensed 🤯 > Outperforms ElevenLabs > Trained on 500K hours of audio > Llama 500M arch > Emotionally aware speech > Zero shot voice cloning > Live on Hugging Face 🤗

Pretty fucking wild - Chatterbox TTS by Resemble AI - Zero shot voice cloning, Apache 2.0 licensed 🤯 > Outperforms ElevenLabs > Trained on 500K hours of audio > Llama 500M arch > Emotionally aware speech > Zero shot voice cloning > Live on Hugging Face 🤗

Vaibhav (VB) Srivastav

71,652 просмотров • 1 год назад

HOLY FUCK! Zyphra just dropped Zonos - Apache 2.0 licensed, Multilingual, Text to Speech model with INSTANT voice cloning! 🔥 > Zero-shot TTS with Voice Cloning: Input text and a 10-30 second speaker sample to generate high-quality text-to-speech output > Audio Prefix Inputs: Enhance speaker matching by adding an audio prefix to the text, enabling behaviors like whispering that are hard to achieve with voice cloning alone > Multilingual Support: Supports English, Japanese, Chinese, French, and German > Audio Quality & Emotion Control: Fine-tune speaking rate, pitch, frequency, audio quality, and emotions (e.g., happiness, anger, sadness, fear) > Fast Performance: Runs at ~2x real-time speed on an RTX 4090 > Available on the Hugging Face Hub 🤗

HOLY FUCK! Zyphra just dropped Zonos - Apache 2.0 licensed, Multilingual, Text to Speech model with INSTANT voice cloning! 🔥 > Zero-shot TTS with Voice Cloning: Input text and a 10-30 second speaker sample to generate high-quality text-to-speech output > Audio Prefix Inputs: Enhance speaker matching by adding an audio prefix to the text, enabling behaviors like whispering that are hard to achieve with voice cloning alone > Multilingual Support: Supports English, Japanese, Chinese, French, and German > Audio Quality & Emotion Control: Fine-tune speaking rate, pitch, frequency, audio quality, and emotions (e.g., happiness, anger, sadness, fear) > Fast Performance: Runs at ~2x real-time speed on an RTX 4090 > Available on the Hugging Face Hub 🤗

Vaibhav (VB) Srivastav

298,858 просмотров • 1 год назад

Pretty WILD - SoTA open source TTS model that beats ElevenLabs/ Sesame - Dia 1.6B - Apache 2.0 licensed! 🔥 > Ultra realistic voice synthesis > Capable of producing non-verbal sounds - coughing, laughing 💥 > Zero shot Voice Cloning > Real-time TTS synthesis > Can run on your MacBook > Trending #2 on Hugging Face Weights on the Hub and code on GitHub! 🤯

Pretty WILD - SoTA open source TTS model that beats ElevenLabs/ Sesame - Dia 1.6B - Apache 2.0 licensed! 🔥 > Ultra realistic voice synthesis > Capable of producing non-verbal sounds - coughing, laughing 💥 > Zero shot Voice Cloning > Real-time TTS synthesis > Can run on your MacBook > Trending #2 on Hugging Face Weights on the Hub and code on GitHub! 🤯

Vaibhav (VB) Srivastav

39,165 просмотров • 1 год назад

Awesome! OmniVoice-TTS in ComfyUI. - zero-shot multilingual TTS; - 600+ languages; - voice cloning; - voice design; - multi-speaker dialogues; - supports SageAttention and non-verbal expression tags.

Awesome! OmniVoice-TTS in ComfyUI. - zero-shot multilingual TTS; - 600+ languages; - voice cloning; - voice design; - multi-speaker dialogues; - supports SageAttention and non-verbal expression tags.

Wildminder

35,799 просмотров • 2 месяцев назад

train YOLOv9 on your dataset tutorial - run inference with a pre-trained COCO model - fine-tune model on custom dataset - evaluate the trained model - run inference with a fine-tuned model blogpost: ↓ read more

train YOLOv9 on your dataset tutorial - run inference with a pre-trained COCO model - fine-tune model on custom dataset - evaluate the trained model - run inference with a fine-tuned model blogpost: ↓ read more

SkalskiP

111,792 просмотров • 2 лет назад

The new open-source Text to Speech model: Fish Speech 1.4 is brilliant! Trained on a massive 700K hours of multilingual speech data in 8 languages - Instant voice cloning 🗣️ - Ultra-low latency ⚡ - Compact model (~1GB weights) 🏋️‍♂️

The new open-source Text to Speech model: Fish Speech 1.4 is brilliant! Trained on a massive 700K hours of multilingual speech data in 8 languages - Instant voice cloning 🗣️ - Ultra-low latency ⚡ - Compact model (~1GB weights) 🏋️‍♂️

Rohan Paul

228,836 просмотров • 1 год назад

NEW: Higgs Audio V2 from BosonAI open, unified TTS model w/ voice cloning, beats GPT 4o mini tts and ElevenLabs v2 🔥 > Trained on 10M hours (speech, music, events) > Built on top of Llama 3.2 3B > Works real-time and on edge > Beats GPT-4o-mini-tts, ElevenLabs v2 in prosody & emotion Multi-speaker dialog > Zero-shot voice cloning 🤩 > Available on Hugging Face Kudos to folks at Boson AI for releasing such a brilliant work and all the details around the model! 🤗

NEW: Higgs Audio V2 from BosonAI open, unified TTS model w/ voice cloning, beats GPT 4o mini tts and ElevenLabs v2 🔥 > Trained on 10M hours (speech, music, events) > Built on top of Llama 3.2 3B > Works real-time and on edge > Beats GPT-4o-mini-tts, ElevenLabs v2 in prosody & emotion Multi-speaker dialog > Zero-shot voice cloning 🤩 > Available on Hugging Face Kudos to folks at Boson AI for releasing such a brilliant work and all the details around the model! 🤗

Vaibhav (VB) Srivastav

79,585 просмотров • 10 месяцев назад

Hot! We have a new strong voice model. MOSS-TTS - a production-ready flagship 8B TTS; - high-fidelity zero-shot voice cloning, stable long-form gen; - multilingual; - lossless reconstruction; fine-grained pronunciation control; - token-level duration control, - voice creator, sound effects. Outstanding quality.

Hot! We have a new strong voice model. MOSS-TTS - a production-ready flagship 8B TTS; - high-fidelity zero-shot voice cloning, stable long-form gen; - multilingual; - lossless reconstruction; fine-grained pronunciation control; - token-level duration control, - voice creator, sound effects. Outstanding quality.

Wildminder

12,121 просмотров • 4 месяцев назад

Introducing Fish Speech 1.5 🎉 - Making state-of-the-art TTS accessible to everyone! Highlights: - #2 ranked on TTS-Arena (as "Anonymous Sparkle") - 1M hours of multilingual training data - 13 languages supported, including English, Chinese, Japanese & more - <150ms latency with high-quality instant voice cloning - Pretrained model now open source - Cost-effective self-hosting or cloud options Let's check out the details 🧵⬇️

Introducing Fish Speech 1.5 🎉 - Making state-of-the-art TTS accessible to everyone! Highlights: - #2 ranked on TTS-Arena (as "Anonymous Sparkle") - 1M hours of multilingual training data - 13 languages supported, including English, Chinese, Japanese & more - <150ms latency with high-quality instant voice cloning - Pretrained model now open source - Cost-effective self-hosting or cloud options Let's check out the details 🧵⬇️

Fish Audio

101,606 просмотров • 1 год назад

🎙️ Chatterbox is now available on Comfy Cloud! Zero-shot voice cloning, 23 languages, and expressive TTS — all ready to use with no setup. Clone any voice from just 5 seconds of audio. No training required.

🎙️ Chatterbox is now available on Comfy Cloud! Zero-shot voice cloning, 23 languages, and expressive TTS — all ready to use with no setup. Clone any voice from just 5 seconds of audio. No training required.

ComfyUI

19,942 просмотров • 4 месяцев назад

Marvis-TTS-v0.2 is here 🚀 A local first TTS model capable of realtime performance even on older iPhones that Lucas Newman and I built. What’s new: ✨ Blazing fast — 100M (tiny) & 250M parameter models 🌍 Multilingual — English, French, German 🎭 Enhanced voice cloning — More natural & expressive ⚡ Long-form generation — Up to 90 seconds (4x improvement) Get started today: > pip install -U mlx-audio

Marvis-TTS-v0.2 is here 🚀 A local first TTS model capable of realtime performance even on older iPhones that Lucas Newman and I built. What’s new: ✨ Blazing fast — 100M (tiny) & 250M parameter models 🌍 Multilingual — English, French, German 🎭 Enhanced voice cloning — More natural & expressive ⚡ Long-form generation — Up to 90 seconds (4x improvement) Get started today: > pip install -U mlx-audio

Prince Canuma

118,352 просмотров • 7 месяцев назад

Voice cloning is now available on LiveKit Inference. We’re launching with Inworld AI and Cartesia. Clone a voice once and use it across multiple TTS providers, with automatic fallback to the same voice if a provider fails mid-call. Free to create and available on all paid plans today.

Voice cloning is now available on LiveKit Inference. We’re launching with Inworld AI and Cartesia. Clone a voice once and use it across multiple TTS providers, with automatic fallback to the same voice if a provider fails mid-call. Free to create and available on all paid plans today.

LiveKit

10,891 просмотров • 1 месяц назад

Excited to introduce Fish Speech 1.4 - now open-source and more powerful than ever! 🎉 Our mission is to make cutting-edge voice tech accessible to everyone. What's new: - Trained on 700k hours of multilingual data (up from 200k) - Now supports 8 languages: English, Chinese, German, Japanese, French, Spanish, Korean, and Arabic - Fully open-source, empowering developers and researchers worldwide Key features: - Lightning-fast TTS with ultra-low latency - Instant voice cloning - Self-host or use our cloud service - Simple, flat-rate pricing Try it out: - Playground: - GitHub: - HuggingFace Model: - Demo: - Product Hunt: We can't wait to see what you'll create with Fish Audio. Happy voice building! 🎧🐠

Excited to introduce Fish Speech 1.4 - now open-source and more powerful than ever! 🎉 Our mission is to make cutting-edge voice tech accessible to everyone. What's new: - Trained on 700k hours of multilingual data (up from 200k) - Now supports 8 languages: English, Chinese, German, Japanese, French, Spanish, Korean, and Arabic - Fully open-source, empowering developers and researchers worldwide Key features: - Lightning-fast TTS with ultra-low latency - Instant voice cloning - Self-host or use our cloud service - Simple, flat-rate pricing Try it out: - Playground: - GitHub: - HuggingFace Model: - Demo: - Product Hunt: We can't wait to see what you'll create with Fish Audio. Happy voice building! 🎧🐠

Fish Audio

149,685 просмотров • 1 год назад

🎉 Congrats to Fish Audio on launching Fish Audio S2, a frontier TTS model with fine-grained prosody & emotion control via natural-language inline tags. SGLang Day-0 support is now live! 🏆 Best WER on Seed-TTS Eval; 81.88% win rate on EmergentTTS-Eval 🎙️ Voice cloning with 86.4% prefix-cache hit rate via RadixAttention ⚡️ RTF 0.34, 63.3 tok/s on single H200 (single batch) 🌍 Trained on 10M+ hours of audio across ~100 languages, GRPO-aligned 🔧 Dual-AR (Slow + Fast AR) is LLM-isomorphic: continuous batching, paged KV cache & CUDA graphs inherited natively 🗣️ Native multi-speaker: turn-taking, interruptions & cross-speaker emotion in a single pass 👉Cookbook: 👉Blog: 🎬 Curious how to run with SGLang? Check out this voice cloning demo from Chayenne Zhao with Fishaudio-S2-Pro:

🎉 Congrats to Fish Audio on launching Fish Audio S2, a frontier TTS model with fine-grained prosody & emotion control via natural-language inline tags. SGLang Day-0 support is now live! 🏆 Best WER on Seed-TTS Eval; 81.88% win rate on EmergentTTS-Eval 🎙️ Voice cloning with 86.4% prefix-cache hit rate via RadixAttention ⚡️ RTF 0.34, 63.3 tok/s on single H200 (single batch) 🌍 Trained on 10M+ hours of audio across ~100 languages, GRPO-aligned 🔧 Dual-AR (Slow + Fast AR) is LLM-isomorphic: continuous batching, paged KV cache & CUDA graphs inherited natively 🗣️ Native multi-speaker: turn-taking, interruptions & cross-speaker emotion in a single pass 👉Cookbook: 👉Blog: 🎬 Curious how to run with SGLang? Check out this voice cloning demo from Chayenne Zhao with Fishaudio-S2-Pro:

LMSYS Org

39,939 просмотров • 3 месяцев назад

Run ZONOS Locally ZONOS, the new SOTA Open Source Voice Cloning TTS, is here. I've managed to write a 1-click launcher for Zonos that works on Mac, Windows, and Linux (ALL platforms!) Here's me cloning Peter Griffin's voice on my Mac.

Run ZONOS Locally ZONOS, the new SOTA Open Source Voice Cloning TTS, is here. I've managed to write a 1-click launcher for Zonos that works on Mac, Windows, and Linux (ALL platforms!) Here's me cloning Peter Griffin's voice on my Mac.

cocktail peanut

69,658 просмотров • 1 год назад

StepAudio 2.5 TTS is live now! Control emotion, pacing, pauses, and delivery with plain natural language. No tags, no preset combos. Just describe what you want the voice to do. Zero-shot voice cloning with full timbre + emotion control. Available via Pay-as-you-go API or Step Plan.

StepAudio 2.5 TTS is live now! Control emotion, pacing, pauses, and delivery with plain natural language. No tags, no preset combos. Just describe what you want the voice to do. Zero-shot voice cloning with full timbre + emotion control. Available via Pay-as-you-go API or Step Plan.

StepFun

17,024 просмотров • 1 месяц назад