Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Smol TTS keeps getting better! Introducing OuteTTS v0.2 - 500M parameters, multilingual with voice cloning! 🔥 > Multilingual - English, Chinese, Korean & Japanese > Cross platform inference w/ llama.cpp > Zero-shot voice cloning > Trained on 5 Billion audio tokens > Qwen 2.5 0.5B LLM backbone > Trained... show more

Vaibhav (VB) Srivastav

52,141 subscribers

44,654 görüntüleme • 1 yıl önce •via X (Twitter)

Bilim & Teknoloji Eğitim

Anya Rossi• Live Now

Private livecam show

11 Yorum

Vaibhav (VB) Srivastav profil fotoğrafı

Vaibhav (VB) Srivastav1 yıl önce

Check out the model weights and inference code base here:

Vaibhav (VB) Srivastav profil fotoğrafı

Vaibhav (VB) Srivastav1 yıl önce

llama.cpp compatible GGUFs:

Vaibhav (VB) Srivastav profil fotoğrafı

Vaibhav (VB) Srivastav1 yıl önce

OuteTTS GitHub:

Haorui He profil fotoğrafı

Haorui He1 yıl önce

Big Congrats!!! Another SOTA TTS model trained on Emilia after F5-TTS & MaskGCT! Try out:

Tommy Falkowski profil fotoğrafı

Tommy Falkowski1 yıl önce

Just tested it out and the quality is very good. More importantly, the fact that you can generate speaker profiles is awesome! Will test it out some more and add it to my growing list of supported tts engines in my app 🤣

SkyTab profil fotoğrafı

SkyTab1 yıl önce

Switch to SkyTab and get $5,000! A modern and sleek POS system with commercial-grade durability. 💪 ✅ $0 upfront costs ✅ Best in-class POS ✅ Local service & 24/7 support ✅ And much more! Make the switch today:

Umesh profil fotoğrafı

Umesh1 yıl önce

This is improving so fast that I don't want to speak myself anymore. Just use this and get done 🤖

Fronesis profil fotoğrafı

Fronesis1 yıl önce

Thank you for your work and for sharing insights! 🙌 Advancements like OuteTTS v0.2 showcase the rapid evolution of AI and its potential to empower global communities. 🚀 The future of #AI is bright, and collaborative innovation is key to unlocking its full potential!

Digital Doctor profil fotoğrafı

Digital Doctor1 yıl önce

Are you saying you can voice CLONE on a R-Pi? Is that what you're saying????

斎藤ただし, Tadashi Saito profil fotoğrafı

斎藤ただし, Tadashi Saito1 yıl önce

The font of Japanese characters is wrong, it's for (maybe) Chinese. I hope you'll pay attention and respect to each of them when you are working for multilingual/multicultural things. (like your TTS engine itself does. Brilliant quality✨️)

Ahmed Mansour profil fotoğrafı

Ahmed Mansour1 yıl önce

I tried to run it on HF. average inference time for 200 chars is >1 hour running on CPU. Why is this model so heavy?

Benzer Videolar

Wow! New Speech to Speech model - Fish Agent v0.1 3B by Fish Audio 🔥 > Trained on 700K hours of multilingual audio > Continue-pretrained version of Qwen-2.5-3B-Instruct for 200B audio & text tokens > Zero-shot voice cloning > Text + audio input/ Audio output > Ultra-fast inference w/ 200ms TTFA > Models on the Hub & Finetuning code on its way! 🚀 What an amazing time to be alive 🤗

Wow! New Speech to Speech model - Fish Agent v0.1 3B by Fish Audio 🔥 > Trained on 700K hours of multilingual audio > Continue-pretrained version of Qwen-2.5-3B-Instruct for 200B audio & text tokens > Zero-shot voice cloning > Text + audio input/ Audio output > Ultra-fast inference w/ 200ms TTFA > Models on the Hub & Finetuning code on its way! 🚀 What an amazing time to be alive 🤗

Vaibhav (VB) Srivastav

66,963 görüntüleme • 1 yıl önce

Fish: Multilingual Voice Cloning Fish is a voice cloning AI model that supports multiple languages out of the box: - English - Japanese - Korean - Chinese - French - German - Arabic - Spanish I wrote a 1-click launcher for Fish gradio app, so you can run it locally. Enjoy.

Fish: Multilingual Voice Cloning Fish is a voice cloning AI model that supports multiple languages out of the box: - English - Japanese - Korean - Chinese - French - German - Arabic - Spanish I wrote a 1-click launcher for Fish gradio app, so you can run it locally. Enjoy.

cocktail peanut

35,815 görüntüleme • 1 yıl önce

Pretty fucking wild - Chatterbox TTS by Resemble AI - Zero shot voice cloning, Apache 2.0 licensed 🤯 > Outperforms ElevenLabs > Trained on 500K hours of audio > Llama 500M arch > Emotionally aware speech > Zero shot voice cloning > Live on Hugging Face 🤗

Pretty fucking wild - Chatterbox TTS by Resemble AI - Zero shot voice cloning, Apache 2.0 licensed 🤯 > Outperforms ElevenLabs > Trained on 500K hours of audio > Llama 500M arch > Emotionally aware speech > Zero shot voice cloning > Live on Hugging Face 🤗

Vaibhav (VB) Srivastav

71,652 görüntüleme • 1 yıl önce

HOLY FUCK! Zyphra just dropped Zonos - Apache 2.0 licensed, Multilingual, Text to Speech model with INSTANT voice cloning! 🔥 > Zero-shot TTS with Voice Cloning: Input text and a 10-30 second speaker sample to generate high-quality text-to-speech output > Audio Prefix Inputs: Enhance speaker matching by adding an audio prefix to the text, enabling behaviors like whispering that are hard to achieve with voice cloning alone > Multilingual Support: Supports English, Japanese, Chinese, French, and German > Audio Quality & Emotion Control: Fine-tune speaking rate, pitch, frequency, audio quality, and emotions (e.g., happiness, anger, sadness, fear) > Fast Performance: Runs at ~2x real-time speed on an RTX 4090 > Available on the Hugging Face Hub 🤗

HOLY FUCK! Zyphra just dropped Zonos - Apache 2.0 licensed, Multilingual, Text to Speech model with INSTANT voice cloning! 🔥 > Zero-shot TTS with Voice Cloning: Input text and a 10-30 second speaker sample to generate high-quality text-to-speech output > Audio Prefix Inputs: Enhance speaker matching by adding an audio prefix to the text, enabling behaviors like whispering that are hard to achieve with voice cloning alone > Multilingual Support: Supports English, Japanese, Chinese, French, and German > Audio Quality & Emotion Control: Fine-tune speaking rate, pitch, frequency, audio quality, and emotions (e.g., happiness, anger, sadness, fear) > Fast Performance: Runs at ~2x real-time speed on an RTX 4090 > Available on the Hugging Face Hub 🤗

Vaibhav (VB) Srivastav

298,858 görüntüleme • 1 yıl önce

Pretty WILD - SoTA open source TTS model that beats ElevenLabs/ Sesame - Dia 1.6B - Apache 2.0 licensed! 🔥 > Ultra realistic voice synthesis > Capable of producing non-verbal sounds - coughing, laughing 💥 > Zero shot Voice Cloning > Real-time TTS synthesis > Can run on your MacBook > Trending #2 on Hugging Face Weights on the Hub and code on GitHub! 🤯

Pretty WILD - SoTA open source TTS model that beats ElevenLabs/ Sesame - Dia 1.6B - Apache 2.0 licensed! 🔥 > Ultra realistic voice synthesis > Capable of producing non-verbal sounds - coughing, laughing 💥 > Zero shot Voice Cloning > Real-time TTS synthesis > Can run on your MacBook > Trending #2 on Hugging Face Weights on the Hub and code on GitHub! 🤯

Vaibhav (VB) Srivastav

39,165 görüntüleme • 1 yıl önce

Awesome! OmniVoice-TTS in ComfyUI. - zero-shot multilingual TTS; - 600+ languages; - voice cloning; - voice design; - multi-speaker dialogues; - supports SageAttention and non-verbal expression tags.

Awesome! OmniVoice-TTS in ComfyUI. - zero-shot multilingual TTS; - 600+ languages; - voice cloning; - voice design; - multi-speaker dialogues; - supports SageAttention and non-verbal expression tags.

Wildminder

35,799 görüntüleme • 2 ay önce

train YOLOv9 on your dataset tutorial - run inference with a pre-trained COCO model - fine-tune model on custom dataset - evaluate the trained model - run inference with a fine-tuned model blogpost: ↓ read more

train YOLOv9 on your dataset tutorial - run inference with a pre-trained COCO model - fine-tune model on custom dataset - evaluate the trained model - run inference with a fine-tuned model blogpost: ↓ read more

SkalskiP

111,792 görüntüleme • 2 yıl önce

The new open-source Text to Speech model: Fish Speech 1.4 is brilliant! Trained on a massive 700K hours of multilingual speech data in 8 languages - Instant voice cloning 🗣️ - Ultra-low latency ⚡ - Compact model (~1GB weights) 🏋️‍♂️

The new open-source Text to Speech model: Fish Speech 1.4 is brilliant! Trained on a massive 700K hours of multilingual speech data in 8 languages - Instant voice cloning 🗣️ - Ultra-low latency ⚡ - Compact model (~1GB weights) 🏋️‍♂️

Rohan Paul

228,836 görüntüleme • 1 yıl önce

NEW: Higgs Audio V2 from BosonAI open, unified TTS model w/ voice cloning, beats GPT 4o mini tts and ElevenLabs v2 🔥 > Trained on 10M hours (speech, music, events) > Built on top of Llama 3.2 3B > Works real-time and on edge > Beats GPT-4o-mini-tts, ElevenLabs v2 in prosody & emotion Multi-speaker dialog > Zero-shot voice cloning 🤩 > Available on Hugging Face Kudos to folks at Boson AI for releasing such a brilliant work and all the details around the model! 🤗

NEW: Higgs Audio V2 from BosonAI open, unified TTS model w/ voice cloning, beats GPT 4o mini tts and ElevenLabs v2 🔥 > Trained on 10M hours (speech, music, events) > Built on top of Llama 3.2 3B > Works real-time and on edge > Beats GPT-4o-mini-tts, ElevenLabs v2 in prosody & emotion Multi-speaker dialog > Zero-shot voice cloning 🤩 > Available on Hugging Face Kudos to folks at Boson AI for releasing such a brilliant work and all the details around the model! 🤗

Vaibhav (VB) Srivastav

79,585 görüntüleme • 11 ay önce

Hot! We have a new strong voice model. MOSS-TTS - a production-ready flagship 8B TTS; - high-fidelity zero-shot voice cloning, stable long-form gen; - multilingual; - lossless reconstruction; fine-grained pronunciation control; - token-level duration control, - voice creator, sound effects. Outstanding quality.

Hot! We have a new strong voice model. MOSS-TTS - a production-ready flagship 8B TTS; - high-fidelity zero-shot voice cloning, stable long-form gen; - multilingual; - lossless reconstruction; fine-grained pronunciation control; - token-level duration control, - voice creator, sound effects. Outstanding quality.

Wildminder

12,121 görüntüleme • 4 ay önce

Introducing Fish Speech 1.5 🎉 - Making state-of-the-art TTS accessible to everyone! Highlights: - #2 ranked on TTS-Arena (as "Anonymous Sparkle") - 1M hours of multilingual training data - 13 languages supported, including English, Chinese, Japanese & more - <150ms latency with high-quality instant voice cloning - Pretrained model now open source - Cost-effective self-hosting or cloud options Let's check out the details 🧵⬇️

Introducing Fish Speech 1.5 🎉 - Making state-of-the-art TTS accessible to everyone! Highlights: - #2 ranked on TTS-Arena (as "Anonymous Sparkle") - 1M hours of multilingual training data - 13 languages supported, including English, Chinese, Japanese & more - <150ms latency with high-quality instant voice cloning - Pretrained model now open source - Cost-effective self-hosting or cloud options Let's check out the details 🧵⬇️

Fish Audio

101,606 görüntüleme • 1 yıl önce

Marvis-TTS-v0.2 is here 🚀 A local first TTS model capable of realtime performance even on older iPhones that Lucas Newman and I built. What’s new: ✨ Blazing fast — 100M (tiny) & 250M parameter models 🌍 Multilingual — English, French, German 🎭 Enhanced voice cloning — More natural & expressive ⚡ Long-form generation — Up to 90 seconds (4x improvement) Get started today: > pip install -U mlx-audio

Marvis-TTS-v0.2 is here 🚀 A local first TTS model capable of realtime performance even on older iPhones that Lucas Newman and I built. What’s new: ✨ Blazing fast — 100M (tiny) & 250M parameter models 🌍 Multilingual — English, French, German 🎭 Enhanced voice cloning — More natural & expressive ⚡ Long-form generation — Up to 90 seconds (4x improvement) Get started today: > pip install -U mlx-audio

Prince Canuma

118,352 görüntüleme • 7 ay önce

🎙️ Chatterbox is now available on Comfy Cloud! Zero-shot voice cloning, 23 languages, and expressive TTS — all ready to use with no setup. Clone any voice from just 5 seconds of audio. No training required.

🎙️ Chatterbox is now available on Comfy Cloud! Zero-shot voice cloning, 23 languages, and expressive TTS — all ready to use with no setup. Clone any voice from just 5 seconds of audio. No training required.

ComfyUI

21,722 görüntüleme • 5 ay önce

Presenting MetaVoice-1B, a 1.2B parameter base model for TTS (text-to-speech). * Emotional speech in English * Voice cloning with fine-tuning * Zero-shot cloning for American & British voices * Support for long-form synthesis

Presenting MetaVoice-1B, a 1.2B parameter base model for TTS (text-to-speech). * Emotional speech in English * Voice cloning with fine-tuning * Zero-shot cloning for American & British voices * Support for long-form synthesis

MetaVoice

111,799 görüntüleme • 2 yıl önce

Voice cloning is now available on LiveKit Inference. We’re launching with Inworld AI and Cartesia. Clone a voice once and use it across multiple TTS providers, with automatic fallback to the same voice if a provider fails mid-call. Free to create and available on all paid plans today.

Voice cloning is now available on LiveKit Inference. We’re launching with Inworld AI and Cartesia. Clone a voice once and use it across multiple TTS providers, with automatic fallback to the same voice if a provider fails mid-call. Free to create and available on all paid plans today.

LiveKit

10,891 görüntüleme • 1 ay önce

Excited to introduce Fish Speech 1.4 - now open-source and more powerful than ever! 🎉 Our mission is to make cutting-edge voice tech accessible to everyone. What's new: - Trained on 700k hours of multilingual data (up from 200k) - Now supports 8 languages: English, Chinese, German, Japanese, French, Spanish, Korean, and Arabic - Fully open-source, empowering developers and researchers worldwide Key features: - Lightning-fast TTS with ultra-low latency - Instant voice cloning - Self-host or use our cloud service - Simple, flat-rate pricing Try it out: - Playground: - GitHub: - HuggingFace Model: - Demo: - Product Hunt: We can't wait to see what you'll create with Fish Audio. Happy voice building! 🎧🐠

Excited to introduce Fish Speech 1.4 - now open-source and more powerful than ever! 🎉 Our mission is to make cutting-edge voice tech accessible to everyone. What's new: - Trained on 700k hours of multilingual data (up from 200k) - Now supports 8 languages: English, Chinese, German, Japanese, French, Spanish, Korean, and Arabic - Fully open-source, empowering developers and researchers worldwide Key features: - Lightning-fast TTS with ultra-low latency - Instant voice cloning - Self-host or use our cloud service - Simple, flat-rate pricing Try it out: - Playground: - GitHub: - HuggingFace Model: - Demo: - Product Hunt: We can't wait to see what you'll create with Fish Audio. Happy voice building! 🎧🐠

Fish Audio

149,878 görüntüleme • 1 yıl önce

🎉 Congrats to Fish Audio on launching Fish Audio S2, a frontier TTS model with fine-grained prosody & emotion control via natural-language inline tags. SGLang Day-0 support is now live! 🏆 Best WER on Seed-TTS Eval; 81.88% win rate on EmergentTTS-Eval 🎙️ Voice cloning with 86.4% prefix-cache hit rate via RadixAttention ⚡️ RTF 0.34, 63.3 tok/s on single H200 (single batch) 🌍 Trained on 10M+ hours of audio across ~100 languages, GRPO-aligned 🔧 Dual-AR (Slow + Fast AR) is LLM-isomorphic: continuous batching, paged KV cache & CUDA graphs inherited natively 🗣️ Native multi-speaker: turn-taking, interruptions & cross-speaker emotion in a single pass 👉Cookbook: 👉Blog: 🎬 Curious how to run with SGLang? Check out this voice cloning demo from Chayenne Zhao with Fishaudio-S2-Pro:

🎉 Congrats to Fish Audio on launching Fish Audio S2, a frontier TTS model with fine-grained prosody & emotion control via natural-language inline tags. SGLang Day-0 support is now live! 🏆 Best WER on Seed-TTS Eval; 81.88% win rate on EmergentTTS-Eval 🎙️ Voice cloning with 86.4% prefix-cache hit rate via RadixAttention ⚡️ RTF 0.34, 63.3 tok/s on single H200 (single batch) 🌍 Trained on 10M+ hours of audio across ~100 languages, GRPO-aligned 🔧 Dual-AR (Slow + Fast AR) is LLM-isomorphic: continuous batching, paged KV cache & CUDA graphs inherited natively 🗣️ Native multi-speaker: turn-taking, interruptions & cross-speaker emotion in a single pass 👉Cookbook: 👉Blog: 🎬 Curious how to run with SGLang? Check out this voice cloning demo from Chayenne Zhao with Fishaudio-S2-Pro:

LMSYS Org

39,939 görüntüleme • 3 ay önce

StepAudio 2.5 TTS is live now! Control emotion, pacing, pauses, and delivery with plain natural language. No tags, no preset combos. Just describe what you want the voice to do. Zero-shot voice cloning with full timbre + emotion control. Available via Pay-as-you-go API or Step Plan.

StepAudio 2.5 TTS is live now! Control emotion, pacing, pauses, and delivery with plain natural language. No tags, no preset combos. Just describe what you want the voice to do. Zero-shot voice cloning with full timbre + emotion control. Available via Pay-as-you-go API or Step Plan.

StepFun

17,024 görüntüleme • 2 ay önce