Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Introducing Dialog 1.0 - Ultra-emotional AI Text-To-Speech model Outperforms Elevenlabs on expressiveness and quality 3 to 1 <1% error rate Supports 30+ languages Best in class voice cloning Low latency: 303ms TTFA (Time to First Audio) Experience it for yourself on Read more below⬇️

PlayAI

15,108 subscribers

196,152 просмотров • 1 год назад •via X (Twitter)

Наука и технологии Искусство

Anya Rossi• Live Now

Private livecam show

Комментарии: 11

Фото профиля PlayAI (formerly PlayHT)

PlayAI (formerly PlayHT)1 год назад

• 2/5 It beats Elevenlabs in Human preference testing by 3:1 - here are the results from 100 independent evaluators across 60 samples (we cloned the same voice, and used the same prompts)

Фото профиля PlayAI (formerly PlayHT)

PlayAI (formerly PlayHT)1 год назад

• 3/5 When users expressed a preference, they chose expressiveness and pacing more than any other factor

Фото профиля PlayAI (formerly PlayHT)

PlayAI (formerly PlayHT)1 год назад

• 4/5 And it’s fast, really fast. PlayDialog has lower latency than most other models in the market today, allowing a wide range of use cases like: 📚Narrations and audiobooks 🎙️AI-generated podcasts 🎵AI announcers and DJs 💬AI customer support agents 🗣️Voice agents

Фото профиля PlayAI (formerly PlayHT)

PlayAI (formerly PlayHT)1 год назад

• 5/5 Read full details on our blog You can use these voices in our AI Voiceover Studio and through our Text-to-Speech API. What are you waiting for? We’re excited to see what you build!

Фото профиля AssemblyAI

AssemblyAI1 год назад

Our speech-to-text models are the most accurate on the market with top rankings across industry benchmarks. - The highest accuracy rates—up to 95% - Up to 30% fewer hallucinations than other leaders - Low latency—63 minutes converts in 35 seconds Try via API for free today 👇

Фото профиля mrfakename

mrfakename1 год назад

Congrats! Any plan to open source?

Фото профиля PlayAI (formerly PlayHT)

PlayAI (formerly PlayHT)1 год назад

👀

Фото профиля Whole Mars Catalog

Whole Mars Catalog1 год назад

remarkably lifelike

Фото профиля Hammad

Hammad1 год назад

Thanks everyone! We’re excited for this. Do share your experiences and feedback! ❤️

Фото профиля Guillermo Rauch

Guillermo Rauch1 год назад

Very impressive. Great job

Фото профиля WOLF

WOLF1 год назад

Excited to play around with this!

Похожие видео

🔊Introducing Voxtral TTS: our new frontier open-weight model for natural, expressive, and ultra-fast text-to-speech 🎭Realistic, emotionally expressive speech. 🌍Supports 9 languages and accurately captures diverse dialects. ⚡Very low latency for time-to-first-audio. 🔄Easily adaptable to new voices

🔊Introducing Voxtral TTS: our new frontier open-weight model for natural, expressive, and ultra-fast text-to-speech 🎭Realistic, emotionally expressive speech. 🌍Supports 9 languages and accurately captures diverse dialects. ⚡Very low latency for time-to-first-audio. 🔄Easily adaptable to new voices

Mistral AI

935,674 просмотров • 2 месяцев назад

The new open-source Text to Speech model: Fish Speech 1.4 is brilliant! Trained on a massive 700K hours of multilingual speech data in 8 languages - Instant voice cloning 🗣️ - Ultra-low latency ⚡ - Compact model (~1GB weights) 🏋️‍♂️

The new open-source Text to Speech model: Fish Speech 1.4 is brilliant! Trained on a massive 700K hours of multilingual speech data in 8 languages - Instant voice cloning 🗣️ - Ultra-low latency ⚡ - Compact model (~1GB weights) 🏋️‍♂️

Rohan Paul

228,836 просмотров • 1 год назад

Introducing Atlas 1. Willow's new frontier speech-to-text model. It outperforms ElevenLabs, Deepgram, OpenAI, and more by a wide margin. Built on the first scalable, human-powered transcription infrastructure ever built for real-time dictation.

Introducing Atlas 1. Willow's new frontier speech-to-text model. It outperforms ElevenLabs, Deepgram, OpenAI, and more by a wide margin. Built on the first scalable, human-powered transcription infrastructure ever built for real-time dictation.

Willow

834,137 просмотров • 2 месяцев назад

Oh my…FlashLabs releases Chroma 1.0, first open-source real-time speech-to-speech model with personalized voice cloning. Native speech-to-speech with <150ms latency & voice cloning from seconds of audio. Finally, an open alternative to OpenAI Realtime.

Oh my…FlashLabs releases Chroma 1.0, first open-source real-time speech-to-speech model with personalized voice cloning. Native speech-to-speech with <150ms latency & voice cloning from seconds of audio. Finally, an open alternative to OpenAI Realtime.

Alvaro Cintas

46,151 просмотров • 5 месяцев назад

🚨 New model alert! Dialog by vibx — a leading text-to-speech model — now runs on GroqCloud™. That means natural-sounding speech with ultra-low latency, making real-time voice applications smoother and more responsive. Learn more & build fast — links in the comments!

🚨 New model alert! Dialog by vibx — a leading text-to-speech model — now runs on GroqCloud™. That means natural-sounding speech with ultra-low latency, making real-time voice applications smoother and more responsive. Learn more & build fast — links in the comments!

Groq Inc

47,183 просмотров • 1 год назад

Real-time AI conversations are here! PlayHT, one of the best text-to-speech models I’ve used, now has a latency of less than 300ms. Checkout how fast it outputs audio 🤯 Also, my experience cloning my own voice and links to try it for free are below.

Real-time AI conversations are here! PlayHT, one of the best text-to-speech models I’ve used, now has a latency of less than 300ms. Checkout how fast it outputs audio 🤯 Also, my experience cloning my own voice and links to try it for free are below.

Alvaro Cintas

335,344 просмотров • 2 лет назад

Introducing Eleven Multilingual v2: our new AI speech model supporting 28 languages! v2 comes with enhanced conversational capability, higher output quality & the ability to better preserve unique voice characteristics across languages. Read more:

Introducing Eleven Multilingual v2: our new AI speech model supporting 28 languages! v2 comes with enhanced conversational capability, higher output quality & the ability to better preserve unique voice characteristics across languages. Read more:

ElevenLabs

293,329 просмотров • 2 лет назад

HOLY FUCK! Zyphra just dropped Zonos - Apache 2.0 licensed, Multilingual, Text to Speech model with INSTANT voice cloning! 🔥 > Zero-shot TTS with Voice Cloning: Input text and a 10-30 second speaker sample to generate high-quality text-to-speech output > Audio Prefix Inputs: Enhance speaker matching by adding an audio prefix to the text, enabling behaviors like whispering that are hard to achieve with voice cloning alone > Multilingual Support: Supports English, Japanese, Chinese, French, and German > Audio Quality & Emotion Control: Fine-tune speaking rate, pitch, frequency, audio quality, and emotions (e.g., happiness, anger, sadness, fear) > Fast Performance: Runs at ~2x real-time speed on an RTX 4090 > Available on the Hugging Face Hub 🤗

HOLY FUCK! Zyphra just dropped Zonos - Apache 2.0 licensed, Multilingual, Text to Speech model with INSTANT voice cloning! 🔥 > Zero-shot TTS with Voice Cloning: Input text and a 10-30 second speaker sample to generate high-quality text-to-speech output > Audio Prefix Inputs: Enhance speaker matching by adding an audio prefix to the text, enabling behaviors like whispering that are hard to achieve with voice cloning alone > Multilingual Support: Supports English, Japanese, Chinese, French, and German > Audio Quality & Emotion Control: Fine-tune speaking rate, pitch, frequency, audio quality, and emotions (e.g., happiness, anger, sadness, fear) > Fast Performance: Runs at ~2x real-time speed on an RTX 4090 > Available on the Hugging Face Hub 🤗

Vaibhav (VB) Srivastav

298,858 просмотров • 1 год назад

Wow! New Speech to Speech model - Fish Agent v0.1 3B by Fish Audio 🔥 > Trained on 700K hours of multilingual audio > Continue-pretrained version of Qwen-2.5-3B-Instruct for 200B audio & text tokens > Zero-shot voice cloning > Text + audio input/ Audio output > Ultra-fast inference w/ 200ms TTFA > Models on the Hub & Finetuning code on its way! 🚀 What an amazing time to be alive 🤗

Wow! New Speech to Speech model - Fish Agent v0.1 3B by Fish Audio 🔥 > Trained on 700K hours of multilingual audio > Continue-pretrained version of Qwen-2.5-3B-Instruct for 200B audio & text tokens > Zero-shot voice cloning > Text + audio input/ Audio output > Ultra-fast inference w/ 200ms TTFA > Models on the Hub & Finetuning code on its way! 🚀 What an amazing time to be alive 🤗

Vaibhav (VB) Srivastav

66,963 просмотров • 1 год назад

Instant Voice Cloning 3 seconds of audio is all you need. Text-to-voice AI generated voices with <1 second latency. 🧵 A thread

Instant Voice Cloning 3 seconds of audio is all you need. Text-to-voice AI generated voices with <1 second latency. 🧵 A thread

Linus ✦ Ekenstam

576,437 просмотров • 2 лет назад

Introducing SeamlessM4T, the first all-in-one, multilingual multimodal translation model. This single model can perform tasks across speech-to-text, speech-to-speech, text-to-text translation & speech recognition for up to 100 languages depending on the task. Details ⬇️

Introducing SeamlessM4T, the first all-in-one, multilingual multimodal translation model. This single model can perform tasks across speech-to-text, speech-to-speech, text-to-text translation & speech recognition for up to 100 languages depending on the task. Details ⬇️

AI at Meta

592,704 просмотров • 2 лет назад

Did xAI just mass-murder the entire voice AI industry? 🤯 Grok just launched two voice APIs. Speech-to-Text and Text-to-Speech. Built on the same stack powering Tesla cars and Starlink support. And priced at 10x cheaper than ElevenLabs. Speech-to-Text: $0.10/hr batch. $0.20/hr streaming. Text-to-Speech: $4.20 per million characters. 25+ languages. Real-time streaming. Speaker diarization. Already outperforming ElevenLabs, Deepgram, and AssemblyAI on word error rate. TTS ships with expressive tags like [laugh], [sigh], , . Voices that don't sound like robots reading a script. ElevenLabs spent years building a voice AI company. xAI built voice AI for cars and satellites.

Did xAI just mass-murder the entire voice AI industry? 🤯 Grok just launched two voice APIs. Speech-to-Text and Text-to-Speech. Built on the same stack powering Tesla cars and Starlink support. And priced at 10x cheaper than ElevenLabs. Speech-to-Text: $0.10/hr batch. $0.20/hr streaming. Text-to-Speech: $4.20 per million characters. 25+ languages. Real-time streaming. Speaker diarization. Already outperforming ElevenLabs, Deepgram, and AssemblyAI on word error rate. TTS ships with expressive tags like [laugh], [sigh], , . Voices that don't sound like robots reading a script. ElevenLabs spent years building a voice AI company. xAI built voice AI for cars and satellites.

Vaibhav Sisinty

24,521,874 просмотров • 2 месяцев назад

Introducing our new Turbo 2.5 model - Hindi, French, Spanish, Mandarin and 27 other languages just got 3x faster. This unlocks high-quality low-latency conversational AI for nearly 80% of the world. For the first time, we support Vietnamese, Hungarian and Norwegian text to speech. And English is now 25% faster compared to Turbo v2.

Introducing our new Turbo 2.5 model - Hindi, French, Spanish, Mandarin and 27 other languages just got 3x faster. This unlocks high-quality low-latency conversational AI for nearly 80% of the world. For the first time, we support Vietnamese, Hungarian and Norwegian text to speech. And English is now 25% faster compared to Turbo v2.

ElevenLabs

50,875 просмотров • 1 год назад

MAI‑Transcribe‑1 makes speech‑to‑text clearer, faster, and more reliable even in noisy audio. Ranked #1 on the industry-standard FLEURS word error rate benchmark. Now in public preview. Learn more:

MAI‑Transcribe‑1 makes speech‑to‑text clearer, faster, and more reliable even in noisy audio. Ranked #1 on the industry-standard FLEURS word error rate benchmark. Now in public preview. Learn more:

Microsoft AI

99,507 просмотров • 2 месяцев назад

NEW: Higgs Audio V2 from BosonAI open, unified TTS model w/ voice cloning, beats GPT 4o mini tts and ElevenLabs v2 🔥 > Trained on 10M hours (speech, music, events) > Built on top of Llama 3.2 3B > Works real-time and on edge > Beats GPT-4o-mini-tts, ElevenLabs v2 in prosody & emotion Multi-speaker dialog > Zero-shot voice cloning 🤩 > Available on Hugging Face Kudos to folks at Boson AI for releasing such a brilliant work and all the details around the model! 🤗

NEW: Higgs Audio V2 from BosonAI open, unified TTS model w/ voice cloning, beats GPT 4o mini tts and ElevenLabs v2 🔥 > Trained on 10M hours (speech, music, events) > Built on top of Llama 3.2 3B > Works real-time and on edge > Beats GPT-4o-mini-tts, ElevenLabs v2 in prosody & emotion Multi-speaker dialog > Zero-shot voice cloning 🤩 > Available on Hugging Face Kudos to folks at Boson AI for releasing such a brilliant work and all the details around the model! 🤗

Vaibhav (VB) Srivastav

79,585 просмотров • 11 месяцев назад

Introducing Scribe v2 Realtime – the most accurate real-time Speech to Text model. Built for voice agents, meeting notetakers, and live applications, it transcribes in 150ms across 90+ languages, including English, French, German, Italian, Spanish, Portuguese, Hindi, and Japanese. Available today by API and through ElevenLabs Agents.

Introducing Scribe v2 Realtime – the most accurate real-time Speech to Text model. Built for voice agents, meeting notetakers, and live applications, it transcribes in 150ms across 90+ languages, including English, French, German, Italian, Spanish, Portuguese, Hindi, and Japanese. Available today by API and through ElevenLabs Agents.

ElevenLabs

317,246 просмотров • 7 месяцев назад

The first generative AI for voice creation is here! We call it Voice Design - check out the quick demo below Try it out yourself: Read more on our blog:

The first generative AI for voice creation is here! We call it Voice Design - check out the quick demo below Try it out yourself: Read more on our blog:

ElevenLabs

84,658 просмотров • 3 лет назад

LTX 2.3 audio as standalone speech model. Emotional TTS with Scenema Audio. - Zero-shot expressive voice cloning, speech gen - 8-step distilled with Gemma 3 12B text encoding - stage directions via tags - runs at 1.5x real-time on RTX 4090 - fits in 16GB VRAM - 13 languages, 48kHz stereo output it also gens matching environment sounds

LTX 2.3 audio as standalone speech model. Emotional TTS with Scenema Audio. - Zero-shot expressive voice cloning, speech gen - 8-step distilled with Gemma 3 12B text encoding - stage directions via tags - runs at 1.5x real-time on RTX 4090 - fits in 16GB VRAM - 13 languages, 48kHz stereo output it also gens matching environment sounds

Wildminder

16,309 просмотров • 1 месяц назад