Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Whether you need expressive storytelling, clean narration, or multilingual characters, the Qwen3‑TTS Family makes it feel effortless🎧 With peak performance and powerful control capabilities, the model is able to meet global application demands and adapt voice based on instructions and text semantics, while significantly improving robustness to input text noise.

Alibaba Group

267,905 subscribers

120,908 просмотров • 4 месяцев назад •via X (Twitter)

Образование Наука и технологии

Anya Rossi• Live Now

Private livecam show

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

Introducing Qwen3-TTS! 🗣️ Our new text-to-speech model is designed to be multi-timbre, multi-lingual, and multi-dialect for natural, expressive audio. It delivers strong performance in English & Chinese, and we're excited for you to hear it for yourself!

Introducing Qwen3-TTS! 🗣️ Our new text-to-speech model is designed to be multi-timbre, multi-lingual, and multi-dialect for natural, expressive audio. It delivers strong performance in English & Chinese, and we're excited for you to hear it for yourself!

Tongyi Lab

1,014,717 просмотров • 9 месяцев назад

OpenAI enters the Expressive TTS Arena 🥊🤖🥊 Now hosted on Hugging Face, this arena is a new way to evaluate voice AI systems with natural language instructions + richer text. Compare Hume's TTS against ElevenLabs and OpenAI and see if you agree with the leaderboard results!

OpenAI enters the Expressive TTS Arena 🥊🤖🥊 Now hosted on Hugging Face, this arena is a new way to evaluate voice AI systems with natural language instructions + richer text. Compare Hume's TTS against ElevenLabs and OpenAI and see if you agree with the leaderboard results!

Hume AI

75,343 просмотров • 1 год назад

HOLY FUCK! Zyphra just dropped Zonos - Apache 2.0 licensed, Multilingual, Text to Speech model with INSTANT voice cloning! 🔥 > Zero-shot TTS with Voice Cloning: Input text and a 10-30 second speaker sample to generate high-quality text-to-speech output > Audio Prefix Inputs: Enhance speaker matching by adding an audio prefix to the text, enabling behaviors like whispering that are hard to achieve with voice cloning alone > Multilingual Support: Supports English, Japanese, Chinese, French, and German > Audio Quality & Emotion Control: Fine-tune speaking rate, pitch, frequency, audio quality, and emotions (e.g., happiness, anger, sadness, fear) > Fast Performance: Runs at ~2x real-time speed on an RTX 4090 > Available on the Hugging Face Hub 🤗

HOLY FUCK! Zyphra just dropped Zonos - Apache 2.0 licensed, Multilingual, Text to Speech model with INSTANT voice cloning! 🔥 > Zero-shot TTS with Voice Cloning: Input text and a 10-30 second speaker sample to generate high-quality text-to-speech output > Audio Prefix Inputs: Enhance speaker matching by adding an audio prefix to the text, enabling behaviors like whispering that are hard to achieve with voice cloning alone > Multilingual Support: Supports English, Japanese, Chinese, French, and German > Audio Quality & Emotion Control: Fine-tune speaking rate, pitch, frequency, audio quality, and emotions (e.g., happiness, anger, sadness, fear) > Fast Performance: Runs at ~2x real-time speed on an RTX 4090 > Available on the Hugging Face Hub 🤗

Vaibhav (VB) Srivastav

298,858 просмотров • 1 год назад

Introducing Marvis-TTS 🔥🚀 A new local-first TTS model Lucas Newman and I built for efficiency, accessibility, and real-time performance right on consumer devices like Apple Silicon, iPhones, iPads, and more. Traditional TTS models often demand full text inputs or sacrifice real-time capabilities, Marvis flips the script. It streams audio chunks as text is processed, creating a truly conversational experience. No more awkward pauses or unnatural breaks—Marvis handles the entire text context intelligently to deliver coherent, expressive speech. Get started today: > pip install -U mlx-audio

Introducing Marvis-TTS 🔥🚀 A new local-first TTS model Lucas Newman and I built for efficiency, accessibility, and real-time performance right on consumer devices like Apple Silicon, iPhones, iPads, and more. Traditional TTS models often demand full text inputs or sacrifice real-time capabilities, Marvis flips the script. It streams audio chunks as text is processed, creating a truly conversational experience. No more awkward pauses or unnatural breaks—Marvis handles the entire text context intelligently to deliver coherent, expressive speech. Get started today: > pip install -U mlx-audio

Prince Canuma

81,299 просмотров • 10 месяцев назад

Most don't know (1) how easy it is to invert embedding vectors back into sentences, (2) this is a perfect task text diffusion models. Here's a 78M parameter model and live demo that recovers 80% of tokens from Qwen3-Embedding and EmbeddingGemma vectors. Works even on multilingual input.

Most don't know (1) how easy it is to invert embedding vectors back into sentences, (2) this is a perfect task text diffusion models. Here's a 78M parameter model and live demo that recovers 80% of tokens from Qwen3-Embedding and EmbeddingGemma vectors. Works even on multilingual input.

Jina AI

12,813 просмотров • 4 месяцев назад

Text-to-speech is moving way too fast. Just a few days ago, I tweeted about PersonaPlex-7B, NVIDIA's new open source TTS ( And today, Qwen just open-sourced Qwen3-TTS 🤯 It’s a revolutionary text-to-speech model built for control. Not just about generating speech, but about shaping how it sounds directly from language. You can guide the pace, the tone, and the expressiveness straight from text, without touching audio graphs or hand-tuning parameters. That’s the real shift! What makes Qwen3-TTS stand out is how practical it already is: → voice cloning from just a few seconds of audio → voice creation without any reference sample → support for 10 languages out of the box → end-to-end latency down to ~97ms → works in both streaming and non-streaming setups The models come in two sizes (0.6B and 1.7B), so you can trade off quality and hardware cost depending on your setup. You can work with curated voices, designed voices, or cloned ones, and it integrates cleanly with vLLM for production use. It also ships as a simple Python package you can pip install. If you’re building real-time voice systems, this removes a lot of friction! 100% free and open source. I put the repo in the 🧵↓

Text-to-speech is moving way too fast. Just a few days ago, I tweeted about PersonaPlex-7B, NVIDIA's new open source TTS ( And today, Qwen just open-sourced Qwen3-TTS 🤯 It’s a revolutionary text-to-speech model built for control. Not just about generating speech, but about shaping how it sounds directly from language. You can guide the pace, the tone, and the expressiveness straight from text, without touching audio graphs or hand-tuning parameters. That’s the real shift! What makes Qwen3-TTS stand out is how practical it already is: → voice cloning from just a few seconds of audio → voice creation without any reference sample → support for 10 languages out of the box → end-to-end latency down to ~97ms → works in both streaming and non-streaming setups The models come in two sizes (0.6B and 1.7B), so you can trade off quality and hardware cost depending on your setup. You can work with curated voices, designed voices, or cloned ones, and it integrates cleanly with vLLM for production use. It also ships as a simple Python package you can pip install. If you’re building real-time voice systems, this removes a lot of friction! 100% free and open source. I put the repo in the 🧵↓

Charly Wargnier

59,144 просмотров • 5 месяцев назад

Audio tools are ready on BudgetPixel AI。 Choose from 300+ preset tones, convert voice to text, or clone a voice for custom audio. From voiceovers and narration to ads, characters, and social videos, create the sound your content needs faster.

Audio tools are ready on BudgetPixel AI。 Choose from 300+ preset tones, convert voice to text, or clone a voice for custom audio. From voiceovers and narration to ads, characters, and social videos, create the sound your content needs faster.

BudgetPixel AI

12,803 просмотров • 2 дней назад

Did xAI just mass-murder the entire voice AI industry? 🤯 Grok just launched two voice APIs. Speech-to-Text and Text-to-Speech. Built on the same stack powering Tesla cars and Starlink support. And priced at 10x cheaper than ElevenLabs. Speech-to-Text: $0.10/hr batch. $0.20/hr streaming. Text-to-Speech: $4.20 per million characters. 25+ languages. Real-time streaming. Speaker diarization. Already outperforming ElevenLabs, Deepgram, and AssemblyAI on word error rate. TTS ships with expressive tags like [laugh], [sigh], , . Voices that don't sound like robots reading a script. ElevenLabs spent years building a voice AI company. xAI built voice AI for cars and satellites.

Did xAI just mass-murder the entire voice AI industry? 🤯 Grok just launched two voice APIs. Speech-to-Text and Text-to-Speech. Built on the same stack powering Tesla cars and Starlink support. And priced at 10x cheaper than ElevenLabs. Speech-to-Text: $0.10/hr batch. $0.20/hr streaming. Text-to-Speech: $4.20 per million characters. 25+ languages. Real-time streaming. Speaker diarization. Already outperforming ElevenLabs, Deepgram, and AssemblyAI on word error rate. TTS ships with expressive tags like [laugh], [sigh], , . Voices that don't sound like robots reading a script. ElevenLabs spent years building a voice AI company. xAI built voice AI for cars and satellites.

Vaibhav Sisinty

24,525,197 просмотров • 2 месяцев назад

Over the past few days, I’ve been actively testing Qwen3-Coder-Next, and I’ve built quite a few fun and creative mini-games with it. Its frontend capabilities are genuinely impressive. Because it’s based on the Qwen3-Next model, the speed is outstanding. In fact, we managed to build an Angry Birds–style game in just 45 seconds, which really pushed the upper bound of what I thought was possible in terms of efficiency. It’s also been extremely useful in my day-to-day work. I used it to read through the Qwen3-TTS GitHub repository, quickly write a Gradio app, and deploy it on ModelScope. The overall performance and accuracy were excellent—especially considering that Qwen3-TTS is not a trivial application at all. Overall, this is a model I genuinely enjoy using. I warmly welcome any feedback, and bad cases are especially appreciated—they’re incredibly helpful for us as we continue to optimize and improve the model.

Over the past few days, I’ve been actively testing Qwen3-Coder-Next, and I’ve built quite a few fun and creative mini-games with it. Its frontend capabilities are genuinely impressive. Because it’s based on the Qwen3-Next model, the speed is outstanding. In fact, we managed to build an Angry Birds–style game in just 45 seconds, which really pushed the upper bound of what I thought was possible in terms of efficiency. It’s also been extremely useful in my day-to-day work. I used it to read through the Qwen3-TTS GitHub repository, quickly write a Gradio app, and deploy it on ModelScope. The overall performance and accuracy were excellent—especially considering that Qwen3-TTS is not a trivial application at all. Overall, this is a model I genuinely enjoy using. I warmly welcome any feedback, and bad cases are especially appreciated—they’re incredibly helpful for us as we continue to optimize and improve the model.

Chen Cheng

18,396 просмотров • 5 месяцев назад

We’re excited to announce the release and open-source of HunyuanImage 3.0 — the largest and most powerful open-source text-to-image model to date, with over 80 billion total parameters, of which 13 billion are activated per token during inference.The effect is completely comparable to the industry’s flagship closed-source model.🚀🚀🚀 HunyuanImage 3.0 originates from our internally developed native multimodal large language model, with fine-tuning and post-training focused on text-to-image generation. This unique foundation gives the model a powerful set of capabilities: ✅Reason with world knowledge ✅Understand complex, thousand-word prompts ✅Generate precise text within images Different from traditional DiT architecture image generation models, HunyuanImage 3.0’s MoE architecture uses a Transfusion-based approach to deeply couple Diffusion and LLM training for a single, powerful system. Built on Hunyuan-A13B, HunyuanImage 3.0 was trained on a massive dataset: 5 billion image-text pairs, video frames, interleaved image-text data, and 6 trillion tokens of text corpora. This hybrid training across multimodal generation, understanding, and LLM capabilities allows the model to seamlessly integrate multiple tasks. Whether you're an illustrator, designer, or creator, this is built to slash your workflow from hours to minutes. HunyuanImage 3.0 can generate intricate text, detailed comics, expressive emojis, and lively, engaging illustrations for educational content. The current release focuses solely on text-to-image generation and future updates will include image-to-image, image editing, multi-turn interaction, and more. 👉🏻Try it now: 🔗GitHub: 🤗Hugging Face:

We’re excited to announce the release and open-source of HunyuanImage 3.0 — the largest and most powerful open-source text-to-image model to date, with over 80 billion total parameters, of which 13 billion are activated per token during inference.The effect is completely comparable to the industry’s flagship closed-source model.🚀🚀🚀 HunyuanImage 3.0 originates from our internally developed native multimodal large language model, with fine-tuning and post-training focused on text-to-image generation. This unique foundation gives the model a powerful set of capabilities: ✅Reason with world knowledge ✅Understand complex, thousand-word prompts ✅Generate precise text within images Different from traditional DiT architecture image generation models, HunyuanImage 3.0’s MoE architecture uses a Transfusion-based approach to deeply couple Diffusion and LLM training for a single, powerful system. Built on Hunyuan-A13B, HunyuanImage 3.0 was trained on a massive dataset: 5 billion image-text pairs, video frames, interleaved image-text data, and 6 trillion tokens of text corpora. This hybrid training across multimodal generation, understanding, and LLM capabilities allows the model to seamlessly integrate multiple tasks. Whether you're an illustrator, designer, or creator, this is built to slash your workflow from hours to minutes. HunyuanImage 3.0 can generate intricate text, detailed comics, expressive emojis, and lively, engaging illustrations for educational content. The current release focuses solely on text-to-image generation and future updates will include image-to-image, image editing, multi-turn interaction, and more. 👉🏻Try it now: 🔗GitHub: 🤗Hugging Face:

Tencent Hy

412,616 просмотров • 9 месяцев назад

📹The Hailuo I2V-01-Director Model is now available to everyone! Experience cinematic storytelling with precise camera control—powered by both our text and image models. Try it here:

📹The Hailuo I2V-01-Director Model is now available to everyone! Experience cinematic storytelling with precise camera control—powered by both our text and image models. Try it here:

Hailuo AI (MiniMax)

319,991 просмотров • 1 год назад

Say hello to Coda. Our latest flagship TTS model that blends speed and performance with the most expressive voices on the market. Don't just take our word for it, listen to the voices and tell us what you think:

Say hello to Coda. Our latest flagship TTS model that blends speed and performance with the most expressive voices on the market. Don't just take our word for it, listen to the voices and tell us what you think:

Rime

531,947 просмотров • 1 месяц назад

Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks.

Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks.

OpenAI

22,806,891 просмотров • 2 лет назад

Meet Hibiki, our simultaneous speech-to-speech translation model, currently supporting 🇫🇷➡️🇬🇧. Hibiki produces spoken and text translations of the input speech in real-time, while preserving the speaker’s voice and optimally adapting its pace based on the semantic content of the source speech. Based on objective and human evaluations, Hibiki outperforms previous systems for quality, naturalness and speaker similarity and approaches human interpreters. 🧵

Meet Hibiki, our simultaneous speech-to-speech translation model, currently supporting 🇫🇷➡️🇬🇧. Hibiki produces spoken and text translations of the input speech in real-time, while preserving the speaker’s voice and optimally adapting its pace based on the semantic content of the source speech. Based on objective and human evaluations, Hibiki outperforms previous systems for quality, naturalness and speaker similarity and approaches human interpreters. 🧵

kyutai

167,364 просмотров • 1 год назад

Kling O1 is here! You can transform anything into everything, and everything into anything, with Elements! I've partnered with Kling AI to showcase O1 and its capabilities. This whole film was made with text-to-video, and almost all the characters are based on photos of me:

Kling O1 is here! You can transform anything into everything, and everything into anything, with Elements! I've partnered with Kling AI to showcase O1 and its capabilities. This whole film was made with text-to-video, and almost all the characters are based on photos of me:

Alex Patrascu

22,487 просмотров • 7 месяцев назад

Typing just became… Typeless. Meet Typeless 1.0.2 for Mac — a tool that transforms your voice into clear, accurate writing across your Mac. Speak naturally and let Typeless handle the typing, corrections, and structure for you. With Typeless, you can dictate, translate, or ask for quick edits in any language or accent. It converts your speech into polished text up to 10× faster than traditional typing, while automatically fixing mistakes along the way. --- 1️⃣ Dictation Typeless acts as a powerful voice keyboard that works across all applications on your Mac. When you speak, it understands your intent, organizes your ideas, and converts your natural speech into well-structured writing. Whether you're drafting emails, notes, documents, or messages, Typeless helps you turn spoken thoughts into clean text instantly. Controls - Press Fn to start or stop dictation - Hold Fn for quick, short dictation --- 2️⃣ Translation Typeless makes writing in other languages effortless. You can speak in your native language and have Typeless translate your words instantly into the language you want. This allows you to communicate, write, and respond in foreign languages smoothly and naturally. Controls - Press Fn + Space to start translation - Press Fn to stop translation --- 3️⃣ Ask Anything Typeless Typeless also lets you interact with your text using voice commands. You can select any text and simply say how you want it changed. Typeless can edit, rewrite, answer questions, or perform quick actions based on your request, making editing and improving text much faster. Controls - Press Fn + Space to start Ask Anything - Press Fn to stop Ask Anything --- With Typeless, your voice becomes the fastest and easiest way to write, edit, and communicate on your Mac. Your voice is now your keyboard. Get Typeless → Available now on Mac, Windows, iOS, and Android. #Typeless

Typing just became… Typeless. Meet Typeless 1.0.2 for Mac — a tool that transforms your voice into clear, accurate writing across your Mac. Speak naturally and let Typeless handle the typing, corrections, and structure for you. With Typeless, you can dictate, translate, or ask for quick edits in any language or accent. It converts your speech into polished text up to 10× faster than traditional typing, while automatically fixing mistakes along the way. --- 1️⃣ Dictation Typeless acts as a powerful voice keyboard that works across all applications on your Mac. When you speak, it understands your intent, organizes your ideas, and converts your natural speech into well-structured writing. Whether you're drafting emails, notes, documents, or messages, Typeless helps you turn spoken thoughts into clean text instantly. Controls - Press Fn to start or stop dictation - Hold Fn for quick, short dictation --- 2️⃣ Translation Typeless makes writing in other languages effortless. You can speak in your native language and have Typeless translate your words instantly into the language you want. This allows you to communicate, write, and respond in foreign languages smoothly and naturally. Controls - Press Fn + Space to start translation - Press Fn to stop translation --- 3️⃣ Ask Anything Typeless Typeless also lets you interact with your text using voice commands. You can select any text and simply say how you want it changed. Typeless can edit, rewrite, answer questions, or perform quick actions based on your request, making editing and improving text much faster. Controls - Press Fn + Space to start Ask Anything - Press Fn to stop Ask Anything --- With Typeless, your voice becomes the fastest and easiest way to write, edit, and communicate on your Mac. Your voice is now your keyboard. Get Typeless → Available now on Mac, Windows, iOS, and Android. #Typeless

Kuria Chronicles

43,623 просмотров • 3 месяцев назад

good music makes all the difference and Mureka just dropped their new V7 model for AI-generated songs way more natural and on-point, and TTS to convert any text into rich, emotive speech! examples + everything you need to know in this thread 👇

good music makes all the difference and Mureka just dropped their new V7 model for AI-generated songs way more natural and on-point, and TTS to convert any text into rich, emotive speech! examples + everything you need to know in this thread 👇

TechHalla

180,180 просмотров • 11 месяцев назад

We've officially released and open-sourced HunyuanImage 2.1, our latest text-to-image model. The new model delivers on our commitment to balancing performance and quality. With native 2K image generation, HunyuanImage 2.1 is an advanced open-source text-to-image model.🎨 ✨ New in 2.1: 🔹Advanced Semantics: Supports ultra-long and complex prompts of up to 1000 tokens, and precisely controls the generation of multiple subjects in a single image. 🔹Precise Chinese and English Text Rendering with seamless image–text integration: The model naturally integrates text into images, making it suitable for a wide range of applications such as product covers, illustrations, and poster design to meet the needs of various fields. 🔹Rich Styles and High Aesthetic: Capable of generating images in various styles—including photorealistic portraits, comics, and vinyl figures—it delivers outstanding visual appeal and artistic quality. 🔹High-Quality Generation: Efficiently produces ultra-high-definition (2K) images in the same time other models take to generate a 1K image. HunyuanImage 2.1 uses two text encoders: a multimodal large language model (MLLM) to improve the model's image and text alignment capabilities, and a multi-language character-aware encoder to improve text rendering capabilities. The model is a single- and double-stream diffusion transformer with 17B parameters. We've also open-sourced the weights of the the accelerated version with meanflow which reduces inference steps from 100 to just 8, and PromptEnhancer, the first industrial-grade rewriting model that enhances your prompts for more nuanced and expressive image generation. Now, creators turn complex ideas—like posters with slogans or multi-panel comics—into visuals faster than ever. We’re just getting started. Stay tuned for our native multimodal image generation model coming soon. 🌐Website: 🔗Github: 🤗Hugging Face: ✨Hugging Face Demo:

We've officially released and open-sourced HunyuanImage 2.1, our latest text-to-image model. The new model delivers on our commitment to balancing performance and quality. With native 2K image generation, HunyuanImage 2.1 is an advanced open-source text-to-image model.🎨 ✨ New in 2.1: 🔹Advanced Semantics: Supports ultra-long and complex prompts of up to 1000 tokens, and precisely controls the generation of multiple subjects in a single image. 🔹Precise Chinese and English Text Rendering with seamless image–text integration: The model naturally integrates text into images, making it suitable for a wide range of applications such as product covers, illustrations, and poster design to meet the needs of various fields. 🔹Rich Styles and High Aesthetic: Capable of generating images in various styles—including photorealistic portraits, comics, and vinyl figures—it delivers outstanding visual appeal and artistic quality. 🔹High-Quality Generation: Efficiently produces ultra-high-definition (2K) images in the same time other models take to generate a 1K image. HunyuanImage 2.1 uses two text encoders: a multimodal large language model (MLLM) to improve the model's image and text alignment capabilities, and a multi-language character-aware encoder to improve text rendering capabilities. The model is a single- and double-stream diffusion transformer with 17B parameters. We've also open-sourced the weights of the the accelerated version with meanflow which reduces inference steps from 100 to just 8, and PromptEnhancer, the first industrial-grade rewriting model that enhances your prompts for more nuanced and expressive image generation. Now, creators turn complex ideas—like posters with slogans or multi-panel comics—into visuals faster than ever. We’re just getting started. Stay tuned for our native multimodal image generation model coming soon. 🌐Website: 🔗Github: 🤗Hugging Face: ✨Hugging Face Demo:

Tencent Hy

89,257 просмотров • 9 месяцев назад

TurboEdit Instant text-based image editing discuss: We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models. We introduce an encoder based iterative inversion technique. The inversion network is conditioned on the input image and the reconstructed image from the previous step, allowing for correction of the next reconstruction towards the input image. We demonstrate that disentangled controls can be easily achieved in the few-step diffusion model by conditioning on an (automatically generated) detailed text prompt. To manipulate the inverted image, we freeze the noise maps and modify one attribute in the text prompt (either manually or via instruction based editing driven by an LLM), resulting in the generation of a new image similar to the input image with only one attribute changed. It can further control the editing strength and accept instructive text prompt. Our approach facilitates realistic text-guided image edits in real-time, requiring only 8 number of functional evaluations (NFEs) in inversion (one-time cost) and 4 NFEs per edit. Our method is not only fast, but also significantly outperforms state-of-the-art multi-step diffusion editing techniques.

TurboEdit Instant text-based image editing discuss: We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models. We introduce an encoder based iterative inversion technique. The inversion network is conditioned on the input image and the reconstructed image from the previous step, allowing for correction of the next reconstruction towards the input image. We demonstrate that disentangled controls can be easily achieved in the few-step diffusion model by conditioning on an (automatically generated) detailed text prompt. To manipulate the inverted image, we freeze the noise maps and modify one attribute in the text prompt (either manually or via instruction based editing driven by an LLM), resulting in the generation of a new image similar to the input image with only one attribute changed. It can further control the editing strength and accept instructive text prompt. Our approach facilitates realistic text-guided image edits in real-time, requiring only 8 number of functional evaluations (NFEs) in inversion (one-time cost) and 4 NFEs per edit. Our method is not only fast, but also significantly outperforms state-of-the-art multi-step diffusion editing techniques.

AK

16,062 просмотров • 1 год назад

We're launching Veena TTS 🪕 on June 20 Our flagship text-to-speech model for Indian languages 🇮🇳 Natural, expressive, and actually sounds like us. We’re launching two models: Veena Lite >Open-source and lightweight >4 unique, natural-sounding voices >The first open-source TTS model designed for real creative use cases Veena Max >Available via our web app >15 expressive, high-quality voices >Perfect for storytelling, dubbing, voiceovers, and content creation

We're launching Veena TTS 🪕 on June 20 Our flagship text-to-speech model for Indian languages 🇮🇳 Natural, expressive, and actually sounds like us. We’re launching two models: Veena Lite >Open-source and lightweight >4 unique, natural-sounding voices >The first open-source TTS model designed for real creative use cases Veena Max >Available via our web app >15 expressive, high-quality voices >Perfect for storytelling, dubbing, voiceovers, and content creation

Dheemanth Reddy

80,336 просмотров • 1 год назад