Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

new extremely fast text-to-audio model

Dreaming Tulpa 🥓👑

51,267 subscribers

68,825 Aufrufe • vor 1 Jahr •via X (Twitter)

Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von Dreaming Tulpa 🥓👑

Dreaming Tulpa 🥓👑vor 1 Jahr

this is TangoFlux, a new text-to-audio model that can generate 30 seconds of 44.1kHz audio in just 3.7 seconds on a single A40 GPU project page: code: demo:

Profilbild von Bytescribe

Bytescribevor 1 Jahr

Introducing Vehrbal, the AI that converts audio into SOAP notes! Say goodbye to wasted time and hello to effortless note-taking. Experience the power of fast, simple, and efficient with Vehrbal today.

Profilbild von Nim Eshed 𝕏🦋

Nim Eshed 𝕏🦋vor 1 Jahr

Do you know if there is a good to to duplicate voice

Profilbild von Dreaming Tulpa 🥓👑

Dreaming Tulpa 🥓👑vor 1 Jahr

not sure how it has evolved, but check out tortoise and t5-tts

Profilbild von michielh.eth

michielh.ethvor 1 Jahr

I'm checking it now, need more ebooks in audio format.

Profilbild von Dreaming Tulpa 🥓👑

Dreaming Tulpa 🥓👑vor 1 Jahr

don’t think this is gonna hold up for it but report back!

Profilbild von Russ Shimon

Russ Shimonvor 1 Jahr

Cool. Going to check it out.

Profilbild von Dreaming Tulpa 🥓👑

Dreaming Tulpa 🥓👑vor 1 Jahr

enjoy

Profilbild von Allar Haltsonen

Allar Haltsonenvor 1 Jahr

☄️☄️

Profilbild von aivrar

aivrarvor 1 Jahr

Doesn't seem to make great music, still promising though for general sound effects.

Ähnliche Videos

NEW: Kokoro 82M - APACHE 2.0 licensed, Text to Speech model, trained on < 100 hours of audio 🔥

NEW: Kokoro 82M - APACHE 2.0 licensed, Text to Speech model, trained on < 100 hours of audio 🔥

Vaibhav (VB) Srivastav

330,034 Aufrufe • vor 1 Jahr

Realtime AI Conversations are here! Introducing PlayHT 2.0 Turbo ⚡️ Our new blazing fast Conversational AI Text-to-Speech model with <300ms latency! ✅ Input text streaming from LLMs ✅ Output audio streaming ✅ Clone any voice & accent Try here -

Realtime AI Conversations are here! Introducing PlayHT 2.0 Turbo ⚡️ Our new blazing fast Conversational AI Text-to-Speech model with <300ms latency! ✅ Input text streaming from LLMs ✅ Output audio streaming ✅ Clone any voice & accent Try here -

PlayAI

392,536 Aufrufe • vor 2 Jahren

Wow! New Speech to Speech model - Fish Agent v0.1 3B by Fish Audio 🔥 > Trained on 700K hours of multilingual audio > Continue-pretrained version of Qwen-2.5-3B-Instruct for 200B audio & text tokens > Zero-shot voice cloning > Text + audio input/ Audio output > Ultra-fast inference w/ 200ms TTFA > Models on the Hub & Finetuning code on its way! 🚀 What an amazing time to be alive 🤗

Wow! New Speech to Speech model - Fish Agent v0.1 3B by Fish Audio 🔥 > Trained on 700K hours of multilingual audio > Continue-pretrained version of Qwen-2.5-3B-Instruct for 200B audio & text tokens > Zero-shot voice cloning > Text + audio input/ Audio output > Ultra-fast inference w/ 200ms TTFA > Models on the Hub & Finetuning code on its way! 🚀 What an amazing time to be alive 🤗

Vaibhav (VB) Srivastav

66,945 Aufrufe • vor 1 Jahr

Gemini 3.1 Flash TTS is our most controllable text-to-speech model yet. With new Audio Tags, you can easily direct vocal style, delivery, and pace through text commands. 🧵

Gemini 3.1 Flash TTS is our most controllable text-to-speech model yet. With new Audio Tags, you can easily direct vocal style, delivery, and pace through text commands. 🧵

Google DeepMind

467,720 Aufrufe • vor 1 Monat

🔊Introducing Voxtral TTS: our new frontier open-weight model for natural, expressive, and ultra-fast text-to-speech 🎭Realistic, emotionally expressive speech. 🌍Supports 9 languages and accurately captures diverse dialects. ⚡Very low latency for time-to-first-audio. 🔄Easily adaptable to new voices

🔊Introducing Voxtral TTS: our new frontier open-weight model for natural, expressive, and ultra-fast text-to-speech 🎭Realistic, emotionally expressive speech. 🌍Supports 9 languages and accurately captures diverse dialects. ⚡Very low latency for time-to-first-audio. 🔄Easily adaptable to new voices

Mistral AI

893,008 Aufrufe • vor 2 Monaten

I've got access to the new Open model! It's not DeepSeek V4, but it's something powerful and fast! Here Space Invaders 0-shot, sprites and audio included, clearly prompt was extremely detailed on my side 🔥 🚀 Prompt below.

I've got access to the new Open model! It's not DeepSeek V4, but it's something powerful and fast! Here Space Invaders 0-shot, sprites and audio included, clearly prompt was extremely detailed on my side 🔥 🚀 Prompt below.

Ivan Fioravanti ᯅ

11,441 Aufrufe • vor 2 Monaten

Stability AI just dropped Stable Audio Open Small on Hugging Face Fast Text-to-Audio Generation with Adversarial Post-Training

Stability AI just dropped Stable Audio Open Small on Hugging Face Fast Text-to-Audio Generation with Adversarial Post-Training

AK

55,100 Aufrufe • vor 1 Jahr

Introducing GPT-4o, our new model which can reason across text, audio, and video in real time. It's extremely versatile, fun to play with, and is a step towards a much more natural form of human-computer interaction (and even human-computer-computer interaction):

Introducing GPT-4o, our new model which can reason across text, audio, and video in real time. It's extremely versatile, fun to play with, and is a step towards a much more natural form of human-computer interaction (and even human-computer-computer interaction):

Greg Brockman

4,358,747 Aufrufe • vor 2 Jahren

Bark Text-to-Audio Model Full Text Input: "Why was six afraid of seven?" Ignore Bark's "I'm done with this input" token and tell Bark to just keep generating more audio anyway.

Bark Text-to-Audio Model Full Text Input: "Why was six afraid of seven?" Ignore Bark's "I'm done with this input" token and tell Bark to just keep generating more audio anyway.

Jonathan Fly 👾

461,796 Aufrufe • vor 3 Jahren

Designing an Encoder for Fast Personalization of Text-to-Image Models TL;DR: use an encoder to personalize a text-to-image model to new concepts with a single image and 5-15 tuning steps abs: project page:

Designing an Encoder for Fast Personalization of Text-to-Image Models TL;DR: use an encoder to personalize a text-to-image model to new concepts with a single image and 5-15 tuning steps abs: project page:

AK

165,151 Aufrufe • vor 3 Jahren

Underrated strategy on how to monetize new channels extremely fast 👇🏼

Underrated strategy on how to monetize new channels extremely fast 👇🏼

wannercashcow

26,001 Aufrufe • vor 2 Monaten

VideoComposer brings ControlNet guidance to text and video-to-video. The model enables to combine multiple modalities like text, sketch, style and even motion to drive video generation. The results look extremely good.

VideoComposer brings ControlNet guidance to text and video-to-video. The model enables to combine multiple modalities like text, sketch, style and even motion to drive video generation. The results look extremely good.

Dreaming Tulpa 🥓👑

18,021 Aufrufe • vor 3 Jahren

Starting today you can try our new foundation research model for audio generation. The demo includes Zero shot TTS, Text to sound effects, Infilling and more! Try Audiobox ➡️

Starting today you can try our new foundation research model for audio generation. The demo includes Zero shot TTS, Text to sound effects, Infilling and more! Try Audiobox ➡️

AI at Meta

515,611 Aufrufe • vor 2 Jahren

Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks.

Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks.

OpenAI

22,800,758 Aufrufe • vor 2 Jahren

Can you accurately transcribe fast speech? Tested ' new Speech-to-Text model (Scribe) with Eminem's "Rap God" (4.28 words/sec!) & it nailed it. Great quality and supports 99+ languages.

Can you accurately transcribe fast speech? Tested ' new Speech-to-Text model (Scribe) with Eminem's "Rap God" (4.28 words/sec!) & it nailed it. Great quality and supports 99+ languages.

Addy Osmani

108,720 Aufrufe • vor 1 Jahr

New text and image to video generation AI model Open-Sora-Plan-v1.3.0

New text and image to video generation AI model Open-Sora-Plan-v1.3.0

AK

51,823 Aufrufe • vor 1 Jahr

🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to explore new forms of expression and build applications that were previously out of reach. 🔗 Learn more:

🔉 Introducing SAM Audio, the first unified model that isolates any sound from complex audio mixtures using text, visual, or span prompts. We’re sharing SAM Audio with the community, along with a perception encoder model, benchmarks and research papers, to empower others to explore new forms of expression and build applications that were previously out of reach. 🔗 Learn more:

AI at Meta

1,247,722 Aufrufe • vor 5 Monaten

OK, this is insane.. Alibaba just dropped 4 new AI models at Apsara 2025, and they’re wild: → a 1 trillion parameter LLM → a vision model that codes from images → an omni-model for text/audio/video → and a new Wan 2.5 preview for video + audio gen more details below:👇

OK, this is insane.. Alibaba just dropped 4 new AI models at Apsara 2025, and they’re wild: → a 1 trillion parameter LLM → a vision model that codes from images → an omni-model for text/audio/video → and a new Wan 2.5 preview for video + audio gen more details below:👇

Hamza Khalid

26,707 Aufrufe • vor 8 Monaten

Experimenting with OpenAI's new Text to Speech model 💬 Punctuation is powerful here 🤯

Experimenting with OpenAI's new Text to Speech model 💬 Punctuation is powerful here 🤯

Miguel | AP

202,867 Aufrufe • vor 2 Jahren

This is a big day. Meta is open-sourcing AudioCraft. You can now generate incredible music and sounds with a single prompt. It includes the most performant Generative AI Model (audio) on the market, the "Llama" of Audio. The research framework contains the weights and code of these models: ▸ MusicGen: controllable text-to-music model. ▸ AudioGen: text-to-sound model. ▸ EnCodec: high fidelity neural audio codec. ▸ Multi Band Diffusion: An EnCodec compatible decoder using diffusion. This is going to tremendously speed up audio research 👏

This is a big day. Meta is open-sourcing AudioCraft. You can now generate incredible music and sounds with a single prompt. It includes the most performant Generative AI Model (audio) on the market, the "Llama" of Audio. The research framework contains the weights and code of these models: ▸ MusicGen: controllable text-to-music model. ▸ AudioGen: text-to-sound model. ▸ EnCodec: high fidelity neural audio codec. ▸ Multi Band Diffusion: An EnCodec compatible decoder using diffusion. This is going to tremendously speed up audio research 👏

Lior Alexander

231,551 Aufrufe • vor 2 Jahren