Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

We just solved text-to-speech AI. This model can simulate perfect emotion, screaming and show genuine alarm. — clearly beats 11 labs and Sesame — it’s only 1.6B params — streams realtime on 1 GPU — made by a 1.5 person team in Korea!! It's called Dia by Nari Labs.

Deedy

249,597 subscribers

710,866 views • 1 year ago •via X (Twitter)

News & Politics Science & Technology Education

Anya Rossi• Live Now

Private livecam show

11 Comments

Deedy1 year ago

Source:

Deedy1 year ago

The future is about to look really weird. Audio may have just crossed the uncanny valley (like parts of text and Ike have) into most-humans-wont-know-this-is-AI territory

MightyBot1 year ago

🧠 Unified Search. Smarter Meetings. Effortless CRM. MightyBot is your AI agent platform for seamless workflows—record meetings, automate CRM updates, and find answers across apps in seconds. 🌟 Focus on what matters. We'll handle the grind.

Yuchen Jin1 year ago

what is the 0.5 person in the 1.5 person team? 😂

Deedy1 year ago

Part time research engineer!

Mudit Juneja1 year ago

Who are we here? Are you tied to this project?

Deedy1 year ago

We = humanity

Cr33d1 year ago

1.5 people?! Did the 0.5 person just handle the screaming?

Rithik Chopra1 year ago

Damn that’s crazy!!!

Albert Sebastian1 year ago

whats your take on hume ai?

Cr33d1 year ago

Perfect emotion? Finally, my toaster can apologize for burning my toast! 😂

Related Videos

How did a tiny, scrappy team build one of the most powerful AI voice models? In a deep dive with Sesame CTO Ankit Kumar and a16z's Anjney Midha, we explore how Sesame is pushing the boundaries of AI conversation, why it open-sourced its speech generation model, and the power of small teams to outdo much larger AI labs on product focus. A part of their secret: a relentless focus on real-time, natural conversations over raw intelligence, and a deep commitment to voice, personality, and user experience. By opening up its speech generation model, Sesame is paving the way for even more breakthroughs in AI-native conversation 👇

How did a tiny, scrappy team build one of the most powerful AI voice models? In a deep dive with Sesame CTO Ankit Kumar and a16z's Anjney Midha, we explore how Sesame is pushing the boundaries of AI conversation, why it open-sourced its speech generation model, and the power of small teams to outdo much larger AI labs on product focus. A part of their secret: a relentless focus on real-time, natural conversations over raw intelligence, and a deep commitment to voice, personality, and user experience. By opening up its speech generation model, Sesame is paving the way for even more breakthroughs in AI-native conversation 👇

a16z

29,930 views • 1 year ago

[#ONEPIECE] SHAMROCK VS GUNKO ANIMATION (45s of fight) ‼️ this video is a fan fiction and was made using SEEDANCE 2.0 on CAPCUT using Ai Labs Studio on MOBILE❤️ANYONE CAN NOW CREATE IT'S OWN ANIME 🚨 show your love in the comments / by liking #ONEPIECE1184 #OPSPOILERS #ANIME

[#ONEPIECE] SHAMROCK VS GUNKO ANIMATION (45s of fight) ‼️ this video is a fan fiction and was made using SEEDANCE 2.0 on CAPCUT using Ai Labs Studio on MOBILE❤️ANYONE CAN NOW CREATE IT'S OWN ANIME 🚨 show your love in the comments / by liking #ONEPIECE1184 #OPSPOILERS #ANIME

Fotachu - AR GUY

17,593 views • 2 months ago

Wild. Kimi K2 Thinking just released and it's insane. It's an AI model that can run by itself for hours on end and make HUNDREDS of tool calls It's the 1st model I think that can replace humans In this video I show why it's so special and how to use it to build your first app

Wild. Kimi K2 Thinking just released and it's insane. It's an AI model that can run by itself for hours on end and make HUNDREDS of tool calls It's the 1st model I think that can replace humans In this video I show why it's so special and how to use it to build your first app

Alex Finn

70,013 views • 8 months ago

JUST IN: Google releases Gemini 1.5, a powerful MoE model. It's a huge breakthrough. The model has the longest context window ever seen: 1 million tokens. It can process 1 hour of video, 11 hours of audio, 30,000 lines of code, or 700,000 words in a single prompt. When tested on text, code, image, audio and video evaluations, 1.5 Pro outperforms 1.0 Pro on 87% of the benchmarks used for developing LLMs. You can can sign up in AI Studio to try it out.

JUST IN: Google releases Gemini 1.5, a powerful MoE model. It's a huge breakthrough. The model has the longest context window ever seen: 1 million tokens. It can process 1 hour of video, 11 hours of audio, 30,000 lines of code, or 700,000 words in a single prompt. When tested on text, code, image, audio and video evaluations, 1.5 Pro outperforms 1.0 Pro on 87% of the benchmarks used for developing LLMs. You can can sign up in AI Studio to try it out.

Lior Alexander

83,432 views • 2 years ago

Mustafa Suleyman, CEO of Microsoft AI, takes a swipe at Anthropic’s internal ‘AI consciousness’ talk—says it’s baseless and could make shutdown decisions harder. Says labs flirting with AI personhood, if you convince yourself the model can suffer, then you’ll start “protecting” it, that could end up being unwilling to shut systems down, even when they’re clearly risky. "There is a growing belief in some labs, particularly inside Anthropic, that these models are conscious. If it’s a conscious being that’s aware of itself and can suffer, then it deserves moral protection. This is a serious area of academic research, not just outside, but inside some labs. I think it’s very concerning and totally without merit or basis. If we go down that path, it becomes a slippery slope where we won’t be prepared to turn these systems off." --- From 'Financial Times' YT channel ( link in comment)

Mustafa Suleyman, CEO of Microsoft AI, takes a swipe at Anthropic’s internal ‘AI consciousness’ talk—says it’s baseless and could make shutdown decisions harder. Says labs flirting with AI personhood, if you convince yourself the model can suffer, then you’ll start “protecting” it, that could end up being unwilling to shut systems down, even when they’re clearly risky. "There is a growing belief in some labs, particularly inside Anthropic, that these models are conscious. If it’s a conscious being that’s aware of itself and can suffer, then it deserves moral protection. This is a serious area of academic research, not just outside, but inside some labs. I think it’s very concerning and totally without merit or basis. If we go down that path, it becomes a slippery slope where we won’t be prepared to turn these systems off." --- From 'Financial Times' YT channel ( link in comment)

Rohan Paul

19,259 views • 5 months ago

This is the craziest model you'll see this week! This model powers Figma. It's also behind Wayfair and a bunch of major e-commerce retailers. This model is a PIXEL-PERFECT image editing model. Literally, "pixel perfect". The model is capable of modifying one image while keeping everything else untouched. It's a model designed, developed, and deployed by the research team at Jasper. This beats OpenAI models, Nano Banana, and every single general image model I've seen so far. It's up to 100x less expensive to run, and up to 10X faster. I recorded a quick video to show you how impressive this is. There's a ton of research behind this model. I'm adding two links below so you can read about how this model works, specifically about "Latent Bridge Matching" and "Flash Diffusion".

This is the craziest model you'll see this week! This model powers Figma. It's also behind Wayfair and a bunch of major e-commerce retailers. This model is a PIXEL-PERFECT image editing model. Literally, "pixel perfect". The model is capable of modifying one image while keeping everything else untouched. It's a model designed, developed, and deployed by the research team at Jasper. This beats OpenAI models, Nano Banana, and every single general image model I've seen so far. It's up to 100x less expensive to run, and up to 10X faster. I recorded a quick video to show you how impressive this is. There's a ton of research behind this model. I'm adding two links below so you can read about how this model works, specifically about "Latent Bridge Matching" and "Flash Diffusion".

Santiago

25,861 views • 9 months ago

🚨 THIS IS LITERALLY HUGE 🚨 The team that solved "app sprawl" just dropped their answer to "AI sprawl" Today I'm THRILLED to help bring Brain MAX by ClickUp to the Product Hunt 😸 community 🚀 This native desktop app gives you productivity superpowers by unifying ALL AI models into one place What makes Brain MAX different: ✨ Every AI model → One app ✨ Deep app integrations (not surface level) ✨ AI completes your actual tasks ✨ Talk-to-text that writes like YOU ✨ Daily agenda with prepped meeting context ✨ No waitlist (available NOW) This replaces 5+ apps you're juggling right now and the innovation labs team have been cooking around the clock on this 🍳 Zeb Evans Chris Cunningham and the ClickUp team are looking forward to hearing from you WOULD LOVE for you to check it out and show some love and share your comments on PH (link below) 👇

🚨 THIS IS LITERALLY HUGE 🚨 The team that solved "app sprawl" just dropped their answer to "AI sprawl" Today I'm THRILLED to help bring Brain MAX by ClickUp to the Product Hunt 😸 community 🚀 This native desktop app gives you productivity superpowers by unifying ALL AI models into one place What makes Brain MAX different: ✨ Every AI model → One app ✨ Deep app integrations (not surface level) ✨ AI completes your actual tasks ✨ Talk-to-text that writes like YOU ✨ Daily agenda with prepped meeting context ✨ No waitlist (available NOW) This replaces 5+ apps you're juggling right now and the innovation labs team have been cooking around the clock on this 🍳 Zeb Evans Chris Cunningham and the ClickUp team are looking forward to hearing from you WOULD LOVE for you to check it out and show some love and share your comments on PH (link below) 👇

KP

32,478 views • 1 year ago

This is the most realistic voice cloning I’ve ever heard. It captures voice, emotion, speaking style, and even your rhythm. Powered by EVI 3, Hume’s next-gen speech-to-speech model that: ➔ Streams your speech → replies in real-time ➔ Understands tone and context ➔ Adapts emotion on cue (“whisper fearfully,” “sound excited”) Try it now: 🔗 (live demo, no prompt edits) 🔗 (customize voices, integrate with LLMs) You can even integrate it with top AI models like Groq, Anthropic, and DeepSeek 👇 Talk to the voice I cloned here:

This is the most realistic voice cloning I’ve ever heard. It captures voice, emotion, speaking style, and even your rhythm. Powered by EVI 3, Hume’s next-gen speech-to-speech model that: ➔ Streams your speech → replies in real-time ➔ Understands tone and context ➔ Adapts emotion on cue (“whisper fearfully,” “sound excited”) Try it now: 🔗 (live demo, no prompt edits) 🔗 (customize voices, integrate with LLMs) You can even integrate it with top AI models like Groq, Anthropic, and DeepSeek 👇 Talk to the voice I cloned here:

Ali Sufian

15,063 views • 1 year ago

The German Federal Innovation Agency is running a €125m competition to fund 3 new Frontier AI labs in Europe! "Next Frontier AI" is a pan-European initiative led by SPRIND, Federal Agency for Breakthrough Innovation to create three globally relevant Frontier AI labs. It starts with a €125m challenge where up to 10 teams receive funding over 24 months to build frontier-lab capabilities, including team, infrastructure, prototypes, evaluation discipline, and early pilots. The best-performing teams will then receive structured support toward raising ~€1 billion each. Europe has some of the top AI companies in the world, but as a continent we are massively dependent on frontier labs from other countries. This initiative seeks to change that and I am HERE FOR IT.

The German Federal Innovation Agency is running a €125m competition to fund 3 new Frontier AI labs in Europe! "Next Frontier AI" is a pan-European initiative led by SPRIND, Federal Agency for Breakthrough Innovation to create three globally relevant Frontier AI labs. It starts with a €125m challenge where up to 10 teams receive funding over 24 months to build frontier-lab capabilities, including team, infrastructure, prototypes, evaluation discipline, and early pilots. The best-performing teams will then receive structured support toward raising ~€1 billion each. Europe has some of the top AI companies in the world, but as a continent we are massively dependent on frontier labs from other countries. This initiative seeks to change that and I am HERE FOR IT.

Seb Johnson

24,124 views • 3 months ago

Today we’ve raised $52M Seed and we are announcing the public launch of S2.1 Pro. >It can clone a voice from 5 seconds of audio >2x faster than Cartesia & 1/6th the cost of Eleven Labs >most expressive model with word level control over emotion, intonation, pacing etc We support frontier AI companies including HeyGen, LiveKit, Retell, Sanas, and OpenArt all run our model in production. If you're a business and we can't cut your voice AI costs by 50%, we'll give you 1 year of Fish Audio for free. Book a demo: To celebrate our first birthday, we'll give you 1 month of S2.1 Pro for free. Like, retweet, and comment “Fish” to get it.

Today we’ve raised $52M Seed and we are announcing the public launch of S2.1 Pro. >It can clone a voice from 5 seconds of audio >2x faster than Cartesia & 1/6th the cost of Eleven Labs >most expressive model with word level control over emotion, intonation, pacing etc We support frontier AI companies including HeyGen, LiveKit, Retell, Sanas, and OpenArt all run our model in production. If you're a business and we can't cut your voice AI costs by 50%, we'll give you 1 year of Fish Audio for free. Book a demo: To celebrate our first birthday, we'll give you 1 month of S2.1 Pro for free. Like, retweet, and comment “Fish” to get it.

Fish Audio

6,092,394 views • 5 days ago

✨ It's not even 2026 yet, but you can start using AI world models today! This is World Labs running inside my app 🏡 Interior AI letting you redesign your home with AI and then walk through it! It's really cool and of course not perfect especially since for now it's based on just one image, but multi-image will allow for more precise understanding of the room As always this is just the first version, and it'll get better from here World Labs is founded by Fei-Fei Li, a Computer Science professor from Stanford University, and other people like Justin Johnson, Christoph Lassner, and Ben Mildenhall "each a world-renowned technologist in computer vision and graphics" The tech essentially comes down to img2splat, search what a guassian splat is if you don't know It's in beta and wait list, but I'm trying to get this live for everyone as soon as possible as it seems like such an essential feature for interior design P.S. I also tried to worldify people photos (for Photo AI) but World Labs for safety right now seems to block it off and make it a black square, maybe in the future

✨ It's not even 2026 yet, but you can start using AI world models today! This is World Labs running inside my app 🏡 Interior AI letting you redesign your home with AI and then walk through it! It's really cool and of course not perfect especially since for now it's based on just one image, but multi-image will allow for more precise understanding of the room As always this is just the first version, and it'll get better from here World Labs is founded by Fei-Fei Li, a Computer Science professor from Stanford University, and other people like Justin Johnson, Christoph Lassner, and Ben Mildenhall "each a world-renowned technologist in computer vision and graphics" The tech essentially comes down to img2splat, search what a guassian splat is if you don't know It's in beta and wait list, but I'm trying to get this live for everyone as soon as possible as it seems like such an essential feature for interior design P.S. I also tried to worldify people photos (for Photo AI) but World Labs for safety right now seems to block it off and make it a black square, maybe in the future

@levelsio

428,893 views • 10 months ago

Taking talking shop to a whole new level. We just shipped Glean’s real-time voice capability, powered by OpenAI’s newest speech model GPT-Realtime-2. Grounded in the context across your org, it feels like a real AI coworker and can keep up with how work gets finished. In internal evals, GPT-Realtime-2 delivered a 42.9% relative increase in helpfulness over its previous version. Give it a try. It speaks for itself. OpenAIDevs

Taking talking shop to a whole new level. We just shipped Glean’s real-time voice capability, powered by OpenAI’s newest speech model GPT-Realtime-2. Grounded in the context across your org, it feels like a real AI coworker and can keep up with how work gets finished. In internal evals, GPT-Realtime-2 delivered a 42.9% relative increase in helpfulness over its previous version. Give it a try. It speaks for itself. OpenAIDevs

Glean

16,588 views • 2 months ago

Introducing open source, real-time demo for Comfy Deploy, streaming in any params (text, number, images). Workflow by Julien Blanchon 🇺🇦 🫶 Running serverless gpu on Modal Drawing UI tldraw Inspired by Krea, fal 's real-time demo! Huge respects! Demo on GitHub. For context: tbh this is just a technical demo, truly meant to showcase how powerfully Comfy UI is when you get to expose the params. I'm also very grateful to even get the chance to meet with the team behind krea and fal in SF, these are very creative people and I have huge respect. Everything you see here is running behind a workflow done by julien when I asked for help in the discord, for which I am really really grateful. I'm looking toward a future where real-time generative AI becomes more accessible, and by contributing to open source, a lot more people can build creative AI apps and let their imaginations go wild. benny

Introducing open source, real-time demo for Comfy Deploy, streaming in any params (text, number, images). Workflow by Julien Blanchon 🇺🇦 🫶 Running serverless gpu on Modal Drawing UI tldraw Inspired by Krea, fal 's real-time demo! Huge respects! Demo on GitHub. For context: tbh this is just a technical demo, truly meant to showcase how powerfully Comfy UI is when you get to expose the params. I'm also very grateful to even get the chance to meet with the team behind krea and fal in SF, these are very creative people and I have huge respect. Everything you see here is running behind a workflow done by julien when I asked for help in the discord, for which I am really really grateful. I'm looking toward a future where real-time generative AI becomes more accessible, and by contributing to open source, a lot more people can build creative AI apps and let their imaginations go wild. benny

BennyKok

165,435 views • 2 years ago

We raised $4.8M to build Lamina Labs (YC P26) and make AI explain with video, not just text. Introducing Simi, the fastest way to make an explainer video. Give it a single prompt or document, and it can generate a one-minute narrated explainer in as little as 20 seconds, in 80+ languages. It works like ChatGPT, except the answer is a video. The round was led by Foundation Capital , with participation from Link Ventures and Y Combinator . This is our story. Thread below on what we are building at.

We raised $4.8M to build Lamina Labs (YC P26) and make AI explain with video, not just text. Introducing Simi, the fastest way to make an explainer video. Give it a single prompt or document, and it can generate a one-minute narrated explainer in as little as 20 seconds, in 80+ languages. It works like ChatGPT, except the answer is a video. The round was led by Foundation Capital , with participation from Link Ventures and Y Combinator . This is our story. Thread below on what we are building at.

Sudip

1,059,568 views • 3 days ago

1/3 Today we’re starting to roll out a new AI Mode capability to perform complex analysis with data visualization in Labs. We showed this on stage at I/O, and starting today you’ll be able to try this out first for questions about stocks and mutual funds. You can ask to compare any stocks, show price history over a given period, and much more. More details below 🧵

1/3 Today we’re starting to roll out a new AI Mode capability to perform complex analysis with data visualization in Labs. We showed this on stage at I/O, and starting today you’ll be able to try this out first for questions about stocks and mutual funds. You can ask to compare any stocks, show price history over a given period, and much more. More details below 🧵

Robby Stein

33,218 views • 1 year ago

I made this short drama with SkyReels in no time This isn’t just another AI video tool, it's the world’s first text-to-film agent. It writes your script, builds the storyboard, generates consistent characters, voices, music, lip sync… and even edits the entire film on its own. And I'm going to show you exactly how you can do it yourself. Step-by-step tutorial below:

I made this short drama with SkyReels in no time This isn’t just another AI video tool, it's the world’s first text-to-film agent. It writes your script, builds the storyboard, generates consistent characters, voices, music, lip sync… and even edits the entire film on its own. And I'm going to show you exactly how you can do it yourself. Step-by-step tutorial below:

Amira Zairi

20,010 views • 1 year ago

i was playing around the newest TTS (text to speech) model 'kokoro' which has just 82M parameters! by adding only 1 line of code you can create a custom voice that mixes any two (out of ten voices) in any ratio example: mixing one male and one female voice in the 60-40 ratio

i was playing around the newest TTS (text to speech) model 'kokoro' which has just 82M parameters! by adding only 1 line of code you can create a custom voice that mixes any two (out of ten voices) in any ratio example: mixing one male and one female voice in the 60-40 ratio

maharshi

88,729 views • 1 year ago

Text to Sound Effects is here. Our newest AI Audio model generates sound effects, short instrumental tracks, soundscapes, and a wide variety of character voices, all from a text prompt. Available now for all users. Everyone from content creators, video game developers, to film and television studios, uses sound effects to create rich and immersive content. Now, in addition to AI voiceovers, you can generate all of the sounds you need with just a prompt. Everything you hear in this video was generated by ElevenLabs sound and voice models. In this thread, we shared some additional clips that help show off the range of this new model.

Text to Sound Effects is here. Our newest AI Audio model generates sound effects, short instrumental tracks, soundscapes, and a wide variety of character voices, all from a text prompt. Available now for all users. Everyone from content creators, video game developers, to film and television studios, uses sound effects to create rich and immersive content. Now, in addition to AI voiceovers, you can generate all of the sounds you need with just a prompt. Everything you hear in this video was generated by ElevenLabs sound and voice models. In this thread, we shared some additional clips that help show off the range of this new model.

ElevenLabs

450,763 views • 2 years ago

When it comes to building CCNA labs, you don't have to pick just one. There is massive value in using real hardware, Packet Tracer, and GNS3/CML together! - Real Hardware: Gives you that crucial experience of cabling, connecting devices, and using Putty to SSH into devices, you don't get that feeling just by clicking an icon! - Virtualization (GNS3/CML): Setting up a virtual machine and connecting it to your real network is an entire education on its own! - Packet Tracer: Perfect for running through pre-made, step-by-step instructor labs, which mirror exactly what you will see on the exam simulations. You can master them all, and it will surely help you clear your CCNA with ease. Join the Summer of CCNA right now to experience the real labs: 🚀

When it comes to building CCNA labs, you don't have to pick just one. There is massive value in using real hardware, Packet Tracer, and GNS3/CML together! - Real Hardware: Gives you that crucial experience of cabling, connecting devices, and using Putty to SSH into devices, you don't get that feeling just by clicking an icon! - Virtualization (GNS3/CML): Setting up a virtual machine and connecting it to your real network is an entire education on its own! - Packet Tracer: Perfect for running through pre-made, step-by-step instructor labs, which mirror exactly what you will see on the exam simulations. You can master them all, and it will surely help you clear your CCNA with ease. Join the Summer of CCNA right now to experience the real labs: 🚀

NetworkChuck

13,943 views • 16 days ago