Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

I’ve been exploring Gemini 2.0’s new native audio output capability, which is available for early testers. I’m a developer at Google Creative Lab, and wanted to share one of my favorite experiments so far called ✨ VoiceCursor (🔊 sound on for video) Unlike traditional TTS, native audio lets you... show more

Trudy Painter

3,307 subscribers

67,567 views • 1 year ago •via X (Twitter)

Education Science & Technology

Anya Rossi• Live Now

Private livecam show

10 Comments

Trudy Painter1 year ago

Gemini 2.0 native audio output is available in AI Studio for early testers. The prompt in this screencap is: Say this in an upbeat, happy tone: “You can steer a voice and … put emphasis on different words!” 🔗

Trudy Painter1 year ago

✨Voice Cursor follows a similar prompting strategy. After you highlight a phrase, the Voice Cursor will ask the API for audio for the phrase in your selected voice and tone. (and you can edit the prompt sent to the Gemini API in the bottom box)

Trudy Painter1 year ago

And for me, when the ✨Voice Cursor sits inside a familiar text editor, the highlight interaction feels fluid and comfortable. I’m excited about how native audio might enable new kinds of tools for how we write...

Trudy Painter1 year ago

You can get the code to see how it works at Native audio output is available to early testers now, with a wider rollout expected next year. This voice cursor was built on top of Such a good repo Also - it’s super simple to change the tone prompt presets + how you make calls to the Gemini 2.0 API (see screenshot below).

jpa1 year ago

so cool, trudy!

Codetard1 year ago

:)

Tom Bielecki1 year ago

@codexeditor audio as another annotation layer

Data & Analytics1 year ago

@JeffDean @JeffDean, that native audio output sounds dope! Real game-changer for developers. How’s it stacking up against other tools you’ve tried?

steve ike1 year ago

This is really cool. Thanks for sharing, look forward to checking out the code and learning from your work.

𝑫𝒂𝒏𝒊𝒆𝒍 𝑺𝒄𝒐𝒕𝒕 𝑴𝒂𝒕𝒕𝒉𝒆𝒘𝒔 🇦🇺1 year ago

Oh, the quality is remarkable! Thanks for sharing.

Related Videos

🚨 Breaking: Google is rolling out a native image feature in AI Studio! You can now use Gemini 2.0 Flash Experimental for native image output. This feature is insanely powerful , you can do so much with it. Google is crushing it, no hype, just delivering. Mad respect for the Google DeepMind and Gemini team! I had early access to this for a long time but I wasn’t supposed to share it :) It's amazing, you guys will love it.

🚨 Breaking: Google is rolling out a native image feature in AI Studio! You can now use Gemini 2.0 Flash Experimental for native image output. This feature is insanely powerful , you can do so much with it. Google is crushing it, no hype, just delivering. Mad respect for the Google DeepMind and Gemini team! I had early access to this for a long time but I wasn’t supposed to share it :) It's amazing, you guys will love it.

AshutoshShrivastava

70,802 views • 1 year ago

Gemini 2.0 Flash Experimental has the ability to produce native audio in a variety of styles and languages - all from scratch. 🗣️ Here’s how this is different to traditional text-to-speech systems ↓

Gemini 2.0 Flash Experimental has the ability to produce native audio in a variety of styles and languages - all from scratch. 🗣️ Here’s how this is different to traditional text-to-speech systems ↓

Google DeepMind

125,769 views • 1 year ago

If you try nothing else today, give the demo at a go - it lets you stream video and audio directly to Gemini 2.0 Flash and get audio back, so you can have a real-time audio conversation about what you can see with the model Feels like science fiction!

If you try nothing else today, give the demo at a go - it lets you stream video and audio directly to Gemini 2.0 Flash and get audio back, so you can have a real-time audio conversation about what you can see with the model Feels like science fiction!

Simon Willison

208,950 views • 1 year ago

Google Gemini new Canvas feature is super cool and useful! You can now write code and visualize the output too. I just built this Stripe dashboard in a less than 60 seconds here’s how you can use it: - It works only with Gemini 2.0 Flash. - Select Canvas while giving your prompt. - Available for Free tier too. Share below if you’ve built anything!

Google Gemini new Canvas feature is super cool and useful! You can now write code and visualize the output too. I just built this Stripe dashboard in a less than 60 seconds here’s how you can use it: - It works only with Gemini 2.0 Flash. - Select Canvas while giving your prompt. - Available for Free tier too. Share below if you’ve built anything!

AshutoshShrivastava

45,597 views • 1 year ago

Today we announced Gemini 2.0, our most capable AI model yet. With new advances in multimodality — like native image and audio output — and native tool use, it will enable us to build new AI agents that bring us closer to our vision of a universal assistant.

Today we announced Gemini 2.0, our most capable AI model yet. With new advances in multimodality — like native image and audio output — and native tool use, it will enable us to build new AI agents that bring us closer to our vision of a universal assistant.

Google

137,190 views • 1 year ago

every AI lab picked a different war today Anthropic: lets mistakenly vibe code a $100/mo minimum for Claude Code OpenAI: a new image model for creatives SpaceX: maybe acquiring Cursor, $60B Google: open sourced DESIGN .md this all happened just today. it is only Tuesday 🤣

every AI lab picked a different war today Anthropic: lets mistakenly vibe code a $100/mo minimum for Claude Code OpenAI: a new image model for creatives SpaceX: maybe acquiring Cursor, $60B Google: open sourced DESIGN .md this all happened just today. it is only Tuesday 🤣

klöss

135,972 views • 1 month ago

Teknium 🪽 hey 👋 i work on Gemini at Google i added a few unique Gemini TTS features to Hermes to make the experience much more fun: expressive audio tags and “directors notes”. this lets the hermes agent sound incredibly lifelike in its tts responses check out my PR

Teknium 🪽 hey 👋 i work on Gemini at Google i added a few unique Gemini TTS features to Hermes to make the experience much more fun: expressive audio tags and “directors notes”. this lets the hermes agent sound incredibly lifelike in its tts responses check out my PR

Barron Roth

74,359 views • 11 days ago

Having fun playing with new native audio capabilities in Gemini 1.5 Pro! ♊ Here’s a demo using audio from the #GoogleIO keynote with examples you can try: transcription, word-level timecodes, and searching audio by drawing. (🔊Video has sound)

Having fun playing with new native audio capabilities in Gemini 1.5 Pro! ♊ Here’s a demo using audio from the #GoogleIO keynote with examples you can try: transcription, word-level timecodes, and searching audio by drawing. (🔊Video has sound)

Alexander Chen

77,091 views • 2 years ago

My new song #PALAVA is dropping this midnight!!! This is a such a different type of record for me and i’m in love with it and can’t wait for you to hear it too. See you at midnight!

My new song #PALAVA is dropping this midnight!!! This is a such a different type of record for me and i’m in love with it and can’t wait for you to hear it too. See you at midnight!

Johnny Drille

108,918 views • 2 years ago

Kling AI has impressed me so much with the new Kling 2.0 that I've made a special compilation video to showcase my work so far. 2.0 has the best prompt adherence of any model I've experienced. This is the future! 1.6 was already good, but this is the model I've been waiting for. Everything below is a one-try output. Kling AI

Kling AI has impressed me so much with the new Kling 2.0 that I've made a special compilation video to showcase my work so far. 2.0 has the best prompt adherence of any model I've experienced. This is the future! 1.6 was already good, but this is the model I've been waiting for. Everything below is a one-try output. Kling AI

WuxIA Rocks

14,680 views • 1 year ago

🚨 It is here! Kling 2.6 is launching exclusively on fal day 0! 🎬 Native audio generation for text-to-video and image-to-video 🎵 Cinematic storytelling with expressive audio performances ✨ High-intensity VFX with detailed sound design

🚨 It is here! Kling 2.6 is launching exclusively on fal day 0! 🎬 Native audio generation for text-to-video and image-to-video 🎵 Cinematic storytelling with expressive audio performances ✨ High-intensity VFX with detailed sound design

fal

81,051 views • 6 months ago

Gemini 2.0 Flash now has native image generation 🤯 So this is actually pretty wild, I made a short kids story with images from a simple prompt. Sure the story is simple, but this is so powerful. 🧵 A thread

Gemini 2.0 Flash now has native image generation 🤯 So this is actually pretty wild, I made a short kids story with images from a simple prompt. Sure the story is simple, but this is so powerful. 🧵 A thread

Linus ✦ Ekenstam

39,478 views • 1 year ago

Audio Transcription with Google Gemini 1.5 Flash In this video, Gemini-Flash was able to transcribe 13 minutes of audio in 50-60 seconds. I have tested this with multiple audio files, and the transcription accuracy is close to 99%. If the audio is clear, you get 100% correct transcription. Even if the audio is really bad with a lot of noise, you still get around 95-96% accuracy. The code for this is available on GitHub. If you're interested, you can download and run it. You just need a Google API key.

Audio Transcription with Google Gemini 1.5 Flash In this video, Gemini-Flash was able to transcribe 13 minutes of audio in 50-60 seconds. I have tested this with multiple audio files, and the transcription accuracy is close to 99%. If the audio is clear, you get 100% correct transcription. Even if the audio is really bad with a lot of noise, you still get around 95-96% accuracy. The code for this is available on GitHub. If you're interested, you can download and run it. You just need a Google API key.

AshutoshShrivastava

122,990 views • 1 year ago

Made this to show you my journey so far. I hope you enjoy it ♥️ It commemorates the great change that is moving onto a 2.0 new model ✨ I am really excited for the future ~!

Made this to show you my journey so far. I hope you enjoy it ♥️ It commemorates the great change that is moving onto a 2.0 new model ✨ I am really excited for the future ~!

ChonkyLotus : Your cozy Giantess DragonCat 💗

28,591 views • 1 year ago

✨ Added Gemini 2's AI new experimental edit functionality to Photo AI! It's the state of the art model for editing photos with prompts I think It's just one button in Photo AI called [ 📝 AI Edit ] then you type a prompt in a JS prompt() window and just a few seconds later your edit is done Here I took an AI photo of myself, and then added 1000s of puppies to it with [ 📝 AI Edit ] and then turned it into a video with [ 🎞️ Make video ] Gemini 2 isn't perfect btw, the quality of faces reduces a bit with every edit. A short term fix would pressing Remix and then getting the resemblance back.

✨ Added Gemini 2's AI new experimental edit functionality to Photo AI! It's the state of the art model for editing photos with prompts I think It's just one button in Photo AI called [ 📝 AI Edit ] then you type a prompt in a JS prompt() window and just a few seconds later your edit is done Here I took an AI photo of myself, and then added 1000s of puppies to it with [ 📝 AI Edit ] and then turned it into a video with [ 🎞️ Make video ] Gemini 2 isn't perfect btw, the quality of faces reduces a bit with every edit. A short term fix would pressing Remix and then getting the resemblance back.

@levelsio

653,195 views • 1 year ago

Ok, so the new Midjourney Style Tuner is here and it's a MASSIVE update. There is so much to unpack, no way I can do it all in one post, but here is an overview. You start by typing in /tune followed by your prompt. MJ will generate a range of sample images (between 16-128) showing different visual styles based on your prompt. Then you choose your favorite images from the sample to generate a unique code you can use to customize the look of future jobs. (link below) So for example, I just tried it with the prompt: 💬 1990s, a photo of a woman dressed in fuzzy velour, shimmering midnight hues reflecting sadness by the lake, a contemporary nightmare on cross-processed film --ar 4:5 --c 4 I selected 10 images, which generated the code zlPhgLCHDXol64kPlwSUzu97. I can then add it to my prompt using --style {code} But it makes more sense if you try it yourself, so try playing around with my Style Tuner for that prompt and generate your own code from your selections: Some quick notes: -You can control the level of style applied by adding the --s parameter to your prompt. -You can combine multiple styles together by separating with a hyphen, i.e --style code1-code2 I will be obsessing over this all day tomorrow and sharing everything I learn along the way. This is an incredible update.

Ok, so the new Midjourney Style Tuner is here and it's a MASSIVE update. There is so much to unpack, no way I can do it all in one post, but here is an overview. You start by typing in /tune followed by your prompt. MJ will generate a range of sample images (between 16-128) showing different visual styles based on your prompt. Then you choose your favorite images from the sample to generate a unique code you can use to customize the look of future jobs. (link below) So for example, I just tried it with the prompt: 💬 1990s, a photo of a woman dressed in fuzzy velour, shimmering midnight hues reflecting sadness by the lake, a contemporary nightmare on cross-processed film --ar 4:5 --c 4 I selected 10 images, which generated the code zlPhgLCHDXol64kPlwSUzu97. I can then add it to my prompt using --style {code} But it makes more sense if you try it yourself, so try playing around with my Style Tuner for that prompt and generate your own code from your selections: Some quick notes: -You can control the level of style applied by adding the --s parameter to your prompt. -You can combine multiple styles together by separating with a hyphen, i.e --style code1-code2 I will be obsessing over this all day tomorrow and sharing everything I learn along the way. This is an incredible update.

Nick St. Pierre

377,928 views • 2 years ago

Introducing Gemini 3.1 Flash TTS 🗣️, our latest text to speech model with scene direction, speaker level specificity, audio tags, more natural + expressive voices, and support for 70 different languages. Available via our new audio playground in AI Studio and in the Gemini API!

Introducing Gemini 3.1 Flash TTS 🗣️, our latest text to speech model with scene direction, speaker level specificity, audio tags, more natural + expressive voices, and support for 70 different languages. Available via our new audio playground in AI Studio and in the Gemini API!

Logan Kilpatrick

800,265 views • 2 months ago

Video to Audio is now here in #DreamMachine. To generate sound for your video generations, just select the new "Audio" button. Create with a single click or describe with prompts for more customized direction. Audio is available now in beta for free to all users.

Video to Audio is now here in #DreamMachine. To generate sound for your video generations, just select the new "Audio" button. Create with a single click or describe with prompts for more customized direction. Audio is available now in beta for free to all users.

Luma

2,260,160 views • 1 year ago

Meet Veo 3.1 👋 With Veo 3.1, you can generate videos with richer audio, better narrative control, and enhanced realism. This new update also comes with a suite of additional features built for creative control. Including the ability to: — Extend your videos to make them longer (with audio) — Set the first and last frame of your video to control your output (with audio) — Upload multiple ingredient images to craft your scene (with audio) — Add or remove objects directly in your video These features are available in Flow by Google, the Gemini API, and Google Cloud Vertex AI. Veo 3.1 is also available in the Google Gemini. We’re so excited to see what you create!

Meet Veo 3.1 👋 With Veo 3.1, you can generate videos with richer audio, better narrative control, and enhanced realism. This new update also comes with a suite of additional features built for creative control. Including the ability to: — Extend your videos to make them longer (with audio) — Set the first and last frame of your video to control your output (with audio) — Upload multiple ingredient images to craft your scene (with audio) — Add or remove objects directly in your video These features are available in Flow by Google, the Gemini API, and Google Cloud Vertex AI. Veo 3.1 is also available in the Google Gemini. We’re so excited to see what you create!

Google AI

71,238 views • 8 months ago