Loading video...
Video Failed to Load
I’ve been exploring Gemini 2.0’s new native audio output capability, which is available for early testers. I’m a developer at Google Creative Lab, and wanted to share one of my favorite experiments so far called ✨ VoiceCursor (🔊 sound on for video) Unlike traditional TTS, native audio lets you... show more
67,567 views • 1 year ago •via X (Twitter)
10 Comments

Gemini 2.0 native audio output is available in AI Studio for early testers. The prompt in this screencap is: Say this in an upbeat, happy tone: “You can steer a voice and … put emphasis on different words!” 🔗

✨Voice Cursor follows a similar prompting strategy. After you highlight a phrase, the Voice Cursor will ask the API for audio for the phrase in your selected voice and tone. (and you can edit the prompt sent to the Gemini API in the bottom box)

And for me, when the ✨Voice Cursor sits inside a familiar text editor, the highlight interaction feels fluid and comfortable. I’m excited about how native audio might enable new kinds of tools for how we write...

You can get the code to see how it works at Native audio output is available to early testers now, with a wider rollout expected next year. This voice cursor was built on top of Such a good repo Also - it’s super simple to change the tone prompt presets + how you make calls to the Gemini 2.0 API (see screenshot below).

so cool, trudy!

:)

@codexeditor audio as another annotation layer

@JeffDean @JeffDean, that native audio output sounds dope! Real game-changer for developers. How’s it stacking up against other tools you’ve tried?

This is really cool. Thanks for sharing, look forward to checking out the code and learning from your work.

Oh, the quality is remarkable! Thanks for sharing.


