Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

I built a simple voice assistant in 70 lines of Python code. It uses: • LiveKit - The voice agent • AssemblyAI - To turn your voice into text • OpenAI - The brain of the agent, and to turn text into audio There's something really cool about this:

Santiago

451,925 subscribers

34,632 views • 1 year ago •via X (Twitter)

Science & Technology Education

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

Add a face to your voice agent. LiveAvatar by HeyGen is now supported in LiveKit Agents. Add a realtime human avatar to your agent without rebuilding the conversation loop. Your LiveKit agent still owns the room, turn-taking, model orchestration, and voice pipeline. LiveAvatar renders the synchronized face and video stream. Useful for product demos, onboarding, tutoring, and support agents that need a visual layer.

LiveKit

10,761 views • 2 months ago

Here is the fastest way to turn text into a voice. Less than 100ms to start streaming. This is an absolute must for anyone who wants to build a voice agent that doesn't sound robotic (latency kills the perception of reality). Here is how easy this is:

Here is the fastest way to turn text into a voice. Less than 100ms to start streaming. This is an absolute must for anyone who wants to build a voice agent that doesn't sound robotic (latency kills the perception of reality). Here is how easy this is:

Santiago

38,347 views • 3 months ago

We shipped Agent Console, a realtime debugging surface for voice agents. Talk to your agent and see the entire pipeline live, from audio and latency to tool calls, transcripts, and participant state. Available now in the LiveKit Cloud dashboard.

We shipped Agent Console, a realtime debugging surface for voice agents. Talk to your agent and see the entire pipeline live, from audio and latency to tool calls, transcripts, and participant state. Available now in the LiveKit Cloud dashboard.

LiveKit

11,915 views • 3 months ago

Learn to build conversational AI voice agents in "Building AI Voice Agents for Production", created in collaboration with LiveKit and RealAvatar, and taught by dsa (Co-founder & CEO of LiveKit), Shayne (Developer Advocate, LiveKit), and Nedelina Teneva (Head of AI at RealAvatar, an AI Fund portfolio company). Voice agents combine speech and reasoning capabilities to enable real-time conversations. They're already being used to support customer service, to improve accessibility in healthcare, for entertainment applications, and for talk therapy. In this course, you’ll learn to build voice agents that listen, reason, and respond naturally. You’ll follow the architecture used to create the "AI Andrew" Avatar, a collaborative project between and RealAvatar that responds to users in what sounds like my voice. You’ll build a voice agent from scratch and deploy it to the cloud, enabling support for many simultaneous users. What you’ll learn: - Understand the fundamentals of voice agents, including key components like speech-to-text (STT), text-to-speech (TTS), and LLMs, and how latency is introduced at each layer. - Explore voice agent architectures and the trade-offs between modular pipelines and speech-to-speech APIs. - Explore how platforms like LiveKit mitigate latency issues with optimized networking infrastructure and low-latency communication protocols. - Learn how to connect client devices to voice agents using WebRTC—and why it outperforms HTTP and WebSocket for low-latency audio streaming. - Incorporate voice activity detection (VAD), end-of-turn detection, and context management to detect turns, handle interruptions, and manage conversational flow. - Understand the trade-offs between latency, quality, and cost in an example in which you build a voice agent and change its voice. - Equip your agent with metrics to measure latency at each stage of the voice pipeline and learn the key levers you can pull to make your agent faster and more responsive. The voice agents built in this course also incorporate voice technology from , a supporting contributor to the project. By the end of this course, you'll have learned the components of an AI voice agent pipeline, combined them into a system with low-latency communication, and deployed them on cloud infrastructure so it scales to many users. I’m looking forward to seeing what voice agents you build from this course! Please sign up here:

Andrew Ng

87,484 views • 1 year ago

I built a service desk agent using ElevenLabs’ new Conversational AI Agents feature. Watch the video to see how responsive it is! Previously, I used ElevenLabs for cloning my voice and used my generated voice for narrations in some of my youtube videos. This feature takes elevenlabs' voice AI to a whole new level! It simplifies systems that used to require separate TTS (text-to-speech) and STT (speech-to-text) processes for both sides of the conversation. Now, it’s much simpler! Create your own agent here What will you create with this?

I built a service desk agent using ElevenLabs’ new Conversational AI Agents feature. Watch the video to see how responsive it is! Previously, I used ElevenLabs for cloning my voice and used my generated voice for narrations in some of my youtube videos. This feature takes elevenlabs' voice AI to a whole new level! It simplifies systems that used to require separate TTS (text-to-speech) and STT (speech-to-text) processes for both sides of the conversation. Now, it’s much simpler! Create your own agent here What will you create with this?

Melvin Vivas

27,722 views • 1 year ago

I really enjoyed doing these voice impressions, and honestly, I'd love to one day turn the dream of becoming a voice actor into reality!

I really enjoyed doing these voice impressions, and honestly, I'd love to one day turn the dream of becoming a voice actor into reality!

Ndukauba

99,205 views • 8 months ago

We shipped the tutorial for Agents UI. In 5 minutes you'll have a fully wired voice agent frontend with audio visualizers, media controls, and session management built directly into your codebase. Watch it, build it, own it. shadcn inside™.

We shipped the tutorial for Agents UI. In 5 minutes you'll have a fully wired voice agent frontend with audio visualizers, media controls, and session management built directly into your codebase. Watch it, build it, own it. shadcn inside™.

LiveKit

20,232 views • 4 months ago

Preview of a Gaia 🌱 device! Turn-by-turn voice conversation with your own knowledge assistant (a Gaia node). 💸 $15 worth of hardware (pre-tariff!) 🌱 Voice-to-text, LLM and text-to-voice AI models on Gaia nodes 📖 Open source all-the-way — hardware, firmware, server, AI models 🦀 Rust all-the-way from firmware to AI server Notice how I switched to another language (Chinese) in the middle of the conversation and it seamlessly switched too. Very smart. PS. we are still optimizing performance before the software release.

Preview of a Gaia 🌱 device! Turn-by-turn voice conversation with your own knowledge assistant (a Gaia node). 💸 $15 worth of hardware (pre-tariff!) 🌱 Voice-to-text, LLM and text-to-voice AI models on Gaia nodes 📖 Open source all-the-way — hardware, firmware, server, AI models 🦀 Rust all-the-way from firmware to AI server Notice how I switched to another language (Chinese) in the middle of the conversation and it seamlessly switched too. Very smart. PS. we are still optimizing performance before the software release.

Michael Yuan

14,983 views • 1 year ago

What if you could talk to your Telegram bot and it actually talked back? Learn how built a voice-enabled Telegram bot with the Gemini Interactions API in ~400 lines of Python. Send a voice note in any language, Gemini understands the audio and replies with text and a spoken voice message. Uses: - Gemini 3.1 Flash Lite for reasoning, 3.1 Flash TTS for speech - Interactions API handles multi-turn memory server-side - Native audio input, no transcription step needed - Deploys to Cloud Run with scale-to-zero Awesome work by Thor 雷神 ⚡️. 🤗

What if you could talk to your Telegram bot and it actually talked back? Learn how built a voice-enabled Telegram bot with the Gemini Interactions API in ~400 lines of Python. Send a voice note in any language, Gemini understands the audio and replies with text and a spoken voice message. Uses: - Gemini 3.1 Flash Lite for reasoning, 3.1 Flash TTS for speech - Interactions API handles multi-turn memory server-side - Native audio input, no transcription step needed - Deploys to Cloud Run with scale-to-zero Awesome work by Thor 雷神 ⚡️. 🤗

Philipp Schmid

11,156 views • 3 months ago

Introducing Speech Engine. Developers can now turn their existing chat agent into a full voice agent with one prompt. Speech Engine combines our leading speech, transcription, and voice orchestration models into a single pipeline - all custom built to work best together.

Introducing Speech Engine. Developers can now turn their existing chat agent into a full voice agent with one prompt. Speech Engine combines our leading speech, transcription, and voice orchestration models into a single pipeline - all custom built to work best together.

ElevenLabs

134,449 views • 2 months ago

New course: Add voice to your AI agents and applications, built with Vocal Bridge (disclosure: an AI Fund portfolio company) and taught by its CEO Ashwyn Sharma. Voice applications historically required making a hard tradeoff: using fast voice-to-voice models that sacrifice reliability, or accurate speech-to-text pipelines that add latency. This course teaches you how to build voice agents that are both reliable and fast. You'll build three types of voice-enabled applications: a voice-interactive game where voice commands and mouse clicks work together over a single channel, an agent that gains a voice in about 10 lines of code without touching its prompts or tools, and an agent that places outbound phone calls using a make_phone_call function. Skills you'll gain: - Add a voice layer to an existing agent without rewriting your prompts, RAG pipeline, or tools - Give an agent the ability to place outbound calls and stream transcripts back live - Set up voice evaluation to score calls, catch regressions, and improve quality before deployment Join and add voice to your agents without overhauling your architecture:

New course: Add voice to your AI agents and applications, built with Vocal Bridge (disclosure: an AI Fund portfolio company) and taught by its CEO Ashwyn Sharma. Voice applications historically required making a hard tradeoff: using fast voice-to-voice models that sacrifice reliability, or accurate speech-to-text pipelines that add latency. This course teaches you how to build voice agents that are both reliable and fast. You'll build three types of voice-enabled applications: a voice-interactive game where voice commands and mouse clicks work together over a single channel, an agent that gains a voice in about 10 lines of code without touching its prompts or tools, and an agent that places outbound phone calls using a make_phone_call function. Skills you'll gain: - Add a voice layer to an existing agent without rewriting your prompts, RAG pipeline, or tools - Give an agent the ability to place outbound calls and stream transcripts back live - Set up voice evaluation to score calls, catch regressions, and improve quality before deployment Join and add voice to your agents without overhauling your architecture:

Andrew Ng

84,692 views • 1 month ago

You can turn an existing LangGraph agent into a fully functional voice agent with Pipecat AI. This 17-minute walkthrough shows you exactly how to do it.

You can turn an existing LangGraph agent into a fully functional voice agent with Pipecat AI. This 17-minute walkthrough shows you exactly how to do it.

LangChain

18,349 views • 1 month ago

Today I built this voice agent while on a call with it Introducing /this-needs-a-call - it's a slash command to start a phone call - grill or be grilled by the voice agent in realtime - your coding agent listens in on the call via MCP and works while you talk for hours

Today I built this voice agent while on a call with it Introducing /this-needs-a-call - it's a slash command to start a phone call - grill or be grilled by the voice agent in realtime - your coding agent listens in on the call via MCP and works while you talk for hours

jacob paris ▲

32,463 views • 26 days ago

Introducing --agent flag in CodeRabbit CLI 🎉 The new --agent flag turns CodeRabbit into a tool your AI agent can use, providing structured JSON output instead of terminal text. Your agent writes code, CodeRabbit reviews it, reads the JSON, and fixes what's flagged.

Introducing --agent flag in CodeRabbit CLI 🎉 The new --agent flag turns CodeRabbit into a tool your AI agent can use, providing structured JSON output instead of terminal text. Your agent writes code, CodeRabbit reviews it, reads the JSON, and fixes what's flagged.

CodeRabbit

22,404 views • 3 months ago

"Free, accurate voice transcription in the browser using Whisper" ⚡️ I'm happy to share a modern voice-to-text app (Say) that uses in-browser AI (Whisper and T5) to offer voice transcription, text summaries and note management. Everything is done locally in a privacy-friendly way. Built with React and Transformers.js by Xenova

"Free, accurate voice transcription in the browser using Whisper" ⚡️ I'm happy to share a modern voice-to-text app (Say) that uses in-browser AI (Whisper and T5) to offer voice transcription, text summaries and note management. Everything is done locally in a privacy-friendly way. Built with React and Transformers.js by Xenova

Addy Osmani

36,822 views • 1 year ago

BREAKING: ChatGPT GPT-4o was just announce by OpenAI. It improves on vision, audio and text. The ease of use is incredibly enhanced. It makes interaction with the GPT much more natural, especially with voice. GPT-4o reasons across voice, text and vision. GPT-4 wil be available to everyone.

BREAKING: ChatGPT GPT-4o was just announce by OpenAI. It improves on vision, audio and text. The ease of use is incredibly enhanced. It makes interaction with the GPT much more natural, especially with voice. GPT-4o reasons across voice, text and vision. GPT-4 wil be available to everyone.

Ed Krassenstein

21,605 views • 2 years ago

Hacked today: A realtime voice-to-voice assistant for my Mac that runs in the background and helps me be productive. Stack - RealtimeTTS package (for python) - Groq for fast inference - Mac native text to speech Quite happy with this :)

Hacked today: A realtime voice-to-voice assistant for my Mac that runs in the background and helps me be productive. Stack - RealtimeTTS package (for python) - Groq for fast inference - Mac native text to speech Quite happy with this :)

Paras Chopra

19,316 views • 2 years ago

Introducing Line by Cartesia: the modern voice agent development platform. Line was built to be code-first, because best-in-class products are built in code. ▶️ Watch us build an advanced voice agent with background reasoning in just minutes.

Introducing Line by Cartesia: the modern voice agent development platform. Line was built to be code-first, because best-in-class products are built in code. ▶️ Watch us build an advanced voice agent with background reasoning in just minutes.

Cartesia

158,953 views • 11 months ago

Maven Voice is our enterprise-grade AI voice agent built for the chaotic, noisy and unpredictable situations we encounter everyday. Thank you to OpenAI, Phonic, , and for helping make this possible.

Maven Voice is our enterprise-grade AI voice agent built for the chaotic, noisy and unpredictable situations we encounter everyday. Thank you to OpenAI, Phonic, , and for helping make this possible.

Sami Shalabi

16,315 views • 11 months ago