LiveKit's banner

LiveKit

@livekit • 10,049 subscribers

Open source framework and cloud platform for building voice, video, and physical AI agents. https://t.co/OWLvFH82oN

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

We learn to speak before we learn to read. Voice is the most natural interface we have. We just raised a $100M to make building voice AI as easy as a web app.

We learn to speak before we learn to read. Voice is the most natural interface we have. We just raised a $100M to make building voice AI as easy as a web app.

231,482 Aufrufe • vor 6 Monaten

Introducing Agents UI, an open-source shadcn component library for building polished React frontends for your voice agents. Audio visualizers. Media controls. Session management tools. Chat transcripts. All wired to LiveKit Agents. Install via the shadcn CLI and own the code.

Introducing Agents UI, an open-source shadcn component library for building polished React frontends for your voice agents. Audio visualizers. Media controls. Session management tools. Chat transcripts. All wired to LiveKit Agents. Install via the shadcn CLI and own the code.

182,887 Aufrufe • vor 4 Monaten

Grok's Text to Speech API is now available in LiveKit Inference. Natural, expressive voices with low-latency streaming. Multilingual in 20+ languages. Telephony and production-ready out of the box. One API key. No extra setup. →

Grok's Text to Speech API is now available in LiveKit Inference. Natural, expressive voices with low-latency streaming. Multilingual in 20+ languages. Telephony and production-ready out of the box. One API key. No extra setup. →

159,093 Aufrufe • vor 4 Monaten

We built a live multilingual, multi-person video call with Gemini 3.5 Live Translate on LiveKit. Everyone picks their language, speaks naturally, and hears each other in real time in their language of choice. Watch the demo and check out the open source repo:

We built a live multilingual, multi-person video call with Gemini 3.5 Live Translate on LiveKit. Everyone picks their language, speaks naturally, and hears each other in real time in their language of choice. Watch the demo and check out the open source repo:

21,575 Aufrufe • vor 1 Monat

How can a voice agent tell when you’re actually interrupting it? VAD is too sensitive—laughs, “mm-hmm,” or a sneeze shouldn’t stop the agent. We trained an audio model for adaptive interruption handling so agents can distinguish real interruptions from noise.

How can a voice agent tell when you’re actually interrupting it? VAD is too sensitive—laughs, “mm-hmm,” or a sneeze shouldn’t stop the agent. We trained an audio model for adaptive interruption handling so agents can distinguish real interruptions from noise.

43,832 Aufrufe • vor 4 Monaten

Gemini 3.1 Flash Live just dropped and it's available with LiveKit today. This is the first Gemini 3 native audio model on the Live API. Better instruction following, improved tool calling, reduced speaker drift, and support for 70+ languages. Audio in, audio out. No text conversion in between.

Gemini 3.1 Flash Live just dropped and it's available with LiveKit today. This is the first Gemini 3 native audio model on the Live API. Better instruction following, improved tool calling, reduced speaker drift, and support for 70+ languages. Audio in, audio out. No text conversion in between.

40,277 Aufrufe • vor 4 Monaten

Voice agents do not sound robotic because they are slow. They sound robotic because the model writes like an essay and then reads it out loud. We just shared a post on making STT to LLM to TTS sound human. Make the model more human by including ums, sos, real pauses, and even laughter tags. Tiny rhythm changes can make a huge difference.

Voice agents do not sound robotic because they are slow. They sound robotic because the model writes like an essay and then reads it out loud. We just shared a post on making STT to LLM to TTS sound human. Make the model more human by including ums, sos, real pauses, and even laughter tags. Tiny rhythm changes can make a huge difference.

46,010 Aufrufe • vor 4 Monaten

Today we’re launching our first homegrown AI model: an open source turn detection model for building voice agents. Instead of relying solely on voice activity detection (VAD), which only considers when a user is speaking, our model also considers what has and is being said in the context of a conversation and predicts when a user is finished expressing their thoughts before the agent responds. Conversations with AI voice agents using this new model flow much more naturally without constant interruptions from the AI— check it out (more videos, details, and code in the thread):

Today we’re launching our first homegrown AI model: an open source turn detection model for building voice agents. Instead of relying solely on voice activity detection (VAD), which only considers when a user is speaking, our model also considers what has and is being said in the context of a conversation and predicts when a user is finished expressing their thoughts before the agent responds. Conversations with AI voice agents using this new model flow much more naturally without constant interruptions from the AI— check it out (more videos, details, and code in the thread):

126,860 Aufrufe • vor 1 Jahr

You can now deploy AI voice agents to LiveKit Cloud. We handle: • Stateful load balancing • Capacity management • Draining and instant rollbacks • Operational observability

You can now deploy AI voice agents to LiveKit Cloud. We handle: • Stateful load balancing • Capacity management • Draining and instant rollbacks • Operational observability

68,005 Aufrufe • vor 11 Monaten

We shipped LiveKit Turn Detector v1. Instead of reading transcripts, it listens to speech directly, combining semantic and acoustic cues into one end-of-turn prediction. The result: high accuracy, low latency—the best model we tested across 14 languages. Available on LiveKit Cloud.

We shipped LiveKit Turn Detector v1. Instead of reading transcripts, it listens to speech directly, combining semantic and acoustic cues into one end-of-turn prediction. The result: high accuracy, low latency—the best model we tested across 14 languages. Available on LiveKit Cloud.

11,606 Aufrufe • vor 1 Monat

Introducing LiveKit Inference — a new cloud service that gives you access to the most popular voice AI models with just your LiveKit API key. We manage rate limits for you, report on usage, and consolidate billing. All LiveKit Cloud plans now include free monthly inference credits. A single string update allows you to call models from: AssemblyAI Deepgram Google DeepMind Inworld AI OpenAI Rime

Introducing LiveKit Inference — a new cloud service that gives you access to the most popular voice AI models with just your LiveKit API key. We manage rate limits for you, report on usage, and consolidate billing. All LiveKit Cloud plans now include free monthly inference credits. A single string update allows you to call models from: AssemblyAI Deepgram Google DeepMind Inworld AI OpenAI Rime

37,173 Aufrufe • vor 9 Monaten

We shipped the tutorial for Agents UI. In 5 minutes you'll have a fully wired voice agent frontend with audio visualizers, media controls, and session management built directly into your codebase. Watch it, build it, own it. shadcn inside™.

We shipped the tutorial for Agents UI. In 5 minutes you'll have a fully wired voice agent frontend with audio visualizers, media controls, and session management built directly into your codebase. Watch it, build it, own it. shadcn inside™.

20,232 Aufrufe • vor 4 Monaten

Voice cloning is now available on LiveKit Inference. We’re launching with Inworld AI and Cartesia. Clone a voice once and use it across multiple TTS providers, with automatic fallback to the same voice if a provider fails mid-call. Free to create and available on all paid plans today.

Voice cloning is now available on LiveKit Inference. We’re launching with Inworld AI and Cartesia. Clone a voice once and use it across multiple TTS providers, with automatic fallback to the same voice if a provider fails mid-call. Free to create and available on all paid plans today.

11,218 Aufrufe • vor 2 Monaten

Add a face to your voice agent. LiveAvatar by HeyGen is now supported in LiveKit Agents. Add a realtime human avatar to your agent without rebuilding the conversation loop. Your LiveKit agent still owns the room, turn-taking, model orchestration, and voice pipeline. LiveAvatar renders the synchronized face and video stream. Useful for product demos, onboarding, tutoring, and support agents that need a visual layer.

10,761 Aufrufe • vor 2 Monaten

We shipped Agent Console, a realtime debugging surface for voice agents. Talk to your agent and see the entire pipeline live, from audio and latency to tool calls, transcripts, and participant state. Available now in the LiveKit Cloud dashboard.

We shipped Agent Console, a realtime debugging surface for voice agents. Talk to your agent and see the entire pipeline live, from audio and latency to tool calls, transcripts, and participant state. Available now in the LiveKit Cloud dashboard.

11,915 Aufrufe • vor 3 Monaten

xAI STT is live. You can now run a complete cascaded voice agent pipeline on xAI (STT + Grok + TTS) through LiveKit Inference with one API key, giving you more control, full visibility, and easy component swaps.

xAI STT is live. You can now run a complete cascaded voice agent pipeline on xAI (STT + Grok + TTS) through LiveKit Inference with one API key, giving you more control, full visibility, and easy component swaps.

10,544 Aufrufe • vor 3 Monaten

We launched livekit-wakeword, an open-source library that lets you train a custom wake word model from scratch with a single command. It handles synthetic data generation, augmentation, training, and ONNX export all in one shot.

We launched livekit-wakeword, an open-source library that lets you train a custom wake word model from scratch with a single command. It handles synthetic data generation, augmentation, training, and ONNX export all in one shot.

10,104 Aufrufe • vor 3 Monaten

Keine weiteren Inhalte verfügbar