Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

As promised. Write up on implementing and optimizing conversational agents: An open source repo which is a generic WebSocket server for low-latency conversational agents: And another demo

Sean Moriarity

3,519 subscribers

15,924 views • 2 years ago •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

10 Comments

Sean Moriarity2 years ago

With a LiveView implementation using client-side VAD I was able to consistently get 1300-1500 ms time to first spoken word with about 100 ms ping between my Mac and my GPU machine. Running locally on the GPU machine I could get 900-1000ms time to first spoken word.

Sean Moriarity2 years ago

I haven't fully benchmarked the WebSocket-based server, but it should be similar. It also has a much better VAD implementation, so it's not as broken as my last one.

Sean Moriarity2 years ago

Eventually I will get a version running on a Fly GPU with both a local Bumblebee LLM and Speech-to-text pipeline

Michał Śledź2 years ago

Really impressive! 👏 Would be awesome to try to run this over Elixir WebRTC instead of a WS. We did a demo, where we send a video from a web cam over WebRTC to the Pheonix app, feed it into Nx and perform image recognition. Here is the blog post:

Sean Moriarity2 years ago

Thanks for the suggestion! I’ll look into this!

Mohammed Zeeshan2 years ago

this is mindblowing stuff

Bill Tihen2 years ago

Wow - very cool

Colm Byrne2 years ago

From what you created it seems like Retell don't have much of a ring fence if it can be hacked together in a couple days. Thoughts?

Sean Moriarity2 years ago

Good question, apologies in advance for the long reply. I think they probably have the most complete and reliable product in the space I’ve seen in my limited exposure to it. I think that there’s a lot of tiny details that go into making conversations realistic, and if they iterate on that then they can put some distance between themselves and anybody else. this idea of hacking together 3 models has been “in the air” for awhile, and it’s not difficult to get your own working version up and running quickly, if you can accept a 70-80% solution. It’s esp compelling to build your own if you need it because their prices are kinda high, and I think you can save long term if you invest in it. Also, there are going to be a million open source versions of this exact thing popping up now that they’ve done their launch and set the standard I think if their target market is developers (which I believe it is) then they’re in a tough spot because I think you can build something comparable (not better!) that’s cheaper. To me what’s much more attractive is if they go after direct applications of conversational agents in market research, surveying, etc. and can capitalize quickly on having the best offering early. My feeling though is that actually would prefer their users to be the ones building integrations for specific niches on top of their platform so they can focus on improving the conversational experience. In that case I would be really nervous about a big AI research lab releasing a foundation model that’s either end-to-end or fuses parts of the pipeline more efficiently than they can. I got the sense their plan is to actually train their own models eventually, in which case they can capitalize on this head start, exposure, and data from early launch and maybe establish a much bigger lead than what they have now. Not sure how much funding they have but this would require a decent amount Sorry for the long answer, and take everything with a grain of salt because I have never run a startup before hahahaha

Holden Oullette2 years ago

I know I’m late on the draw about this, but if you’re trying to eek out every little bit of performance gains: there’s a change in the alpha version of Jason v1.5 that introduces an optional dep containing a Rust NIF for Jason.encode - increasing speeds 1.5x for most inputs

Related Videos

Conversational AI is here. Build AI agents that can speak in minutes with low latency, full configurability, and seamless scalability.

Conversational AI is here. Build AI agents that can speak in minutes with low latency, full configurability, and seamless scalability.

ElevenLabs

769,738 views • 1 year ago

We've added native, low-latency RAG to Conversational AI — enabling your voice agents to access and use large knowledge bases in real time.

We've added native, low-latency RAG to Conversational AI — enabling your voice agents to access and use large knowledge bases in real time.

ElevenLabs

23,107 views • 1 year ago

Use twilio and ElevenLabs to build outbound calling conversational AI agents. Full tutorial and source code below.

Use twilio and ElevenLabs to build outbound calling conversational AI agents. Full tutorial and source code below.

ElevenLabs Developers

33,414 views • 1 year ago

.Marc Benioff on customer service digital agents: AI agents operate at a higher level than chatbots, conversational and human-like. #DF24

.Marc Benioff on customer service digital agents: AI agents operate at a higher level than chatbots, conversational and human-like. #DF24

Vala Afshar

20,488 views • 1 year ago

Conversational AI makes building voice agents easier than ever before. Make a Conversational Agent for your business in seconds, with

Conversational AI makes building voice agents easier than ever before. Make a Conversational Agent for your business in seconds, with

ElevenLabs Developers

18,653 views • 10 months ago

Introducing AI Voice Agents: The All-In-One Platform for Voice AI Agents and Everything Audio! 🎙 Build conversational agents, clone voices, generate sounds & engage like never before 🤖 DEMO IS LIVE: 🌱 Start farming & earn rewards!

Introducing AI Voice Agents: The All-In-One Platform for Voice AI Agents and Everything Audio! 🎙 Build conversational agents, clone voices, generate sounds & engage like never before 🤖 DEMO IS LIVE: 🌱 Start farming & earn rewards!

AI Voice Agents | AIVA

95,503 views • 1 year ago

We’ve built a new CLI for managing conversational agents as code. It brings version control, programmability, and deeper integration into your existing workflows.

We’ve built a new CLI for managing conversational agents as code. It brings version control, programmability, and deeper integration into your existing workflows.

ElevenLabs

29,502 views • 10 months ago

NVIDIA just removed one of the biggest friction points in Voice AI. PersonaPlex-7B is an open-source, full-duplex conversational model. Free, open source (MIT), with open model weights on Hugging Face 🤗 Links to repo and weights in 🧵↓ The traditional ASR → LLM → TTS pipeline forces rigid turn-taking. It’s efficient, but it never feels natural. PersonaPlex-7B changes that. This NVIDIA model can listen and speak at the same time. It runs directly on continuous audio tokens with a dual-stream transformer, generating text and audio in parallel instead of passing control between components. That unlocks: → instant back-channel responses → interruptions that feel human → real conversational rhythm Persona control is fully zero-shot! If you’re building low-latency assistants or support agents, this is a big step forward 🔥

NVIDIA just removed one of the biggest friction points in Voice AI. PersonaPlex-7B is an open-source, full-duplex conversational model. Free, open source (MIT), with open model weights on Hugging Face 🤗 Links to repo and weights in 🧵↓ The traditional ASR → LLM → TTS pipeline forces rigid turn-taking. It’s efficient, but it never feels natural. PersonaPlex-7B changes that. This NVIDIA model can listen and speak at the same time. It runs directly on continuous audio tokens with a dual-stream transformer, generating text and audio in parallel instead of passing control between components. That unlocks: → instant back-channel responses → interruptions that feel human → real conversational rhythm Persona control is fully zero-shot! If you’re building low-latency assistants or support agents, this is a big step forward 🔥

Charly Wargnier

564,447 views • 4 months ago

Chat with AI agents directly on now live. ⚡ From ChatGPT to Anthropic's Claude, test conversational agents in real-time before integrating them. Try it now on marketplace Watch demo 👇

Chat with AI agents directly on now live. ⚡ From ChatGPT to Anthropic's Claude, test conversational agents in real-time before integrating them. Try it now on marketplace Watch demo 👇

Fetch.ai

16,572 views • 1 year ago

Hedera x AI Workshop: Leveraging MCP for Hedera Agents Michael covers MCP Servers, Conversational Agent configuration, and the upcoming HOL Desktop release. Explore below and sign up for a $1M Hackathon ↓

Hedera x AI Workshop: Leveraging MCP for Hedera Agents Michael covers MCP Servers, Conversational Agent configuration, and the upcoming HOL Desktop release. Explore below and sign up for a $1M Hackathon ↓

Hashgraph Online DAO

22,876 views • 9 months ago

Build a conversational voice bot with 1 second voice-to-voice latency with Modal, Pipecat AI, and open models. Modal works seamlessly with WebRTC, WebSockets, and tunneling to squash latency to an absolute minimum.

Build a conversational voice bot with 1 second voice-to-voice latency with Modal, Pipecat AI, and open models. Modal works seamlessly with WebRTC, WebSockets, and tunneling to squash latency to an absolute minimum.

Modal

29,559 views • 7 months ago

The future of commerce is conversational. Using Shopify Storefront MCP and ElevenLabs Agents, we can get a glimpse into the future of online shopping.

The future of commerce is conversational. Using Shopify Storefront MCP and ElevenLabs Agents, we can get a glimpse into the future of online shopping.

ElevenLabs

45,296 views • 7 months ago

Hi, we're Decagon :) The conversational AI platform to build, optimize, and scale AI agents that deliver concierge CX.

Hi, we're Decagon :) The conversational AI platform to build, optimize, and scale AI agents that deliver concierge CX.

Decagon

12,607 views • 7 months ago

We’ve added WebRTC support to ElevenLabs Conversational AI, enabling best-in-class echo cancellation and background noise removal for your AI agents.

We’ve added WebRTC support to ElevenLabs Conversational AI, enabling best-in-class echo cancellation and background noise removal for your AI agents.

ElevenLabs

27,705 views • 10 months ago

🤖 Agents SDK—our new open-source SDK for orchestrating multi-agent workflows, improving upon Swarm. Configure agents with built-in tools, hand off tasks, add safety guardrails, and visualize execution traces for debugging and optimizing performance.

🤖 Agents SDK—our new open-source SDK for orchestrating multi-agent workflows, improving upon Swarm. Configure agents with built-in tools, hand off tasks, add safety guardrails, and visualize execution traces for debugging and optimizing performance.

OpenAI Developers

164,359 views • 1 year ago

Conversational AI now supports Multivoice mode - letting AI agents switch voice and language mid-sentence. English-speaking agents can say Italian words in a native Italian voice or alternate between characters. Useful for language apps and multi-character audio experiences.

Conversational AI now supports Multivoice mode - letting AI agents switch voice and language mid-sentence. English-speaking agents can say Italian words in a native Italian voice or alternate between characters. Useful for language apps and multi-character audio experiences.

ElevenLabs

28,686 views • 1 year ago

📀 ElevenLabs just launched their conversational AI product, allowing you to set up voice agents with your own voice 🤯 Took me less than 10mins to set up, and is easily integrated with Supabase Auth & Edge Functions 🔥 Demo & code 👇

📀 ElevenLabs just launched their conversational AI product, allowing you to set up voice agents with your own voice 🤯 Took me less than 10mins to set up, and is easily integrated with Supabase Auth & Edge Functions 🔥 Demo & code 👇

Thor 雷神 ⚡️

30,006 views • 1 year ago

2️⃣ Graphiti MCP server Agents forget everything after each task. Graphiti MCP server lets Agents build and query temporally-aware knowledge graphs, which act as an Agent's memory! Check this👇

2️⃣ Graphiti MCP server Agents forget everything after each task. Graphiti MCP server lets Agents build and query temporally-aware knowledge graphs, which act as an Agent's memory! Check this👇

Avi Chawla

37,902 views • 1 year ago

Announcing Grok 4 Fire Enrich - an open source contact enrichment engine AI agents analyze any CSV and then automatically fill in missing data like key decision makers, company size, and more Orchestrated by @Grok 4 and powered by Firecrawl Demo and repo 👇

Announcing Grok 4 Fire Enrich - an open source contact enrichment engine AI agents analyze any CSV and then automatically fill in missing data like key decision makers, company size, and more Orchestrated by @Grok 4 and powered by Firecrawl Demo and repo 👇

Eric Ciarla (hiring)

28,374 views • 11 months ago