
Sudarshan Kamath
@kamath_sutra • 21,694 subscribers
Building https://t.co/CwCpxzKtgM | Trying to do non-trivial things, maybe solve Death | 🌍 citizen
Shorts
Videos

She deserves her moment. So do your customers. Let humans be humans. Try Smallest AI Voice Agents!
Sudarshan Kamath1,712,344 просмотров • 16 дней назад

OpenAI's S2S preview is polished but it still thinks in steps. Speech → text → model → text → speech. That's not how humans converse. Introducing Hydra. A native speech-to-speech model that doesn't wait for turn-taking, doesn't flatten emotion into text, and doesn't break when you interrupt it mid-sentence. Hydra reasons asynchronously, speaks and listens simultaneously, and preserves emotion because it never leaves the audio domain. It's still in beta, but the shift is obvious. If you want early access, the link is in the comments. Here's a preview of what that looks like -
Sudarshan Kamath328,580 просмотров • 3 месяцев назад

Fundraise Announcement! We are proud to announce that we at smallest.ai have raised 8M USD in an oversubscribed seed round led by SIERRA Ventures with participation from 3one4 Capital, Better, Upsparks Capital, Schema Ventures, DeVC, Tiny Supercomputer Investment Company, peercheque, shyamal, Mission Street Capital, Boris Wertz, raveen, and many more angels to build the future of Enterprise Voice AI! We are transforming voice AI from the ground up - pushing agents to pass the Turing test in the coming years. We are also excited to build a research-first organization - diving into the depths of problems from first principles and doing things no one in the world is even thinking about. It was our dream to build a scientific team that pushes the boundaries of the frontier. We are privileged to have the support of some of the best early backers a founder could ever ask for. Thank you, everyone for supporting us - this is just the beginning!
Sudarshan Kamath679,595 просмотров • 7 месяцев назад

Introducing Lightning V3 - it beats every model we tested against. ElevenLabs, Cartesia, OpenAI. Lightning sets a new SOTA with V3 in conversational text-to-speech. → Highest MOS score for conversational TTS at 3.9 → ~76% win rate vs gpt-4o-mini-tts on naturalness → 15 languages with mid-sentence code-switching → Built from scratch for voice agents, not read-aloud Every TTS model sounds clean in a demo. You type a sentence and you get beautiful audio. Voice agents don't work that way. They stream. They're generating audio in real-time chunks with half the context missing. That's where everything breaks. A great reading voice and a great conversational voice are fundamentally different things. A conversational voice has to sound like it's thinking - with the pauses, the rhythm shifts, the reactions. It has to handle the way real people actually talk, including switching languages mid-sentence. That's what V3 does. V3.1 also ships voice cloning. 5 to 15 seconds of audio, no fine-tuning, production-grade clone across 15 languages. Blog link in the comments.
Sudarshan Kamath71,172 просмотров • 2 месяцев назад

Keeps impressing - honestly, I can't believe how human it sounds!
Sudarshan Kamath16,594 просмотров • 2 месяцев назад

Announcing... Voice x Memory! We’re unpacking what makes agents listen, respond, and remember, or sometimes forget, and what that means for building better voice systems. We will move towards a world of large LLMs remembering a lot of information to smaller LMs with finite real-time intelligence and infinite memory. Here's a short teaser as we walk down the streets of SF having a chat with @shortkingceo on how we see the future
Sudarshan Kamath23,452 просмотров • 3 месяцев назад
Больше нет контента для загрузки