AI was trained on the open internet, but the data that matters most lives in the real world. Introducing early access to Numo, an app built to collect the next generation of AI training data. Starting with voice data collection in Bengali, Hindi, Tamil, and Telugu. Details ↴
1,237,650 görüntüleme
Voice AI has an evaluation problem. Models look strong on public benchmarks, then collapse on real-world audio. Introducing a recipe-driven evaluation framework for low-resource languages, real-world audio, and production failure modes. Details ↓
29,279 görüntüleme
Most teams collecting voice data optimize for volume over quality, partly because they’re measuring quality wrong. To help evaluate quality we created the Poseidon Score. When applied, single-speaker audio scored well while multi-speaker conversations scored worse. Why? ↓
28,349 görüntüleme
Poseidon needs voice data and reliable ground truth in low-resource languages to benchmark against. To ensure LLM transcript accuracy, we worked with linguists to audit Bengali outputs. For a language spoken by 280M people, the gaps we found point to a deeper issue: data ↓
22,705 görüntüleme
Conversational voice data exposes a subtle failure mode in many ASR pipelines. The metric says the data is bad, but human reviewers say the audio is clear. Often the issue isn’t transcription quality itself, but that the evaluation stack wasn’t built for turn-taking, silence, and multi-speaker structure. Learn how to solve this in our latest blog: