Загрузка видео...
Не удалось загрузить видео
At Standard Intelligence we’ve been researching scalable cross-modality learning. We’re excited to share some early results in the form of 𝗵𝗲𝗿𝘁𝘇-𝗱𝗲𝘃, an open-source, first-of-its-kind base model for full-duplex conversational audio. 1/
178,771 просмотров • 1 год назад •via X (Twitter)
Комментарии: 10

Hertz-dev is an 8.5B parameter transformer trained on 20 million unique hours of high-quality audio data. We’ve released checkpoints and code for both mono and full-duplex generation on our website under the Apache license.

Hertz-dev is a base model, without fine-tuning, RLHF, or instruction-following behavior. It can be fine-tuned by researchers for almost 𝘢𝘯𝘺 audio modeling task, from live translation to classification.

Base models excel at faithfully modeling their training set, and accurate maps come from contact with reality. From the world’s largest dataset of high-quality real-world conversational audio, hertz-dev learned human-like speech patterns such as pauses and emotional inflections.

Hertz-dev has a 80ms theoretical average latency, and benchmarks 120ms real-world latency on a single RTX 4090—1.5-2x lower than the previous state of the art. Low latency is necessary for natural audio, and we're proud to move the field in this direction.

We’re currently training a scaled, 70B parameter version of Hertz, and we’ll be expanding to more modalities in the future. We’re excited to see what the research community builds on top of this model.

This is impressive! Seems like the training dataset is mostly podcast? And FYI, I believe there’s also a fully-duplex vision/audio model out there, would be interested in learning more about the implementation!

cool project! would love to see our base model used in projects like this one

i love small business sunday

small. business. sunday.

so happy we got this out. base models are very important research artifacts to have publicly available, and i'm glad to help ensure that they exist further into the timeline:)

