正在加载视频...

视频加载失败

It's finally possible: real-time in-browser speech recognition with OpenAI Whisper! 🤯 The model runs fully on-device using Transformers.js and ONNX Runtime Web, and supports multilingual transcription across 100 different languages! 🔥 Check out the demo (+ source code)! 👇

261,247 次观看 • 2 年前 •via X (Twitter)

10 条评论

Xenova 的头像
Xenova2 年前

Source code: Demo:

Xenova 的头像
Xenova2 年前

UPDATE: I added WebGPU support to Whisper Web! 😍

Mike Young 的头像
Mike Young2 年前

ok browsers have been able to do this forever tho

Jason Mayes 的头像
Jason Mayes2 年前

Very nice!!! Is this a diff incarnation to the whisper web turbo that came out before? I thought that was also faster than real-time no? Or is it the streaming ability that is new here Vs rec and sending to model?

Xenova 的头像
Xenova2 年前

This uses Transformers.js (+ ONNX Runtime Web) vs. @fleetwood___'s Ratchet library. His version would certainly be able to run in real-time too though... and is still on his TODO list I'm sure! 😉

Mike Nolivos 的头像
Mike Nolivos2 年前

Does this work on mobile?

NOBODY 的头像
NOBODY2 年前

@cocktailpeanut

Nodus Labs 的头像
Nodus Labs2 年前

Nice! How much extra memory does it take?

Tom Bielecki 的头像
Tom Bielecki2 年前

👏👏👏 Is it possible to add an initial prompt/prefix for custom terms/hints?

Xenova 的头像
Xenova2 年前

Definitely possible - it would just require updating the initial tokens passed to the decoder. Do you have an example (in python?) for this I can take a look at? Feel free to open a feature request on GitHub so I can track this easier.

相关视频