Loading video...
Video Failed to Load
It's finally possible: real-time in-browser speech recognition with OpenAI Whisper! 🤯 The model runs fully on-device using Transformers.js and ONNX Runtime Web, and supports multilingual transcription across 100 different languages! 🔥 Check out the demo (+ source code)! 👇
261,247 views • 2 years ago •via X (Twitter)
10 Comments

Source code: Demo:

UPDATE: I added WebGPU support to Whisper Web! 😍

ok browsers have been able to do this forever tho

Very nice!!! Is this a diff incarnation to the whisper web turbo that came out before? I thought that was also faster than real-time no? Or is it the streaming ability that is new here Vs rec and sending to model?

This uses Transformers.js (+ ONNX Runtime Web) vs. @fleetwood___'s Ratchet library. His version would certainly be able to run in real-time too though... and is still on his TODO list I'm sure! 😉

Does this work on mobile?

@cocktailpeanut

Nice! How much extra memory does it take?

👏👏👏 Is it possible to add an initial prompt/prefix for custom terms/hints?

Definitely possible - it would just require updating the initial tokens passed to the decoder. Do you have an example (in python?) for this I can take a look at? Feel free to open a feature request on GitHub so I can track this easier.

