Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

It's finally possible: real-time in-browser speech recognition with OpenAI Whisper! 🤯 The model runs fully on-device using Transformers.js and ONNX Runtime Web, and supports multilingual transcription across 100 different languages! 🔥 Check out the demo (+ source code)! 👇

Xenova

18,701 subscribers

262,755 views • 2 years ago •via X (Twitter)

Science & Technology Gaming Education

Anya Rossi• Live Now

Private livecam show

10 Comments

Xenova2 years ago

Source code: Demo:

Xenova2 years ago

UPDATE: I added WebGPU support to Whisper Web! 😍

Mike Young2 years ago

ok browsers have been able to do this forever tho

Jason Mayes2 years ago

Very nice!!! Is this a diff incarnation to the whisper web turbo that came out before? I thought that was also faster than real-time no? Or is it the streaming ability that is new here Vs rec and sending to model?

Xenova2 years ago

This uses Transformers.js (+ ONNX Runtime Web) vs. @fleetwood___'s Ratchet library. His version would certainly be able to run in real-time too though... and is still on his TODO list I'm sure! 😉

Mike Nolivos2 years ago

Does this work on mobile?

NOBODY2 years ago

@cocktailpeanut

Nodus Labs2 years ago

Nice! How much extra memory does it take?

Tom Bielecki2 years ago

👏👏👏 Is it possible to add an initial prompt/prefix for custom terms/hints?

Xenova2 years ago

Definitely possible - it would just require updating the initial tokens passed to the decoder. Do you have an example (in python?) for this I can take a look at? Feel free to open a feature request on GitHub so I can track this easier.

Related Videos

Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser! 🚀 Faster and more accurate than Whisper 🔒 Privacy-focused (no data leaves your device) ⚡️ WebGPU accelerated (w/ WASM fallback) 🔥 Powered by ONNX Runtime Web and Transformers.js

Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser! 🚀 Faster and more accurate than Whisper 🔒 Privacy-focused (no data leaves your device) ⚡️ WebGPU accelerated (w/ WASM fallback) 🔥 Powered by ONNX Runtime Web and Transformers.js

Xenova

27,878 views • 1 year ago

Introducing Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running 100% locally in your browser thanks to 🤗 Transformers.js! This unlocks a world of possibilities for in-browser video editing! 🤯 What will you build? 😍 Demo (+ source code) 👇

Introducing Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running 100% locally in your browser thanks to 🤗 Transformers.js! This unlocks a world of possibilities for in-browser video editing! 🤯 What will you build? 😍 Demo (+ source code) 👇

Xenova

105,958 views • 1 year ago

Absolutely wild what's possible with Hugging Face's Transformers.js in the browser! 🤯 🎙️Realtime in-browser speech-to-text with OpenAI Whisper! 📡 Broadcast to subscribed clients with Supabase Realtime. 🌏 Translate to 200 languages with AI at Meta's NLLB-200! Source code 👇

Absolutely wild what's possible with Hugging Face's Transformers.js in the browser! 🤯 🎙️Realtime in-browser speech-to-text with OpenAI Whisper! 📡 Broadcast to subscribed clients with Supabase Realtime. 🌏 Translate to 200 languages with AI at Meta's NLLB-200! Source code 👇

Thor 雷神 ⚡️

52,386 views • 1 year ago

Introducing TTS WebGPU: The first ever text-to-speech web app built with WebGPU acceleration! 🔥 High-quality and natural speech generation that runs 100% locally in your browser, powered by OuteTTS and Transformers.js.🤗 Try it out yourself! Demo + source code below 👇

Introducing TTS WebGPU: The first ever text-to-speech web app built with WebGPU acceleration! 🔥 High-quality and natural speech generation that runs 100% locally in your browser, powered by OuteTTS and Transformers.js.🤗 Try it out yourself! Demo + source code below 👇

Xenova

19,666 views • 1 year ago

Introducing Voxtral WebGPU: Real-time speech transcription entirely in your browser. This demo runs Voxtral-Mini-4B, a powerful streaming ASR model from Mistral AI, locally on WebGPU. The model supports 13 languages and is capable of <500 ms latency. Fully private. Zero cost.

Introducing Voxtral WebGPU: Real-time speech transcription entirely in your browser. This demo runs Voxtral-Mini-4B, a powerful streaming ASR model from Mistral AI, locally on WebGPU. The model supports 13 languages and is capable of <500 ms latency. Fully private. Zero cost.

Xenova

94,356 views • 3 months ago

Florence-2, the new vision foundation model by Microsoft, can now run 100% locally in your browser on WebGPU, thanks to Transformers.js! 🤗🤯 It supports tasks like image captioning, optical character recognition, object detection, and many more! 😍 WOW! Demo (+ source code) 👇

Florence-2, the new vision foundation model by Microsoft, can now run 100% locally in your browser on WebGPU, thanks to Transformers.js! 🤗🤯 It supports tasks like image captioning, optical character recognition, object detection, and many more! 😍 WOW! Demo (+ source code) 👇

Xenova

88,747 views • 2 years ago

Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! 🤯 🗣️ Transcribe videos, meeting notes, songs and more 🔐 Runs on-device, meaning no data is sent to a server 🌎 Multilingual (8 languages) 🤗 Completely free (forever) & open source

Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! 🤯 🗣️ Transcribe videos, meeting notes, songs and more 🔐 Runs on-device, meaning no data is sent to a server 🌎 Multilingual (8 languages) 🤗 Completely free (forever) & open source

Xenova

19,969 views • 11 months ago

Behold... GPT-OSS (20B) running 100% locally in your browser on WebGPU. This shouldn't be possible — but with Transformers.js v4 and ONNX Runtime Web, it is! A new class of AI apps is emerging. Zero-install, infinite distribution. Simply visit a website and run models locally.

Behold... GPT-OSS (20B) running 100% locally in your browser on WebGPU. This shouldn't be possible — but with Transformers.js v4 and ONNX Runtime Web, it is! A new class of AI apps is emerging. Zero-install, infinite distribution. Simply visit a website and run models locally.

Xenova

311,512 views • 4 months ago

OmniParser, the new screen parsing tool from Microsoft (and #1 trending model on Hugging Face), can now run 100% locally in your browser with Transformers.js! 🤯 Who's going to be the first to turn this into a browser extension? 👀 Endless possibilities! Demo & code below! 👇

OmniParser, the new screen parsing tool from Microsoft (and #1 trending model on Hugging Face), can now run 100% locally in your browser with Transformers.js! 🤯 Who's going to be the first to turn this into a browser extension? 👀 Endless possibilities! Demo & code below! 👇

Xenova

64,553 views • 1 year ago

Okay, this is actually insane... You can now run LFM2.5-1.2B-Thinking (a 1.2B parameter LLM from @LiquidAI) at over 200 tokens per second directly in your browser on WebGPU! 🤯 Zero install. Fully private. Blazingly fast. Powered by Transformers.js and ONNX Runtime Web

Okay, this is actually insane... You can now run LFM2.5-1.2B-Thinking (a 1.2B parameter LLM from @LiquidAI) at over 200 tokens per second directly in your browser on WebGPU! 🤯 Zero install. Fully private. Blazingly fast. Powered by Transformers.js and ONNX Runtime Web

Xenova

103,001 views • 3 months ago

"Free, accurate voice transcription in the browser using Whisper" ⚡️ I'm happy to share a modern voice-to-text app (Say) that uses in-browser AI (Whisper and T5) to offer voice transcription, text summaries and note management. Everything is done locally in a privacy-friendly way. Built with React and Transformers.js by Xenova

"Free, accurate voice transcription in the browser using Whisper" ⚡️ I'm happy to share a modern voice-to-text app (Say) that uses in-browser AI (Whisper and T5) to offer voice transcription, text summaries and note management. Everything is done locally in a privacy-friendly way. Built with React and Transformers.js by Xenova

Addy Osmani

33,616 views • 1 year ago

Introducing SeamlessM4T, the first all-in-one, multilingual multimodal translation model. This single model can perform tasks across speech-to-text, speech-to-speech, text-to-text translation & speech recognition for up to 100 languages depending on the task. Details ⬇️

Introducing SeamlessM4T, the first all-in-one, multilingual multimodal translation model. This single model can perform tasks across speech-to-text, speech-to-speech, text-to-text translation & speech recognition for up to 100 languages depending on the task. Details ⬇️

AI at Meta

592,704 views • 2 years ago

DINOv3 is revolutionary: a new state-of-the-art vision backbone trained to produce rich, dense image features. I loved their demo video so much that I decided to re-create their visualization tool. Everything runs 100% in-browser with 🤗 Transformers.js! Demo + source code 👇

DINOv3 is revolutionary: a new state-of-the-art vision backbone trained to produce rich, dense image features. I loved their demo video so much that I decided to re-create their visualization tool. Everything runs 100% in-browser with 🤗 Transformers.js! Demo + source code 👇

Xenova

74,790 views • 10 months ago

NEW: Google releases EmbeddingGemma, a state-of-the-art multilingual embedding model perfect for on-device use cases! At only 308M params, the model can run 100% locally in your browser! 🤯 Explore your documents in an interactive 3D universe with our demo: "The Semantic Galaxy"

NEW: Google releases EmbeddingGemma, a state-of-the-art multilingual embedding model perfect for on-device use cases! At only 308M params, the model can run 100% locally in your browser! 🤯 Explore your documents in an interactive 3D universe with our demo: "The Semantic Galaxy"

Xenova

23,338 views • 9 months ago

Love is complicated. Understanding customers over the phone shouldn’t be. Meet Babel, our new real-time multilingual transcription engine, built to understand conversations across languages—because nothing should get lost in translation. Watch the demo, and happy (early) Valentine’s Day.

Love is complicated. Understanding customers over the phone shouldn’t be. Meet Babel, our new real-time multilingual transcription engine, built to understand conversations across languages—because nothing should get lost in translation. Watch the demo, and happy (early) Valentine’s Day.

Bland

17,103 views • 1 year ago

RF-DETR, the state-of-the-art model series for real-time object detection, can now run 100% locally in your browser on WebGPU with 🤗 Transformers.js v4! The models are Apache-2.0 licensed, making them a perfect fit for both personal and commercial applications. Try the demo 👇

RF-DETR, the state-of-the-art model series for real-time object detection, can now run 100% locally in your browser on WebGPU with 🤗 Transformers.js v4! The models are Apache-2.0 licensed, making them a perfect fit for both personal and commercial applications. Try the demo 👇

Xenova

77,063 views • 4 months ago

say hello to my little friend 🤖 3D model control (move, rotate, scale, animate) using hand gestures and voice commands, running in real-time in the browser! created with Three.js, Rosebud AI, mediapipe computer vision, and web speech API credit to Quaternius for the robot model 🔗 demo below

say hello to my little friend 🤖 3D model control (move, rotate, scale, animate) using hand gestures and voice commands, running in real-time in the browser! created with Three.js, Rosebud AI, mediapipe computer vision, and web speech API credit to Quaternius for the robot model 🔗 demo below

AA

30,775 views • 1 year ago

Is this the future of AI browser agents? 👀 WebGPU-accelerated reasoning LLMs are now supported in Transformers.js! 🤯 Here's MiniThinky-v2 (1B) running 100% locally in the browser at ~60 tps (no API calls)! I can't wait to see what you build with it! Demo + source code in 🧵👇

Is this the future of AI browser agents? 👀 WebGPU-accelerated reasoning LLMs are now supported in Transformers.js! 🤯 Here's MiniThinky-v2 (1B) running 100% locally in the browser at ~60 tps (no API calls)! I can't wait to see what you build with it! Demo + source code in 🧵👇

Xenova

81,346 views • 1 year ago

Introducing Phi-3 WebGPU, a private and powerful AI chatbot that runs locally in your browser, powered by 🤗 Transformers.js and onnxruntime-web! 🔒 On-device inference: no data sent to a server ⚡️ WebGPU-accelerated (> 20 t/s) 📥 Model downloaded once and cached Try it out! 👇

Introducing Phi-3 WebGPU, a private and powerful AI chatbot that runs locally in your browser, powered by 🤗 Transformers.js and onnxruntime-web! 🔒 On-device inference: no data sent to a server ⚡️ WebGPU-accelerated (> 20 t/s) 📥 Model downloaded once and cached Try it out! 👇

Xenova

117,665 views • 2 years ago