Xenova's banner

Xenova

@xenovacom • 20,221 subscribers

Bringing the power of machine learning to the web. Currently working on Transformers.js (@huggingface 🤗)

Shorts

Opus 4.7 just wrote a custom WebGPU kernel that runs Qwen3.5 up to 13x faster using a fused LinearAttention op! 🤯 Agentic kernel optimization is the future. Now live in 🤗 Transformers.js v4.2.0! P.S. I've updated all our previous demos to use this new version. Enjoy!

Opus 4.7 just wrote a custom WebGPU kernel that runs Qwen3.5 up to 13x faster using a fused LinearAttention op! 🤯 Agentic kernel optimization is the future. Now live in 🤗 Transformers.js v4.2.0! P.S. I've updated all our previous demos to use this new version. Enjoy!

78,136 Aufrufe

There has been a huge debate recently about the best approach for image background removal. Here's my attempt: - In-browser inference w/ 🤗 Transformers.js - WebGPU accelerated (fast!) - Costs $0 (no image hosting or server processing) - No data leaves your device (privacy!)

There has been a huge debate recently about the best approach for image background removal. Here's my attempt: - In-browser inference w/ 🤗 Transformers.js - WebGPU accelerated (fast!) - Costs $0 (no image hosting or server processing) - No data leaves your device (privacy!)

419,020 Aufrufe

Introducing 🤗 Transformers.js v4: state-of-the-art machine learning for the web! 🚀 New WebGPU backend (browser, Node.js, Bun, Deno) ⚡️ Huge performance improvements 🤯 Support for over 200 architectures 🛠️ Complete codebase refactor Learn more about our biggest release yet! 👇

Introducing 🤗 Transformers.js v4: state-of-the-art machine learning for the web! 🚀 New WebGPU backend (browser, Node.js, Bun, Deno) ⚡️ Huge performance improvements 🤯 Support for over 200 architectures 🛠️ Complete codebase refactor Learn more about our biggest release yet! 👇

72,710 Aufrufe

Inspired by Andrej Karpathy's microgpt, I built microgpt.js: a JavaScript port that runs entirely in your browser! It's an exact numerical implementation, so the randomness and outputs match bit-for-bit! Try it out yourself and train your own GPT by simply opening a webpage! 👇

Inspired by Andrej Karpathy's microgpt, I built microgpt.js: a JavaScript port that runs entirely in your browser! It's an exact numerical implementation, so the randomness and outputs match bit-for-bit! Try it out yourself and train your own GPT by simply opening a webpage! 👇

48,392 Aufrufe

Run OpenAI's new Whisper Turbo model 100% locally in your browser with Transformers.js! ⚡️ Transcribe 2 minutes of audio in ~12 seconds! 🤯 Demo + source code 👇

Run OpenAI's new Whisper Turbo model 100% locally in your browser with Transformers.js! ⚡️ Transcribe 2 minutes of audio in ~12 seconds! 🤯 Demo + source code 👇

137,002 Aufrufe

Meta's Segment Anything Model (SAM) can now run in your browser w/ WebGPU (+ fp16), meaning up to 8x faster image encoding (10s → 1.25s)! 🤯⚡️ Video is not sped up! Everything runs 100% locally thanks to 🤗 Transformers.js and onnxruntime-web! 🔗 Demo:

Meta's Segment Anything Model (SAM) can now run in your browser w/ WebGPU (+ fp16), meaning up to 8x faster image encoding (10s → 1.25s)! 🤯⚡️ Video is not sped up! Everything runs 100% locally thanks to 🤗 Transformers.js and onnxruntime-web! 🔗 Demo:

120,352 Aufrufe

I'm excited to announce that Transformers.js V3 is finally available on NPM! 🔥 State-of-the-art Machine Learning for the web, now with WebGPU support! 🤯⚡️ Install it from NPM with: 𝚗𝚙𝚖 𝚒 @𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎/𝚝𝚛𝚊𝚗𝚜𝚏𝚘𝚛𝚖𝚎𝚛𝚜 or via CDN (example below) 👇

I'm excited to announce that Transformers.js V3 is finally available on NPM! 🔥 State-of-the-art Machine Learning for the web, now with WebGPU support! 🤯⚡️ Install it from NPM with: 𝚗𝚙𝚖 𝚒 @𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎/𝚝𝚛𝚊𝚗𝚜𝚏𝚘𝚛𝚖𝚎𝚛𝚜 or via CDN (example below) 👇

87,354 Aufrufe

IBM just released Granite 4.0 1B Speech, a compact and efficient speech-language model, designed for multilingual speech recognition and bidirectional speech translation. New #1 on the OpenASR leaderboard! It can even run in your browser on WebGPU, thanks to 🤗 Transformers.js

IBM just released Granite 4.0 1B Speech, a compact and efficient speech-language model, designed for multilingual speech recognition and bidirectional speech translation. New #1 on the OpenASR leaderboard! It can even run in your browser on WebGPU, thanks to 🤗 Transformers.js

21,506 Aufrufe

WOW! 🤯 Language models are becoming smaller and more capable than ever! Here's SmolLM2 running 100% locally in-browser w/ WebGPU on a 6-year-old GPU. Look at that speed! ⚡️😍 Powered by 🤗 Transformers.js and ONNX Runtime Web! How many tokens/second do you get? Let me know! 👇

WOW! 🤯 Language models are becoming smaller and more capable than ever! Here's SmolLM2 running 100% locally in-browser w/ WebGPU on a 6-year-old GPU. Look at that speed! ⚡️😍 Powered by 🤗 Transformers.js and ONNX Runtime Web! How many tokens/second do you get? Let me know! 👇

12,557 Aufrufe

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Bonsai 27B just changed the local LLM game forever. 1-bit quantization shrinks it from 54GB to just 3.8GB (-93%), while retaining 90% of its intelligence. That's insane. With custom WebGPU kernels written by Fable 5 and GPT 5.6 Sol, the model now runs locally in your browser!

Bonsai 27B just changed the local LLM game forever. 1-bit quantization shrinks it from 54GB to just 3.8GB (-93%), while retaining 90% of its intelligence. That's insane. With custom WebGPU kernels written by Fable 5 and GPT 5.6 Sol, the model now runs locally in your browser!

202,769 Aufrufe • vor 4 Tagen

I gave Fable 5 one job: write custom WebGPU kernels for Gemma 4 inference. It climbed to 84 tok/s, then hit a wall, insisting further optimization was impossible. Hours later, Anthropic rolled back invisible LLM development safeguards, and it hit 255 tok/s. The next day, access to Fable 5 was suspended globally.

I gave Fable 5 one job: write custom WebGPU kernels for Gemma 4 inference. It climbed to 84 tok/s, then hit a wall, insisting further optimization was impossible. Hours later, Anthropic rolled back invisible LLM development safeguards, and it hit 255 tok/s. The next day, access to Fable 5 was suspended globally.

1,164,466 Aufrufe • vor 1 Monat

Before Fable 5 was shut down, it pushed Gemma 4 to 255 tok/s on WebGPU. Some didn't believe it was real. Today we're releasing the demo and kernels it wrote for you to see yourself. Run it locally in your browser. Agentic kernel optimization is the future of on-device inference

Before Fable 5 was shut down, it pushed Gemma 4 to 255 tok/s on WebGPU. Some didn't believe it was real. Today we're releasing the demo and kernels it wrote for you to see yourself. Run it locally in your browser. Agentic kernel optimization is the future of on-device inference

481,489 Aufrufe • vor 1 Monat

While we eagerly await Fable 5's return, our agentic WebGPU kernel optimization framework kept running. Opus 4.8 picked up where Fable left off, pushing Liquid AI's new LFM2.5 230M to an unbelievable 1,400 tok/s... running locally in your browser. Don't blink or you'll miss it.

While we eagerly await Fable 5's return, our agentic WebGPU kernel optimization framework kept running. Opus 4.8 picked up where Fable left off, pushing Liquid AI's new LFM2.5 230M to an unbelievable 1,400 tok/s... running locally in your browser. Don't blink or you'll miss it.

173,590 Aufrufe • vor 23 Tagen

WTF?! This changes image generation forever! 🤯 PrismML just released Binary and Ternary Bonsai Image 4B! That's right, 1-bit diffusion models are here. Only ~3GB in size (FLUX.2 Klein 4B is 16GB). The most shocking part? It can run 100% locally in your browser. Try it now! 👇

WTF?! This changes image generation forever! 🤯 PrismML just released Binary and Ternary Bonsai Image 4B! That's right, 1-bit diffusion models are here. Only ~3GB in size (FLUX.2 Klein 4B is 16GB). The most shocking part? It can run 100% locally in your browser. Try it now! 👇

181,784 Aufrufe • vor 1 Monat

NEW: OpenAI releases Privacy Filter, their first open model of 2026! 🤗 Apache-2.0! It's a bidirectional token-classification adaptation of GPT-OSS, trained to mask personally identifiable information (PII) in text. At only 1.5B params, it can even run locally in your browser!

NEW: OpenAI releases Privacy Filter, their first open model of 2026! 🤗 Apache-2.0! It's a bidirectional token-classification adaptation of GPT-OSS, trained to mask personally identifiable information (PII) in text. At only 1.5B params, it can even run locally in your browser!

220,000 Aufrufe • vor 2 Monaten

Behold... GPT-OSS (20B) running 100% locally in your browser on WebGPU. This shouldn't be possible — but with Transformers.js v4 and ONNX Runtime Web, it is! A new class of AI apps is emerging. Zero-install, infinite distribution. Simply visit a website and run models locally.

Behold... GPT-OSS (20B) running 100% locally in your browser on WebGPU. This shouldn't be possible — but with Transformers.js v4 and ONNX Runtime Web, it is! A new class of AI apps is emerging. Zero-install, infinite distribution. Simply visit a website and run models locally.

311,512 Aufrufe • vor 5 Monaten

The era of 1-bit LLMs is here — now with WebGPU acceleration! 🤯 It's incredible to think that a quantized 1.7B model (just 290MB in size) can run at ~100 tokens per second entirely in your browser. Try the demo yourself 👇

The era of 1-bit LLMs is here — now with WebGPU acceleration! 🤯 It's incredible to think that a quantized 1.7B model (just 290MB in size) can run at ~100 tokens per second entirely in your browser. Try the demo yourself 👇

105,316 Aufrufe • vor 3 Monaten

NEW: Mistral AI releases Mistral 3, a family of multimodal models, including three start-of-the-art dense models (3B, 8B, and 14B) and Mistral Large 3 (675B, 41B active). All Apache 2.0! 🤗 Surprisingly, the 3B is small enough to run 100% locally in your browser on WebGPU! 🤯

NEW: Mistral AI releases Mistral 3, a family of multimodal models, including three start-of-the-art dense models (3B, 8B, and 14B) and Mistral Large 3 (675B, 41B active). All Apache 2.0! 🤗 Surprisingly, the 3B is small enough to run 100% locally in your browser on WebGPU! 🤯

225,128 Aufrufe • vor 7 Monaten

Chrome's new `window.ai` feature is going to change the web forever! 🤯 It allows you to run Gemini Nano, a powerful 3.25B parameter LLM, 100% locally in your browser! We've also added experimental support to 🤗 Transformers.js, making it super easy to use! 😍 Check it out! 👇

Chrome's new `window.ai` feature is going to change the web forever! 🤯 It allows you to run Gemini Nano, a powerful 3.25B parameter LLM, 100% locally in your browser! We've also added experimental support to 🤗 Transformers.js, making it super easy to use! 😍 Check it out! 👇

581,694 Aufrufe • vor 2 Jahren

NEW: Alibaba just released Qwen 3.5 Small — a family of powerful multimodal models available in a range of sizes (0.8B, 2B, 4B, and 9B parameters). Perfect for on-device applications! They can even run 100% locally in your browser on WebGPU, powered by Transformers.js! 🤯

NEW: Alibaba just released Qwen 3.5 Small — a family of powerful multimodal models available in a range of sizes (0.8B, 2B, 4B, and 9B parameters). Perfect for on-device applications! They can even run 100% locally in your browser on WebGPU, powered by Transformers.js! 🤯

102,592 Aufrufe • vor 4 Monaten

Introducing Voxtral WebGPU: Real-time speech transcription entirely in your browser. This demo runs Voxtral-Mini-4B, a powerful streaming ASR model from Mistral AI, locally on WebGPU. The model supports 13 languages and is capable of <500 ms latency. Fully private. Zero cost.

Introducing Voxtral WebGPU: Real-time speech transcription entirely in your browser. This demo runs Voxtral-Mini-4B, a powerful streaming ASR model from Mistral AI, locally on WebGPU. The model supports 13 languages and is capable of <500 ms latency. Fully private. Zero cost.

94,356 Aufrufe • vor 4 Monaten

Okay, this is actually insane... You can now run LFM2.5-1.2B-Thinking (a 1.2B parameter LLM from @LiquidAI) at over 200 tokens per second directly in your browser on WebGPU! 🤯 Zero install. Fully private. Blazingly fast. Powered by Transformers.js and ONNX Runtime Web

Okay, this is actually insane... You can now run LFM2.5-1.2B-Thinking (a 1.2B parameter LLM from @LiquidAI) at over 200 tokens per second directly in your browser on WebGPU! 🤯 Zero install. Fully private. Blazingly fast. Powered by Transformers.js and ONNX Runtime Web

103,001 Aufrufe • vor 4 Monaten

I think Reachy is the one who needs chess lessons… 😅 Robotics meets WebAI: Gemma 4 running fully offline on WebGPU with Transformers.js, controlling Reachy Mini over WebSerial. No internet, just a browser and a USB-C cable. What should Reachy play next?

I think Reachy is the one who needs chess lessons… 😅 Robotics meets WebAI: Gemma 4 running fully offline on WebGPU with Transformers.js, controlling Reachy Mini over WebSerial. No internet, just a browser and a USB-C cable. What should Reachy play next?

42,407 Aufrufe • vor 2 Monaten

RF-DETR, the state-of-the-art model series for real-time object detection, can now run 100% locally in your browser on WebGPU with 🤗 Transformers.js v4! The models are Apache-2.0 licensed, making them a perfect fit for both personal and commercial applications. Try the demo 👇

RF-DETR, the state-of-the-art model series for real-time object detection, can now run 100% locally in your browser on WebGPU with 🤗 Transformers.js v4! The models are Apache-2.0 licensed, making them a perfect fit for both personal and commercial applications. Try the demo 👇

77,101 Aufrufe • vor 5 Monaten

It's finally possible: real-time in-browser speech recognition with OpenAI Whisper! 🤯 The model runs fully on-device using Transformers.js and ONNX Runtime Web, and supports multilingual transcription across 100 different languages! 🔥 Check out the demo (+ source code)! 👇

It's finally possible: real-time in-browser speech recognition with OpenAI Whisper! 🤯 The model runs fully on-device using Transformers.js and ONNX Runtime Web, and supports multilingual transcription across 100 different languages! 🔥 Check out the demo (+ source code)! 👇

262,776 Aufrufe • vor 2 Jahren

Not enough people are talking about NVIDIA's new Nemotron-3-Nano (4B) model! 🤯 Hybrid Mamba + Attention architecture, designed as a unified model for reasoning and non-reasoning tasks. So small and efficient, it can run 100% locally in your web browser at 75 tokens per second.

Not enough people are talking about NVIDIA's new Nemotron-3-Nano (4B) model! 🤯 Hybrid Mamba + Attention architecture, designed as a unified model for reasoning and non-reasoning tasks. So small and efficient, it can run 100% locally in your web browser at 75 tokens per second.

50,063 Aufrufe • vor 4 Monaten

BOOM! 💥 Today I added WebGPU support for Andrej Karpathy's nanochat models, meaning they can run 100% locally in your browser (no server)! The d32 version runs at over 50 tps on my M4 Max 🚀 Pretty wild that you can now deploy AI applications using just a single index.html file 😅

BOOM! 💥 Today I added WebGPU support for Andrej Karpathy's nanochat models, meaning they can run 100% locally in your browser (no server)! The d32 version runs at over 50 tps on my M4 Max 🚀 Pretty wild that you can now deploy AI applications using just a single index.html file 😅

95,454 Aufrufe • vor 9 Monaten

Real-time video captioning in your browser with @LiquidAI's LFM2-VL model on WebGPU. Sending every frame to a server was never going to be the answer. Imagine the bandwidth, latency and cost. Local inference. No server costs. Infinitely scalable. This is the way.

Real-time video captioning in your browser with @LiquidAI's LFM2-VL model on WebGPU. Sending every frame to a server was never going to be the answer. Imagine the bandwidth, latency and cost. Local inference. No server costs. Infinitely scalable. This is the way.

48,751 Aufrufe • vor 4 Monaten

NEW: Google releases Gemma 4, their most capable open models yet! 🤯 Apache-2.0, multimodal (text, image, and audio input), and multilingual (140 languages)! They can even run 100% locally in your browser on WebGPU. Watch it describe the Artemis II launch! 🚀 Try the demo! 👇

NEW: Google releases Gemma 4, their most capable open models yet! 🤯 Apache-2.0, multimodal (text, image, and audio input), and multilingual (140 languages)! They can even run 100% locally in your browser on WebGPU. Watch it describe the Artemis II launch! 🚀 Try the demo! 👇

38,427 Aufrufe • vor 3 Monaten