
Xenova
@xenovacom • 17,207 subscribers
Bringing the power of machine learning to the web. Currently working on Transformers.js (@huggingface 🤗)
Shorts
Videos

WTF?! This changes image generation forever! 🤯 PrismML just released Binary and Ternary Bonsai Image 4B! That's right, 1-bit diffusion models are here. Only ~3GB in size (FLUX.2 Klein 4B is 16GB). The most shocking part? It can run 100% locally in your browser. Try it now! 👇
Xenova176,220 Aufrufe • vor 8 Tagen

NEW: OpenAI releases Privacy Filter, their first open model of 2026! 🤗 Apache-2.0! It's a bidirectional token-classification adaptation of GPT-OSS, trained to mask personally identifiable information (PII) in text. At only 1.5B params, it can even run locally in your browser!
Xenova219,173 Aufrufe • vor 1 Monat

Behold... GPT-OSS (20B) running 100% locally in your browser on WebGPU. This shouldn't be possible — but with Transformers.js v4 and ONNX Runtime Web, it is! A new class of AI apps is emerging. Zero-install, infinite distribution. Simply visit a website and run models locally.
Xenova311,285 Aufrufe • vor 3 Monaten

NEW: Mistral AI releases Mistral 3, a family of multimodal models, including three start-of-the-art dense models (3B, 8B, and 14B) and Mistral Large 3 (675B, 41B active). All Apache 2.0! 🤗 Surprisingly, the 3B is small enough to run 100% locally in your browser on WebGPU! 🤯
Xenova225,018 Aufrufe • vor 6 Monaten

Introducing Voxtral WebGPU: Real-time speech transcription entirely in your browser. This demo runs Voxtral-Mini-4B, a powerful streaming ASR model from Mistral AI, locally on WebGPU. The model supports 13 languages and is capable of <500 ms latency. Fully private. Zero cost.
Xenova93,558 Aufrufe • vor 2 Monaten

NEW: Alibaba just released Qwen 3.5 Small — a family of powerful multimodal models available in a range of sizes (0.8B, 2B, 4B, and 9B parameters). Perfect for on-device applications! They can even run 100% locally in your browser on WebGPU, powered by Transformers.js! 🤯
Xenova102,359 Aufrufe • vor 3 Monaten

Okay, this is actually insane... You can now run LFM2.5-1.2B-Thinking (a 1.2B parameter LLM from @LiquidAI) at over 200 tokens per second directly in your browser on WebGPU! 🤯 Zero install. Fully private. Blazingly fast. Powered by Transformers.js and ONNX Runtime Web
Xenova103,001 Aufrufe • vor 3 Monaten

Chrome's new `window.ai` feature is going to change the web forever! 🤯 It allows you to run Gemini Nano, a powerful 3.25B parameter LLM, 100% locally in your browser! We've also added experimental support to 🤗 Transformers.js, making it super easy to use! 😍 Check it out! 👇
Xenova580,736 Aufrufe • vor 1 Jahr

RF-DETR, the state-of-the-art model series for real-time object detection, can now run 100% locally in your browser on WebGPU with 🤗 Transformers.js v4! The models are Apache-2.0 licensed, making them a perfect fit for both personal and commercial applications. Try the demo 👇
Xenova76,917 Aufrufe • vor 3 Monaten

Not enough people are talking about NVIDIA's new Nemotron-3-Nano (4B) model! 🤯 Hybrid Mamba + Attention architecture, designed as a unified model for reasoning and non-reasoning tasks. So small and efficient, it can run 100% locally in your web browser at 75 tokens per second.
Xenova50,063 Aufrufe • vor 2 Monaten

NEW: Google releases Gemma 4, their most capable open models yet! 🤯 Apache-2.0, multimodal (text, image, and audio input), and multilingual (140 languages)! They can even run 100% locally in your browser on WebGPU. Watch it describe the Artemis II launch! 🚀 Try the demo! 👇
Xenova38,427 Aufrufe • vor 2 Monaten

It's finally possible: real-time in-browser speech recognition with OpenAI Whisper! 🤯 The model runs fully on-device using Transformers.js and ONNX Runtime Web, and supports multilingual transcription across 100 different languages! 🔥 Check out the demo (+ source code)! 👇
Xenova261,247 Aufrufe • vor 2 Jahren

BOOM! 💥 Today I added WebGPU support for Andrej Karpathy's nanochat models, meaning they can run 100% locally in your browser (no server)! The d32 version runs at over 50 tps on my M4 Max 🚀 Pretty wild that you can now deploy AI applications using just a single index.html file 😅
Xenova95,096 Aufrufe • vor 7 Monaten

IBM just released Granite 4.0, their latest series of small language models! These models excel at agentic workflows (tool calling), document analysis, RAG, and more. 🚀 The "Micro" (3.4B) model can even run 100% locally in your browser on WebGPU, powered by 🤗 Transformers.js!
Xenova82,762 Aufrufe • vor 8 Monaten

NEW: Google releases FunctionGemma, a lightweight (270M), open foundation model built for creating specialized function calling models! 🤯 To test it out, I built a small game: use natural language to solve fun physics simulation puzzles, running 100% locally in your browser! 🕹️
Xenova58,113 Aufrufe • vor 5 Monaten

NEW: LiquidAI just released LFM2.5-350M, a tiny model that brings agentic AI and tool-calling capabilities to resource-constrained environments. 🤯 It can even run locally in your browser via WebGPU, serving as a powerful companion while you browse the web. Try the demo! 👇
Xenova25,735 Aufrufe • vor 2 Monaten