Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

Wait, that's JUST a 3B multimodal model understanding and generation model AND apache 2.0 licensed 🔥

Vaibhav (VB) Srivastav

31,993 subscribers

43,777 görüntüleme • 11 ay önce •via X (Twitter)

Bilim & Teknoloji Haberler & Politika Sanat

Anya Rossi• Live Now

Private livecam show

13 Yorum

Vaibhav (VB) Srivastav profil fotoğrafı

Vaibhav (VB) Srivastav11 ay önce

Try it out directly on their ZeroGPU demo here:

Igor Tarasenko profil fotoğrafı

Igor Tarasenko11 ay önce

how is it even legal?

Mario profil fotoğrafı

Mario11 ay önce

This looks extremly good

Pannous profil fotoğrafı

Pannous11 ay önce

is it good for anything other than style transfer?

Dodo profil fotoğrafı

Dodo11 ay önce

bro, we're in the future

Himanshu Kumar profil fotoğrafı

Himanshu Kumar11 ay önce

Open source could democratize access to powerful AI like this.

S..... profil fotoğrafı

S.....11 ay önce

tf this run on zero 😱😱😱😱😱

Rithesh profil fotoğrafı

Rithesh11 ay önce

I tried out a simple task and the results were horrible.

Tsukuyomi profil fotoğrafı

Tsukuyomi11 ay önce

ah, just a casual 3B multimodal model, huh? sounds like the future is here, just waiting to take over. let's hope it doesn't start plotting against us.

Ivan Fioravanti ᯅ profil fotoğrafı

Ivan Fioravanti ᯅ11 ay önce

DeepSeek-R1-0528-5bit on MLX pushing M3 Ultra 512GB to its limits! 501GB used mem visibile on mactop in the video! Context: 4K tokens Prompt: 190.29 t/s Gen: 11.37 t/s Peak Mem: 487.48 GB! THIS IS APPLE MLX!

Unsloth AI profil fotoğrafı

Unsloth AI11 ay önce

You can now fine-tune Gemma 3n for free with our notebook! Unsloth makes Google Gemma training 1.5x faster with 50% less VRAM and 5x longer context lengths - with no accuracy loss. Guide: GitHub: Colab:

Andrej Karpathy profil fotoğrafı

Andrej Karpathy11 ay önce

Love this project: nanoGPT -> recursive self-improvement benchmark. Good old nanoGPT keeps on giving and surprising :) - First I wrote it as a small little repo to teach people the basics of training GPTs. - Then it became a target and baseline for my port to direct C/CUDA re-implementation in llm.c. - Then that was modded (by @kellerjordan0 et al.) into a (small-scale) LLM research harness. People iteratively optimized the training so that e.g. reproducing GPT-2 (124M) performance takes not 45 min (original) but now only 3 min! - Now the idea is to use this process of optimizing the code as a benchmark for LLM coding agents. If humans can speed up LLM training from 45 to 3 minutes, how well do LLM Agents do, under different kinds of settings (e.g. with or without hints etc.)? (spoiler: in this paper, as a baseline and right now not that well, even with strong hints). The idea of recursive self-improvement has of course been around for a long time. My usual rant on it is that it's not going to be this thing that didn't exist and then suddenly exists. Recursive self-improvement has already begun a long time ago and is under-way today in a smooth, incremental way. First, even basic software tools (e.g. coding IDEs) fall into the category because they speed up programmers in building the N+1 version. Any of our existing software infrastructure that speeds up development (google search, git, ...) qualifies. And then if you insist on AI as a special and distinct, most programmers now already routinely use LLM code completion or code diffs in their own programming workflows, collaborating in increasingly larger chunks of functionality and experimentation. This amount of collaboration will continue to grow. It's worth also pointing out that nanoGPT is a super simple, tiny educational codebase (~750 lines of code) and for only the pretraining stage of building LLMs. Production-grade code bases are *significantly* (100-1000X?) bigger and more complex. But for the current level of AI capability, it is imo an excellent, interesting, tractable benchmark that I look forward to following.

steven profil fotoğrafı

steven11 ay önce

Gemma 3n just dropped — and now it’s easy to fine-tune it on text, audio and vision! 🔥 We just released full recipes to get you started!

Benzer Videolar

Thinking with Camera A Unified Multimodal Model for Camera-Centric Understanding and Generation

Thinking with Camera A Unified Multimodal Model for Camera-Centric Understanding and Generation

AK

39,318 görüntüleme • 8 ay önce

Alibaba just open-sourced Qwen 3.6-35B A3B 35B MoE model (only ~3B active), multimodal, agentic coding, Apache 2.0. This isn’t just efficient—it’s built for real workflows. Open models are catching up fast.

Alibaba just open-sourced Qwen 3.6-35B A3B 35B MoE model (only ~3B active), multimodal, agentic coding, Apache 2.0. This isn’t just efficient—it’s built for real workflows. Open models are catching up fast.

Mervin Praison

13,063 görüntüleme • 2 ay önce

Introducing Omni, one unified model can support any-to-any multimodal modeling, including multimodal understanding, image/video generation and editing, world modeling and 3D reconstruction. All in one that adopts standard mixture-of-experts arch with only 3B activations.

Introducing Omni, one unified model can support any-to-any multimodal modeling, including multimodal understanding, image/video generation and editing, world modeling and 3D reconstruction. All in one that adopts standard mixture-of-experts arch with only 3B activations.

Ceyuan Yang

31,091 görüntüleme • 1 ay önce

NEW: Kokoro 82M - APACHE 2.0 licensed, Text to Speech model, trained on < 100 hours of audio 🔥

NEW: Kokoro 82M - APACHE 2.0 licensed, Text to Speech model, trained on < 100 hours of audio 🔥

Vaibhav (VB) Srivastav

330,034 görüntüleme • 1 yıl önce

🎵 An open-source music generation model (ACE-Step) was just released by StepFun AI + ACE Studio! 📐 3.5B parameter open-weight model 🔓 Apache-2.0 license 🎙️ Supports lyric generation 🚀 Can generate 4m songs in just 20s on A100

🎵 An open-source music generation model (ACE-Step) was just released by StepFun AI + ACE Studio! 📐 3.5B parameter open-weight model 🔓 Apache-2.0 license 🎙️ Supports lyric generation 🚀 Can generate 4m songs in just 20s on A100

mrfakename

19,666 görüntüleme • 1 yıl önce

Command A+ from Cohere is out now :) its our best model yet and its open source apache 2.0

Command A+ from Cohere is out now :) its our best model yet and its open source apache 2.0

Nick Frosst

203,424 görüntüleme • 1 ay önce

Today we're releasing ZONOS2, our next-generation real-time TTS model with high-fidelity voice cloning. ZONOS2 is the most expressive open-source TTS model, released under Apache 2.0 and available on Zyphra Cloud on AMD. 🧵

Today we're releasing ZONOS2, our next-generation real-time TTS model with high-fidelity voice cloning. ZONOS2 is the most expressive open-source TTS model, released under Apache 2.0 and available on Zyphra Cloud on AMD. 🧵

Zyphra

329,828 görüntüleme • 6 gün önce

1/4 🚀We are launching Qwen-Image-2.0, a next-generation foundational image generation model. The key highlights of Qwen-Image-2.0 include: Professional Typography Rendering: Supports 1k-token instructions for direct generation of professional infographics, including PPTs, posters, comics, and more. Stronger Semantic Adherence: Native 2K resolution support for finely detailed realistic scenes, including people, nature, and architecture. Improved Text Rendering: Integrated understanding and generation capabilities, unifying image generation and editing in a single mode Lighter Model Architecture: Smaller model size with faster inference speed.

1/4 🚀We are launching Qwen-Image-2.0, a next-generation foundational image generation model. The key highlights of Qwen-Image-2.0 include: Professional Typography Rendering: Supports 1k-token instructions for direct generation of professional infographics, including PPTs, posters, comics, and more. Stronger Semantic Adherence: Native 2K resolution support for finely detailed realistic scenes, including people, nature, and architecture. Improved Text Rendering: Integrated understanding and generation capabilities, unifying image generation and editing in a single mode Lighter Model Architecture: Smaller model size with faster inference speed.

Tongyi Lab

164,097 görüntüleme • 4 ay önce

We're releasing Letta Code, a memory-first coding agent - open source (apache 2.0) - model agnostic - portable agent learning and memory

We're releasing Letta Code, a memory-first coding agent - open source (apache 2.0) - model agnostic - portable agent learning and memory

Letta

368,466 görüntüleme • 6 ay önce

Pretty Insane - SoTA Text to Speech model capable of English AND Hindi - 3B Llama backbone - Apache 2.0 licensed 🔥 > Sub 80 ms latency > Supports both English, Hindi including code-mix > Runs in a free google colab too 🤯 Best part: They're actively working on other languages like Tamil, Telugu, Bengali, etc > Available on Hugging Face hub, powered by Transformers 💥

Pretty Insane - SoTA Text to Speech model capable of English AND Hindi - 3B Llama backbone - Apache 2.0 licensed 🔥 > Sub 80 ms latency > Supports both English, Hindi including code-mix > Runs in a free google colab too 🤯 Best part: They're actively working on other languages like Tamil, Telugu, Bengali, etc > Available on Hugging Face hub, powered by Transformers 💥

Vaibhav (VB) Srivastav

33,749 görüntüleme • 11 ay önce

Introducing LingBot-World: An open-source world simulator pushing the boundaries of video generation. 🚀 🌍 High-Fidelity: Realistic, scientific, & stylized. 🧠 Long-Term Memory: Minute-level consistency. ⚡ Real-Time: <1s latency at 16 FPS. 📜 Apache 2.0 Licensed. Model: Github:

Introducing LingBot-World: An open-source world simulator pushing the boundaries of video generation. 🚀 🌍 High-Fidelity: Realistic, scientific, & stylized. 🧠 Long-Term Memory: Minute-level consistency. ⚡ Real-Time: <1s latency at 16 FPS. 📜 Apache 2.0 Licensed. Model: Github:

ModelScope

28,809 görüntüleme • 4 ay önce

NEW: Mistral AI releases Mistral 3, a family of multimodal models, including three start-of-the-art dense models (3B, 8B, and 14B) and Mistral Large 3 (675B, 41B active). All Apache 2.0! 🤗 Surprisingly, the 3B is small enough to run 100% locally in your browser on WebGPU! 🤯

NEW: Mistral AI releases Mistral 3, a family of multimodal models, including three start-of-the-art dense models (3B, 8B, and 14B) and Mistral Large 3 (675B, 41B active). All Apache 2.0! 🤗 Surprisingly, the 3B is small enough to run 100% locally in your browser on WebGPU! 🤯

Xenova

225,080 görüntüleme • 6 ay önce

Introducing 0GM-1.0-35B-A3B. Our first proprietary AI model. Mixture of Experts (MoE), 35B parameters, 3B active per token. Trained on our own decentralized GPU network. Open source under Apache 2.0.

Introducing 0GM-1.0-35B-A3B. Our first proprietary AI model. Mixture of Experts (MoE), 35B parameters, 3B active per token. Trained on our own decentralized GPU network. Open source under Apache 2.0.

0G Labs (Home of Infinite AI)

48,602 görüntüleme • 1 ay önce

BOOM! IBM just released an updated SmolDocling - tiny 258M param SoTA VLM - Apache 2.0 licensed! 🔥 Capable of doing OCR, Visual QA, Translation and much more - try it out now!

BOOM! IBM just released an updated SmolDocling - tiny 258M param SoTA VLM - Apache 2.0 licensed! 🔥 Capable of doing OCR, Visual QA, Translation and much more - try it out now!

Vaibhav (VB) Srivastav

52,820 görüntüleme • 9 ay önce

Introducing Indic-Parler TTS - Trained on 10K hours of data, 938M params, supports 20 Indic languages, emotional synthesis, apache 2.0 licensed! 🔥 A collaboration w/ AI4Bharat & Hugging Face - w/ fully customisable speech and voice personas! Try it out directly below or use the model weights as you want! 🇮🇳/acc

Introducing Indic-Parler TTS - Trained on 10K hours of data, 938M params, supports 20 Indic languages, emotional synthesis, apache 2.0 licensed! 🔥 A collaboration w/ AI4Bharat & Hugging Face - w/ fully customisable speech and voice personas! Try it out directly below or use the model weights as you want! 🇮🇳/acc

Vaibhav (VB) Srivastav

42,165 görüntüleme • 1 yıl önce

WOW! DeepMind *just* dropped Magenta Real-time - Apache 2.0 licensed 🔥 > 800M params transformer, trained on ~190K hours of instrumental stock music > adapts MusicLM for real-time generation via 2s audio chunks (conditioned on prior 10s context) > 48 KHz Stereo > MusicCoCa: New joint music-text embedding model, blending MuLan and CoCa approaches > 1.25s generation time for 2s audio on free-tier Colab TPUs > style embeddings (from text/audio prompts) allow real-time morphing of genres/instruments > on-device inference, personal fine-tuning - coming soon! > model weights on Hugging Face 🤗

WOW! DeepMind just dropped Magenta Real-time - Apache 2.0 licensed 🔥 > 800M params transformer, trained on ~190K hours of instrumental stock music > adapts MusicLM for real-time generation via 2s audio chunks (conditioned on prior 10s context) > 48 KHz Stereo > MusicCoCa: New joint music-text embedding model, blending MuLan and CoCa approaches > 1.25s generation time for 2s audio on free-tier Colab TPUs > style embeddings (from text/audio prompts) allow real-time morphing of genres/instruments > on-device inference, personal fine-tuning - coming soon! > model weights on Hugging Face 🤗

Vaibhav (VB) Srivastav

90,219 görüntüleme • 1 yıl önce

🚀UniWorld: a unified model that skips VAEs and uses semantic features from SigLIP! Using just 1% of BAGEL’s data, it outperforms on image editing and excels in understanding & generation. 🌟Now data, model, training & evaluation script are open-source!

🚀UniWorld: a unified model that skips VAEs and uses semantic features from SigLIP! Using just 1% of BAGEL’s data, it outperforms on image editing and excels in understanding & generation. 🌟Now data, model, training & evaluation script are open-source!

Bin Lin

22,327 görüntüleme • 1 yıl önce

We’re diligently improving Grok, building a specialized coding model, improving multi modal capabilities, and developing a strong model for video generation and understanding. Grok on Web: Grok on iOS: Grok on Android: Grok API console:

We’re diligently improving Grok, building a specialized coding model, improving multi modal capabilities, and developing a strong model for video generation and understanding. Grok on Web: Grok on iOS: Grok on Android: Grok API console:

xAI

176,186 görüntüleme • 11 ay önce

We are thrilled to announce a major upgrade to our open-source 3D generation model, introducing two groundbreaking new versions: 3D 2.0 MV (Multi-View Generation) and 3D 2.0 Mini! 3D 2.0 MV : 3D 2.0Mini:

We are thrilled to announce a major upgrade to our open-source 3D generation model, introducing two groundbreaking new versions: 3D 2.0 MV (Multi-View Generation) and 3D 2.0 Mini! 3D 2.0 MV : 3D 2.0Mini:

Tencent HY

134,410 görüntüleme • 1 yıl önce

Today we’re taking a big step on the path toward AGI and releasing Gemini 3— our most intelligent model yet. With Gemini 3, you can bring any idea to life. It is state-of-the-art in reasoning, the best model in the world for multimodal understanding, and our best agentic and vibe coding model.

Today we’re taking a big step on the path toward AGI and releasing Gemini 3— our most intelligent model yet. With Gemini 3, you can bring any idea to life. It is state-of-the-art in reasoning, the best model in the world for multimodal understanding, and our best agentic and vibe coding model.

Google AI

492,732 görüntüleme • 7 ay önce