Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Wait, that's JUST a 3B multimodal model understanding and generation model AND apache 2.0 licensed 🔥

43,777 görüntüleme • 11 ay önce •via X (Twitter)

13 Yorum

Vaibhav (VB) Srivastav profil fotoğrafı
Vaibhav (VB) Srivastav11 ay önce

Try it out directly on their ZeroGPU demo here:

Igor Tarasenko profil fotoğrafı
Igor Tarasenko11 ay önce

how is it even legal?

Mario profil fotoğrafı
Mario11 ay önce

This looks extremly good

Pannous profil fotoğrafı
Pannous11 ay önce

is it good for anything other than style transfer?

Dodo profil fotoğrafı
Dodo11 ay önce

bro, we're in the future

Himanshu Kumar profil fotoğrafı
Himanshu Kumar11 ay önce

Open source could democratize access to powerful AI like this.

S..... profil fotoğrafı
S.....11 ay önce

tf this run on zero 😱😱😱😱😱

Rithesh profil fotoğrafı
Rithesh11 ay önce

I tried out a simple task and the results were horrible.

Tsukuyomi profil fotoğrafı
Tsukuyomi11 ay önce

ah, just a casual 3B multimodal model, huh? sounds like the future is here, just waiting to take over. let's hope it doesn't start plotting against us.

Ivan Fioravanti ᯅ profil fotoğrafı
Ivan Fioravanti ᯅ11 ay önce

DeepSeek-R1-0528-5bit on MLX pushing M3 Ultra 512GB to its limits! 501GB used mem visibile on mactop in the video! Context: 4K tokens Prompt: 190.29 t/s Gen: 11.37 t/s Peak Mem: 487.48 GB! THIS IS APPLE MLX!

Unsloth AI profil fotoğrafı
Unsloth AI11 ay önce

You can now fine-tune Gemma 3n for free with our notebook! Unsloth makes Google Gemma training 1.5x faster with 50% less VRAM and 5x longer context lengths - with no accuracy loss. Guide: GitHub: Colab:

Andrej Karpathy profil fotoğrafı
Andrej Karpathy11 ay önce

Love this project: nanoGPT -> recursive self-improvement benchmark. Good old nanoGPT keeps on giving and surprising :) - First I wrote it as a small little repo to teach people the basics of training GPTs. - Then it became a target and baseline for my port to direct C/CUDA re-implementation in llm.c. - Then that was modded (by @kellerjordan0 et al.) into a (small-scale) LLM research harness. People iteratively optimized the training so that e.g. reproducing GPT-2 (124M) performance takes not 45 min (original) but now only 3 min! - Now the idea is to use this process of optimizing the code as a benchmark for LLM coding agents. If humans can speed up LLM training from 45 to 3 minutes, how well do LLM Agents do, under different kinds of settings (e.g. with or without hints etc.)? (spoiler: in this paper, as a baseline and right now not that well, even with strong hints). The idea of recursive self-improvement has of course been around for a long time. My usual rant on it is that it's not going to be this thing that didn't exist and then suddenly exists. Recursive self-improvement has already begun a long time ago and is under-way today in a smooth, incremental way. First, even basic software tools (e.g. coding IDEs) fall into the category because they speed up programmers in building the N+1 version. Any of our existing software infrastructure that speeds up development (google search, git, ...) qualifies. And then if you insist on AI as a special and distinct, most programmers now already routinely use LLM code completion or code diffs in their own programming workflows, collaborating in increasingly larger chunks of functionality and experimentation. This amount of collaboration will continue to grow. It's worth also pointing out that nanoGPT is a super simple, tiny educational codebase (~750 lines of code) and for only the pretraining stage of building LLMs. Production-grade code bases are *significantly* (100-1000X?) bigger and more complex. But for the current level of AI capability, it is imo an excellent, interesting, tractable benchmark that I look forward to following.

steven profil fotoğrafı
steven11 ay önce

Gemma 3n just dropped — and now it’s easy to fine-tune it on text, audio and vision! 🔥 We just released full recipes to get you started!

Benzer Videolar