
Mariusz Kurman
@mkurman88 • 3,846 subscribers
from bedside to byte_side, MD to AI, 🇵🇱
Videos

This is unbelievable. One of my greatest runs ever. This model didn't even see 40B tokens. ~190M, trained from scratch, no self-attention per se, with conv1d, conv2d, and chunk-token attention. It already reasons about user intent, not just blabbing random things related to the query. Not perfect, I know, but still!
Mariusz Kurman12,406 次观看 • 19 天前

This is it: A single-person project; Trained from scratch on TPUs (Google TRC) on the one and only SYTNH dataset by pleias; Neuroblast-v3 architecture running on my local vLLM instance Just wow (I'm amazed by how good it looks; speed is incredible, here slightly slowed by high-resolution recording) Todos > needs agentic fine-tuning in the future > needs some fine-grained RL
Mariusz Kurman32,474 次观看 • 6 个月前
没有更多内容可加载