Aksel's banner
Aksel's profile picture

Aksel

@akseljoonas3,067 subscribers

building AI agents @huggingface 🤗

Shorts

Introducing ml-intern, the agent that just automated the post-training team Hugging Face It's an open-source implementation of the real research loop that our ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements ideas in GPU sandboxes, iterates and builds deeply research-backed models for any use case. All built on the Hugging Face ecosystem. It can pull off crazy things: We made it train the best model for scientific reasoning. It went through citations from the official benchmark paper. Found OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered dataset variants from ARC/SciQ/MMLU, and ran 12 SFT runs on Qwen3-1.7B. This pushed the score 10% → 32% on GPQA in under 10h. Claude Code's best: 22.99%. In healthcare settings it inspected available datasets, concluded they were too low quality, and wrote a script to generate 1100 synthetic data points from scratch for emergencies, hedging, multilingual etc. Then upsampled 50x for training. Beat Codex on HealthBench by 60%. For competitive mathematics, it wrote a full GRPO script, launched training with A100 GPUs on watched rewards claim and then collapse, and ran ablations until it succeeded. All fully backed by papers, autonomously. How it works? ml-intern makes full use of the HF ecosystem: - finds papers on arxiv and reads them fully, walks citation graphs, pulls datasets referenced in methodology sections and on - browses the Hub, reads recent docs, inspects datasets and reformats them before training so it doesn't waste GPU hours on bad data - launches training jobs on HF Jobs if no local GPUs are available, monitors runs, reads its own eval outputs, diagnoses failures, retrains ml-intern deeply embodies how researchers work and think. It knows how data should look like and what good models feel like. Releasing it today as a CLI and a web app you can use from your phone/desktop. CLI: Web + mobile: And the best part? We also provisioned 1k$ GPU resources and Anthropic credits for the quickest among you to use.

Introducing ml-intern, the agent that just automated the post-training team Hugging Face It's an open-source implementation of the real research loop that our ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements ideas in GPU sandboxes, iterates and builds deeply research-backed models for any use case. All built on the Hugging Face ecosystem. It can pull off crazy things: We made it train the best model for scientific reasoning. It went through citations from the official benchmark paper. Found OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered dataset variants from ARC/SciQ/MMLU, and ran 12 SFT runs on Qwen3-1.7B. This pushed the score 10% → 32% on GPQA in under 10h. Claude Code's best: 22.99%. In healthcare settings it inspected available datasets, concluded they were too low quality, and wrote a script to generate 1100 synthetic data points from scratch for emergencies, hedging, multilingual etc. Then upsampled 50x for training. Beat Codex on HealthBench by 60%. For competitive mathematics, it wrote a full GRPO script, launched training with A100 GPUs on watched rewards claim and then collapse, and ran ablations until it succeeded. All fully backed by papers, autonomously. How it works? ml-intern makes full use of the HF ecosystem: - finds papers on arxiv and reads them fully, walks citation graphs, pulls datasets referenced in methodology sections and on - browses the Hub, reads recent docs, inspects datasets and reformats them before training so it doesn't waste GPU hours on bad data - launches training jobs on HF Jobs if no local GPUs are available, monitors runs, reads its own eval outputs, diagnoses failures, retrains ml-intern deeply embodies how researchers work and think. It knows how data should look like and what good models feel like. Releasing it today as a CLI and a web app you can use from your phone/desktop. CLI: Web + mobile: And the best part? We also provisioned 1k$ GPU resources and Anthropic credits for the quickest among you to use.

1,260,755 次观看

3 weeks since ml-intern launched and we just hit 1M messages exchanged. that's 3.3 agent-years of ML research in 21 days. 2 months worth of research every day. 17,383 training jobs total. talk about AI acceleration. here's some of what people built: Carlos Miguel Patiño replicated the full DeepSeek v4 architecture and pre+post trained a 100M MoE from scratch. → it landed a third place submission on Keller Jordan optimizer competition. autoresearch on SOTA territory. Lewis Tunstall Got the intern to convert Alec Radford's cool new talkie-lm 1930 model to work with transformers. tokenizer, chat template, model conversion etc all one-shotted by ml-intern. someone created entire PhD dissertation chapter on context-aware agentic cyber defense drafted with 16 research subagents. and someone used it to crack an Paul Jankura kernel optimization take-home. (we don't know how to feel about this one 👀 ) just getting started →

3 weeks since ml-intern launched and we just hit 1M messages exchanged. that's 3.3 agent-years of ML research in 21 days. 2 months worth of research every day. 17,383 training jobs total. talk about AI acceleration. here's some of what people built: Carlos Miguel Patiño replicated the full DeepSeek v4 architecture and pre+post trained a 100M MoE from scratch. → it landed a third place submission on Keller Jordan optimizer competition. autoresearch on SOTA territory. Lewis Tunstall Got the intern to convert Alec Radford's cool new talkie-lm 1930 model to work with transformers. tokenizer, chat template, model conversion etc all one-shotted by ml-intern. someone created entire PhD dissertation chapter on context-aware agentic cyber defense drafted with 16 research subagents. and someone used it to crack an Paul Jankura kernel optimization take-home. (we don't know how to feel about this one 👀 ) just getting started →

35,091 次观看

Videos

没有更多内容可加载