Leandro von Werra's banner

Leandro von Werra

@lvwerra • 12,167 subscribers

Head of research @huggingface

Shorts

We released physics-intern: a simple harness for science problems! It gets models like Gemini 3.1 Pro to go from 17.7 -> 31.4, thus beating GPT 5.5 Pro. The physics-intern harness can wrap any model and via dedicated subagent boost the performance of the vanilla reasoning models. While I think more and more of these harness capability gains will be absorbed into the models (like prompting tricks disappeared over time) there is a lot to be gained right now by building good scaffolds for those models and integrating tools well. Interestingly, the exception we found that GPT 5.5 Pro actually didn't benefit from the physics-intern harness! Read more about it here: PS: I think the Harness[Model] notation is kind of nice.

We released physics-intern: a simple harness for science problems! It gets models like Gemini 3.1 Pro to go from 17.7 -> 31.4, thus beating GPT 5.5 Pro. The physics-intern harness can wrap any model and via dedicated subagent boost the performance of the vanilla reasoning models. While I think more and more of these harness capability gains will be absorbed into the models (like prompting tricks disappeared over time) there is a lot to be gained right now by building good scaffolds for those models and integrating tools well. Interestingly, the exception we found that GPT 5.5 Pro actually didn't benefit from the physics-intern harness! Read more about it here: PS: I think the Harness[Model] notation is kind of nice.

97,181 次观看

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

We launched an agent collaboration with a simple task: make Gemma 4 faster. Over 100 agents from all over the world joined, exchanged 1000+ messages and submitted 450 results. A week of collaboration later the throughput went from 100 tok/s to over 500 tok/s.

We launched an agent collaboration with a simple task: make Gemma 4 faster. Over 100 agents from all over the world joined, exchanged 1000+ messages and submitted 450 results. A week of collaboration later the throughput went from 100 tok/s to over 500 tok/s.

Leandro von Werra

229,031 次观看 • 1 个月前

We are releasing Carbon: a crazy fast DNA model Carbon is 275x faster than the next best model. So fast you can process the whole human genome on a single GPU in <2 days. Here are the tricks we used: When modelling DNA sequences a lot of the performance comes down to tokenizing the sequences in a smart way. BPE tokenizer struggle because there are no whitespaces and character (called base in DNA) level tokenizers waste a lot of compute on too many tokens. Carbon is built with a unique tokenizer: we split sequences in chunks of 6 bases, but during both training and inference we can work with single base resolution. That's similar to having word tokens but resolving them at the character level. All possible thanks to the DNA tokens unique structure. The architecture combined with the tokenizer makes the model 275x faster than the previous SoTA (Evo2) at this size. We built an interactive demo so you can explore how the model can generate DNA sequences, investigate the structure of genes, predict the effect of mutations, generate and fold proteins and even reconstruct parts of the tree of life.

We are releasing Carbon: a crazy fast DNA model Carbon is 275x faster than the next best model. So fast you can process the whole human genome on a single GPU in <2 days. Here are the tricks we used: When modelling DNA sequences a lot of the performance comes down to tokenizing the sequences in a smart way. BPE tokenizer struggle because there are no whitespaces and character (called base in DNA) level tokenizers waste a lot of compute on too many tokens. Carbon is built with a unique tokenizer: we split sequences in chunks of 6 bases, but during both training and inference we can work with single base resolution. That's similar to having word tokens but resolving them at the character level. All possible thanks to the DNA tokens unique structure. The architecture combined with the tokenizer makes the model 275x faster than the previous SoTA (Evo2) at this size. We built an interactive demo so you can explore how the model can generate DNA sequences, investigate the structure of genes, predict the effect of mutations, generate and fold proteins and even reconstruct parts of the tree of life.

Leandro von Werra

404,474 次观看 • 2 个月前

Fable is the new leader on CADGenBench! Still long way to go:

Fable is the new leader on CADGenBench! Still long way to go:

Leandro von Werra

36,711 次观看 • 1 个月前

Seeing a group of agents collaborate is really mesmerising, like watching a colony of ants! These agents have been collaborating for over a week now. Their goal is simple: gather and organize all information on RL + LLMs and build the ultimate resource. The status so far: - over 20 agents have joined to work on this - processed almost 300 sources (papers, blogs, etc.) - exchanged 350 messages to coordinate - wrote 48 articles on the wiki with 90k words/2k refs - started working on code implementations Curious how far we can push this. Can they find gaps in the existing research? Can they identify and resolve contradictory results? Add your agent: Watch them work:

Seeing a group of agents collaborate is really mesmerising, like watching a colony of ants! These agents have been collaborating for over a week now. Their goal is simple: gather and organize all information on RL + LLMs and build the ultimate resource. The status so far: - over 20 agents have joined to work on this - processed almost 300 sources (papers, blogs, etc.) - exchanged 350 messages to coordinate - wrote 48 articles on the wiki with 90k words/2k refs - started working on code implementations Curious how far we can push this. Can they find gaps in the existing research? Can they identify and resolve contradictory results? Add your agent: Watch them work:

Leandro von Werra

13,423 次观看 • 14 天前

Jupyter Agents - LLMs running data analysis directly in a notebook! The agent can load data, execute code, plot results and following your guidance and ideas! A very natural way to collaborate with an LLM over data and it's just scratching the surface of what's possible soon!

Jupyter Agents - LLMs running data analysis directly in a notebook! The agent can load data, execute code, plot results and following your guidance and ideas! A very natural way to collaborate with an LLM over data and it's just scratching the surface of what's possible soon!

Leandro von Werra

200,291 次观看 • 1 年前

Excited to release: AgentUI > a fresh chat interface - natively multi-agent > agents coordinate via reports and figures > plug+play any open/closed model as sub-agent > agents specialise in code, web search, multimodal... Try it here:

Excited to release: AgentUI > a fresh chat interface - natively multi-agent > agents coordinate via reports and figures > plug+play any open/closed model as sub-agent > agents specialise in code, web search, multimodal... Try it here:

Leandro von Werra

41,065 次观看 • 4 个月前

Excited to release: Jupyter Agent 2 The agent can load data, execute code, plot results inside Jupyter faster than you can scroll! 🤖 Powered by Qwen3-Coder ⚡️ Running on Cerebras ⚙️ Executed in E2B ↕️ Upload your files All videos are in *real time*!

Excited to release: Jupyter Agent 2 The agent can load data, execute code, plot results inside Jupyter faster than you can scroll! 🤖 Powered by Qwen3-Coder ⚡️ Running on Cerebras ⚙️ Executed in E2B ↕️ Upload your files All videos are in real time!

Leandro von Werra

66,234 次观看 • 11 个月前

Open Computer Agent - LLMs completing tasks using a VM. It's playground to test how well current LLM agents use a computer to solve everyday tasks. And this is just the start - very soon models will be 10x faster and 10x better at it! ❤️ built with e2b x qwen2.5-vl x smolagent

Open Computer Agent - LLMs completing tasks using a VM. It's playground to test how well current LLM agents use a computer to solve everyday tasks. And this is just the start - very soon models will be 10x faster and 10x better at it! ❤️ built with e2b x qwen2.5-vl x smolagent

Leandro von Werra

17,126 次观看 • 1 年前

Or watch how the model solves the Lokta-Volterra equation and plots the results and refines them. Try it out:

Or watch how the model solves the Lokta-Volterra equation and plots the results and refines them. Try it out:

Leandro von Werra

13,013 次观看 • 1 年前

没有更多内容可加载