Roberta Raileanu's banner

Roberta Raileanu

@robertarail • 10,859 subscribers

Open-Ended Team Lead and Senior Staff Research Scientist @GoogleDeepMind. Honorary Lecturer @UCL. ex @Meta | @NYU | @Princeton.

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬 We introduce MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. The key contributions of our work are: 🕹️ Enables the exploration of different training algorithms for AI Research Agents such as RL 🛠️ Provides a flexible evaluation framework that can accommodate different artifacts such as models, algorithms, or predictions 🤖 Allows researchers to evaluate any model without the need to develop a custom agentic harness 🎯 Introduces 13 diverse open-ended AI Research tasks for evaluating AI Research Agents on a wide range of domains such as computer vision, natural language processing, reinforcement learning, game theory, and logical reasoning. 📈 Proposes a new evaluation metric for AI Research Agents MLGym makes it easy to: 1) Add new tasks 2) Evaluate new models 3) Integrate new agents Check out a video of the MLGym Agent to see how it performs the full pipeline of idea generation💡, implementation 👩‍💻, experimentation 👩‍🔬, and iteration 🔄 to improve on ML tasks. Huge thanks to the exceptionally talented Deepak Nathani who led this work and to all the other amazing collaborators who made this possible 🙏🫶🚀

Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬 We introduce MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. The key contributions of our work are: 🕹️ Enables the exploration of different training algorithms for AI Research Agents such as RL 🛠️ Provides a flexible evaluation framework that can accommodate different artifacts such as models, algorithms, or predictions 🤖 Allows researchers to evaluate any model without the need to develop a custom agentic harness 🎯 Introduces 13 diverse open-ended AI Research tasks for evaluating AI Research Agents on a wide range of domains such as computer vision, natural language processing, reinforcement learning, game theory, and logical reasoning. 📈 Proposes a new evaluation metric for AI Research Agents MLGym makes it easy to: 1) Add new tasks 2) Evaluate new models 3) Integrate new agents Check out a video of the MLGym Agent to see how it performs the full pipeline of idea generation💡, implementation 👩‍💻, experimentation 👩‍🔬, and iteration 🔄 to improve on ML tasks. Huge thanks to the exceptionally talented Deepak Nathani who led this work and to all the other amazing collaborators who made this possible 🙏🫶🚀

Roberta Raileanu

104,935 просмотров • 1 год назад

Больше нет контента для загрузки