Prithviraj (Raj) Ammanabrolu's banner

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu • 8,268 subscribers

Reinforcement Learning and Language. Assistant Prof @UCSanDiego. Research Scientist @Nvidia.

Shorts

Introducing TALES - Text Adventure Learning Environment Suite A benchmark of a few hundred text envs: science experiments and embodied cooking to solving murder mysteries. We test over 30 of the best LLM agents and pinpoint failure modes +how to improve 👨‍💻pip install tale-suite

Introducing TALES - Text Adventure Learning Environment Suite A benchmark of a few hundred text envs: science experiments and embodied cooking to solving murder mysteries. We test over 30 of the best LLM agents and pinpoint failure modes +how to improve 👨‍💻pip install tale-suite

16,463 просмотров

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Ever wished we had fewer X-training hyphenates? Pre, mid, post etc. Why not just Training? Trying to bridge the divides (and get all our friends into one team again), we intro *Introspective X Training*, an offline RL inspired method that scales effectively across any LLM stage by annotating your data with a thinking reward generated language critique! Up to 2.8x FLOP efficiency + 5-10 point score gains (esp with math and code) at any stage from scratch to 24T tokens on 8b (active) sized models!! We burned much compute ablating so you wouldn't have to Moral of the story is‼️don't throw out any data via filtering, just feedback condition it‼️ You can spend FLOPs up front on inference to *classify* data quality and then train so that tokens aren't all treated equally based on the feedback starting early in training itself. Right now they're really only separated out much later during mid/post training This improves overall compute efficiency and gives us benchmark perf not possible with just baseline methods! Paper here: Thanks to Brandon Cui and Ximing Lu for leading this w/ Syeda Nahida Akter David Acuna Hyunwoo Kim Jaehun Jung Yuxiao Qu Shrimai Yejin Choi

Ever wished we had fewer X-training hyphenates? Pre, mid, post etc. Why not just Training? Trying to bridge the divides (and get all our friends into one team again), we intro Introspective X Training, an offline RL inspired method that scales effectively across any LLM stage by annotating your data with a thinking reward generated language critique! Up to 2.8x FLOP efficiency + 5-10 point score gains (esp with math and code) at any stage from scratch to 24T tokens on 8b (active) sized models!! We burned much compute ablating so you wouldn't have to Moral of the story is‼️don't throw out any data via filtering, just feedback condition it‼️ You can spend FLOPs up front on inference to classify data quality and then train so that tokens aren't all treated equally based on the feedback starting early in training itself. Right now they're really only separated out much later during mid/post training This improves overall compute efficiency and gives us benchmark perf not possible with just baseline methods! Paper here: Thanks to Brandon Cui and Ximing Lu for leading this w/ Syeda Nahida Akter David Acuna Hyunwoo Kim Jaehun Jung Yuxiao Qu Shrimai Yejin Choi

Prithviraj (Raj) Ammanabrolu

27,471 просмотров • 2 месяцев назад

The future of embodied AI revolves around *collaborative* multi agent scenarios that need natural language communication, task delegation, resource sharing, and more ⛏️ Here are MINDcraft and MineCollab, a simulator and benchmark purpose built to enable research in this area!

The future of embodied AI revolves around collaborative multi agent scenarios that need natural language communication, task delegation, resource sharing, and more ⛏️ Here are MINDcraft and MineCollab, a simulator and benchmark purpose built to enable research in this area!

Prithviraj (Raj) Ammanabrolu

34,004 просмотров • 1 год назад

Больше нет контента для загрузки