
Prithviraj (Raj) Ammanabrolu
@rajammanabrolu • 8,223 subscribers
Reinforcement Learning and Language. Assistant Prof @UCSanDiego. Research Scientist @Nvidia.
Shorts
Videos

Ever wished we had fewer X-training hyphenates? Pre, mid, post etc. Why not just Training? Trying to bridge the divides (and get all our friends into one team again), we intro *Introspective X Training*, an offline RL inspired method that scales effectively across any LLM stage by annotating your data with a thinking reward generated language critique! Up to 2.8x FLOP efficiency + 5-10 point score gains (esp with math and code) at any stage from scratch to 24T tokens on 8b (active) sized models!! We burned much compute ablating so you wouldn't have to Moral of the story is‼️don't throw out any data via filtering, just feedback condition it‼️ You can spend FLOPs up front on inference to *classify* data quality and then train so that tokens aren't all treated equally based on the feedback starting early in training itself. Right now they're really only separated out much later during mid/post training This improves overall compute efficiency and gives us benchmark perf not possible with just baseline methods! Paper here: Thanks to Brandon Cui and Ximing Lu for leading this w/ Syeda Nahida Akter David Acuna Hyunwoo Kim Jaehun Jung Yuxiao Qu Shrimai Yejin Choi
Prithviraj (Raj) Ammanabrolu25,607 просмотров • 19 дней назад

The future of embodied AI revolves around *collaborative* multi agent scenarios that need natural language communication, task delegation, resource sharing, and more ⛏️ Here are MINDcraft and MineCollab, a simulator and benchmark purpose built to enable research in this area!
Prithviraj (Raj) Ammanabrolu33,885 просмотров • 1 год назад
Больше нет контента для загрузки