🚨Current scalable RL algos train a policy w/o value... show more

Aviral Kumar
37,301 просмотров • 1 год назад
Does off-policy value-based RL scale? In LLMs, larger scale... show more

Oleg Rybkin
23,979 просмотров • 1 год назад
🤔 How to fine-tune an Imitation Learning policy (e.g.,... show more

Tongzhou Mu 🤖🦾🦿
16,959 просмотров • 1 год назад
D4RL is a great benchmark, but is saturated. Introducing... show more

Seohong Park
36,410 просмотров • 1 год назад
Repair based on the current value.

Gisa w'I Rwanda🇷🇼❤️🇷🇼
107,536 просмотров • 20 дней назад
New work: The Value Axis 🎯 How do LLMs... show more

Nick Jiang
25,071 просмотров • 17 дней назад
This figure from HIL-SERL is one of the clearest... show more

Dominique Paul
24,433 просмотров • 4 месяцев назад
RL is back! But is it always the best... show more

Sebastian Risi
11,130 просмотров • 11 месяцев назад
Frontier research just crossed a new threshold. Mind Lab... show more

Chidanand Tripathi
89,813 просмотров • 6 месяцев назад
🚨 New: Integrating Harbor (Harbor Framework) for end-to-end Computer-Use... show more

Marco Mascorro
19,448 просмотров • 3 месяцев назад
RL X-mas came early. 🎄 For too long, building... show more

Weights & Biases
112,643 просмотров • 8 месяцев назад
Over the past months, Cohort I of our RL... show more

Prime Intellect
59,409 просмотров • 2 месяцев назад
Zombie robot RL policy

Simon Kalouche
164,401 просмотров • 1 год назад
🔥 Nebius AI R&D is hiring AI Research Interns... show more

Ibragim
33,427 просмотров • 2 месяцев назад
Some personal news: I recently joined Cursor. Cursor is... show more

Sasha Rush
336,077 просмотров • 1 год назад
Introducing RL Environment Creator Skill Now any one can... show more

Adithya S K
46,556 просмотров • 1 месяц назад
Twitter AU in which Boss and Noeul attempt to... show more

MayflowerPrincess
20,606 просмотров • 2 лет назад
New research from Databricks: LLMs Can Learn to Reason... show more

Databricks AI Research
12,539 просмотров • 4 месяцев назад
he is ready to throw hands for rl 😭

ara ྀི
38,189 просмотров • 9 месяцев назад
US-based K-Scale Labs launched pre-orders for its open-source humanoid,... show more

Brett Adcock
10,994 просмотров • 1 год назад
How to return a value based on a criteria... show more

Excel Dictionary
153,781 просмотров • 3 лет назад