🚨Current scalable RL algos train a policy w/o value... show more

Aviral Kumar
37,286 Aufrufe • vor 1 Jahr
Does off-policy value-based RL scale? In LLMs, larger scale... show more

Oleg Rybkin
23,968 Aufrufe • vor 1 Jahr
🤔 How to fine-tune an Imitation Learning policy (e.g.,... show more

Tongzhou Mu 🤖🦾🦿
16,959 Aufrufe • vor 1 Jahr
D4RL is a great benchmark, but is saturated. Introducing... show more

Seohong Park
36,410 Aufrufe • vor 1 Jahr
Repair based on the current value.

Gisa w'I Rwanda🇷🇼❤️🇷🇼
107,536 Aufrufe • vor 14 Tagen
New work: The Value Axis 🎯 How do LLMs... show more

Nick Jiang
25,071 Aufrufe • vor 11 Tagen
This figure from HIL-SERL is one of the clearest... show more

Dominique Paul
24,433 Aufrufe • vor 4 Monaten
RL is back! But is it always the best... show more

Sebastian Risi
11,130 Aufrufe • vor 11 Monaten
Frontier research just crossed a new threshold. Mind Lab... show more

Chidanand Tripathi
89,813 Aufrufe • vor 6 Monaten
🚨 New: Integrating Harbor (Harbor Framework) for end-to-end Computer-Use... show more

Marco Mascorro
19,448 Aufrufe • vor 3 Monaten
RL X-mas came early. 🎄 For too long, building... show more

Weights & Biases
112,643 Aufrufe • vor 8 Monaten
Over the past months, Cohort I of our RL... show more

Prime Intellect
59,409 Aufrufe • vor 2 Monaten
Zombie robot RL policy

Simon Kalouche
164,401 Aufrufe • vor 1 Jahr
🔥 Nebius AI R&D is hiring AI Research Interns... show more

Ibragim
33,427 Aufrufe • vor 2 Monaten
Introducing RL Environment Creator Skill Now any one can... show more

Adithya S K
46,556 Aufrufe • vor 1 Monat
Some personal news: I recently joined Cursor. Cursor is... show more

Sasha Rush
336,077 Aufrufe • vor 1 Jahr
Twitter AU in which Boss and Noeul attempt to... show more

MayflowerPrincess
20,606 Aufrufe • vor 2 Jahren
New research from Databricks: LLMs Can Learn to Reason... show more

Databricks AI Research
12,539 Aufrufe • vor 4 Monaten
he is ready to throw hands for rl 😭

ara ྀི
38,189 Aufrufe • vor 9 Monaten
US-based K-Scale Labs launched pre-orders for its open-source humanoid,... show more

Brett Adcock
10,994 Aufrufe • vor 11 Monaten
How to return a value based on a criteria... show more

Excel Dictionary
153,781 Aufrufe • vor 3 Jahren