🚨Current scalable RL algos train a policy w/o value...

Aviral Kumar's profile picture

Aviral Kumar

37,286 Aufrufe • vor 1 Jahr

Does off-policy value-based RL scale? In LLMs, larger scale...

Oleg Rybkin's profile picture

Oleg Rybkin

23,968 Aufrufe • vor 1 Jahr

🤔 How to fine-tune an Imitation Learning policy (e.g.,...

Tongzhou Mu 🤖🦾🦿's profile picture

Tongzhou Mu 🤖🦾🦿

16,959 Aufrufe • vor 1 Jahr

D4RL is a great benchmark, but is saturated. Introducing...

Seohong Park's profile picture

Seohong Park

36,410 Aufrufe • vor 1 Jahr

Repair based on the current value.

Gisa w'I Rwanda🇷🇼❤️🇷🇼's profile picture

Gisa w'I Rwanda🇷🇼❤️🇷🇼

107,536 Aufrufe • vor 14 Tagen

New work: The Value Axis 🎯 How do LLMs...

Nick Jiang's profile picture

Nick Jiang

25,071 Aufrufe • vor 11 Tagen

This figure from HIL-SERL is one of the clearest...

Dominique Paul's profile picture

Dominique Paul

24,433 Aufrufe • vor 4 Monaten

RL is back! But is it always the best...

Sebastian Risi's profile picture

Sebastian Risi

11,130 Aufrufe • vor 11 Monaten

Frontier research just crossed a new threshold. Mind Lab...

Chidanand Tripathi's profile picture

Chidanand Tripathi

89,813 Aufrufe • vor 6 Monaten

🚨 New: Integrating Harbor (Harbor Framework) for end-to-end Computer-Use...

Marco Mascorro's profile picture

Marco Mascorro

19,448 Aufrufe • vor 3 Monaten

RL X-mas came early. 🎄 For too long, building...

Weights & Biases's profile picture

Weights & Biases

112,643 Aufrufe • vor 8 Monaten

Over the past months, Cohort I of our RL...

Prime Intellect's profile picture

Prime Intellect

59,409 Aufrufe • vor 2 Monaten

Zombie robot RL policy

Simon Kalouche's profile picture

Simon Kalouche

164,401 Aufrufe • vor 1 Jahr

🔥 Nebius AI R&D is hiring AI Research Interns...

Ibragim's profile picture

Ibragim

33,427 Aufrufe • vor 2 Monaten

Introducing RL Environment Creator Skill Now any one can...

Adithya S K's profile picture

Adithya S K

46,556 Aufrufe • vor 1 Monat

Some personal news: I recently joined Cursor. Cursor is...

Sasha Rush's profile picture

Sasha Rush

336,077 Aufrufe • vor 1 Jahr

Twitter AU in which Boss and Noeul attempt to...

MayflowerPrincess's profile picture

MayflowerPrincess

20,606 Aufrufe • vor 2 Jahren

New research from Databricks: LLMs Can Learn to Reason...

Databricks AI Research's profile picture

Databricks AI Research

12,539 Aufrufe • vor 4 Monaten

he is ready to throw hands for rl 😭

ara ྀི's profile picture

ara ྀི

38,189 Aufrufe • vor 9 Monaten

US-based K-Scale Labs launched pre-orders for its open-source humanoid,...

Brett Adcock's profile picture

Brett Adcock

10,994 Aufrufe • vor 11 Monaten

How to return a value based on a criteria...

Excel Dictionary's profile picture

Excel Dictionary

153,781 Aufrufe • vor 3 Jahren