🚨Current scalable RL algos train a policy w/o value...

Aviral Kumar's profile picture

Aviral Kumar

37,286 görüntüleme • 1 yıl önce

Does off-policy value-based RL scale? In LLMs, larger scale...

Oleg Rybkin's profile picture

Oleg Rybkin

23,968 görüntüleme • 1 yıl önce

🤔 How to fine-tune an Imitation Learning policy (e.g.,...

Tongzhou Mu 🤖🦾🦿's profile picture

Tongzhou Mu 🤖🦾🦿

16,923 görüntüleme • 1 yıl önce

D4RL is a great benchmark, but is saturated. Introducing...

Seohong Park's profile picture

Seohong Park

36,410 görüntüleme • 1 yıl önce

This figure from HIL-SERL is one of the clearest...

Dominique Paul's profile picture

Dominique Paul

24,399 görüntüleme • 3 ay önce

RL is back! But is it always the best...

Sebastian Risi's profile picture

Sebastian Risi

11,130 görüntüleme • 10 ay önce

Frontier research just crossed a new threshold. Mind Lab...

Chidanand Tripathi's profile picture

Chidanand Tripathi

89,813 görüntüleme • 6 ay önce

🚨 New: Integrating Harbor (Harbor Framework) for end-to-end Computer-Use...

Marco Mascorro's profile picture

Marco Mascorro

19,448 görüntüleme • 3 ay önce

RL X-mas came early. 🎄 For too long, building...

Weights & Biases's profile picture

Weights & Biases

112,643 görüntüleme • 8 ay önce

Over the past months, Cohort I of our RL...

Prime Intellect's profile picture

Prime Intellect

59,206 görüntüleme • 1 ay önce

Zombie robot RL policy

Simon Kalouche's profile picture

Simon Kalouche

164,319 görüntüleme • 11 ay önce

🔥 Nebius AI R&D is hiring AI Research Interns...

Ibragim's profile picture

Ibragim

33,337 görüntüleme • 1 ay önce

Twitter AU in which Boss and Noeul attempt to...

MayflowerPrincess's profile picture

MayflowerPrincess

20,561 görüntüleme • 2 yıl önce

Introducing RL Environment Creator Skill Now any one can...

Adithya S K's profile picture

Adithya S K

46,445 görüntüleme • 1 ay önce

Some personal news: I recently joined Cursor. Cursor is...

Sasha Rush's profile picture

Sasha Rush

335,873 görüntüleme • 1 yıl önce

New research from Databricks: LLMs Can Learn to Reason...

Databricks AI Research's profile picture

Databricks AI Research

12,439 görüntüleme • 3 ay önce

he is ready to throw hands for rl 😭

ara ྀི's profile picture

ara ྀི

38,189 görüntüleme • 9 ay önce

US-based K-Scale Labs launched pre-orders for its open-source humanoid,...

Brett Adcock's profile picture

Brett Adcock

10,994 görüntüleme • 11 ay önce

How to return a value based on a criteria...

Excel Dictionary's profile picture

Excel Dictionary

153,781 görüntüleme • 3 yıl önce

This is REK Real Steel in RL

CIX 🦾's profile picture

CIX 🦾

11,965 görüntüleme • 3 ay önce

anyway… 2 be rl with yall

Lissy's profile picture

Lissy

14,184 görüntüleme • 2 ay önce