RL is painfully slow 😭 — bottlenecked by super-long...

Infini-AI-Lab's profile picture

Infini-AI-Lab

77,156 görüntüleme • 22 gün önce

Reinforcement learning should be able to improve upon behaviors...

Vivek Myers's profile picture

Vivek Myers

79,514 görüntüleme • 1 yıl önce

Introducing RL Environment Creator Skill Now any one can...

Adithya S K's profile picture

Adithya S K

46,556 görüntüleme • 1 ay önce

Frontier research just crossed a new threshold. Mind Lab...

Chidanand Tripathi's profile picture

Chidanand Tripathi

89,813 görüntüleme • 6 ay önce

🚨Current scalable RL algos train a policy w/o value...

Aviral Kumar's profile picture

Aviral Kumar

37,301 görüntüleme • 1 yıl önce

New research from Databricks: LLMs Can Learn to Reason...

Databricks AI Research's profile picture

Databricks AI Research

12,539 görüntüleme • 4 ay önce

RL X-mas came early. 🎄 For too long, building...

Weights & Biases's profile picture

Weights & Biases

112,643 görüntüleme • 8 ay önce

Does LLM RL post-training need to be on-policy?

Kianté Brantley's profile picture

Kianté Brantley

113,605 görüntüleme • 4 ay önce

RL is back! But is it always the best...

Sebastian Risi's profile picture

Sebastian Risi

11,130 görüntüleme • 11 ay önce

What if you kept asking an LLM to "make...

Minqi Jiang's profile picture

Minqi Jiang

41,066 görüntüleme • 9 ay önce

fell for a g*rl in the holy month….

lalo's profile picture

lalo

161,678 görüntüleme • 4 ay önce

This figure from HIL-SERL is one of the clearest...

Dominique Paul's profile picture

Dominique Paul

24,433 görüntüleme • 4 ay önce

Excited to share CrystalReasoner, a reasoning model for crystal...

Sherry Yang's profile picture

Sherry Yang

10,740 görüntüleme • 1 ay önce

I've gotten a mujoco sim RL training loop for...

kache's profile picture

kache

29,728 görüntüleme • 22 gün önce

Thanks AK! Finally, robot can do continuous, agile, autonomous,...

Guanya Shi's profile picture

Guanya Shi

32,155 görüntüleme • 1 yıl önce

PPO has long dominated robot locomotion training in simulation....

Robotic Systems Lab's profile picture

Robotic Systems Lab

41,757 görüntüleme • 22 gün önce

GR-RL Going Dexterous and Precise for Long-Horizon Robotic Manipulation

AK's profile picture

AK

17,630 görüntüleme • 7 ay önce

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos...

MrNeRF's profile picture

MrNeRF

24,729 görüntüleme • 11 ay önce

1/ While most RL methods use shallow MLPs (~2–5...

Kevin Wang's profile picture

Kevin Wang

154,777 görüntüleme • 1 yıl önce

lando in the new RL ad!

ray's profile picture

ray

18,331 görüntüleme • 5 ay önce

Y’all slept on this g!rl 🤦 Na one wh!te...

Minister For Ashawo's profile picture

Minister For Ashawo

735,463 görüntüleme • 9 ay önce