Does LLM RL post-training need to be on-policy?

Kianté Brantley's profile picture

Kianté Brantley

113,605 Aufrufe • vor 4 Monaten

LLM training on RTX 5090

ℏεsam's profile picture

ℏεsam

144,051 Aufrufe • vor 1 Jahr

Zombie robot RL policy

Simon Kalouche's profile picture

Simon Kalouche

164,401 Aufrufe • vor 1 Jahr

Does off-policy value-based RL scale? In LLMs, larger scale...

Oleg Rybkin's profile picture

Oleg Rybkin

23,979 Aufrufe • vor 1 Jahr

Rare serious post on here but it need to...

Alexandra Pembroke's profile picture

Alexandra Pembroke

10,899 Aufrufe • vor 1 Monat

Tutorial Time: Run any open-source LLM locally. Now we...

Linus ✦ Ekenstam's profile picture

Linus ✦ Ekenstam

915,787 Aufrufe • vor 2 Jahren

Reinforcement learning should be able to improve upon behaviors...

Vivek Myers's profile picture

Vivek Myers

79,523 Aufrufe • vor 1 Jahr

New project! Flow Policy Gradients for Robot Control tldr;...

Brent Yi's profile picture

Brent Yi

91,596 Aufrufe • vor 4 Monaten

How thick does ice need to be before you...

AlphaFox's profile picture

AlphaFox

43,053 Aufrufe • vor 5 Monaten

Does your bed board need to be replaced 😏

sytoys-us1's profile picture

sytoys-us1

49,674 Aufrufe • vor 9 Monaten

Young man does not say "Thank you Sir" will...

newbeginning's profile picture

newbeginning

12,898 Aufrufe • vor 1 Jahr

How does high-fidelity tactile simulation help robots nail the...

Binghao Huang's profile picture

Binghao Huang

46,982 Aufrufe • vor 8 Monaten

How it feels to be an LLM

Beff (e/acc)'s profile picture

Beff (e/acc)

36,669 Aufrufe • vor 1 Jahr

i need to post more on here 😩

MS. F!NEE $HITT's profile picture

MS. F!NEE $HITT

124,925 Aufrufe • vor 1 Jahr

i need to post on here more oops

haley ⋆✧.*'s profile picture

haley ⋆✧.*

14,443 Aufrufe • vor 1 Jahr

🚨Current scalable RL algos train a policy w/o value...

Aviral Kumar's profile picture

Aviral Kumar

37,301 Aufrufe • vor 1 Jahr

RL is painfully slow 😭 — bottlenecked by super-long...

Infini-AI-Lab's profile picture

Infini-AI-Lab

77,156 Aufrufe • vor 24 Tagen

🤔 How to fine-tune an Imitation Learning policy (e.g.,...

Tongzhou Mu 🤖🦾🦿's profile picture

Tongzhou Mu 🤖🦾🦿

16,959 Aufrufe • vor 1 Jahr

I love this guy. He does not need to...

Dex's profile picture

Dex

35,821 Aufrufe • vor 1 Jahr

Introducing RL Environment Creator Skill Now any one can...

Adithya S K's profile picture

Adithya S K

46,556 Aufrufe • vor 1 Monat