Does off-policy value-based RL scale? In LLMs, larger scale... show more

Oleg Rybkin
23,968 просмотров • 1 год назад
🚨Current scalable RL algos train a policy w/o value... show more

Aviral Kumar
37,284 просмотров • 1 год назад
Introducing CQN: Coarse-to-fine Q-Network, a value-based RL algorithm for... show more

Younggyo Seo
16,413 просмотров • 1 год назад
New research from Databricks: LLMs Can Learn to Reason... show more

Databricks AI Research
12,439 просмотров • 3 месяцев назад
US-based K-Scale Labs launched pre-orders for its open-source humanoid,... show more

Brett Adcock
10,994 просмотров • 11 месяцев назад
1/ While most RL methods use shallow MLPs (~2–5... show more

Kevin Wang
154,516 просмотров • 1 год назад
Your value doesn't decrease based on someone's inability to... show more

Persephanii Aka Thick Yonce
380,674 просмотров • 2 лет назад
Why am I working on RL for LLMs, when... show more

Shane Gu
12,440 просмотров • 6 месяцев назад
Frontier research just crossed a new threshold. Mind Lab... show more

Chidanand Tripathi
89,813 просмотров • 6 месяцев назад
Thanks AK! Finally, robot can do continuous, agile, autonomous,... show more

Guanya Shi
32,142 просмотров • 1 год назад
Does LLM RL post-training need to be on-policy?

Kianté Brantley
113,263 просмотров • 3 месяцев назад
RL is back! But is it always the best... show more

Sebastian Risi
11,130 просмотров • 10 месяцев назад
X_Acc_Flags - for ios The value of ‘Location (accurate)’... show more

CrazyMind
19,392 просмотров • 6 месяцев назад
"I think just based on vehicle autonomy, we can... show more

DogeDesigner
37,893 просмотров • 11 месяцев назад
Crypto can’t scale without solving value transfer. Not just... show more

Kima Network
25,026 просмотров • 11 месяцев назад
Polygon 2.0 is a concrete vision to build the... show more

Polygon | POL
72,469 просмотров • 3 лет назад
This figure from HIL-SERL is one of the clearest... show more

Dominique Paul
24,399 просмотров • 3 месяцев назад
Some personal news: I recently joined Cursor. Cursor is... show more

Sasha Rush
335,873 просмотров • 1 год назад
Pay for what you create, not for seats. FLORA's... show more

FLORA ©
22,998 просмотров • 4 месяцев назад
Video diffusion models are just overqualified depth estimators! Deterministic... show more

Wildminder
49,235 просмотров • 2 месяцев назад
How does high-fidelity tactile simulation help robots nail the... show more

Binghao Huang
46,949 просмотров • 7 месяцев назад