Does off-policy value-based RL scale? In LLMs, larger scale...

Oleg Rybkin's profile picture

Oleg Rybkin

23,968 просмотров • 1 год назад

🚨Current scalable RL algos train a policy w/o value...

Aviral Kumar's profile picture

Aviral Kumar

37,284 просмотров • 1 год назад

Introducing CQN: Coarse-to-fine Q-Network, a value-based RL algorithm for...

Younggyo Seo's profile picture

Younggyo Seo

16,413 просмотров • 1 год назад

New research from Databricks: LLMs Can Learn to Reason...

Databricks AI Research's profile picture

Databricks AI Research

12,439 просмотров • 3 месяцев назад

US-based K-Scale Labs launched pre-orders for its open-source humanoid,...

Brett Adcock's profile picture

Brett Adcock

10,994 просмотров • 11 месяцев назад

1/ While most RL methods use shallow MLPs (~2–5...

Kevin Wang's profile picture

Kevin Wang

154,516 просмотров • 1 год назад

Your value doesn't decrease based on someone's inability to...

Persephanii Aka Thick Yonce's profile picture

Persephanii Aka Thick Yonce

380,674 просмотров • 2 лет назад

Why am I working on RL for LLMs, when...

Shane Gu's profile picture

Shane Gu

12,440 просмотров • 6 месяцев назад

Frontier research just crossed a new threshold. Mind Lab...

Chidanand Tripathi's profile picture

Chidanand Tripathi

89,813 просмотров • 6 месяцев назад

Thanks AK! Finally, robot can do continuous, agile, autonomous,...

Guanya Shi's profile picture

Guanya Shi

32,142 просмотров • 1 год назад

Does LLM RL post-training need to be on-policy?

Kianté Brantley's profile picture

Kianté Brantley

113,263 просмотров • 3 месяцев назад

RL is back! But is it always the best...

Sebastian Risi's profile picture

Sebastian Risi

11,130 просмотров • 10 месяцев назад

X_Acc_Flags - for ios The value of ‘Location (accurate)’...

 CrazyMind's profile picture

 CrazyMind

19,392 просмотров • 6 месяцев назад

"I think just based on vehicle autonomy, we can...

DogeDesigner's profile picture

DogeDesigner

37,893 просмотров • 11 месяцев назад

Crypto can’t scale without solving value transfer. Not just...

Kima Network's profile picture

Kima Network

25,026 просмотров • 11 месяцев назад

Polygon 2.0 is a concrete vision to build the...

Polygon | POL's profile picture

Polygon | POL

72,469 просмотров • 3 лет назад

This figure from HIL-SERL is one of the clearest...

Dominique Paul's profile picture

Dominique Paul

24,399 просмотров • 3 месяцев назад

Some personal news: I recently joined Cursor. Cursor is...

Sasha Rush's profile picture

Sasha Rush

335,873 просмотров • 1 год назад

Pay for what you create, not for seats. FLORA's...

FLORA ©'s profile picture

FLORA ©

22,998 просмотров • 4 месяцев назад

Video diffusion models are just overqualified depth estimators! Deterministic...

Wildminder's profile picture

Wildminder

49,235 просмотров • 2 месяцев назад

How does high-fidelity tactile simulation help robots nail the...

Binghao Huang's profile picture

Binghao Huang

46,949 просмотров • 7 месяцев назад