Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

FP4 Explore, BF16 Train Diffusion Reinforcement Learning via Efficient Rollout Scaling paper:

AK

499,672 subscribers

12,761 Aufrufe • vor 1 Monat •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Frontier models that use reasoning to "think" during inference are generating 5X more AI tokens per year. ✨ "Inference is now a thinking process. And in order to teach AI how to think, reinforcement learning and very significant computation was introduced into post-training," said NVIDIA CEO Jensen Huang in a recent keynote at #CES2026. Reinforcement learning is increasing computation demands across all AI scaling laws: pre-training, post-training, and test-time scaling. Learn more about reinforcement learning and AI scaling ➡️

Frontier models that use reasoning to "think" during inference are generating 5X more AI tokens per year. ✨ "Inference is now a thinking process. And in order to teach AI how to think, reinforcement learning and very significant computation was introduced into post-training," said NVIDIA CEO Jensen Huang in a recent keynote at #CES2026. Reinforcement learning is increasing computation demands across all AI scaling laws: pre-training, post-training, and test-time scaling. Learn more about reinforcement learning and AI scaling ➡️

NVIDIA Data Center

13,159 Aufrufe • vor 4 Monaten

🤔Want a principled way to RL your diffusion model? Check Data-regularized Reinforcement Learning (DDRL)! Post-train NVIDIA #Cosmos World Foundation models with a million GPU hours! 🤯 Novel formulation ➡️ Theoretically integrates SFT into RL ➡️ Robust to Reward Hacking 🛑 Details: #DDRL #Diffusion #RL #NVIDIA #Cosmos

🤔Want a principled way to RL your diffusion model? Check Data-regularized Reinforcement Learning (DDRL)! Post-train NVIDIA #Cosmos World Foundation models with a million GPU hours! 🤯 Novel formulation ➡️ Theoretically integrates SFT into RL ➡️ Robust to Reward Hacking 🛑 Details: #DDRL #Diffusion #RL #NVIDIA #Cosmos

Haotian Ye

77,429 Aufrufe • vor 6 Monaten

Reinforcement learning

Reinforcement learning

Massimo

3,407,828 Aufrufe • vor 2 Jahren

ManipTrans is out on Hugging Face Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning

ManipTrans is out on Hugging Face Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning

AK

22,136 Aufrufe • vor 1 Jahr

Redwood AI | Mobility Reinforcement Learning

Redwood AI | Mobility Reinforcement Learning

1X

1,317,618 Aufrufe • vor 1 Jahr

Our CoRL 2024 paper shows Reinforcement Learning can allow robots to learn skills via real-world practice, without any demonstrations or simulation engineering. Rewards are provided using language/vision models, and mobility of robots enables autonomous exploration. 1/N

Our CoRL 2024 paper shows Reinforcement Learning can allow robots to learn skills via real-world practice, without any demonstrations or simulation engineering. Rewards are provided using language/vision models, and mobility of robots enables autonomous exploration. 1/N

Russell Mendonca

38,454 Aufrufe • vor 1 Jahr

MatAnyone 2 is out on Hugging Face Scaling Video Matting via a Learned Quality Evaluator paper: app:

MatAnyone 2 is out on Hugging Face Scaling Video Matting via a Learned Quality Evaluator paper: app:

AK

34,753 Aufrufe • vor 2 Monaten

An interesting example of reinforcement learning.

An interesting example of reinforcement learning.

Historic Vids

1,563,912 Aufrufe • vor 3 Jahren

Watch Gemini 2.5 Pro implement a landmark Google DeepMind research paper. 🕹️ It codes the reinforcement learning algorithm, visualizes the training live and even debugs errors. ↓

Watch Gemini 2.5 Pro implement a landmark Google DeepMind research paper. 🕹️ It codes the reinforcement learning algorithm, visualizes the training live and even debugs errors. ↓

Google DeepMind

369,872 Aufrufe • vor 1 Jahr

An Interesting example of Reinforcement learning!! 🐓

An Interesting example of Reinforcement learning!! 🐓

Aether

7,058,785 Aufrufe • vor 2 Jahren

Nvidia presents Fast-dLLM v2 Efficient Block-Diffusion LLM

Nvidia presents Fast-dLLM v2 Efficient Block-Diffusion LLM

AK

18,753 Aufrufe • vor 8 Monaten

This is a single uncut video, showing a robot learning several tasks instantly, after just one demonstration each ... This is possible because we've now been able to achieve in-context learning for everyday robotics tasks, and I'm very excited to announce our latest paper: 🎆 Instant Policy: In-Context Imitation Learning via Graph Diffusion 🎆 (1/6) 🧵👇

This is a single uncut video, showing a robot learning several tasks instantly, after just one demonstration each ... This is possible because we've now been able to achieve in-context learning for everyday robotics tasks, and I'm very excited to announce our latest paper: 🎆 Instant Policy: In-Context Imitation Learning via Graph Diffusion 🎆 (1/6) 🧵👇

Edward Johns

74,663 Aufrufe • vor 1 Jahr

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

Heni Ben Amor

51,172 Aufrufe • vor 6 Monaten

Getting Cyn ready for reinforcement learning >:3

Getting Cyn ready for reinforcement learning >:3

DT

14,256 Aufrufe • vor 7 Monaten

notes on my reinforcement learning for rocket league side quest so far: i’m in awe that — we live in a time where i can train a reinforcement learning model that is learning to play rocket league and at the same time, test community made rocket league models. where, both processes can run in parallel & locally on my hardware. in the game window — i’m playing against a reinforcement learning ppo model that’s running locally. and in the vscode window — i’m training my own reinforcement learning model (just a dummy example script for now). also, the rocket league botting ecosystem has incredible tooling where you can test out different reinforcement learning ideas against other algorithms + it has great documentation for linux and windows. it’s honestly a beautiful hidden gem of a rabbit hole :)

notes on my reinforcement learning for rocket league side quest so far: i’m in awe that — we live in a time where i can train a reinforcement learning model that is learning to play rocket league and at the same time, test community made rocket league models. where, both processes can run in parallel & locally on my hardware. in the game window — i’m playing against a reinforcement learning ppo model that’s running locally. and in the vscode window — i’m training my own reinforcement learning model (just a dummy example script for now). also, the rocket league botting ecosystem has incredible tooling where you can test out different reinforcement learning ideas against other algorithms + it has great documentation for linux and windows. it’s honestly a beautiful hidden gem of a rabbit hole :)

naklecha

46,302 Aufrufe • vor 1 Jahr

Baby steps towards physically based animation with reinforcement learning...

Baby steps towards physically based animation with reinforcement learning...

Dennis Gustafsson

32,132 Aufrufe • vor 1 Jahr

(1/5) FP4 hardware is here, but 4-bit attention still kills model quality, blocking true end-to-end FP4 serving. To fix that, we propose Attn-QAT, the first systematic study of quantization-aware training for attention. The result: FP4 attention quality is comparable to BF16 attention with 1.1x–1.5x higher throughput than SageAttention3 on an RTX 5090 and 1.39x speedup over FlashAttention-4 on a B200. Blog: Code: Checkpoints:

(1/5) FP4 hardware is here, but 4-bit attention still kills model quality, blocking true end-to-end FP4 serving. To fix that, we propose Attn-QAT, the first systematic study of quantization-aware training for attention. The result: FP4 attention quality is comparable to BF16 attention with 1.1x–1.5x higher throughput than SageAttention3 on an RTX 5090 and 1.39x speedup over FlashAttention-4 on a B200. Blog: Code: Checkpoints:

Hao AI Lab

37,175 Aufrufe • vor 1 Monat

RL-100 Performant Robotic Manipulation with Real-World Reinforcement Learning

RL-100 Performant Robotic Manipulation with Real-World Reinforcement Learning

AK

15,364 Aufrufe • vor 7 Monaten

Daniel Kokotajlo says companies are making AI more agentic by scaling RL (reinforcement learning) and connecting agents to various tools. by early 2027, these agents will be fully autonomous, superhuman coders able to replace human programmers.

Daniel Kokotajlo says companies are making AI more agentic by scaling RL (reinforcement learning) and connecting agents to various tools. by early 2027, these agents will be fully autonomous, superhuman coders able to replace human programmers.

Haider.

11,559 Aufrufe • vor 1 Jahr

Making paper airplanes with efficient flight

Making paper airplanes with efficient flight

Today I Learned

241,218 Aufrufe • vor 1 Jahr