正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

FP4 Explore, BF16 Train Diffusion Reinforcement Learning via Efficient Rollout Scaling paper:

AK

503,687 subscribers

12,778 次观看 • 3 个月前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion Paper: (not yet) Project: Code:

Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion Paper: (not yet) Project: Code:

MrNeRF

19,907 次观看 • 2 年前

Frontier models that use reasoning to "think" during inference are generating 5X more AI tokens per year. ✨ "Inference is now a thinking process. And in order to teach AI how to think, reinforcement learning and very significant computation was introduced into post-training," said NVIDIA CEO Jensen Huang in a recent keynote at #CES2026. Reinforcement learning is increasing computation demands across all AI scaling laws: pre-training, post-training, and test-time scaling. Learn more about reinforcement learning and AI scaling ➡️

Frontier models that use reasoning to "think" during inference are generating 5X more AI tokens per year. ✨ "Inference is now a thinking process. And in order to teach AI how to think, reinforcement learning and very significant computation was introduced into post-training," said NVIDIA CEO Jensen Huang in a recent keynote at #CES2026. Reinforcement learning is increasing computation demands across all AI scaling laws: pre-training, post-training, and test-time scaling. Learn more about reinforcement learning and AI scaling ➡️

NVIDIA AI Infrastructure

13,329 次观看 • 6 个月前

🤔Want a principled way to RL your diffusion model? Check Data-regularized Reinforcement Learning (DDRL)! Post-train NVIDIA #Cosmos World Foundation models with a million GPU hours! 🤯 Novel formulation ➡️ Theoretically integrates SFT into RL ➡️ Robust to Reward Hacking 🛑 Details: #DDRL #Diffusion #RL #NVIDIA #Cosmos

🤔Want a principled way to RL your diffusion model? Check Data-regularized Reinforcement Learning (DDRL)! Post-train NVIDIA #Cosmos World Foundation models with a million GPU hours! 🤯 Novel formulation ➡️ Theoretically integrates SFT into RL ➡️ Robust to Reward Hacking 🛑 Details: #DDRL #Diffusion #RL #NVIDIA #Cosmos

Haotian Ye

77,749 次观看 • 7 个月前

Train your avatars to interact with 3D scenes. We use adversarial imitation learning and reinforcement learning to train physically-simulated characters that perform scene interaction tasks in a natural and life-like manner. Today at #SIGGRAPH2023.

Train your avatars to interact with 3D scenes. We use adversarial imitation learning and reinforcement learning to train physically-simulated characters that perform scene interaction tasks in a natural and life-like manner. Today at #SIGGRAPH2023.

Michael Black

46,237 次观看 • 2 年前

Our CoRL 2024 paper shows Reinforcement Learning can allow robots to learn skills via real-world practice, without any demonstrations or simulation engineering. Rewards are provided using language/vision models, and mobility of robots enables autonomous exploration. 1/N

Our CoRL 2024 paper shows Reinforcement Learning can allow robots to learn skills via real-world practice, without any demonstrations or simulation engineering. Rewards are provided using language/vision models, and mobility of robots enables autonomous exploration. 1/N

Russell Mendonca

38,454 次观看 • 1 年前

MatAnyone 2 is out on Hugging Face Scaling Video Matting via a Learned Quality Evaluator paper: app:

MatAnyone 2 is out on Hugging Face Scaling Video Matting via a Learned Quality Evaluator paper: app:

AK

34,753 次观看 • 4 个月前

Watch Gemini 2.5 Pro implement a landmark Google DeepMind research paper. 🕹️ It codes the reinforcement learning algorithm, visualizes the training live and even debugs errors. ↓

Watch Gemini 2.5 Pro implement a landmark Google DeepMind research paper. 🕹️ It codes the reinforcement learning algorithm, visualizes the training live and even debugs errors. ↓

Google DeepMind

369,902 次观看 • 1 年前

This is a single uncut video, showing a robot learning several tasks instantly, after just one demonstration each ... This is possible because we've now been able to achieve in-context learning for everyday robotics tasks, and I'm very excited to announce our latest paper: 🎆 Instant Policy: In-Context Imitation Learning via Graph Diffusion 🎆 (1/6) 🧵👇

This is a single uncut video, showing a robot learning several tasks instantly, after just one demonstration each ... This is possible because we've now been able to achieve in-context learning for everyday robotics tasks, and I'm very excited to announce our latest paper: 🎆 Instant Policy: In-Context Imitation Learning via Graph Diffusion 🎆 (1/6) 🧵👇

Edward Johns

74,683 次观看 • 1 年前

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

Heni Ben Amor

51,248 次观看 • 7 个月前

notes on my reinforcement learning for rocket league side quest so far: i’m in awe that — we live in a time where i can train a reinforcement learning model that is learning to play rocket league and at the same time, test community made rocket league models. where, both processes can run in parallel & locally on my hardware. in the game window — i’m playing against a reinforcement learning ppo model that’s running locally. and in the vscode window — i’m training my own reinforcement learning model (just a dummy example script for now). also, the rocket league botting ecosystem has incredible tooling where you can test out different reinforcement learning ideas against other algorithms + it has great documentation for linux and windows. it’s honestly a beautiful hidden gem of a rabbit hole :)

notes on my reinforcement learning for rocket league side quest so far: i’m in awe that — we live in a time where i can train a reinforcement learning model that is learning to play rocket league and at the same time, test community made rocket league models. where, both processes can run in parallel & locally on my hardware. in the game window — i’m playing against a reinforcement learning ppo model that’s running locally. and in the vscode window — i’m training my own reinforcement learning model (just a dummy example script for now). also, the rocket league botting ecosystem has incredible tooling where you can test out different reinforcement learning ideas against other algorithms + it has great documentation for linux and windows. it’s honestly a beautiful hidden gem of a rabbit hole :)

naklecha

46,302 次观看 • 1 年前

Daniel Kokotajlo says companies are making AI more agentic by scaling RL (reinforcement learning) and connecting agents to various tools. by early 2027, these agents will be fully autonomous, superhuman coders able to replace human programmers.

Daniel Kokotajlo says companies are making AI more agentic by scaling RL (reinforcement learning) and connecting agents to various tools. by early 2027, these agents will be fully autonomous, superhuman coders able to replace human programmers.

Haider.

11,572 次观看 • 1 年前

We open-sourced QeRL — Quantization-enhanced Reinforcement Learning ! 🧠 4-bit quantized RL training 💪 Train a 32B LLM on a single H100 GPU ⚙️ 1.7× faster overall training 🎯 Accuracy on par with bfloat16-level accuracy 🔥 Supports NVFP4 quantization format Moreover, we show that quantization helps exploration in RL training. Paper: Code: #NVIDIA #AIResearch #ReinforcementLearning #Quantization #LLM #EfficientAI

We open-sourced QeRL — Quantization-enhanced Reinforcement Learning ! 🧠 4-bit quantized RL training 💪 Train a 32B LLM on a single H100 GPU ⚙️ 1.7× faster overall training 🎯 Accuracy on par with bfloat16-level accuracy 🔥 Supports NVFP4 quantization format Moreover, we show that quantization helps exploration in RL training. Paper: Code: #NVIDIA #AIResearch #ReinforcementLearning #Quantization #LLM #EfficientAI

Yukang Chen

69,747 次观看 • 9 个月前

Haven't been to a conference in a while, really excited to be at #NeurIPS2024! I'll be helping present 4 of our group's recent papers: 1. Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL 2. Distributional Successor Features Enable Zero-Shot Policy Optimization 3. Learning to Cooperate with Humans using Generative Agents 4. Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Find more details on each paper and where to find us in this thread (1/6)

Haven't been to a conference in a while, really excited to be at #NeurIPS2024! I'll be helping present 4 of our group's recent papers: 1. Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL 2. Distributional Successor Features Enable Zero-Shot Policy Optimization 3. Learning to Cooperate with Humans using Generative Agents 4. Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Find more details on each paper and where to find us in this thread (1/6)

Abhishek Gupta

10,803 次观看 • 1 年前

How do AI agents move from simple actions to true reasoning? In Episode 6 of Making a Mind, Dr. Danielle Perszyk and Amazon AI researcher Meiqi Sun explore how reinforcement learning helps AI handle more complex, multistep challenges.

How do AI agents move from simple actions to true reasoning? In Episode 6 of Making a Mind, Dr. Danielle Perszyk and Amazon AI researcher Meiqi Sun explore how reinforcement learning helps AI handle more complex, multistep challenges.

Amazon News

25,876 次观看 • 4 个月前

Scaling laws in deep RL? Turns out that batch size, learning rate, and UTD (update-to-data) for getting the most efficient and scalable deep RL has predictable relationships. Checkout the analysis in new work by Oleg Rybkin & collaborators:

Scaling laws in deep RL? Turns out that batch size, learning rate, and UTD (update-to-data) for getting the most efficient and scalable deep RL has predictable relationships. Checkout the analysis in new work by Oleg Rybkin & collaborators:

Sergey Levine

43,464 次观看 • 1 年前

New short course on Reinforcement Learning from Human Feedback, built in collaboration with @GoogleCloud! In this course, you’ll explore this key technique used to align LLMs with human values and make them more helpful, honest, and safe. Join now:

New short course on Reinforcement Learning from Human Feedback, built in collaboration with @GoogleCloud! In this course, you’ll explore this key technique used to align LLMs with human values and make them more helpful, honest, and safe. Join now:

DeepLearning.AI

19,192 次观看 • 2 年前

Our project on Reinforcement Learning for versatile exoskeleton controllers is published in Nature. I hope there is more AI assistive technology to augment humans in the physical world. The project is led by NCSU Hao Su's group and the paper is

Our project on Reinforcement Learning for versatile exoskeleton controllers is published in Nature. I hope there is more AI assistive technology to augment humans in the physical world. The project is led by NCSU Hao Su's group and the paper is

Bolei Zhou

16,795 次观看 • 2 年前

🚀 Introducing Emu3.5 — a large-scale multimodal world model that natively predicts the next vision-language state. 🔥 Trained on over 10T interleaved vision-language tokens and enhanced with reinforcement learning, Emu3.5 achieves powerful multimodal reasoning and generation. ⚡ Powered by our new Discrete Diffusion Adaptation (DiDA) for 20× faster inference. 🔥 Emu3.5 outperforms Nano Banana across image generation, editing, interleaved tasks and more. 🌍 Explore Emu3.5: Github: #Emu3 #MultimodalAI #WorldModel #NextTokenPrediction

🚀 Introducing Emu3.5 — a large-scale multimodal world model that natively predicts the next vision-language state. 🔥 Trained on over 10T interleaved vision-language tokens and enhanced with reinforcement learning, Emu3.5 achieves powerful multimodal reasoning and generation. ⚡ Powered by our new Discrete Diffusion Adaptation (DiDA) for 20× faster inference. 🔥 Emu3.5 outperforms Nano Banana across image generation, editing, interleaved tasks and more. 🌍 Explore Emu3.5: Github: #Emu3 #MultimodalAI #WorldModel #NextTokenPrediction

BAAI

51,880 次观看 • 8 个月前

Robot policies must be both reliable and highly capable to be useful; the best way to achieve this level of performance is with reinforcement learning. However, for reinforcement learning you are usually stuck between two difficult options: reinforcement in the real world is often risky and expensive, while reinforcement learning in a traditional simulator takes a lot of engineering work and has a persistent sim-to-real gap. What if instead you could train your robot purely in a world model? RISE by Jiazhi Yang et al. uses a compositional world model to predict the future and evaluate progress. This allows for a self-improving pipeline, which learns a world model from real data and then learns how the robot should perform different tasks. This pipeline results in a data-driven way to improve policy performance from real data but without real-world reinforcement learning. Watch Episode #86 of RoboPapers, with Chris Paxton and Jiafei Duan, to learn more!

Robot policies must be both reliable and highly capable to be useful; the best way to achieve this level of performance is with reinforcement learning. However, for reinforcement learning you are usually stuck between two difficult options: reinforcement in the real world is often risky and expensive, while reinforcement learning in a traditional simulator takes a lot of engineering work and has a persistent sim-to-real gap. What if instead you could train your robot purely in a world model? RISE by Jiazhi Yang et al. uses a compositional world model to predict the future and evaluate progress. This allows for a self-improving pipeline, which learns a world model from real data and then learns how the robot should perform different tasks. This pipeline results in a data-driven way to improve policy performance from real data but without real-world reinforcement learning. Watch Episode #86 of RoboPapers, with Chris Paxton and Jiafei Duan, to learn more!

RoboPapers

38,334 次观看 • 1 个月前