Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

If you have a policy that uses diffusion/flow (e.g. diffusion VLA), you can run RL where the actor chooses the noise, which is then denoised by the policy to produce an action. This method, which we call diffusion steering (DSRL), leads to a remarkably efficient RL method! 🧵👇

Sergey Levine

130,062 subscribers

152,824 Aufrufe • vor 1 Jahr •via X (Twitter)

Wissenschaft & Technologie Bildung

Anya Rossi• Live Now

Private livecam show

9 Kommentare

Profilbild von Sergey Levine

Sergey Levinevor 1 Jahr

DSRL trains an actor and Q-function, treating the diffusion noise as the action space. Because samples from the noise prior map to reasonable actions for the policy, DSRL essentially explores "inside" the set of reasonable pre-trained behaviors, making it extremely efficient.

Profilbild von Sergey Levine

Sergey Levinevor 1 Jahr

DSRL learns essentially in real time, with good results in as little as 50 trials (it's so efficient that a person can literally sit in front of the robot and push a button to assign sparse rewards).

Profilbild von Sergey Levine

Sergey Levinevor 1 Jahr

This was a really fun collaboration led by @ajwagenmaker Project website with paper: To find out more, check out his thread here:

Profilbild von ahad

ahadvor 1 Jahr

would a supervised learning version of this work? where the noise distribution is a parmeter that is also optimized along with policy weights

Profilbild von ahad

ahadvor 1 Jahr

how long would it take to get that first sparse reward with this method?

Profilbild von Himanshu Kumar

Himanshu Kumarvor 1 Jahr

Controlling the noise instead of the action itself is a surprisingly effective approach.

Profilbild von Andres Franco

Andres Francovor 1 Jahr

This is pretty amazing, and the visualization made everything so easy to understand😅

Profilbild von Ran Cheng

Ran Chengvor 1 Jahr

Will making the initial noise distribution a learnable parameter reduce randomness and thus make the model more prone to overfitting?

Profilbild von Joanne Mercado

Joanne Mercadovor 1 Jahr

😅🥹

Ähnliche Videos

So you’ve trained your favorite diffusion/flow based policy, but it’s just not good enough 0-shot. Worry not, in our new work DSRL - we show how to *steer* pre-trained diffusion policies with off-policy RL, improving behavior efficiently enough for direct training in the real world! DSRL retains nice exploration from the base policy, but allows for quick improvement beyond this base policy with RL. The method is frustratingly simple, and super easy to throw on top of your favorite pretrained policy (VLA/diffusion policy, etc). Let’s think about how it works, 🧵 (1/10)

So you’ve trained your favorite diffusion/flow based policy, but it’s just not good enough 0-shot. Worry not, in our new work DSRL - we show how to steer pre-trained diffusion policies with off-policy RL, improving behavior efficiently enough for direct training in the real world! DSRL retains nice exploration from the base policy, but allows for quick improvement beyond this base policy with RL. The method is frustratingly simple, and super easy to throw on top of your favorite pretrained policy (VLA/diffusion policy, etc). Let’s think about how it works, 🧵 (1/10)

Abhishek Gupta

19,035 Aufrufe • vor 1 Jahr

Flow reversal steering allows "steering" diffusion-based VLAs with high-level actions, for example from VLM reasoning. This also lets us run RL in the diffusion noise space with exploration guided by high-level reasoning: think through a task, then practice it! 👇

Flow reversal steering allows "steering" diffusion-based VLAs with high-level actions, for example from VLM reasoning. This also lets us run RL in the diffusion noise space with exploration guided by high-level reasoning: think through a task, then practice it! 👇

Sergey Levine

73,360 Aufrufe • vor 20 Tagen

👇Introducing DPPO, Diffusion Policy Policy Optimization DPPO optimizes pre-trained Diffusion Policy using policy gradient from RL, showing 𝘀𝘂𝗿𝗽𝗿𝗶𝘀𝗶𝗻𝗴 𝗶𝗺𝗽𝗿𝗼𝘃𝗲𝗺𝗲𝗻𝘁𝘀 over a variety of baselines across benchmarks and sim2real transfer

👇Introducing DPPO, Diffusion Policy Policy Optimization DPPO optimizes pre-trained Diffusion Policy using policy gradient from RL, showing 𝘀𝘂𝗿𝗽𝗿𝗶𝘀𝗶𝗻𝗴 𝗶𝗺𝗽𝗿𝗼𝘃𝗲𝗺𝗲𝗻𝘁𝘀 over a variety of baselines across benchmarks and sim2real transfer

Allen Ren

78,227 Aufrufe • vor 1 Jahr

AccVideo just dropped on Hugging Face Accelerating Video Diffusion Model with Synthetic Dataset present a efficient distillation method to accelerate video diffusion models with synthetic dataset method is 8.5x faster than HunyuanVideo

AccVideo just dropped on Hugging Face Accelerating Video Diffusion Model with Synthetic Dataset present a efficient distillation method to accelerate video diffusion models with synthetic dataset method is 8.5x faster than HunyuanVideo

AK

20,633 Aufrufe • vor 1 Jahr

Google presents VLOGGER Multimodal Diffusion for Embodied Avatar Synthesis We propose VLOGGER, a method for audio-driven human video generation from a single input image of a person, which builds on the success of recent generative diffusion models. Our method consists of

AK

66,375 Aufrufe • vor 2 Jahren

🎉 Diffusion-style annealing + sampling-based MPC can surpass RL, and seamlessly adapt to task parameters, all 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴-𝗳𝗿𝗲𝗲! We open sourced DIAL-MPC, the first training-free method for whole-body torque control using full-order dynamics 🧵

🎉 Diffusion-style annealing + sampling-based MPC can surpass RL, and seamlessly adapt to task parameters, all 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴-𝗳𝗿𝗲𝗲! We open sourced DIAL-MPC, the first training-free method for whole-body torque control using full-order dynamics 🧵

Haoru Xue

172,483 Aufrufe • vor 1 Jahr

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

Physical Intelligence

431,436 Aufrufe • vor 3 Monaten

🤔Want a principled way to RL your diffusion model? Check Data-regularized Reinforcement Learning (DDRL)! Post-train NVIDIA #Cosmos World Foundation models with a million GPU hours! 🤯 Novel formulation ➡️ Theoretically integrates SFT into RL ➡️ Robust to Reward Hacking 🛑 Details: #DDRL #Diffusion #RL #NVIDIA #Cosmos

🤔Want a principled way to RL your diffusion model? Check Data-regularized Reinforcement Learning (DDRL)! Post-train NVIDIA #Cosmos World Foundation models with a million GPU hours! 🤯 Novel formulation ➡️ Theoretically integrates SFT into RL ➡️ Robust to Reward Hacking 🛑 Details: #DDRL #Diffusion #RL #NVIDIA #Cosmos

Haotian Ye

77,626 Aufrufe • vor 6 Monaten

We release Cosmos Policy 💫: a state-of-the-art robot policy built on a video diffusion model backbone. - policy + world model + value function — in 1 model - no architectural changes to the base video model - SOTA in LIBERO (98.5%), RoboCasa (67.1%), & ALOHA tasks (93.6%) 🧵👇

We release Cosmos Policy 💫: a state-of-the-art robot policy built on a video diffusion model backbone. - policy + world model + value function — in 1 model - no architectural changes to the base video model - SOTA in LIBERO (98.5%), RoboCasa (67.1%), & ALOHA tasks (93.6%) 🧵👇

Moo Jin Kim

149,014 Aufrufe • vor 5 Monaten

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models paper page: github: Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse images. However, these models still encounter difficulties when generating images from prompts that demand spatial or common sense reasoning. We propose to equip diffusion models with enhanced reasoning capabilities by using off-the-shelf pretrained large language models (LLMs) in a novel two-stage generation process. First, we adapt an LLM to be a text-guided layout generator through in-context learning. When provided with an image prompt, an LLM outputs a scene layout in the form of bounding boxes along with corresponding individual descriptions. Second, we steer a diffusion model with a novel controller to generate images conditioned on the layout. Both stages utilize frozen pretrained models without any LLM or diffusion model parameter optimization. We validate the superiority of our design by demonstrating its ability to outperform the base diffusion model in accurately generating images according to prompts that necessitate both language and spatial reasoning. Additionally, our method naturally allows dialog-based scene specification and is able to handle prompts in a language that is not well-supported by the underlying diffusion model.

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models paper page: github: Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse images. However, these models still encounter difficulties when generating images from prompts that demand spatial or common sense reasoning. We propose to equip diffusion models with enhanced reasoning capabilities by using off-the-shelf pretrained large language models (LLMs) in a novel two-stage generation process. First, we adapt an LLM to be a text-guided layout generator through in-context learning. When provided with an image prompt, an LLM outputs a scene layout in the form of bounding boxes along with corresponding individual descriptions. Second, we steer a diffusion model with a novel controller to generate images conditioned on the layout. Both stages utilize frozen pretrained models without any LLM or diffusion model parameter optimization. We validate the superiority of our design by demonstrating its ability to outperform the base diffusion model in accurately generating images according to prompts that necessitate both language and spatial reasoning. Additionally, our method naturally allows dialog-based scene specification and is able to handle prompts in a language that is not well-supported by the underlying diffusion model.

AK

83,657 Aufrufe • vor 2 Jahren

I'm thrilled to announce the launch of ⚡️Flash Diffusion from Jasper! Earlier this year, with our acquisition of Clipdrop, we launched the Jasper AI Research Lab in Paris. Today, we are excited to release our first piece of groundbreaking research: the open-source distillation method, "Flash Diffusion". Flash Diffusion accelerates inference by 500%, reduces computing costs, and produces higher-quality image outputs. Dive into the details and discover how Flash Diffusion is set to revolutionize the field of AI and image synthesis. Read all about it here: Try a demo on Hugging Face:

I'm thrilled to announce the launch of ⚡️Flash Diffusion from Jasper! Earlier this year, with our acquisition of Clipdrop, we launched the Jasper AI Research Lab in Paris. Today, we are excited to release our first piece of groundbreaking research: the open-source distillation method, "Flash Diffusion". Flash Diffusion accelerates inference by 500%, reduces computing costs, and produces higher-quality image outputs. Dive into the details and discover how Flash Diffusion is set to revolutionize the field of AI and image synthesis. Read all about it here: Try a demo on Hugging Face:

Timothy Young

10,091 Aufrufe • vor 2 Jahren

Dreamix: Video Diffusion Models are General Video Editors abs: project page: present diffusion-based method that is able to perform text-based motion and appearance editing of general videos

Dreamix: Video Diffusion Models are General Video Editors abs: project page: present diffusion-based method that is able to perform text-based motion and appearance editing of general videos

AK

398,160 Aufrufe • vor 3 Jahren

V3D Video Diffusion Models are Effective 3D Generators Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent advancements in video diffusion models, we introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation. To fully unleash the potential of video diffusion to perceive the 3D world, we further introduce geometrical consistency prior and extend the video diffusion model to a multi-view consistent 3D generator. Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image. With our tailored reconstruction pipelines, we can generate high-quality meshes or 3D Gaussians within 3 minutes. Furthermore, our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views. Extensive experiments demonstrate the superior performance of the proposed approach, especially in terms of generation quality and multi-view consistency

V3D Video Diffusion Models are Effective 3D Generators Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent advancements in video diffusion models, we introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation. To fully unleash the potential of video diffusion to perceive the 3D world, we further introduce geometrical consistency prior and extend the video diffusion model to a multi-view consistent 3D generator. Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image. With our tailored reconstruction pipelines, we can generate high-quality meshes or 3D Gaussians within 3 minutes. Furthermore, our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views. Extensive experiments demonstrate the superior performance of the proposed approach, especially in terms of generation quality and multi-view consistency

AK

31,997 Aufrufe • vor 2 Jahren

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

Heni Ben Amor

51,248 Aufrufe • vor 7 Monaten

Diffusion has shown great promise for generating robot **actions**, can it act as a **world model** to generate the future conditioned on actions? In our work led by han qi Haocheng Yin and in collaboration with Yilun Du, we show a **controllable** action-conditioned video diffusion model can produce photorealistic and (near) physics-accurate future predictions. This ability strengthens the policy via: - ranking different action proposals and selecting the best, or - **visual** trajectory optimization by optimizing the action proposals using gradient ascent. Learn more about Generative Predictive Control (GPC) at:

Diffusion has shown great promise for generating robot actions, can it act as a world model to generate the future conditioned on actions? In our work led by han qi Haocheng Yin and in collaboration with Yilun Du, we show a controllable action-conditioned video diffusion model can produce photorealistic and (near) physics-accurate future predictions. This ability strengthens the policy via: - ranking different action proposals and selecting the best, or - visual trajectory optimization by optimizing the action proposals using gradient ascent. Learn more about Generative Predictive Control (GPC) at:

Heng Yang

38,390 Aufrufe • vor 1 Jahr

🇺🇸DAVID SACKS: TRUMP REMOVED BIDEN’S DIFFUSION RULE TO UNLEASH AMERICAN TECH “ That's been the opposite of the approach in Washington. The Trump administration just announced that we'd be rescinding what's known as the Biden Diffusion Rule, which was a rule that came out in January. The, the Biden diffusion rule came out in, in January, and it literally restricted the, the diffusion, which or proliferation of American technology all over the world.” Source: David Sacks

🇺🇸DAVID SACKS: TRUMP REMOVED BIDEN’S DIFFUSION RULE TO UNLEASH AMERICAN TECH “ That's been the opposite of the approach in Washington. The Trump administration just announced that we'd be rescinding what's known as the Biden Diffusion Rule, which was a rule that came out in January. The, the Biden diffusion rule came out in, in January, and it literally restricted the, the diffusion, which or proliferation of American technology all over the world.” Source: David Sacks

Mario Nawfal

87,440 Aufrufe • vor 1 Jahr

Presenting DemoDiffusion: An extremely simple approach enabling a pre-trained 'generalist' diffusion policy to follow a human-demonstration for a novel task during inference One-shot human imitation *without* requiring any paired human-robot data or online RL 🙂 1/n

Presenting DemoDiffusion: An extremely simple approach enabling a pre-trained 'generalist' diffusion policy to follow a human-demonstration for a novel task during inference One-shot human imitation without requiring any paired human-robot data or online RL 🙂 1/n

Homanga Bharadhwaj

32,830 Aufrufe • vor 1 Jahr

Tactile Diffusion generates synthetic tactile images from sim data, capturing the complex illumination of the gel deformation. This research from UW & Meta AI is the first method using diffusion to close the sim2real gap for vision-based tactile sensing. Read the paper ⬇️

Tactile Diffusion generates synthetic tactile images from sim data, capturing the complex illumination of the gel deformation. This research from UW & Meta AI is the first method using diffusion to close the sim2real gap for vision-based tactile sensing. Read the paper ⬇️

AI at Meta

100,156 Aufrufe • vor 3 Jahren

Nvidia presents Edify 3D! The method can generate high-quality 3D assets from text descriptions. It uses a diffusion model to create detailed quad-mesh topologies and high-resolution textures in under 2 minutes.

Nvidia presents Edify 3D! The method can generate high-quality 3D assets from text descriptions. It uses a diffusion model to create detailed quad-mesh topologies and high-resolution textures in under 2 minutes.

Dreaming Tulpa 🥓👑

39,517 Aufrufe • vor 1 Jahr