Abhishek Gupta's banner

Abhishek Gupta

@abhishekunique7 • 11,579 subscribers

Assistant Professor at University of Washington. I like robots, and reinforcement learning. Previously: post-doc at MIT, PhD at Berkeley

Shorts

Here’s a pretty weird and surprising result - retrieval-augmented generation works unreasonably well for robot learning – but only when parameterized using difference vectors! We introduce Difference-Aware Retrieval Policies for Imitation Learning (DARP), a simple, semi-parametric RAG architecture for imitation learning that achieves gains of up to 200% over standard behavior cloning. No additional assumptions beyond BC, just a little architecture switch! The theory backing it up is pretty cool too and it works on real robots! :) Play with our website to understand better: 🧵(1/7)

Here’s a pretty weird and surprising result - retrieval-augmented generation works unreasonably well for robot learning – but only when parameterized using difference vectors! We introduce Difference-Aware Retrieval Policies for Imitation Learning (DARP), a simple, semi-parametric RAG architecture for imitation learning that achieves gains of up to 200% over standard behavior cloning. No additional assumptions beyond BC, just a little architecture switch! The theory backing it up is pretty cool too and it works on real robots! :) Play with our website to understand better: 🧵(1/7)

21,078 views

So I hear that behavior cloning is all the rage now. What if we could do better, but with the same data? :) In CCIL, we show that imitation via BC is improved by synthesizing corrective labels to account for compounding error, without interactive oracles. Lets you do 👇! 🧵(1/9)

So I hear that behavior cloning is all the rage now. What if we could do better, but with the same data? :) In CCIL, we show that imitation via BC is improved by synthesizing corrective labels to account for compounding error, without interactive oracles. Lets you do 👇! 🧵(1/9)

53,905 views

Constructing interactive simulated worlds has been a challenging problem, requiring considerable manual effort for asset creation and articulation, and composing assets to form full scenes. In our new work - DRAWER, we made the process of creating scenes in simulation as simple as taking a video of the scene and out comes a high-quality, fully interactive environment in simulation. No human simulation designer involved! A 🧵(1/7)

Constructing interactive simulated worlds has been a challenging problem, requiring considerable manual effort for asset creation and articulation, and composing assets to form full scenes. In our new work - DRAWER, we made the process of creating scenes in simulation as simple as taking a video of the scene and out comes a high-quality, fully interactive environment in simulation. No human simulation designer involved! A 🧵(1/7)

12,072 views

World modeling and imitation learning have largely been considered two disparate worlds. In our recent work, Unified World Models, just accepted to #RSS2025, Chuning Zhu provides a dead-simple unifying solution: just train a joint diffusion model over actions and future states, but with *decoupled* diffusion time steps across these modalities. Manipulating these decoupled time steps then allows for marginalization or conditioning on actions or states; a single model can serve as a policy, forward dynamics model, video prediction model, or inverse dynamics model by simply setting diffusion timesteps carefully. The resulting model can leverage video datasets along with robot training data much more effectively, and shows improved robustness, generalization, and flexibility. This is exciting because it is frustratingly simple, scalable, and shows strong improvement on real-world robotics problems. Please refer to Chuning Zhu 's excellent thread for more details! More details/code can be found on our website and in the paper -

World modeling and imitation learning have largely been considered two disparate worlds. In our recent work, Unified World Models, just accepted to #RSS2025, Chuning Zhu provides a dead-simple unifying solution: just train a joint diffusion model over actions and future states, but with decoupled diffusion time steps across these modalities. Manipulating these decoupled time steps then allows for marginalization or conditioning on actions or states; a single model can serve as a policy, forward dynamics model, video prediction model, or inverse dynamics model by simply setting diffusion timesteps carefully. The resulting model can leverage video datasets along with robot training data much more effectively, and shows improved robustness, generalization, and flexibility. This is exciting because it is frustratingly simple, scalable, and shows strong improvement on real-world robotics problems. Please refer to Chuning Zhu 's excellent thread for more details! More details/code can be found on our website and in the paper -

11,430 views

Haven't been to a conference in a while, really excited to be at #NeurIPS2024! I'll be helping present 4 of our group's recent papers: 1. Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL 2. Distributional Successor Features Enable Zero-Shot Policy Optimization 3. Learning to Cooperate with Humans using Generative Agents 4. Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Find more details on each paper and where to find us in this thread (1/6)

Haven't been to a conference in a while, really excited to be at #NeurIPS2024! I'll be helping present 4 of our group's recent papers: 1. Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL 2. Distributional Successor Features Enable Zero-Shot Policy Optimization 3. Learning to Cooperate with Humans using Generative Agents 4. Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Find more details on each paper and where to find us in this thread (1/6)

10,803 views

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Policies trained on real robot data via imitation can be surprisingly capable. But for domains like dexterous manipulation, they are often not quite good enough: they move slowly, miss grasps, make unreliable contact, and fail under small perturbations. Can we improve them without any additional data collection on the real robot? In SCORE, we show that we can improve real-world diffusion/flow policies cheaply by using simulation to simply learn how to steer them on deployment. This leads to large gains in real-world success and speed across a variety of tasks, without requiring additional real-world experience: 🧵 (1/10)

Policies trained on real robot data via imitation can be surprisingly capable. But for domains like dexterous manipulation, they are often not quite good enough: they move slowly, miss grasps, make unreliable contact, and fail under small perturbations. Can we improve them without any additional data collection on the real robot? In SCORE, we show that we can improve real-world diffusion/flow policies cheaply by using simulation to simply learn how to steer them on deployment. This leads to large gains in real-world success and speed across a variety of tasks, without requiring additional real-world experience: 🧵 (1/10)

33,689 views • 18 days ago

Excited to share the project that has surprised me the most in the last year! Large-scale RL in simulation, no demos and no reward engineering can solve dynamic, dexterous and contact rich tasks. The learned behaviors are reactive, forceful and use the environment for recovery in ways that are extremely challenging to bake in or teleoperate! You can play with the policies yourself to see: And, the learned behavior transfers to real world robots from RGB camera inputs! So what’s the trick - using simulator resets carefully! Let’s unpack (1/10)

Excited to share the project that has surprised me the most in the last year! Large-scale RL in simulation, no demos and no reward engineering can solve dynamic, dexterous and contact rich tasks. The learned behaviors are reactive, forceful and use the environment for recovery in ways that are extremely challenging to bake in or teleoperate! You can play with the policies yourself to see: And, the learned behavior transfers to real world robots from RGB camera inputs! So what’s the trick - using simulator resets carefully! Let’s unpack (1/10)

82,768 views • 3 months ago

Punchline: distill world models from simulation to enable fast, stable real-world robot adaptation. Simulation is nearly always wrong. But in Simulation Distillation, we ask a simple question: How do we perform simulation pretraining such that real-world adaptation becomes trivially easy? Let's take a closer look (1/n)

Punchline: distill world models from simulation to enable fast, stable real-world robot adaptation. Simulation is nearly always wrong. But in Simulation Distillation, we ask a simple question: How do we perform simulation pretraining such that real-world adaptation becomes trivially easy? Let's take a closer look (1/n)

32,721 views • 2 months ago

Punchline: World models == VQA (about the future)! Planning with world models can be powerful for robotics/control. But most world models are video generators trained to predict everything, including irrelevant pixels and distractions. We ask - what if a world model only predicted the semantic information necessary for decision-making? Introducing Semantic World Models (SWM). Given an observation and an action sequence, SWMs cast modeling as answering textual questions about the future outcome resulting from the actions. Recasting world modeling as a VQA problem lets us directly leverage the pretrained knowledge and machinery of VLMs for generalizable modeling. We had a lot of fun thinking about how this work helps connect these two seemingly very different fields of study - VLMs and world models! 🧵(1/6) Paper: Fun demo:

Punchline: World models == VQA (about the future)! Planning with world models can be powerful for robotics/control. But most world models are video generators trained to predict everything, including irrelevant pixels and distractions. We ask - what if a world model only predicted the semantic information necessary for decision-making? Introducing Semantic World Models (SWM). Given an observation and an action sequence, SWMs cast modeling as answering textual questions about the future outcome resulting from the actions. Recasting world modeling as a VQA problem lets us directly leverage the pretrained knowledge and machinery of VLMs for generalizable modeling. We had a lot of fun thinking about how this work helps connect these two seemingly very different fields of study - VLMs and world models! 🧵(1/6) Paper: Fun demo:

61,208 views • 8 months ago

Combinatorial complexity is often the bane of imitation learning - including VLA models! Jesse Zhang and Marius Memmel proposed a way around this, using VLMs to perform problem reduction for imitation. The insight is simple - 1) High-level VLM takes a complex scene/task and reducing it a minimal representation (via masking and path prediction) that is needed to act in the world. 2) A low-level policy then takes this reduced representation and generates actions to be executed in the world. The high-level policy absorbs all the combinatorial complexity of the problem, leaving the low-level to focus on dexterity and geometric reasoning. Super simple, works really well across policy classes and problem settings! - 41.4× sim2real improvement (3DDA) and 2–3.5× boosts for π₀ and ACT in the real world. Paper: Website: Demo: Fun collaboration led by Jesse Zhang Marius Memmel with lots of collaborators! Let us know what you think 😀

Combinatorial complexity is often the bane of imitation learning - including VLA models! Jesse Zhang and Marius Memmel proposed a way around this, using VLMs to perform problem reduction for imitation. The insight is simple - 1) High-level VLM takes a complex scene/task and reducing it a minimal representation (via masking and path prediction) that is needed to act in the world. 2) A low-level policy then takes this reduced representation and generates actions to be executed in the world. The high-level policy absorbs all the combinatorial complexity of the problem, leaving the low-level to focus on dexterity and geometric reasoning. Super simple, works really well across policy classes and problem settings! - 41.4× sim2real improvement (3DDA) and 2–3.5× boosts for π₀ and ACT in the real world. Paper: Website: Demo: Fun collaboration led by Jesse Zhang Marius Memmel with lots of collaborators! Let us know what you think 😀

22,354 views • 9 months ago

Imitation learning is great, but needs us to have (near) optimal data. We throw away most other data (failures, evaluation data, suboptimal data, undirected play data), even though this data can be really useful and way cheaper! In our new work - RISE, we show a simple way to *use all of this non-optimal data to robustify imitation learning* with minimal requirements beyond BC. Key idea: use non-expert data to learn how to *recover* back to expert data with a minimal frills offline RL that works under sparse data coverage. Allows usage of *all* available data, not just expert data - never throw your data away! Paper: Website: A 🧵(1/10)

Imitation learning is great, but needs us to have (near) optimal data. We throw away most other data (failures, evaluation data, suboptimal data, undirected play data), even though this data can be really useful and way cheaper! In our new work - RISE, we show a simple way to use all of this non-optimal data to robustify imitation learning with minimal requirements beyond BC. Key idea: use non-expert data to learn how to recover back to expert data with a minimal frills offline RL that works under sparse data coverage. Allows usage of all available data, not just expert data - never throw your data away! Paper: Website: A 🧵(1/10)

20,612 views • 8 months ago

So you’ve trained your favorite diffusion/flow based policy, but it’s just not good enough 0-shot. Worry not, in our new work DSRL - we show how to *steer* pre-trained diffusion policies with off-policy RL, improving behavior efficiently enough for direct training in the real world! DSRL retains nice exploration from the base policy, but allows for quick improvement beyond this base policy with RL. The method is frustratingly simple, and super easy to throw on top of your favorite pretrained policy (VLA/diffusion policy, etc). Let’s think about how it works, 🧵 (1/10)

So you’ve trained your favorite diffusion/flow based policy, but it’s just not good enough 0-shot. Worry not, in our new work DSRL - we show how to steer pre-trained diffusion policies with off-policy RL, improving behavior efficiently enough for direct training in the real world! DSRL retains nice exploration from the base policy, but allows for quick improvement beyond this base policy with RL. The method is frustratingly simple, and super easy to throw on top of your favorite pretrained policy (VLA/diffusion policy, etc). Let’s think about how it works, 🧵 (1/10)

19,084 views • 1 year ago

So we did a bunch of projects with real world reinforcement learning - but it was often too inefficient to be practical to train tabula rasa. This suggests we need better priors, but acquiring these from on-robot data can often be expensive as well. In our recent work, we show that despite being fundamentally inaccurate, simulation can guide provide a cheap way to guide real-world RL finetuning to be super efficient! We propose Simulation-Guided Fine-Tuning (SGFT) - a simple paradigm for sim2real finetuning that uses simulation to provide reward shaping that accelerates real world RL finetuning *beyond* just providing an initialization. TLDR: Use value functions from sim to shape rewards for real-world RL, see large sample efficiency improvements 🧵(1/6)

So we did a bunch of projects with real world reinforcement learning - but it was often too inefficient to be practical to train tabula rasa. This suggests we need better priors, but acquiring these from on-robot data can often be expensive as well. In our recent work, we show that despite being fundamentally inaccurate, simulation can guide provide a cheap way to guide real-world RL finetuning to be super efficient! We propose Simulation-Guided Fine-Tuning (SGFT) - a simple paradigm for sim2real finetuning that uses simulation to provide reward shaping that accelerates real world RL finetuning beyond just providing an initialization. TLDR: Use value functions from sim to shape rewards for real-world RL, see large sample efficiency improvements 🧵(1/6)

13,637 views • 1 year ago

Learned visuomotor policies are notoriously fragile, they break with changes in conditions like lighting, clutter, or object variations amongst other things. In Yunchu @ CoRL2025's latest work, we asked whether we could get these policies to be robust and generalizable with a clever choice of visual representation! The argument we made was - we want a choice of visual representation that specifically adapts to be sufficient, yet minimal for the task at hand. We thought about it from the perspective of flexible, key-point based representations. The key question becomes - how do we choose a sufficient, task-specific, yet minimal set of keypoints as a representation for policy learning. Yunchu proposes a neat way of automatically selecting task-relevant keypoints using a standard supervised learning objective, and using this for robust policy learning. This is largely under the same assumptions as behavior cloning, but with huge gains on robustness. Let’s understand how, 🧵 (1/8)

Learned visuomotor policies are notoriously fragile, they break with changes in conditions like lighting, clutter, or object variations amongst other things. In Yunchu @ CoRL2025's latest work, we asked whether we could get these policies to be robust and generalizable with a clever choice of visual representation! The argument we made was - we want a choice of visual representation that specifically adapts to be sufficient, yet minimal for the task at hand. We thought about it from the perspective of flexible, key-point based representations. The key question becomes - how do we choose a sufficient, task-specific, yet minimal set of keypoints as a representation for policy learning. Yunchu proposes a neat way of automatically selecting task-relevant keypoints using a standard supervised learning objective, and using this for robust policy learning. This is largely under the same assumptions as behavior cloning, but with huge gains on robustness. Let’s understand how, 🧵 (1/8)

11,355 views • 1 year ago

Constructing interactive simulated worlds has been a challenging problem, requiring considerable manual effort for asset creation and articulation, and composing assets to form full scenes. In our new work - DRAWER, we made the process of creating scenes in simulation as simple as taking a video of the scene and out comes a high-quality, fully interactive environment in simulation. No human simulation designer involved! A 🧵(1/7)

Constructing interactive simulated worlds has been a challenging problem, requiring considerable manual effort for asset creation and articulation, and composing assets to form full scenes. In our new work - DRAWER, we made the process of creating scenes in simulation as simple as taking a video of the scene and out comes a high-quality, fully interactive environment in simulation. No human simulation designer involved! A 🧵(1/7)

12,072 views • 1 year ago

So I heard we need more data for robot learning :) Purely real world teleop is expensive and slow, making large scale data collection challenging. I’ve been excited about getting more data into robot learning, going beyond just real-world teleop data. To this end, we’ve been scaling up data generation with RL in realistic simulations generated on the fly from crowdsourced videos. Enables realistic data collection, much more cheaply than purely real world teleop. Importantly, data collection becomes even*cheaper* with more environments, allowing training with over 100x more data. Transfers to real robots for generalizable manipulation. A 🧵 (1/N)

So I heard we need more data for robot learning :) Purely real world teleop is expensive and slow, making large scale data collection challenging. I’ve been excited about getting more data into robot learning, going beyond just real-world teleop data. To this end, we’ve been scaling up data generation with RL in realistic simulations generated on the fly from crowdsourced videos. Enables realistic data collection, much more cheaply than purely real world teleop. Importantly, data collection becomes evencheaper with more environments, allowing training with over 100x more data. Transfers to real robots for generalizable manipulation. A 🧵 (1/N)

13,350 views • 1 year ago

Over the last few months, we’ve been thinking about how to learn from “off-domain” data - data from non-robot sources like video or simulation. These data sources are not quite good enough to learn policies (even monolithic VLA models) directly, but they still contain lots of information that can be useful for generalizable robot control. How can we develop robot learning models that are able to make use of this type of data for generalizable control? In new work, that we call HAMSTER, we show that VLMs can be useful for enabling robotic learning from off-domain data, but specifically when used through hierarchical VLA architectures. We show that this class of models can learn generalizable robot policies for the real world from large-scale, off-domain data. A 🧵 (1/10)

Over the last few months, we’ve been thinking about how to learn from “off-domain” data - data from non-robot sources like video or simulation. These data sources are not quite good enough to learn policies (even monolithic VLA models) directly, but they still contain lots of information that can be useful for generalizable robot control. How can we develop robot learning models that are able to make use of this type of data for generalizable control? In new work, that we call HAMSTER, we show that VLMs can be useful for enabling robotic learning from off-domain data, but specifically when used through hierarchical VLA architectures. We show that this class of models can learn generalizable robot policies for the real world from large-scale, off-domain data. A 🧵 (1/10)

11,994 views • 1 year ago

In my experience, robot 'generalists' are often jacks of all trades but masters of none. In training across multiple tasks and environments, robot policies fail to generalize robustly and effectively to each particular test setting. What if at test time, we non-parametrically *retrieved* “relevant” data from the training set and used it to significantly improve the performance of few-shot imitation learning to be robust to various test time scenes. Notably, we are *not* collecting lots of new data, just training more on sub-components of the same training data! Now, we’re certainly not the first to suggest retrieval, but in our new work - STRAP, we show how retrieving relevant *sub-trajectories* from offline datasets can significantly increase data reuse across tasks, when paired with an appropriate metric space. A 🧵 (1/7)

In my experience, robot 'generalists' are often jacks of all trades but masters of none. In training across multiple tasks and environments, robot policies fail to generalize robustly and effectively to each particular test setting. What if at test time, we non-parametrically retrieved “relevant” data from the training set and used it to significantly improve the performance of few-shot imitation learning to be robust to various test time scenes. Notably, we are not collecting lots of new data, just training more on sub-components of the same training data! Now, we’re certainly not the first to suggest retrieval, but in our new work - STRAP, we show how retrieving relevant sub-trajectories from offline datasets can significantly increase data reuse across tasks, when paired with an appropriate metric space. A 🧵 (1/7)

12,045 views • 1 year ago

Haven't been to a conference in a while, really excited to be at #NeurIPS2024! I'll be helping present 4 of our group's recent papers: 1. Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL 2. Distributional Successor Features Enable Zero-Shot Policy Optimization 3. Learning to Cooperate with Humans using Generative Agents 4. Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Find more details on each paper and where to find us in this thread (1/6)

Haven't been to a conference in a while, really excited to be at #NeurIPS2024! I'll be helping present 4 of our group's recent papers: 1. Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL 2. Distributional Successor Features Enable Zero-Shot Policy Optimization 3. Learning to Cooperate with Humans using Generative Agents 4. Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Find more details on each paper and where to find us in this thread (1/6)

10,803 views • 1 year ago

No more content to load