Dominique Paul's banner

Dominique Paul

@DominiqueCAPaul • 7,129 subscribers

Founder of Dream Machines, building ML models for robotic arms. Previous: Stats at @ETH & @NeurIPSconf. I post about RL and VLA post-training.

Shorts

Today I incorporated my startup - and where else than in Germany 🇩🇪 All without leaving my house and getting a German notary appointment in under 24 hours, thanks to the electronic ID. Long Europe 🇪🇺

Today I incorporated my startup - and where else than in Germany 🇩🇪 All without leaving my house and getting a German notary appointment in under 24 hours, thanks to the electronic ID. Long Europe 🇪🇺

114,850 Aufrufe

We‘re gonna be doing DAgger style data collection soon. But the leader arms can’t carry their own weight and would need to follow the leaders at all time. That feels messy, so Aurel Arnold implemented a teleop setup with the Quest in the last 1.5 days. First configuration is working now, but not feeling 100% natural yet. Hard to describe why.

We‘re gonna be doing DAgger style data collection soon. But the leader arms can’t carry their own weight and would need to follow the leaders at all time. That feels messy, so Aurel Arnold implemented a teleop setup with the Quest in the last 1.5 days. First configuration is working now, but not feeling 100% natural yet. Hard to describe why.

19,940 Aufrufe

Time to get used to teleoperation on an upgraded hardware stack! Thank you Jannik Grothusen for building such an insane arm! 👊

Time to get used to teleoperation on an upgraded hardware stack! Thank you Jannik Grothusen for building such an insane arm! 👊

44,583 Aufrufe

This figure from HIL-SERL is one of the clearest visualisations of how RL learns differently from imitation learning. The difference comes down to this: imitation learning treats each (state, action) pair as independent. A correction at timestep 20 teaches nothing about timestep 19 or 21. RL propagates reward backward through time. One successful insertion updates the value estimate of every state along the trajectory. So RL builds a full map of "which states lead to success"; imitation learning just memorizes individual snapshots. Setup: a robot inserting a RAM stick into a motherboard slot. Each dot is an end-effector position (Y = lateral, Z = height). Starting position is randomized. Left to right = training progressing. Top row (RL): the policy builds a funnel. Broad at the top, narrowing into the target. It systematically fills in the state space, learning which paths lead to success from many different starting positions. Bottom row (imitation learning / HG-DAgger, same human data): sparse, diffuse, no funnel. The policy only learns near states the human demonstrated. Both have access to the same data, including human corrections, but a completely different structure emerges.

This figure from HIL-SERL is one of the clearest visualisations of how RL learns differently from imitation learning. The difference comes down to this: imitation learning treats each (state, action) pair as independent. A correction at timestep 20 teaches nothing about timestep 19 or 21. RL propagates reward backward through time. One successful insertion updates the value estimate of every state along the trajectory. So RL builds a full map of "which states lead to success"; imitation learning just memorizes individual snapshots. Setup: a robot inserting a RAM stick into a motherboard slot. Each dot is an end-effector position (Y = lateral, Z = height). Starting position is randomized. Left to right = training progressing. Top row (RL): the policy builds a funnel. Broad at the top, narrowing into the target. It systematically fills in the state space, learning which paths lead to success from many different starting positions. Bottom row (imitation learning / HG-DAgger, same human data): sparse, diffuse, no funnel. The policy only learns near states the human demonstrated. Both have access to the same data, including human corrections, but a completely different structure emerges.

24,433 Aufrufe

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Europe just overtook the US in the number of git pushes.

Europe just overtook the US in the number of git pushes.

687,377 Aufrufe • vor 5 Monaten

Lighting differences can make a huge difference in robotics. Today, I found a quirk in my model exemplifying this. > I collected 10h of training data. > 3h in, I notice that the left arm following the right arm for the final movement could be good for the final insertion subtask. > I change behaviour. > But the first 3h didn't include much nighttime data (=dark room). 🎯Result The same policy reliably exhibits different behaviour on exactly this subtask depending on the lighting. So many microlearnings to be made in robotics.

Lighting differences can make a huge difference in robotics. Today, I found a quirk in my model exemplifying this. > I collected 10h of training data. > 3h in, I notice that the left arm following the right arm for the final movement could be good for the final insertion subtask. > I change behaviour. > But the first 3h didn't include much nighttime data (=dark room). 🎯Result The same policy reliably exhibits different behaviour on exactly this subtask depending on the lighting. So many microlearnings to be made in robotics.

97,812 Aufrufe • vor 1 Monat

A few months ago I realised how when you’re building alone you miss out on the random photos other people take of you. Like the ones that pop on your phone or somewhere else later a remind you of a specific phase of your life. So I decided to start taking a few myself, even during the bland stretches. Today, while reading the RL book, I thought of that and recorded a tiny time-lapse. I then later noticed I should have done so during an easier section of the book because I apparently spent 11 minutes on two pages and don’t flip a page until the end 😂

A few months ago I realised how when you’re building alone you miss out on the random photos other people take of you. Like the ones that pop on your phone or somewhere else later a remind you of a specific phase of your life. So I decided to start taking a few myself, even during the bland stretches. Today, while reading the RL book, I thought of that and recorded a tiny time-lapse. I then later noticed I should have done so during an easier section of the book because I apparently spent 11 minutes on two pages and don’t flip a page until the end 😂

90,205 Aufrufe • vor 7 Monaten

Thoughts from a teleop session today: 1/ Teleop is painful. UMI-style grippers are making more and more sense to me: shorter per-episode execution, more data, and more intuitive for factory workers who will be the ones with my product eventually. Wondering if PI resists this because researchers aren't the ones collecting the data? 2/ The take-away of the HF shirt folding post stuck with me: data quality matters most of all. I'm 30 minutes into this task and still making mistakes with teleop. What’s the perspective for non-roboticists? Maybe VR headset is better. Want to try that next. 3/ I’m noticeable better at teleop even when I’m just 15cm closer. 4/ Double-close gestures for re-record (left) and early episode end (right) are a game changer. Credit Andreas Köpf. 5/ Want to gamify my own collection more: thinking of a daily target dashboard. 6/ I’d like to rate each episode with a 1-5 data quality score. Don't wanna throw away bad data away, but still be able to filter top-quality. Maybe possible with foot pedals?

Thoughts from a teleop session today: 1/ Teleop is painful. UMI-style grippers are making more and more sense to me: shorter per-episode execution, more data, and more intuitive for factory workers who will be the ones with my product eventually. Wondering if PI resists this because researchers aren't the ones collecting the data? 2/ The take-away of the HF shirt folding post stuck with me: data quality matters most of all. I'm 30 minutes into this task and still making mistakes with teleop. What’s the perspective for non-roboticists? Maybe VR headset is better. Want to try that next. 3/ I’m noticeable better at teleop even when I’m just 15cm closer. 4/ Double-close gestures for re-record (left) and early episode end (right) are a game changer. Credit Andreas Köpf. 5/ Want to gamify my own collection more: thinking of a daily target dashboard. 6/ I’d like to rate each episode with a 1-5 data quality score. Don't wanna throw away bad data away, but still be able to filter top-quality. Maybe possible with foot pedals?

37,908 Aufrufe • vor 3 Monaten

We built a golden robot retriever! > SO-100 attached to a Unitree Go2 > Trained an ACT policy to grab objects that (mostly) also works for OOD objects and backgrounds > System is controlled by an agentic loop (OpenAI) evaluating surroundings & triggering retrieval with the arm > The quadruped is teleoperated Thank you, Valentin Hartmann, Wen, and Brian, for the amazing time together! Thank you Arnie Ramesh Simon Sure Pascal Bertrand Lucas Beyer (bl16) OpenAI mimic Elvis Nava Loki Robotics Antonio Arbues for an amazing hackathon!

We built a golden robot retriever! > SO-100 attached to a Unitree Go2 > Trained an ACT policy to grab objects that (mostly) also works for OOD objects and backgrounds > System is controlled by an agentic loop (OpenAI) evaluating surroundings & triggering retrieval with the arm > The quadruped is teleoperated Thank you, Valentin Hartmann, Wen, and Brian, for the amazing time together! Thank you Arnie Ramesh Simon Sure Pascal Bertrand Lucas Beyer (bl16) OpenAI mimic Elvis Nava Loki Robotics Antonio Arbues for an amazing hackathon!

115,376 Aufrufe • vor 1 Jahr

Collecting DAgger style data for an ACT policy trained on 50 episodes - just for a minimal experiment. Problem: The parts hard for the policy are also hard for the human with teleoperation.

Collecting DAgger style data for an ACT policy trained on 50 episodes - just for a minimal experiment. Problem: The parts hard for the policy are also hard for the human with teleoperation.

11,223 Aufrufe • vor 1 Monat

So using relative joint angles with Pi0.5 isn’t going that well yet. I think it’s some bug in the inference loop. Need to think of a debugging visualization tomorrow.

So using relative joint angles with Pi0.5 isn’t going that well yet. I think it’s some bug in the inference loop. Need to think of a debugging visualization tomorrow.

13,328 Aufrufe • vor 2 Monaten

Today and tomorrow are data collection power days: 465 today with a goal of 1100 new samples total. The Hugging Face recording script previously took 30-40s to process each 15s recording, but I changed it to batch process and now collection is way faster (and more fun)!

Today and tomorrow are data collection power days: 465 today with a goal of 1100 new samples total. The Hugging Face recording script previously took 30-40s to process each 15s recording, but I changed it to batch process and now collection is way faster (and more fun)!

39,273 Aufrufe • vor 1 Jahr

Another reason I’m interested in UMI style grippers is that tasks that‘d take 10s for a human take 5x as long with teleoperation. The lack of haptic and force feedback makes it hard to do movements relying on them like here, passing an object from one hand to another. Adding to that: teleoperation means you’re standing a meter away from the objects you’re operating on, and here for example, the box is obstructing the view of the PCBs. Thank you for great teleoperation Felix Neumann 🤝

Another reason I’m interested in UMI style grippers is that tasks that‘d take 10s for a human take 5x as long with teleoperation. The lack of haptic and force feedback makes it hard to do movements relying on them like here, passing an object from one hand to another. Adding to that: teleoperation means you’re standing a meter away from the objects you’re operating on, and here for example, the box is obstructing the view of the PCBs. Thank you for great teleoperation Felix Neumann 🤝

15,434 Aufrufe • vor 6 Monaten

Today I fixed the follower arms to a profile bar and started experimenting with new leader–follower setups. This speaker desk I found on Amazon is the first of three. I like the laptop‘s position in the bottom tray: Doesn’t block view and still accessible for quick code changes. What I still haven’t figured out is the ideal horizontal distance between both the leader and follower arms to make teleop feel more natural.

10,595 Aufrufe • vor 7 Monaten

First successes in training the robot policy for chess movements! 🧵

First successes in training the robot policy for chess movements! 🧵

13,458 Aufrufe • vor 1 Jahr

Keine weiteren Inhalte verfügbar