Dominique Paul's banner
Dominique Paul's profile picture

Dominique Paul

@DominiqueCAPaul6,502 subscribers

Founder of Dream Machines, building ML models for robotic arms. Previous: Stats at @ETH & @NeurIPSconf. I post about RL and VLA post-training.

Shorts

Today I incorporated my startup - and where else than in Germany 🇩🇪 All without leaving my house and getting a German notary appointment in under 24 hours, thanks to the electronic ID. Long Europe 🇪🇺

Today I incorporated my startup - and where else than in Germany 🇩🇪 All without leaving my house and getting a German notary appointment in under 24 hours, thanks to the electronic ID. Long Europe 🇪🇺

114,740 просмотров

Time to get used to teleoperation on an upgraded hardware stack! Thank you Jannik Grothusen for building such an insane arm! 👊

Time to get used to teleoperation on an upgraded hardware stack! Thank you Jannik Grothusen for building such an insane arm! 👊

44,583 просмотров

This figure from HIL-SERL is one of the clearest visualisations of how RL learns differently from imitation learning. The difference comes down to this: imitation learning treats each (state, action) pair as independent. A correction at timestep 20 teaches nothing about timestep 19 or 21. RL propagates reward backward through time. One successful insertion updates the value estimate of every state along the trajectory. So RL builds a full map of "which states lead to success"; imitation learning just memorizes individual snapshots. Setup: a robot inserting a RAM stick into a motherboard slot. Each dot is an end-effector position (Y = lateral, Z = height). Starting position is randomized. Left to right = training progressing. Top row (RL): the policy builds a funnel. Broad at the top, narrowing into the target. It systematically fills in the state space, learning which paths lead to success from many different starting positions. Bottom row (imitation learning / HG-DAgger, same human data): sparse, diffuse, no funnel. The policy only learns near states the human demonstrated. Both have access to the same data, including human corrections, but a completely different structure emerges.

This figure from HIL-SERL is one of the clearest visualisations of how RL learns differently from imitation learning. The difference comes down to this: imitation learning treats each (state, action) pair as independent. A correction at timestep 20 teaches nothing about timestep 19 or 21. RL propagates reward backward through time. One successful insertion updates the value estimate of every state along the trajectory. So RL builds a full map of "which states lead to success"; imitation learning just memorizes individual snapshots. Setup: a robot inserting a RAM stick into a motherboard slot. Each dot is an end-effector position (Y = lateral, Z = height). Starting position is randomized. Left to right = training progressing. Top row (RL): the policy builds a funnel. Broad at the top, narrowing into the target. It systematically fills in the state space, learning which paths lead to success from many different starting positions. Bottom row (imitation learning / HG-DAgger, same human data): sparse, diffuse, no funnel. The policy only learns near states the human demonstrated. Both have access to the same data, including human corrections, but a completely different structure emerges.

24,399 просмотров

Videos

DominiqueCAPaul's profile picture

Europe just overtook the US in the number of git pushes.

Dominique Paul

687,038 просмотров • 3 месяцев назад

Больше нет контента для загрузки