David Bar's banner
David Bar's profile picture

David Bar

@observie5,516 subscribers

robots building journey log

Shorts

System identification (sysid) is the process of finding the physical parameters that make a simulation match reality. If you're training an RL locomotion policy in simulation, the accuracy of your motor model directly affects how well the policy transfers to the real robot. A recent git commit by Kevin Zakka added a sysid toolbox to MuJoCo which automates this process: you provide recorded motor data and a MuJoCo model, and it optimizes the model parameters to minimize the difference between simulated and real trajectories. For my RobStride Dynamics RS02 QDD motors (17 Nm peak, 7.75:1 gear), I built a Rust tool that sends multi-sine torque excitation at 1 kHz and records position/velocity feedback. I then feed this data into MuJoCo's sysid optimizer.

System identification (sysid) is the process of finding the physical parameters that make a simulation match reality. If you're training an RL locomotion policy in simulation, the accuracy of your motor model directly affects how well the policy transfers to the real robot. A recent git commit by Kevin Zakka added a sysid toolbox to MuJoCo which automates this process: you provide recorded motor data and a MuJoCo model, and it optimizes the model parameters to minimize the difference between simulated and real trajectories. For my RobStride Dynamics RS02 QDD motors (17 Nm peak, 7.75:1 gear), I built a Rust tool that sends multi-sine torque excitation at 1 kHz and records position/velocity feedback. I then feed this data into MuJoCo's sysid optimizer.

47,934 次观看

Gaps identified yesterday, fixed overnight, new model sent today for manufacturing All I really wanted is to write the code, but must have open and robust hardware first

Gaps identified yesterday, fixed overnight, new model sent today for manufacturing All I really wanted is to write the code, but must have open and robust hardware first

136,592 次观看

Imagine this is yours Open api, open hardware, open specs; your code. Build, extend, improve It's called Hoper

Imagine this is yours Open api, open hardware, open specs; your code. Build, extend, improve It's called Hoper

81,929 次观看

Hello World

Hello World

38,029 次观看

This demo runs on PETG and Rust :) Motion software directly controlling the motors via CAN sockets. a Jetson Orin AGX yawning while running the show, hidden beneath the wires, waiting for the interesting parts of the project to begin.

This demo runs on PETG and Rust :) Motion software directly controlling the motors via CAN sockets. a Jetson Orin AGX yawning while running the show, hidden beneath the wires, waiting for the interesting parts of the project to begin.

55,751 次观看

Good morning everyone

Good morning everyone

14,705 次观看

Double helical gear in action (aka herringbone gear)

Double helical gear in action (aka herringbone gear)

24,457 次观看

Videos

observie's profile picture

Most RL locomotion examples let the actor (the policy network that runs on the real robot) observe two ground truths that are not directly measured by hardware: - linear velocity of the robot - projected gravity (i.e. orientation of the robot) The former can be inferred using a state estimator built using a small neural network trained to predict velocity, while the latter can be computed using Madgwick AHRS / Kalman filter. Alternatively, it kind of makes sense to let the actor network learn to extract whatever internal representation it needs directly from raw sensor data, instead of using hand-designed estimators. I removed base_lin_vel, similarly to Asimov's approach, as well as projected_gravity. Instead, I added the accelerometer data (which most RL examples do not seem to provide). I continue to give those ground truth variables to the critic as privileged info the actor can't see, which is known as an asymmetric actor-critic architecture. Advantages: 1. Should minimize the sim2real gap, as there are less external components whose results may differ between the sim and the hw 2. The actor can learn the interim representation that works better for the task, not necessarily those that we decided to infer for it 3. Less hand-tuned parameters At least in simulation this seems to work great. It might be luck, trivial or still plain wrong, but after 1500 iterations, the simulation reached the best run yet in terms of reward, lin/ang tracking, action std and more.

David Bar

11,685 次观看 • 2 个月前

没有更多内容可加载