
David Bar
@observie • 5,516 subscribers
robots building journey log
Shorts
Videos

Most RL locomotion examples let the actor (the policy network that runs on the real robot) observe two ground truths that are not directly measured by hardware: - linear velocity of the robot - projected gravity (i.e. orientation of the robot) The former can be inferred using a state estimator built using a small neural network trained to predict velocity, while the latter can be computed using Madgwick AHRS / Kalman filter. Alternatively, it kind of makes sense to let the actor network learn to extract whatever internal representation it needs directly from raw sensor data, instead of using hand-designed estimators. I removed base_lin_vel, similarly to Asimov's approach, as well as projected_gravity. Instead, I added the accelerometer data (which most RL examples do not seem to provide). I continue to give those ground truth variables to the critic as privileged info the actor can't see, which is known as an asymmetric actor-critic architecture. Advantages: 1. Should minimize the sim2real gap, as there are less external components whose results may differ between the sim and the hw 2. The actor can learn the interim representation that works better for the task, not necessarily those that we decided to infer for it 3. Less hand-tuned parameters At least in simulation this seems to work great. It might be luck, trivial or still plain wrong, but after 1500 iterations, the simulation reached the best run yet in terms of reward, lin/ang tracking, action std and more.
David Bar11,685 Aufrufe • vor 2 Monaten
Keine weiteren Inhalte verfügbar