Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

This is a vertically integrated end to end deep neural network performing forward pass inference real-time, controlling individual actuator's torque output for bidpedal gait generation in adverse, GPS denied, envs. ok its standard PPO rl trained in mjlab, strapped to a tractor.

34,118 görüntüleme • 5 ay önce •via X (Twitter)

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

In a world of PPO everything for reinforcement learning, I've been tinkering with SAC for training a quadruped gait. This gait is trained purely on CPU (training on one of the Dell GB10s) on a single environment. Training any particular run is obviously slower than PPO on an RTX Pro 6000 with 8092 envs, if you already know the exact hyperparams/rwd function for your PPO algo... but, if we're honest with ourselves, then we know we usually spend days tuning our PPO algo and fighting it to do what we want. In contrast, SAC has kind of been a breath of fresh air, very amenable to changing the reward function to tune behavior. So far, my first attempts to tune things have consistently just worked immediately rather than 15 different variations of reward hacking only to find previous tuned behaviors got lost in the process. There is also FastSAC, which I've not yet tried, but can speed things up potentially and introduce scale back into the equation. My main painpoint in getting SAC to work for gait was actually getting it to learn to step. It seems as though SAC is not as good as PPO at significant exploration on its own. I ended up starting with a sinusoidal gait (basically just a rule to make legs swing) as training wheels then blended it out through training as phase 1, then began working on smoothing things out after this. I think if we look at end to end dev time rather than any particular run that finally managed to work, SAC may actually be the "faster" algorithm to train. Quadruped gaits are inherently easier than bipedal and maybe there are areas where SAC falls short, but I'll definitely be spending more time with SAC.

Harrison Kinsley

26,726 görüntüleme • 3 ay önce