Harrison Kinsley's banner
Harrison Kinsley's profile picture

Harrison Kinsley

@Sentdex104,520 subscribers

gpus and tractors. Director of AI and Engineering @ https://t.co/H4St8dd1ip Neural networks from Scratch book: https://t.co/hyMkWyUP7R https://t.co/8WGZRkUGsn

Shorts

This is incredible

This is incredible

2,846,098 просмотров

testing robot policies has never been so much fun

testing robot policies has never been so much fun

57,986 просмотров

on a scale of impressive to theatrics, where do you find the ceo of engine getting kicked by his robot to land Brett Adcock ?

on a scale of impressive to theatrics, where do you find the ceo of engine getting kicked by his robot to land Brett Adcock ?

92,560 просмотров

Playing on the new Jetson Thor trying to think in terms of having gobs of memory, but low mem bandwidth Moondream2 VLM is ~2 FPS per full loop of everything But we have 128GB of memory. So run 15 VLM servers (~76GB) & get 30 FPS w/ ~100ms latency for the feed very comfy.

Playing on the new Jetson Thor trying to think in terms of having gobs of memory, but low mem bandwidth Moondream2 VLM is ~2 FPS per full loop of everything But we have 128GB of memory. So run 15 VLM servers (~76GB) & get 30 FPS w/ ~100ms latency for the feed very comfy.

35,687 просмотров

Videos

Sentdex's profile picture

I need a picker upper pupper in my twin toddler life.

Harrison Kinsley

13,751 просмотров • 25 дней назад

Sentdex's profile picture

just cooked up a new sprinter policy, do we attempt sim2real?

Harrison Kinsley

57,912 просмотров • 7 месяцев назад

Sentdex's profile picture

In a world of PPO everything for reinforcement learning, I've been tinkering with SAC for training a quadruped gait. This gait is trained purely on CPU (training on one of the Dell GB10s) on a single environment. Training any particular run is obviously slower than PPO on an RTX Pro 6000 with 8092 envs, if you already know the exact hyperparams/rwd function for your PPO algo... but, if we're honest with ourselves, then we know we usually spend days tuning our PPO algo and fighting it to do what we want. In contrast, SAC has kind of been a breath of fresh air, very amenable to changing the reward function to tune behavior. So far, my first attempts to tune things have consistently just worked immediately rather than 15 different variations of reward hacking only to find previous tuned behaviors got lost in the process. There is also FastSAC, which I've not yet tried, but can speed things up potentially and introduce scale back into the equation. My main painpoint in getting SAC to work for gait was actually getting it to learn to step. It seems as though SAC is not as good as PPO at significant exploration on its own. I ended up starting with a sinusoidal gait (basically just a rule to make legs swing) as training wheels then blended it out through training as phase 1, then began working on smoothing things out after this. I think if we look at end to end dev time rather than any particular run that finally managed to work, SAC may actually be the "faster" algorithm to train. Quadruped gaits are inherently easier than bipedal and maybe there are areas where SAC falls short, but I'll definitely be spending more time with SAC.

Harrison Kinsley

26,711 просмотров • 3 месяцев назад

Больше нет контента для загрузки