Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

we made distributed inference verifiable with <1% overhead. verification is critical for any distributed system. in a trustless network, actors may swap your 70B model for a cheaper 8B one to cut costs. until now, maintaining inference integrity meant either doubling your cost (redundancy) or exploding your latency (zkp)....

28,432 görüntüleme • 5 ay önce •via X (Twitter)

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

In just one week, Binh Pham and I trained a full-body Unitree G1. Here's a recap: 1. Secured a Unitree G1 humanoid through a LinkedIn post 2. Deployed TWIST2 full-body teleoperation pipelines 3. Adapted TWIST2 for Zed stereo camera & collected full-body teleoperation samples (carried by Binh Pham ) 4. Adapted & fine-tuned NVIDIA Gr00T N1.5 VLA on the TWIST2 public datasets, which I fine-tuned on an 8xNVIDIA H100 Cluster. We picked Gr00T N1.5 as it was trained with Unitree G1 embodiment data. 5. Adapted the TWIST2 codebase to stream in the actions from Gr00T via ZMQ using a co-located NVIDIA H100 for ~200ms inference latency 6. Tested the model in sim, then deployed to the real-world Unitree G1. We streamed a training sample observation to the VLA (as we didn't want to break robot in case real observations were OOD) We were the first team in the world to deploy the full TWIST2 data collection pipeline to the unitree g1 :) Much more work ahead though, which I'll work on as a side-project over the next months: 1. Exploring the various types of 'world models': video backbones, dynamics models, v-jepa-2 models. I believe these will generalize better & train much more data-efficiently than VLM backbones 2. Speeding up inference - I believe low-latency robotics inference will be a big challenge. There are many works in video diffusion which I'd like to test (e.g. SageAttention, SparseAttention, Drifting Models). Perhaps also writing custom CUDA kernels. 3. Economics of inference scaling :) What will be the compute demands as we scale inference up to millions of humanoids? Will it run on edge or on distributed 'co-located' inference clusters? These are questions I'd like to answer. Adapted TWIST2 codebase: Adapted Gr00T-N1.5 codebase: The ETH Robotics Club are doing a cool GTC Golden ticket competition with NVIDIA , so this is my submission :) The DGX Spark compute will get me a long way with initial prototyping & especially working on inference optimization for next-gen Blackwell GPUs #NVIDIAGTC #GOLDENTICKET #ETHRC

Arnie Ramesh

14,815 görüntüleme • 3 ay önce