Loading video...
Video Failed to Load
Can vision-language-action (VLA) models generalize to diverse OOD tasks and align with customized objectives? ๐ค ๐ We introduce GRAPE, a plug-and-play algorithm to generalize robot policies via preference alignment. GRAPE unfolds three benefits to boost the generalizability of VLAs: ๐1. GRAPE aligns VLAs on a trajectory level and endows... show more
19,988 views โข 1 year ago โขvia X (Twitter)
7 Comments

[2/N] Detailed Method 1๏ธโฃ Trajectory-wise Preference Optimization: GRAPE scales up step-wise VLAs and trains with a trajectory-wise objective, aligning policies globally by learning from both successes and failures. 2๏ธโฃCustomized Preference Synthesis: GRAPE breaks down complex tasks into stages, guided by spatiotemporal constraints from VL models. Flexibly aligns for arbitrary objectives, such as safety, efficiency, or task success. 3๏ธโฃ Iterative Online Alignment: GRAPE refines the alignment process through iterative cycles of 1) online sample collection, 2) synthetic preference ranking, and 3) trajectory-wise preference optimization.

[3/N] Empirical Takeaway 1: Stronger generalizability on a wide array of OOD tasks. 1๏ธโฃ Real-world OOD tasks GRAPE crushes OpenVLA-SFT in generalization: - Visual (new visual environments) ๐: +20.7% - Subject (unseen objects) ๐: +27.5% - Action (unseen actions)๐: +10.0% - Semantic (unseen prompts)๐ง : +5.0% - Language grounding (objects in unseen spatial positions)๐: +26.7% 2๏ธโฃ Simulation OOD tasks In Simpler-Env, GRAPE shines: - Subject (unseen objects) ๐: +8.0% - Physical (unseen object sizes/shapes) ๐๏ธ: +12.3% - Semantic (unseen prompts)๐ง : +19.0%

[4/N] Empirical Takeaway 2: Versatility to align towards customized alignment objectives. GRAPE excels at aligning robot policies with diverse natural language goals: โ Task completion โ Safety โ Cost-efficiency Results: - ๐ง Safer policies: -44.31% collisions - โณ Efficient policies: -11.15% rollout lengths

[5/N] Nice work, @ZijianZhangNLP , Kyle Zheng, and nice collab. w/ @ZRChen_AISafety , @jang_yoel , @Yi_Li_UW , @chaoqi_w , @dingmyu , @fox_dieter17849

Cool work๏ผThank Prof. Yao and our nice collab!

what is the required training resources๏ผ

GRAPE seems like a promising leap for VLA models in robotics! The trajectory-level preference alignment and reward modeling are particularly intriguing for enabling safer, more efficient, and task-diverse applications. How scalable is GRAPE to real-world multi-agent environments?


