Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

How do you teach a robot to handle complex, multi-step tasks, without training it for each one? [Github ⬇️] The team behind ReKep shows that robots can perform bimanual, in-the-wild tasks by reasoning over keypoint constraints: Generated on the fly using vision and language models. No task-specific data, no...

25,348 görüntüleme • 1 yıl önce •via X (Twitter)

2 Yorum

VentureMind AI profil fotoğrafı
VentureMind AI1 yıl önce

INSANE 🔥

MightyBot profil fotoğrafı
MightyBot1 yıl önce

🧠 Unified Search. Smarter Meetings. Effortless CRM. MightyBot is your AI agent platform for seamless workflows—record meetings, automate CRM updates, and find answers across apps in seconds. 🌟 Focus on what matters. We'll handle the grind.

Benzer Videolar

A team tested Pi0, Pi0 Fast, Gr00t, and ACT on real robot arms in manufacturing tasks. (🔖 Bookmark this for later!) The task was precise: place thin rectangular frames from a messy stack into a holder. The team fine-tuned each model on 100 real trajectories and compared training time, inference speed, motion quality, and success rates. ⬇️ Here’s a breakdown of what they found Pi0 (Original) ✅ Strongest overall performance in precise pick-and-place ✅ High success rate even in edge cases ✅ Longest training time (~11 hours, ~$30 per run) ✅ Inference time of 80 ms causes short pauses between actions Despite delays, it handles complex scenarios well… solid for high-precision tasks, but slow to train. Gr00t ✅ Trains fast (~2 hours, ~$5 per run) ✅ Performs almost as well as Pi0 on large-object tasks ✅ Struggles with fine precision; random movement in some trials ✅ More training didn’t fix jitter or random offsets Best suited for tasks where exact precision isn’t critical. Not ready for manufacturing-grade accuracy without more tuning. Pi0 Fast ✅ Promised faster training, but results were underwhelming ✅ Training at 6 hours still showed low success rates ✅ Inference was slower than expected ✅ Not reliable for generalizing even slightly new tasks Currently too unstable for real-world deployment. Doesn’t live up to the “Fast” name yet. ACT (Baseline) ✅ 200MB model—lightweight, but limited ✅ Struggles with stacked objects or ambiguous scenes ✅ Success rates around 70% in best-case setups ✅ Can’t match newer models on precision or generalization Still a solid baseline, but clearly a generation behind in robustness. 🚨 Extra Notes All newer models share a common issue: •Inference takes longer than a frame (80 ms vs 33 ms), so robots “pause” between chunks. •This results in jittery movements, but not a dealbreaker unless tasks are time-sensitive. Language-conditioned tasks also fell short: after training on two labeled tasks, the model couldn’t generalize to a third unseen combination using only text prompts. ✅ The good news? These models adapt well to new robot arms with quick fine-tuning. ❌ The bad news? There’s still no plug-and-play solution for improving performance after deployment. Reinforcement learning or DAgger-style data collection during real-world operation may be the next big step, something many teams in robotics are actively working on.

Ilir Aliu - eu/acc

21,703 görüntüleme • 1 yıl önce

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

Andrew Ng

86,381 görüntüleme • 1 yıl önce