Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

🏗️ Policy Adaptation from Foundation Model Feedback #CVPR2023 Instead of using foundation model as a pre-trained encoder (generator), we use it as a Teacher (discriminator) to tell where our policy did wrong and helps it adapts to new envs and tasks.

Xiaolong Wang

21,966 subscribers

24,412 просмотров • 3 лет назад •via X (Twitter)

Наука и технологии Образование #CVPR2023

Anya Rossi• Live Now

Private livecam show

Комментарии: 5

Фото профиля Xiaolong Wang

Xiaolong Wang3 лет назад

Work was led by Yuying @tttoaster_ when she was interning in our lab at UCSD, collaborating with @anna_macalus on working with the robots. Please check arxiv: Full video introduction:

Фото профиля Ted Xiao

Ted Xiao3 лет назад

Nice work! Using VLMs unlocks automated feedback, which is often quite expensive to produce by humans For another style of this same method (language feedback from CLIP) but in the offline setting, check out our work DIAL:

Фото профиля Xiaolong Wang

Xiaolong Wang3 лет назад

Thanks for sharing. Yes, definitely quite relevant! Very interesting idea on applying to offline setting. I think the shared core idea is we are all using VLM to provide some sort of reward signals.

Фото профиля Rishabh Agarwal

Rishabh Agarwal3 лет назад

Nice, looks very relevant to Reincarnating RL:

Фото профиля Xiaolong Wang

Xiaolong Wang3 лет назад

Thank you for making the connection. Yes, I think we have a bit of flavor on progressively learning. The focus has been on VLM for supervision but extending more on the progressive learning direction can actually be quite an interesting ... hmm

Похожие видео

Introducing Empathic Voice Interface 2 (EVI 2), our new voice-to-voice foundation model. EVI 2 merges language and voice into a single model trained specifically for emotional intelligence. You can try it and start building today.

Introducing Empathic Voice Interface 2 (EVI 2), our new voice-to-voice foundation model. EVI 2 merges language and voice into a single model trained specifically for emotional intelligence. You can try it and start building today.

Hume AI

165,640 просмотров • 1 год назад

Can we build a generalist robotic policy that doesn’t just memorize training data and regurgitate it during test time, but instead remembers past actions as memory and conditions its decisions on them?🤖💡 Introducing SAM2Act—a multi-view robotic transformer-based policy that integrates a visual foundation model with a memory architecture for robotic manipulation. Project page: 🧵👇

Can we build a generalist robotic policy that doesn’t just memorize training data and regurgitate it during test time, but instead remembers past actions as memory and conditions its decisions on them?🤖💡 Introducing SAM2Act—a multi-view robotic transformer-based policy that integrates a visual foundation model with a memory architecture for robotic manipulation. Project page: 🧵👇

Jiafei Duan

87,573 просмотров • 1 год назад

Cosmos Policy turns a pretrained video diffusion model into a robot controller. Instead of redesigning the architecture, it injects robot state, actions, and values directly as latent frames inside the video model

Cosmos Policy turns a pretrained video diffusion model into a robot controller. Instead of redesigning the architecture, it injects robot state, actions, and values directly as latent frames inside the video model

Robots Digest 🤖

22,933 просмотров • 6 месяцев назад

We trained a foundation model on 18 million heart ultrasound videos to predict structure instead of pixels. Introducing EchoJEPA, the first foundation-scale JEPA for medical video. Paper: Code: 🧵 1/n

We trained a foundation model on 18 million heart ultrasound videos to predict structure instead of pixels. Introducing EchoJEPA, the first foundation-scale JEPA for medical video. Paper: Code: 🧵 1/n

Alif Munim (d/acc)

590,652 просмотров • 5 месяцев назад

Today we're introducing TRIBE v2 (Trimodal Brain Encoder), a foundation model trained to predict how the human brain responds to almost any sight or sound. Building on our Algonauts 2025 award-winning architecture, TRIBE v2 draws on 500+ hours of fMRI recordings from 700+ people to create a digital twin of neural activity and enable zero-shot predictions for new subjects, languages, and tasks. Try the demo and learn more here:

Today we're introducing TRIBE v2 (Trimodal Brain Encoder), a foundation model trained to predict how the human brain responds to almost any sight or sound. Building on our Algonauts 2025 award-winning architecture, TRIBE v2 draws on 500+ hours of fMRI recordings from 700+ people to create a digital twin of neural activity and enable zero-shot predictions for new subjects, languages, and tasks. Try the demo and learn more here:

AI at Meta

6,942,506 просмотров • 4 месяцев назад

While uniform stairs seems to be a solved problem, AGIBOT's X2 shows off how its AGILE foundation model adapts to uneven obstacles. Watch it handle a mix of 15cm, 25cm, and 20cm steps.

While uniform stairs seems to be a solved problem, AGIBOT's X2 shows off how its AGILE foundation model adapts to uneven obstacles. Watch it handle a mix of 15cm, 25cm, and 20cm steps.

Humanoids daily

20,693 просмотров • 2 месяцев назад

John Major, "One aspect of reducing immigration would be to make it more palatable for migrants to remain in their own countries" "That is why the policy of cutting aid and investment to poor countries is a very poor sighted policy" "It will accelerate the demand from the countries to migrate as their countries get poorer" "It's a policy that acts against our own domestic interest" "It is morally wrong, politically wrong, and heartless"

John Major, "One aspect of reducing immigration would be to make it more palatable for migrants to remain in their own countries" "That is why the policy of cutting aid and investment to poor countries is a very poor sighted policy" "It will accelerate the demand from the countries to migrate as their countries get poorer" "It's a policy that acts against our own domestic interest" "It is morally wrong, politically wrong, and heartless"

Farrukh

40,331 просмотров • 8 месяцев назад

World Model meets robot policy! Robbyant's LingBot-VA: unifies video world modeling and robotic policy learning. - A single model generates both future video and the actions to make it real. - Long-term memory enables long-horizon tasks. - Claims significant outperformance over π₀.₅ in real-world tasks. - It's open-source

World Model meets robot policy! Robbyant's LingBot-VA: unifies video world modeling and robotic policy learning. - A single model generates both future video and the actions to make it real. - Long-term memory enables long-horizon tasks. - Claims significant outperformance over π₀.₅ in real-world tasks. - It's open-source

The Humanoid Hub

17,721 просмотров • 5 месяцев назад

Yay, finally! Introducing Vision Banana🍌 from Google DeepMind, our unified model that outperforms SoTA specialist models on various vision tasks! By treating 2D/3D vision tasks as image generation, we unlock a new foundation for CV. Project page: (1/5)

Yay, finally! Introducing Vision Banana🍌 from Google DeepMind, our unified model that outperforms SoTA specialist models on various vision tasks! By treating 2D/3D vision tasks as image generation, we unlock a new foundation for CV. Project page: (1/5)

Songyou Peng

287,404 просмотров • 3 месяцев назад

New Course: Post-training of LLMs Learn to post-train and customize an LLM in this short course, taught by Banghua Zhu, Assistant Professor at the University of Washington University of Washington, and co-founder of @NexusflowX. Training an LLM to follow instructions or answer questions has two key stages: pre-training and post-training. In pre-training, it learns to predict the next word or token from large amounts of unlabeled text. In post-training, it learns useful behaviors such as following instructions, tool use, and reasoning. Post-training transforms a general-purpose token predictor—trained on trillions of unlabeled text tokens—into an assistant that follows instructions and performs specific tasks. Because it is much cheaper than pre-training, it is practical for many more teams to incorporate post-training methods into their workflows than pre-training. In this course, you’ll learn three common post-training methods—Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL)—and how to use each one effectively. With SFT, you train the model on pairs of input and ideal output responses. With DPO, you provide both a preferred (chosen) and a less preferred (rejected) response and train the model to favor the preferred output. With RL, the model generates an output, receives a reward score based on human or automated feedback, and updates the model to improve performance. You’ll learn the basic concepts, common use cases, and principles for curating high-quality data for effective training. Through hands-on labs, you’ll download a pre-trained model from Hugging Face and post-train it using SFT, DPO, and RL to see how each technique shapes model behavior. In detail, you’ll: - Understand what post-training is, when to use it, and how it differs from pre-training. - Build an SFT pipeline to turn a base model into an instruct model. - Explore how DPO reshapes behavior by minimizing contrastive loss—penalizing poor responses and reinforcing preferred ones. - Implement a DPO pipeline to change the identity of a chat assistant. - Learn online RL methods such as Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), and how to design reward functions. - Train a model with GRPO to improve its math capabilities using a verifiable reward. Post-training is one of the most rapidly developing areas of LLM training. Whether you’re building a high-accuracy context-specific assistant, fine-tuning a model's tone, or improving task-specific accuracy, this course will give you experience with the most important techniques shaping how LLMs are post-trained today. Please sign up here:

New Course: Post-training of LLMs Learn to post-train and customize an LLM in this short course, taught by Banghua Zhu, Assistant Professor at the University of Washington University of Washington, and co-founder of @NexusflowX. Training an LLM to follow instructions or answer questions has two key stages: pre-training and post-training. In pre-training, it learns to predict the next word or token from large amounts of unlabeled text. In post-training, it learns useful behaviors such as following instructions, tool use, and reasoning. Post-training transforms a general-purpose token predictor—trained on trillions of unlabeled text tokens—into an assistant that follows instructions and performs specific tasks. Because it is much cheaper than pre-training, it is practical for many more teams to incorporate post-training methods into their workflows than pre-training. In this course, you’ll learn three common post-training methods—Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL)—and how to use each one effectively. With SFT, you train the model on pairs of input and ideal output responses. With DPO, you provide both a preferred (chosen) and a less preferred (rejected) response and train the model to favor the preferred output. With RL, the model generates an output, receives a reward score based on human or automated feedback, and updates the model to improve performance. You’ll learn the basic concepts, common use cases, and principles for curating high-quality data for effective training. Through hands-on labs, you’ll download a pre-trained model from Hugging Face and post-train it using SFT, DPO, and RL to see how each technique shapes model behavior. In detail, you’ll: - Understand what post-training is, when to use it, and how it differs from pre-training. - Build an SFT pipeline to turn a base model into an instruct model. - Explore how DPO reshapes behavior by minimizing contrastive loss—penalizing poor responses and reinforcing preferred ones. - Implement a DPO pipeline to change the identity of a chat assistant. - Learn online RL methods such as Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO), and how to design reward functions. - Train a model with GRPO to improve its math capabilities using a verifiable reward. Post-training is one of the most rapidly developing areas of LLM training. Whether you’re building a high-accuracy context-specific assistant, fine-tuning a model's tone, or improving task-specific accuracy, this course will give you experience with the most important techniques shaping how LLMs are post-trained today. Please sign up here:

Andrew Ng

125,146 просмотров • 1 год назад

CosmicMan A Text-to-Image Foundation Model for Humans We present CosmicMan, a text-to-image foundation model specialized for generating high-fidelity human images. Unlike current general-purpose foundation models that are stuck in the dilemma of inferior quality and

CosmicMan A Text-to-Image Foundation Model for Humans We present CosmicMan, a text-to-image foundation model specialized for generating high-fidelity human images. Unlike current general-purpose foundation models that are stuck in the dilemma of inferior quality and

AK

46,778 просмотров • 2 лет назад

LingBot-VLA 2.0 is an impressive new embodied model. Open source and is trained across diverse robot configurations, from single-arm robots to humanoid platforms. It packs 60K hours of curated robot and human data into one generalist policy. It improves robots on difficult long-horizon tasks. Great release by Robbyant.

LingBot-VLA 2.0 is an impressive new embodied model. Open source and is trained across diverse robot configurations, from single-arm robots to humanoid platforms. It packs 60K hours of curated robot and human data into one generalist policy. It improves robots on difficult long-horizon tasks. Great release by Robbyant.

elvis

10,464 просмотров • 14 дней назад

Before we ship a new model, these teams try to break it. They build with it, push it to its limits, and tell us where it falls short. What they find makes the final model better.

Before we ship a new model, these teams try to break it. They build with it, push it to its limits, and tell us where it falls short. What they find makes the final model better.

Claude

592,276 просмотров • 2 месяцев назад

Eden is building toward a future where physical services are as scalable as software and delivered autonomously, end-to-end, by intelligent agents. Our foundation model Theta is the first step toward that future.

Eden is building toward a future where physical services are as scalable as software and delivered autonomously, end-to-end, by intelligent agents. Our foundation model Theta is the first step toward that future.

Stamatis Floratos

342,723 просмотров • 2 месяцев назад

🧐Applying world models to improve real-world policy on challenging manipulation tasks used to be considered out of reach. 😌After sustained effort, we’re now seeing encouraging progress. 🚀Thrilled to introduce RISE: Self-Improving Robot Policy with Compositional World Model RISE is, to our knowledge, the first work to use a world model as an effective learning environment for challenging real-world manipulation, enabling policy improvement on tasks that demand high dynamics, dexterity, and precision. Incredible teamwork with Kunyang Lin Hongyang Li Yue Xiangyu Hao Zhao Chonghao Sima

🧐Applying world models to improve real-world policy on challenging manipulation tasks used to be considered out of reach. 😌After sustained effort, we’re now seeing encouraging progress. 🚀Thrilled to introduce RISE: Self-Improving Robot Policy with Compositional World Model RISE is, to our knowledge, the first work to use a world model as an effective learning environment for challenging real-world manipulation, enabling policy improvement on tasks that demand high dynamics, dexterity, and precision. Incredible teamwork with Kunyang Lin Hongyang Li Yue Xiangyu Hao Zhao Chonghao Sima

Jiazhi Yang ✈️ RSS2026

76,440 просмотров • 5 месяцев назад

1/ NitroGen: NVIDIA's new image-to-action model! NitroGen, a vision-action foundation model for generalist gaming agents that is trained on 40,000 hours of gameplay videos across more than 1,000 games. Gaming is a significant factor in AI training. Google DeepMind trained AI early on Starcraft 2, and OpenAI on Dota 2. This new product from NVIDIA is therefore extremely important. Why it matters and how it works:

1/ NitroGen: NVIDIA's new image-to-action model! NitroGen, a vision-action foundation model for generalist gaming agents that is trained on 40,000 hours of gameplay videos across more than 1,000 games. Gaming is a significant factor in AI training. Google DeepMind trained AI early on Starcraft 2, and OpenAI on Dota 2. This new product from NVIDIA is therefore extremely important. Why it matters and how it works:

Chubby♨️

55,280 просмотров • 7 месяцев назад

Humans learn and improve from failures. Similarly, foundation models adapt based on human feedback. Can we leverage this failure understanding to enhance robotics systems that use foundation models? Introducing AHA—a vision-language model for detecting and reasoning over failures in robotic manipulation. Project page: 🧵Thread👇 Aha!

Humans learn and improve from failures. Similarly, foundation models adapt based on human feedback. Can we leverage this failure understanding to enhance robotics systems that use foundation models? Introducing AHA—a vision-language model for detecting and reasoning over failures in robotic manipulation. Project page: 🧵Thread👇 Aha!

Jiafei Duan

48,777 просмотров • 1 год назад

Apple built a large foundation model and fine-tuned it on multiple tasks. But they are doing something very clever: They load a single model in memory and use different adapters to specialize the model on the fly. I recorded a video to show you how to write the code to do the same thing Apple is doing. I explain everything step by step. Here is what I'll show you in the video: 1. We'll load two datasets 2. Then load a large model 3. Then, we'll fine-tune the model on both datasets I'll use LoRA to fine-tune the model. This process creates two small adapters, each specializing in solving one of the datasets. The base model's original parameters will remain unchanged. From here: 4. We'll generate a list of tasks 5. We'll load the correct adapter to solve each task The large model I'm using needs 346 MB of memory, but I only need to load it once. Each adapter is only 2.7 MB. I only need to load the base model once and pair it with any of the fine-tuned adapters. Minimum memory footprint and I can solve multiple tasks. Hope this helps!

Apple built a large foundation model and fine-tuned it on multiple tasks. But they are doing something very clever: They load a single model in memory and use different adapters to specialize the model on the fly. I recorded a video to show you how to write the code to do the same thing Apple is doing. I explain everything step by step. Here is what I'll show you in the video: 1. We'll load two datasets 2. Then load a large model 3. Then, we'll fine-tune the model on both datasets I'll use LoRA to fine-tune the model. This process creates two small adapters, each specializing in solving one of the datasets. The base model's original parameters will remain unchanged. From here: 4. We'll generate a list of tasks 5. We'll load the correct adapter to solve each task The large model I'm using needs 346 MB of memory, but I only need to load it once. Each adapter is only 2.7 MB. I only need to load the base model once and pair it with any of the fine-tuned adapters. Minimum memory footprint and I can solve multiple tasks. Hope this helps!

Santiago

84,747 просмотров • 1 год назад