Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Placing objects sounds simple… until robots have to do it. This method makes it simple, fast & reliable. [Github ⬇️] Robotic object placement is tough, especially with stacking, hanging, or insertion. AnyPlace is a new two-stage method that uses only synthetic data and a vision-language model to teach robots... where and how to place objects; even in the real world. Why this works ✅ Finds the right spot with help from vision-language models ✅ Handles stacking, insertion, and hanging with no real-world training ✅ Trained on synthetic data using Blender and IsaacSim ✅ Works in the real world without fine-tuning It shows that smart use of simulation and language models can make robotic placement tasks easier, faster, and more reliable. Github: Paper: Thank you for sharing Animesh Garg !show more

Ilir Aliu - eu/acc

5,531 subscribers

22,843 views • 1 year ago •via X (Twitter)

Science & Technology

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

How can robots reliably place objects in diverse real-world tasks? 🤖🔍 Placement is tough—objects vary in shape and placement modes (such as stacking, hanging, and insertion), making it a challenging problem. We introduce AnyPlace, a two-stage method trained purely on synthetic data to predict diverse placement poses of unseen objects for real-world tasks. Read on for more👇

How can robots reliably place objects in diverse real-world tasks? 🤖🔍 Placement is tough—objects vary in shape and placement modes (such as stacking, hanging, and insertion), making it a challenging problem. We introduce AnyPlace, a two-stage method trained purely on synthetic data to predict diverse placement poses of unseen objects for real-world tasks. Read on for more👇

Animesh Garg

24,662 views • 1 year ago

Robots struggle with strict action rules…memory and symbols help them learn fast. [Project + Full video link ⬇️] Robots struggle when tasks require specific steps in a fixed order. What if memory helped them think symbolically and learn faster? Solving tasks like unlocking a door then opening it is hard for deep RL. But by learning constraint relationships and storing them in memory, robots can solve these tasks much faster; with fewer trials and less training. Why it works ✅ Learns symbolic rules about action constraints ✅ Uses memory to transfer what it learned across tasks ✅ Handles real-world exploration with just 30 minutes of data ✅ Needs 10x fewer episodes than deep RL approaches This memory-based method shows a promising path forward for robots learning structured, real-world tasks. Full video: Paper: Thank you, Mrinal Verghese for sharing this amazing work! 🙏

Robots struggle with strict action rules…memory and symbols help them learn fast. [Project + Full video link ⬇️] Robots struggle when tasks require specific steps in a fixed order. What if memory helped them think symbolically and learn faster? Solving tasks like unlocking a door then opening it is hard for deep RL. But by learning constraint relationships and storing them in memory, robots can solve these tasks much faster; with fewer trials and less training. Why it works ✅ Learns symbolic rules about action constraints ✅ Uses memory to transfer what it learned across tasks ✅ Handles real-world exploration with just 30 minutes of data ✅ Needs 10x fewer episodes than deep RL approaches This memory-based method shows a promising path forward for robots learning structured, real-world tasks. Full video: Paper: Thank you, Mrinal Verghese for sharing this amazing work! 🙏

Ilir Aliu - eu/acc

10,241 views • 1 year ago

Robotics keeps hitting the same wall. Single task RL works, but... it does not scale to hundreds of tasks or new embodiments. This new paper looks like a real step toward fixing that. The team introduces MMBench, a benchmark with 200 tasks across many domains and robots, and Newt, a language conditioned world model trained online across all 200 tasks at once. The simple idea behind Newt: The model learns from demos to get the right priors It trains across many tasks through online interaction It uses language to ground the goal It adapts fast when a new task shows up What stood out to me: ✅ One model trained on 200 tasks at the same time ✅ Language conditioned control for both states and RGB ✅ Better data efficiency than strong baselines ✅ Strong open loop control ✅ Fast adaptation to new tasks and embodiments ✅ Full release of 200 checkpoints, 4000 demos, code, and benchmark This is a good push toward general control instead of one model per task. If you want the full paper: Project page: —- Weekly robotics and AI insights. Subscribe free:

Robotics keeps hitting the same wall. Single task RL works, but... it does not scale to hundreds of tasks or new embodiments. This new paper looks like a real step toward fixing that. The team introduces MMBench, a benchmark with 200 tasks across many domains and robots, and Newt, a language conditioned world model trained online across all 200 tasks at once. The simple idea behind Newt: The model learns from demos to get the right priors It trains across many tasks through online interaction It uses language to ground the goal It adapts fast when a new task shows up What stood out to me: ✅ One model trained on 200 tasks at the same time ✅ Language conditioned control for both states and RGB ✅ Better data efficiency than strong baselines ✅ Strong open loop control ✅ Fast adaptation to new tasks and embodiments ✅ Full release of 200 checkpoints, 4000 demos, code, and benchmark This is a good push toward general control instead of one model per task. If you want the full paper: Project page: —- Weekly robotics and AI insights. Subscribe free:

Ilir Aliu

70,090 views • 6 months ago

You can't 3D reconstruct glass from images... ...WRONG! Thanks for video diffusion, now just about anything is possible! Introducing...Diffusion Knows Transparency (DKT) Transparent and reflective objects usually break robot vision and photogrammetry pipelines because they don't follow the "solid object" rules standard cameras expect. DKT is a new AI model that repurposes the "internal physics engine" found in video generation models to solve this problem. Researchers took a massive video diffusion model (WAN) and fine-tuned it using a custom-built synthetic dataset to turn it into a high-precision depth sensor. To train the AI, they built the first massive synthetic video library of transparent objects, 1.32 million frames of perfectly labeled glass and metal objects in motion. Without ever seeing a "real" labeled video of glass during training, the model (DKT) outperformed all previous specialized systems on real-world benchmarks (ClearPose, DREDS). They created a "lightweight" 1.3B parameter version that runs fast enough (0.17s per frame) to be used on actual robot hardware. Two reasons I find this project important: 1. It further proves that synthetic data will be essential for training the next generation vision models. 2. In real-world robotic tests, using DKT's depth maps nearly doubled the success rate of robot arms trying to pick up objects on tricky reflective or translucent surfaces. At home robots will need to interact with these types of objects on a daily basis. Check out the project page here: Code is LIVE! #Computervision #Robotics #AI

You can't 3D reconstruct glass from images... ...WRONG! Thanks for video diffusion, now just about anything is possible! Introducing...Diffusion Knows Transparency (DKT) Transparent and reflective objects usually break robot vision and photogrammetry pipelines because they don't follow the "solid object" rules standard cameras expect. DKT is a new AI model that repurposes the "internal physics engine" found in video generation models to solve this problem. Researchers took a massive video diffusion model (WAN) and fine-tuned it using a custom-built synthetic dataset to turn it into a high-precision depth sensor. To train the AI, they built the first massive synthetic video library of transparent objects, 1.32 million frames of perfectly labeled glass and metal objects in motion. Without ever seeing a "real" labeled video of glass during training, the model (DKT) outperformed all previous specialized systems on real-world benchmarks (ClearPose, DREDS). They created a "lightweight" 1.3B parameter version that runs fast enough (0.17s per frame) to be used on actual robot hardware. Two reasons I find this project important: 1. It further proves that synthetic data will be essential for training the next generation vision models. 2. In real-world robotic tests, using DKT's depth maps nearly doubled the success rate of robot arms trying to pick up objects on tricky reflective or translucent surfaces. At home robots will need to interact with these types of objects on a daily basis. Check out the project page here: Code is LIVE! #Computervision #Robotics #AI

Jonathan Stephens

17,712 views • 5 months ago

🤖 NVIDIA’s Gr00t N1.5 is now available in LeRobot! This is the result of a great collaboration between the Hugging Face LeRobot team and NVIDIA Robotics ! Gr00t N1.5 highlights: 🦾 Cross-embodiment foundation model for robots 🧠 Multimodal inputs: vision, language, and proprioception 🪛Tested on the Libero benchmark and real-world hardware tasks 🌍Trained on real robot, synthetic, and internet-scale video data ⚙️ Flow matching action transformer for action prediction

🤖 NVIDIA’s Gr00t N1.5 is now available in LeRobot! This is the result of a great collaboration between the Hugging Face LeRobot team and NVIDIA Robotics ! Gr00t N1.5 highlights: 🦾 Cross-embodiment foundation model for robots 🧠 Multimodal inputs: vision, language, and proprioception 🪛Tested on the Libero benchmark and real-world hardware tasks 🌍Trained on real robot, synthetic, and internet-scale video data ⚙️ Flow matching action transformer for action prediction

LeRobot

115,194 views • 7 months ago

Tactile interaction in the wild can unlock fine-grained manipulation! 🌿🤖✋ We built a portable handheld tactile gripper that enables large-scale visuo-tactile data collection in real-world settings. By pretraining on this data, we bridge vision and touch—allowing robots to: ✅ Perform robust in-hand reorientation ✅ Control contact and force with precision 🔗 Project page: (1/6)

Tactile interaction in the wild can unlock fine-grained manipulation! 🌿🤖✋ We built a portable handheld tactile gripper that enables large-scale visuo-tactile data collection in real-world settings. By pretraining on this data, we bridge vision and touch—allowing robots to: ✅ Perform robust in-hand reorientation ✅ Control contact and force with precision 🔗 Project page: (1/6)

Binghao Huang

78,160 views • 10 months ago

#NVIDIAIsaac Sim 5.0 and Isaac Lab 2.2 are now available in early developer preview on Github. 🎉 These releases give #Robotics developers early access to cutting-edge tools to simulate, train, and validate robots in a physics-based simulation environment. What’s new? ✅Open-source ✅Extensions for synthetic data generation ✅Robot models Read the tech blog to learn more ➡️ #GTCParis #VivaTech

#NVIDIAIsaac Sim 5.0 and Isaac Lab 2.2 are now available in early developer preview on Github. 🎉 These releases give #Robotics developers early access to cutting-edge tools to simulate, train, and validate robots in a physics-based simulation environment. What’s new? ✅Open-source ✅Extensions for synthetic data generation ✅Robot models Read the tech blog to learn more ➡️ #GTCParis #VivaTech

NVIDIA Robotics

13,613 views • 11 months ago

Can robots learn without training❓ [𝗜𝘁'𝘀 𝗼𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲𝗱 ⬇ ] Teaching robots to do complex tasks WITHOUT spending hours training them. Sounds cool, right? That's exactly what DIAL-MPC does! The first training-free method for whole-body torque control using full-order dynamics: ✅ Instantly checks if a robot's moves are right or wrong ✅ Adapts quickly to new tasks without needing extra training ✅ Could work hand-in-hand with other robot learning methods Robots are getting smarter AND faster without the need for long training sessions. Website: Paper: Code: Saw this first Haoru Xue ✈️ CVPR 🙏

Can robots learn without training❓ [𝗜𝘁'𝘀 𝗼𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲𝗱 ⬇ ] Teaching robots to do complex tasks WITHOUT spending hours training them. Sounds cool, right? That's exactly what DIAL-MPC does! The first training-free method for whole-body torque control using full-order dynamics: ✅ Instantly checks if a robot's moves are right or wrong ✅ Adapts quickly to new tasks without needing extra training ✅ Could work hand-in-hand with other robot learning methods Robots are getting smarter AND faster without the need for long training sessions. Website: Paper: Code: Saw this first Haoru Xue ✈️ CVPR 🙏

Ilir Aliu

71,502 views • 1 year ago

Imagine robots finally handling cables and wires! 🔌 A new simulation now allows robots to practice handling flexible objects like cables, wires, and hoses in a virtual environment. Handling wires is especially difficult for robots because they are not rigid and behave unpredictably. This simulation captures realistic physics such as bending, tension, and complex movements, helping robots learn more effectively. 📚 Training with these simulated assets helps robots perform better when moving from virtual training to real-world tasks. Applications include electronic device assembly, automotive wire harness installation, smart home setup, and industrial wiring. This approach makes robot training more complete and prepares them for the challenges they will face in real environments.

Imagine robots finally handling cables and wires! 🔌 A new simulation now allows robots to practice handling flexible objects like cables, wires, and hoses in a virtual environment. Handling wires is especially difficult for robots because they are not rigid and behave unpredictably. This simulation captures realistic physics such as bending, tension, and complex movements, helping robots learn more effectively. 📚 Training with these simulated assets helps robots perform better when moving from virtual training to real-world tasks. Applications include electronic device assembly, automotive wire harness installation, smart home setup, and industrial wiring. This approach makes robot training more complete and prepares them for the challenges they will face in real environments.

Lukas Ziegler

29,916 views • 1 year ago

MolmoAct2 is landing in LeRobot! Ai2's open Action Reasoning Model combines a Molmo2-ER vision-language backbone with a flow-matching continuous action expert to predict robot action chunks from images, language instructions, and proprioceptive state. An open robot foundation model built for real-world control, with strong out-of-the-box performance and easy fine-tuning in LeRobot. Pick-and-place inference running on NVIDIA DGX Spark! Blog: Paper: Thanks to Ai2 Jiafei Duan Haoquan Fang

MolmoAct2 is landing in LeRobot! Ai2's open Action Reasoning Model combines a Molmo2-ER vision-language backbone with a flow-matching continuous action expert to predict robot action chunks from images, language instructions, and proprioceptive state. An open robot foundation model built for real-world control, with strong out-of-the-box performance and easy fine-tuning in LeRobot. Pick-and-place inference running on NVIDIA DGX Spark! Blog: Paper: Thanks to Ai2 Jiafei Duan Haoquan Fang

LeRobot

16,946 views • 6 days ago

OpenAI's Deep Research is getting a run for its money. Deep Lake was just released, and it's a different take on an AI system that can do deep research on your own data. You can use Deep Lake to build AI search with reasoning on your private and public data. (Look at the attached videos to get an idea of how it works.) If you want to research proprietary and sensitive data, Deep Research won't help you because it's limited to public data. Deep Lake, however, will allow you to use your private data. On top of that, Deep Lake supports multi-modal retrieval from the ground up. It uses vision language models for data ingestion and retrieval so that you can connect any data (PDFs, images, videos, structured data, etc.) You can even use mixed-data queries! Deep Lake can search your data from S3, Dropbox, and GCP. It learns from your queries over time, making the results as relevant to your work as possible!

OpenAI's Deep Research is getting a run for its money. Deep Lake was just released, and it's a different take on an AI system that can do deep research on your own data. You can use Deep Lake to build AI search with reasoning on your private and public data. (Look at the attached videos to get an idea of how it works.) If you want to research proprietary and sensitive data, Deep Research won't help you because it's limited to public data. Deep Lake, however, will allow you to use your private data. On top of that, Deep Lake supports multi-modal retrieval from the ground up. It uses vision language models for data ingestion and retrieval so that you can connect any data (PDFs, images, videos, structured data, etc.) You can even use mixed-data queries! Deep Lake can search your data from S3, Dropbox, and GCP. It learns from your queries over time, making the results as relevant to your work as possible!

Santiago

171,340 views • 1 year ago

🚀Thrilled to share what we’ve been building at TRI over the past several months: our first Large Behavior Models (LBMs) are here! I’m proud to have been a core contributor to the multi-task policy learning and post-training efforts. At TRI, we’ve been researching how LBMs can help robots learn faster, better, and more efficiently. The key takeaways: ✅ We built an evaluation pipeline to benchmark LBM performance with real 𝐬𝐭𝐚𝐭𝐢𝐬𝐭𝐢𝐜𝐚𝐥 𝐜𝐨𝐧𝐟𝐢𝐝𝐞𝐧𝐜𝐞 ✅ Pre-training on hundreds of tasks makes models more robust—plus, we can teach new, complex tasks with 80% 𝐥𝐞𝐬𝐬 𝐝𝐚𝐭𝐚 ✅ The bigger and more diverse the pre-training, the better the results Check out our overview video, webpage and paper for more details: ✨ 🌎 📄 We hope this work helps move the field of robotics forward!

🚀Thrilled to share what we’ve been building at TRI over the past several months: our first Large Behavior Models (LBMs) are here! I’m proud to have been a core contributor to the multi-task policy learning and post-training efforts. At TRI, we’ve been researching how LBMs can help robots learn faster, better, and more efficiently. The key takeaways: ✅ We built an evaluation pipeline to benchmark LBM performance with real 𝐬𝐭𝐚𝐭𝐢𝐬𝐭𝐢𝐜𝐚𝐥 𝐜𝐨𝐧𝐟𝐢𝐝𝐞𝐧𝐜𝐞 ✅ Pre-training on hundreds of tasks makes models more robust—plus, we can teach new, complex tasks with 80% 𝐥𝐞𝐬𝐬 𝐝𝐚𝐭𝐚 ✅ The bigger and more diverse the pre-training, the better the results Check out our overview video, webpage and paper for more details: ✨ 🌎 📄 We hope this work helps move the field of robotics forward!

Zubair Irshad

20,256 views • 10 months ago

Spatial AI ( is building large real-world datasets to teach robots how to navigate the world and complete tasks. Their first open source dataset, SEA (Spatial Everyday Activities), is the largest curated egocentric dataset of people carrying out real tasks, with 10,000 hours of data.

Spatial AI ( is building large real-world datasets to teach robots how to navigate the world and complete tasks. Their first open source dataset, SEA (Spatial Everyday Activities), is the largest curated egocentric dataset of people carrying out real tasks, with 10,000 hours of data.

Y Combinator

20,944 views • 6 months ago

You can now delegate tasks to GitHub Copilot coding agent from any page on GitHub 🤖 Open the new Agents panel in one click, write a simple prompt, then hit Enter. GitHub Copilot works in the background, and opens a PR for your review. No interruptions to your workflow required. ✅

You can now delegate tasks to GitHub Copilot coding agent from any page on GitHub 🤖 Open the new Agents panel in one click, write a simple prompt, then hit Enter. GitHub Copilot works in the background, and opens a PR for your review. No interruptions to your workflow required. ✅

GitHub

112,727 views • 9 months ago

𝗘𝘃𝗲𝗿𝘆𝗼𝗻𝗲’𝘀 𝘁𝗮𝗹𝗸𝗶𝗻𝗴 𝗮𝗯𝗼𝘂𝘁 “𝗣𝗵𝘆𝘀𝗶𝗰𝗮𝗹 𝗔𝗜" - the idea that we can simulate real-world environments so well that robots trained in simulation will work perfectly in reality. 𝗧𝗵𝗲 𝗽𝗿𝗼𝗺𝗶𝘀𝗲: Train in virtual worlds → deploy anywhere. 𝗧𝗵𝗲 𝗿𝗲𝗮𝗹𝗶𝘁𝘆: I’ve seen too many teams fall into this trap. After working with manipulation teams at Berkeley, Imperial, and Dyson, here’s the pattern: • 𝗪𝗲𝗲𝗸 𝟭: “Our policy works perfectly in simulation!” • 𝗪𝗲𝗲𝗸 𝟰: “Why doesn’t this work on real objects?” • 𝗠𝗼𝗻𝘁𝗵 𝟮: “We basically need to retrain from scratch with real data.” 𝗧𝗵𝗲 𝗴𝗮𝗽 𝘀𝗶𝗺𝘂𝗹𝗮𝘁𝗶𝗼𝗻𝘀 𝗰𝗮𝗻’𝘁 𝗯𝗿𝗶𝗱𝗴𝗲: Unlike blind locomotion policies that can get away with sim-to-real transfer because they rely mainly on proprioception and contact forces, 𝘃𝗶𝘀𝗶𝗼𝗻-𝗴𝘂𝗶𝗱𝗲𝗱 𝗺𝗮𝗻𝗶𝗽𝘂𝗹𝗮𝘁𝗶𝗼𝗻 𝗶𝘀 𝗲𝘅𝘁𝗿𝗲𝗺𝗲𝗹𝘆 𝘀𝗲𝗻𝘀𝗶𝘁𝗶𝘃𝗲 𝘁𝗼 𝘃𝗶𝘀𝘂𝗮𝗹 𝗱𝗼𝗺𝗮𝗶𝗻 𝗴𝗮𝗽𝘀. • Real friction vs simulated surface textures • Manufacturing tolerances vs perfect CAD models • Dynamic lighting vs controlled virtual environments • Sensor noise vs instantaneous virtual readings 𝗛𝗲𝗿𝗲'𝘀 𝘄𝗵𝗮𝘁 𝗽𝗲𝗼𝗽𝗹𝗲 𝗱𝗼𝗻'𝘁 𝘁𝗮𝗹𝗸 𝗮𝗯𝗼𝘂𝘁: Building these detailed simulated environments takes forever. If it takes 7 days to build a simulated kitchen in simulation, wouldn't it be better to just collect real-world data in a real kitchen instead? 𝗗𝗼𝗻'𝘁 𝗴𝗲𝘁 𝗺𝗲 𝘄𝗿𝗼𝗻𝗴 - simulation is incredible for debugging, safety testing, and exploring edge cases. But it's not a magic solution to real-world deployment. 𝗪𝗵𝗮𝘁 𝗮𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝘄𝗼𝗿𝗸𝘀: Use simulation strategically while making real-world data collection as efficient and flexible as possible. This is why Neuracore focuses on streamlined real-world data infrastructure. Because no amount of virtual training can replace understanding how your robot actually behaves in actual environments. 𝗧𝗵𝗲 𝗽𝗵𝘆𝘀𝗶𝗰𝘀 𝗼𝗳 𝘆𝗼𝘂𝗿 𝗱𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗲𝗻𝘃𝗶𝗿𝗼𝗻𝗺𝗲𝗻𝘁 𝗰𝗮𝗻'𝘁 𝗯𝗲 𝘀𝗶𝗺𝘂𝗹𝗮𝘁𝗲𝗱 𝗮𝘄𝗮𝘆. What’s been your experience with sim-to-real transfer?

𝗘𝘃𝗲𝗿𝘆𝗼𝗻𝗲’𝘀 𝘁𝗮𝗹𝗸𝗶𝗻𝗴 𝗮𝗯𝗼𝘂𝘁 “𝗣𝗵𝘆𝘀𝗶𝗰𝗮𝗹 𝗔𝗜" - the idea that we can simulate real-world environments so well that robots trained in simulation will work perfectly in reality. 𝗧𝗵𝗲 𝗽𝗿𝗼𝗺𝗶𝘀𝗲: Train in virtual worlds → deploy anywhere. 𝗧𝗵𝗲 𝗿𝗲𝗮𝗹𝗶𝘁𝘆: I’ve seen too many teams fall into this trap. After working with manipulation teams at Berkeley, Imperial, and Dyson, here’s the pattern: • 𝗪𝗲𝗲𝗸 𝟭: “Our policy works perfectly in simulation!” • 𝗪𝗲𝗲𝗸 𝟰: “Why doesn’t this work on real objects?” • 𝗠𝗼𝗻𝘁𝗵 𝟮: “We basically need to retrain from scratch with real data.” 𝗧𝗵𝗲 𝗴𝗮𝗽 𝘀𝗶𝗺𝘂𝗹𝗮𝘁𝗶𝗼𝗻𝘀 𝗰𝗮𝗻’𝘁 𝗯𝗿𝗶𝗱𝗴𝗲: Unlike blind locomotion policies that can get away with sim-to-real transfer because they rely mainly on proprioception and contact forces, 𝘃𝗶𝘀𝗶𝗼𝗻-𝗴𝘂𝗶𝗱𝗲𝗱 𝗺𝗮𝗻𝗶𝗽𝘂𝗹𝗮𝘁𝗶𝗼𝗻 𝗶𝘀 𝗲𝘅𝘁𝗿𝗲𝗺𝗲𝗹𝘆 𝘀𝗲𝗻𝘀𝗶𝘁𝗶𝘃𝗲 𝘁𝗼 𝘃𝗶𝘀𝘂𝗮𝗹 𝗱𝗼𝗺𝗮𝗶𝗻 𝗴𝗮𝗽𝘀. • Real friction vs simulated surface textures • Manufacturing tolerances vs perfect CAD models • Dynamic lighting vs controlled virtual environments • Sensor noise vs instantaneous virtual readings 𝗛𝗲𝗿𝗲'𝘀 𝘄𝗵𝗮𝘁 𝗽𝗲𝗼𝗽𝗹𝗲 𝗱𝗼𝗻'𝘁 𝘁𝗮𝗹𝗸 𝗮𝗯𝗼𝘂𝘁: Building these detailed simulated environments takes forever. If it takes 7 days to build a simulated kitchen in simulation, wouldn't it be better to just collect real-world data in a real kitchen instead? 𝗗𝗼𝗻'𝘁 𝗴𝗲𝘁 𝗺𝗲 𝘄𝗿𝗼𝗻𝗴 - simulation is incredible for debugging, safety testing, and exploring edge cases. But it's not a magic solution to real-world deployment. 𝗪𝗵𝗮𝘁 𝗮𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝘄𝗼𝗿𝗸𝘀: Use simulation strategically while making real-world data collection as efficient and flexible as possible. This is why Neuracore focuses on streamlined real-world data infrastructure. Because no amount of virtual training can replace understanding how your robot actually behaves in actual environments. 𝗧𝗵𝗲 𝗽𝗵𝘆𝘀𝗶𝗰𝘀 𝗼𝗳 𝘆𝗼𝘂𝗿 𝗱𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗲𝗻𝘃𝗶𝗿𝗼𝗻𝗺𝗲𝗻𝘁 𝗰𝗮𝗻'𝘁 𝗯𝗲 𝘀𝗶𝗺𝘂𝗹𝗮𝘁𝗲𝗱 𝗮𝘄𝗮𝘆. What’s been your experience with sim-to-real transfer?

Stephen James

25,300 views • 8 months ago

Video diffusion models have strong implicit representations of 3D shape, material, and lighting, but controlling them with language is cumbersome, and control is critical for artists and animators. GenLit connects these implicit representations with a continuous 5D control signal describing the direction and intensity of a point light source. This enables single-image near-field relighting of an image using a video diffusion model. We use a ControlNet-like approach and show that, with a small amount of synthetic data, GenLit generalizes to complex real-world images. Given a single image and the 5D lighting signal, GenLit creates a video of a moving light source that is inside the scene. It moves around and behind scene objects, producing effects such as shading, cast shadows, secularities, and interreflections with a realism that is hard to obtain with traditional inverse rendering methods. GenLit shows that it is possible to get continuous control over implicit physical processes within a video model. I think this is just the beginning and promises to make such models much more practical for creators. Shrisha Bharadwaj will present today at SIGGRAPH Asia Room: S423/S424, Level 4 @ 13:50 on 15 of Dec.

Video diffusion models have strong implicit representations of 3D shape, material, and lighting, but controlling them with language is cumbersome, and control is critical for artists and animators. GenLit connects these implicit representations with a continuous 5D control signal describing the direction and intensity of a point light source. This enables single-image near-field relighting of an image using a video diffusion model. We use a ControlNet-like approach and show that, with a small amount of synthetic data, GenLit generalizes to complex real-world images. Given a single image and the 5D lighting signal, GenLit creates a video of a moving light source that is inside the scene. It moves around and behind scene objects, producing effects such as shading, cast shadows, secularities, and interreflections with a realism that is hard to obtain with traditional inverse rendering methods. GenLit shows that it is possible to get continuous control over implicit physical processes within a video model. I think this is just the beginning and promises to make such models much more practical for creators. Shrisha Bharadwaj will present today at SIGGRAPH Asia Room: S423/S424, Level 4 @ 13:50 on 15 of Dec.

Michael Black

22,004 views • 5 months ago

🛠️ What if a robot could invent its own tools. And teach itself how to use them? That’s exactly what VLMgineer does: a new framework that lets Vision Language Models (VLMs) design physical tools and the actions to use them, entirely on their own. No templates. No human demonstrations. Just raw, AI-driven creativity. Why it matters ✅ Co-designs tools and actions together using VLMs, ensuring tight coupling between form and function ✅ Uses VLM-guided evolution (not random search) to refine designs intelligently ✅ Outperforms human-designed tools by +64.7% in task success across 12 RoboToolBench challenges ✅ Produces better-than-everyday tools for real manipulation tasks—measured in success rate and elegance It builds on the emerging trend of large-model-guided evolutionary design (like Eureka and AlphaEvolve) and brings it into physical robotics. It opens the door to general-purpose, automated hardware design, no strong priors needed. Code & paper: —- Weekly robotics and AI insights. Subscribe free:

🛠️ What if a robot could invent its own tools. And teach itself how to use them? That’s exactly what VLMgineer does: a new framework that lets Vision Language Models (VLMs) design physical tools and the actions to use them, entirely on their own. No templates. No human demonstrations. Just raw, AI-driven creativity. Why it matters ✅ Co-designs tools and actions together using VLMs, ensuring tight coupling between form and function ✅ Uses VLM-guided evolution (not random search) to refine designs intelligently ✅ Outperforms human-designed tools by +64.7% in task success across 12 RoboToolBench challenges ✅ Produces better-than-everyday tools for real manipulation tasks—measured in success rate and elegance It builds on the emerging trend of large-model-guided evolutionary design (like Eureka and AlphaEvolve) and brings it into physical robotics. It opens the door to general-purpose, automated hardware design, no strong priors needed. Code & paper: —- Weekly robotics and AI insights. Subscribe free:

Ilir Aliu

13,984 views • 5 months ago

We have been working on blending and this is a simple 1 2 3 Flip It activity for cvc words, real and nonsense words. The cards are numbered to help with the placement. Students worked with a partner.

We have been working on blending and this is a simple 1 2 3 Flip It activity for cvc words, real and nonsense words. The cards are numbered to help with the placement. Students worked with a partner.

Melanie Brethour

66,447 views • 3 years ago

Why add sensors and complex systems when physics can do the job? This production line sorts products using only weight and controlled bursts of air. ✅ No cameras or vision models ✅ No expensive integration ✅ Just reliable, repeatable separation at scale It’s a reminder that not every automation breakthrough is about AI or advanced robotics... sometimes simple mechanical principles are the most efficient solution. —- Weekly robotics and AI insights. Subscribe free:

Why add sensors and complex systems when physics can do the job? This production line sorts products using only weight and controlled bursts of air. ✅ No cameras or vision models ✅ No expensive integration ✅ Just reliable, repeatable separation at scale It’s a reminder that not every automation breakthrough is about AI or advanced robotics... sometimes simple mechanical principles are the most efficient solution. —- Weekly robotics and AI insights. Subscribe free:

Ilir Aliu

532,332 views • 5 months ago

Predicting the next word "only" is sufficient for language models to learn a large body of knowledge that enables then to code, answer questions, understand many topics, chat, and so on. This is clear to many researchers now, and there are nice tutorials on why this works by Ilya Sutskever resorting to compression ( ) and by Geoffrey Hinton ( ). However, the emergence of types of understanding is not unique to language models. In by Misha Denil and Brandon Amos the authors trained models to predict the next few time stems of over a hundred robot hand sensors (Touch, Gyro, Accelerometer, Joint Info, Actuator Info, etc.). They ten found out that they could regress the shape of the thing the hand was touching from the activations of the neural networks using probes. That is, the model developed an internal representation of shapes even though it was simply used to predict "only" the next few senses. Awareness follows from simple predictions and interaction with the world.

Predicting the next word "only" is sufficient for language models to learn a large body of knowledge that enables then to code, answer questions, understand many topics, chat, and so on. This is clear to many researchers now, and there are nice tutorials on why this works by Ilya Sutskever resorting to compression ( ) and by Geoffrey Hinton ( ). However, the emergence of types of understanding is not unique to language models. In by Misha Denil and Brandon Amos the authors trained models to predict the next few time stems of over a hundred robot hand sensors (Touch, Gyro, Accelerometer, Joint Info, Actuator Info, etc.). They ten found out that they could regress the shape of the thing the hand was touching from the activations of the neural networks using probes. That is, the model developed an internal representation of shapes even though it was simply used to predict "only" the next few senses. Awareness follows from simple predictions and interaction with the world.

Nando de Freitas

130,179 views • 2 years ago