Loading video...

Video Failed to Load

Go Home

Placing objects sounds simple… until robots have to do it. This method makes it simple, fast & reliable. [Github ⬇️] Robotic object placement is tough, especially with stacking, hanging, or insertion. AnyPlace is a new two-stage method that uses only synthetic data and a vision-language model to teach robots...

22,843 views • 1 year ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

You can't 3D reconstruct glass from images... ...WRONG! Thanks for video diffusion, now just about anything is possible! Introducing...Diffusion Knows Transparency (DKT) Transparent and reflective objects usually break robot vision and photogrammetry pipelines because they don't follow the "solid object" rules standard cameras expect. DKT is a new AI model that repurposes the "internal physics engine" found in video generation models to solve this problem. Researchers took a massive video diffusion model (WAN) and fine-tuned it using a custom-built synthetic dataset to turn it into a high-precision depth sensor. To train the AI, they built the first massive synthetic video library of transparent objects, 1.32 million frames of perfectly labeled glass and metal objects in motion. Without ever seeing a "real" labeled video of glass during training, the model (DKT) outperformed all previous specialized systems on real-world benchmarks (ClearPose, DREDS). They created a "lightweight" 1.3B parameter version that runs fast enough (0.17s per frame) to be used on actual robot hardware. Two reasons I find this project important: 1. It further proves that synthetic data will be essential for training the next generation vision models. 2. In real-world robotic tests, using DKT's depth maps nearly doubled the success rate of robot arms trying to pick up objects on tricky reflective or translucent surfaces. At home robots will need to interact with these types of objects on a daily basis. Check out the project page here: Code is LIVE! #Computervision #Robotics #AI

Jonathan Stephens

17,712 views • 5 months ago

𝗘𝘃𝗲𝗿𝘆𝗼𝗻𝗲’𝘀 𝘁𝗮𝗹𝗸𝗶𝗻𝗴 𝗮𝗯𝗼𝘂𝘁 “𝗣𝗵𝘆𝘀𝗶𝗰𝗮𝗹 𝗔𝗜" - the idea that we can simulate real-world environments so well that robots trained in simulation will work perfectly in reality. 𝗧𝗵𝗲 𝗽𝗿𝗼𝗺𝗶𝘀𝗲: Train in virtual worlds → deploy anywhere. 𝗧𝗵𝗲 𝗿𝗲𝗮𝗹𝗶𝘁𝘆: I’ve seen too many teams fall into this trap. After working with manipulation teams at Berkeley, Imperial, and Dyson, here’s the pattern: • 𝗪𝗲𝗲𝗸 𝟭: “Our policy works perfectly in simulation!” • 𝗪𝗲𝗲𝗸 𝟰: “Why doesn’t this work on real objects?” • 𝗠𝗼𝗻𝘁𝗵 𝟮: “We basically need to retrain from scratch with real data.” 𝗧𝗵𝗲 𝗴𝗮𝗽 𝘀𝗶𝗺𝘂𝗹𝗮𝘁𝗶𝗼𝗻𝘀 𝗰𝗮𝗻’𝘁 𝗯𝗿𝗶𝗱𝗴𝗲: Unlike blind locomotion policies that can get away with sim-to-real transfer because they rely mainly on proprioception and contact forces, 𝘃𝗶𝘀𝗶𝗼𝗻-𝗴𝘂𝗶𝗱𝗲𝗱 𝗺𝗮𝗻𝗶𝗽𝘂𝗹𝗮𝘁𝗶𝗼𝗻 𝗶𝘀 𝗲𝘅𝘁𝗿𝗲𝗺𝗲𝗹𝘆 𝘀𝗲𝗻𝘀𝗶𝘁𝗶𝘃𝗲 𝘁𝗼 𝘃𝗶𝘀𝘂𝗮𝗹 𝗱𝗼𝗺𝗮𝗶𝗻 𝗴𝗮𝗽𝘀. • Real friction vs simulated surface textures • Manufacturing tolerances vs perfect CAD models • Dynamic lighting vs controlled virtual environments • Sensor noise vs instantaneous virtual readings 𝗛𝗲𝗿𝗲'𝘀 𝘄𝗵𝗮𝘁 𝗽𝗲𝗼𝗽𝗹𝗲 𝗱𝗼𝗻'𝘁 𝘁𝗮𝗹𝗸 𝗮𝗯𝗼𝘂𝘁: Building these detailed simulated environments takes forever. If it takes 7 days to build a simulated kitchen in simulation, wouldn't it be better to just collect real-world data in a real kitchen instead? 𝗗𝗼𝗻'𝘁 𝗴𝗲𝘁 𝗺𝗲 𝘄𝗿𝗼𝗻𝗴 - simulation is incredible for debugging, safety testing, and exploring edge cases. But it's not a magic solution to real-world deployment. 𝗪𝗵𝗮𝘁 𝗮𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝘄𝗼𝗿𝗸𝘀: Use simulation strategically while making real-world data collection as efficient and flexible as possible. This is why Neuracore focuses on streamlined real-world data infrastructure. Because no amount of virtual training can replace understanding how your robot actually behaves in actual environments. 𝗧𝗵𝗲 𝗽𝗵𝘆𝘀𝗶𝗰𝘀 𝗼𝗳 𝘆𝗼𝘂𝗿 𝗱𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗲𝗻𝘃𝗶𝗿𝗼𝗻𝗺𝗲𝗻𝘁 𝗰𝗮𝗻'𝘁 𝗯𝗲 𝘀𝗶𝗺𝘂𝗹𝗮𝘁𝗲𝗱 𝗮𝘄𝗮𝘆. What’s been your experience with sim-to-real transfer?

Stephen James

25,300 views • 8 months ago