Jonathan Stephens's banner
Jonathan Stephens's profile picture

Jonathan Stephens

@jonstephens8512,548 subscribers

Synthetc Data | World Models | Radiance Fields | Computer Vision | GenAI | Chief Evangelist at @LightwheelAI

Shorts

More tests last night with NVIDIA AI Developer's GEN3C. This was all generated from a single photo using local compute. I love how it accurately simulated the light fade around the building corner. Next up, dynamic scenes. #3D #GenAI #Computervision

More tests last night with NVIDIA AI Developer's GEN3C. This was all generated from a single photo using local compute. I love how it accurately simulated the light fade around the building corner. Next up, dynamic scenes. #3D #GenAI #Computervision

84,840 views

What if you could turn a single 360° photo into a production-ready Isaac Sim environment in minutes? That's exactly what we did here. Using World Labs' Marble and an Insta360 X5 capture (rotating on top), we generated a complete navigable 3D environment and populated it with Lightwheel Sim Ready assets (bottom view). The result? A fully interactive scene in Isaac Sim, ready for sim2real testing,. Navigation, manipulation, or any robotics task you need to validate. What used to take weeks of manual 3D modeling and asset placement now takes minutes. Capture once in the real world, simulate everywhere in your training pipeline. This is the future of robotics development with world models. NVIDIA Robotics NVIDIA Omniverse #Sim2Real #Robotics #Simulation

What if you could turn a single 360° photo into a production-ready Isaac Sim environment in minutes? That's exactly what we did here. Using World Labs' Marble and an Insta360 X5 capture (rotating on top), we generated a complete navigable 3D environment and populated it with Lightwheel Sim Ready assets (bottom view). The result? A fully interactive scene in Isaac Sim, ready for sim2real testing,. Navigation, manipulation, or any robotics task you need to validate. What used to take weeks of manual 3D modeling and asset placement now takes minutes. Capture once in the real world, simulate everywhere in your training pipeline. This is the future of robotics development with world models. NVIDIA Robotics NVIDIA Omniverse #Sim2Real #Robotics #Simulation

46,338 views

More beautiful triangle splatting. Wait a couple seconds for the camera to swing around in the direction of the source images.

More beautiful triangle splatting. Wait a couple seconds for the camera to swing around in the direction of the source images.

64,396 views

Triangle splatting. This is no joke!!! Metrics at 30k iters SSIM- 0.9477584958076477 PSNR - 32.29475784301758 LPIPS - 0.06017586588859558

Triangle splatting. This is no joke!!! Metrics at 30k iters SSIM- 0.9477584958076477 PSNR - 32.29475784301758 LPIPS - 0.06017586588859558

34,437 views

You can't 3D reconstruct glass from images... ...WRONG! Thanks for video diffusion, now just about anything is possible! Introducing...Diffusion Knows Transparency (DKT) Transparent and reflective objects usually break robot vision and photogrammetry pipelines because they don't follow the "solid object" rules standard cameras expect. DKT is a new AI model that repurposes the "internal physics engine" found in video generation models to solve this problem. Researchers took a massive video diffusion model (WAN) and fine-tuned it using a custom-built synthetic dataset to turn it into a high-precision depth sensor. To train the AI, they built the first massive synthetic video library of transparent objects, 1.32 million frames of perfectly labeled glass and metal objects in motion. Without ever seeing a "real" labeled video of glass during training, the model (DKT) outperformed all previous specialized systems on real-world benchmarks (ClearPose, DREDS). They created a "lightweight" 1.3B parameter version that runs fast enough (0.17s per frame) to be used on actual robot hardware. Two reasons I find this project important: 1. It further proves that synthetic data will be essential for training the next generation vision models. 2. In real-world robotic tests, using DKT's depth maps nearly doubled the success rate of robot arms trying to pick up objects on tricky reflective or translucent surfaces. At home robots will need to interact with these types of objects on a daily basis. Check out the project page here: Code is LIVE! #Computervision #Robotics #AI

You can't 3D reconstruct glass from images... ...WRONG! Thanks for video diffusion, now just about anything is possible! Introducing...Diffusion Knows Transparency (DKT) Transparent and reflective objects usually break robot vision and photogrammetry pipelines because they don't follow the "solid object" rules standard cameras expect. DKT is a new AI model that repurposes the "internal physics engine" found in video generation models to solve this problem. Researchers took a massive video diffusion model (WAN) and fine-tuned it using a custom-built synthetic dataset to turn it into a high-precision depth sensor. To train the AI, they built the first massive synthetic video library of transparent objects, 1.32 million frames of perfectly labeled glass and metal objects in motion. Without ever seeing a "real" labeled video of glass during training, the model (DKT) outperformed all previous specialized systems on real-world benchmarks (ClearPose, DREDS). They created a "lightweight" 1.3B parameter version that runs fast enough (0.17s per frame) to be used on actual robot hardware. Two reasons I find this project important: 1. It further proves that synthetic data will be essential for training the next generation vision models. 2. In real-world robotic tests, using DKT's depth maps nearly doubled the success rate of robot arms trying to pick up objects on tricky reflective or translucent surfaces. At home robots will need to interact with these types of objects on a daily basis. Check out the project page here: Code is LIVE! #Computervision #Robotics #AI

17,712 views

I finally got NVIDIA AI Developer's 3D Gaussian Unscented Transforms up and running natively on Windows via gsplat. Just in time for me to make a tutorial so people can enter NVIDIA's 3DGUT sweepstakes!!! Huge thanks to Ruilong Li for responding to my Github issues and solving my roadblocks this evening. #3D #Computervision

I finally got NVIDIA AI Developer's 3D Gaussian Unscented Transforms up and running natively on Windows via gsplat. Just in time for me to make a tutorial so people can enter NVIDIA's 3DGUT sweepstakes!!! Huge thanks to Ruilong Li for responding to my Github issues and solving my roadblocks this evening. #3D #Computervision

16,901 views

SpatialLM. Fast, not real-time. So much to explore here. I see the potential. More coming soon. #AI #AEC

SpatialLM. Fast, not real-time. So much to explore here. I see the potential. More coming soon. #AI #AEC

14,981 views

Check out the details difference between NVIDIA Cosmos 7B and 14B models! Watch the whole video to see each video separate. Twice the parameters makes a huge difference! What do you think? #GenerativeAI #AI #ArtificialIntelligence #Computervision

Check out the details difference between NVIDIA Cosmos 7B and 14B models! Watch the whole video to see each video separate. Twice the parameters makes a huge difference! What do you think? #GenerativeAI #AI #ArtificialIntelligence #Computervision

14,745 views

3DGS needs to fix this if they want to be used for virtual tours: Gaussian splats struggle with ceilings. NeRFs ALWAYS beat out quality in this respect. I captured this over the weekend to compare. I have ideas on how to fix it. Anyone else have success with ceilings?

3DGS needs to fix this if they want to be used for virtual tours: Gaussian splats struggle with ceilings. NeRFs ALWAYS beat out quality in this respect. I captured this over the weekend to compare. I have ideas on how to fix it. Anyone else have success with ceilings?

12,340 views

Videos

No more content to load