Bilawal Sidhu's banner
Bilawal Sidhu's profile picture

Bilawal Sidhu

@bilawalsidhu106,292 subscribers

Spatial intelligence. World models. Visual effects. Creator w/ 1.6M+ audience. Tech Curator @ TED. A16z Scout. Ex-Google PM (AR/VR & 3D Maps) https://t.co/fysPkbPoQ2

Shorts

peak internet: ai generated cctv footage of police arresting ppl for wearing huge boots video credit: u/Qemmish

peak internet: ai generated cctv footage of police arresting ppl for wearing huge boots video credit: u/Qemmish

25,871,407 просмотров

Nano Banana Pro is a really good cartographer. Used it to turn low res satellite imagery into a detailed hand drawn map and vector HD map. Pretty wild how well it segments everything and even recovers paths/roads hidden under tree cover. Looks way more detailed than the current google basemap which is pretty sparse in countries like India. Included both in video for comparison.

Nano Banana Pro is a really good cartographer. Used it to turn low res satellite imagery into a detailed hand drawn map and vector HD map. Pretty wild how well it segments everything and even recovers paths/roads hidden under tree cover. Looks way more detailed than the current google basemap which is pretty sparse in countries like India. Included both in video for comparison.

553,619 просмотров

OpenAI just dropped their Sora research paper. As expected, the video-to-video results are flipping spectacular 🪄 A few other gems:

OpenAI just dropped their Sora research paper. As expected, the video-to-video results are flipping spectacular 🪄 A few other gems:

1,873,609 просмотров

Damn it worked! Genie 3 world --> inpaint UI --> 4x topaz AI upscale --> train 3d gaussian splat You can step inside a painting of Socrates from 1787. Better than any image-to-3d model I've seen. I think Google has stumbled upon the killer app for VR -- the literal holodeck.

Damn it worked! Genie 3 world --> inpaint UI --> 4x topaz AI upscale --> train 3d gaussian splat You can step inside a painting of Socrates from 1787. Better than any image-to-3d model I've seen. I think Google has stumbled upon the killer app for VR -- the literal holodeck.

656,729 просмотров

Generative AI is super cool… BUT I’m continually blown away with the work happening in ‘procedural’ 3D modelling. Especially given these plugins are for a free (!) 3D tool like Blender. We needed a fancy Houdini license to do this just a few years ago 🤯

Generative AI is super cool… BUT I’m continually blown away with the work happening in ‘procedural’ 3D modelling. Especially given these plugins are for a free (!) 3D tool like Blender. We needed a fancy Houdini license to do this just a few years ago 🤯

1,365,691 просмотров

One of the wildest emergent capabilities of Genie 3 is that maps actually work. As I walk around the forest, the GPS display updates its heading in real time. Remember. There is no game engine here. This is an AI hallucinating a working navigational instrument purely from next frame prediction. 🤯

One of the wildest emergent capabilities of Genie 3 is that maps actually work. As I walk around the forest, the GPS display updates its heading in real time. Remember. There is no game engine here. This is an AI hallucinating a working navigational instrument purely from next frame prediction. 🤯

246,852 просмотров

I love 3d maps like this because they’re inherently an abstraction of reality. Unlike 3d scans, the goal isn’t to create a 1:1 mirror world. The goal is to create a stylized distillation down to its visual essence, so you can recognize it effortlessly.

I love 3d maps like this because they’re inherently an abstraction of reality. Unlike 3d scans, the goal isn’t to create a 1:1 mirror world. The goal is to create a stylized distillation down to its visual essence, so you can recognize it effortlessly.

249,351 просмотров

Lmao. What niche even is this — grassroots dirt track racing meets google maps nerds? Veo 3 videos are seriously ridiculous and fun. Turn audio on for max enjoyment.

Lmao. What niche even is this — grassroots dirt track racing meets google maps nerds? Veo 3 videos are seriously ridiculous and fun. Turn audio on for max enjoyment.

451,956 просмотров

The lines between code & content are blurring. I made this 3d city block animation in claude 3.7. Then I used runway gen-3's video-to-video to style it like a lego city at night. At the rate things are going, this'll be a shader running in real-time.

The lines between code & content are blurring. I made this 3d city block animation in claude 3.7. Then I used runway gen-3's video-to-video to style it like a lego city at night. At the rate things are going, this'll be a shader running in real-time.

487,372 просмотров

AI stitching together multiple video feeds into one omniscient traffic god. This is what happens when cameras start talking to each other -- mapping the trajectory of every vehicle and pedestrian seamlessly across cameras. Spatial intelligence is coming to a city near you.

AI stitching together multiple video feeds into one omniscient traffic god. This is what happens when cameras start talking to each other -- mapping the trajectory of every vehicle and pedestrian seamlessly across cameras. Spatial intelligence is coming to a city near you.

328,202 просмотров

Generative AI is cool and all, but procedural 3D modelling continues to hit the spot

Generative AI is cool and all, but procedural 3D modelling continues to hit the spot

500,696 просмотров

5 min video from an insta360 camera turned into a big ass 3d gaussian splat using the new niantic scaniverse app it’s kinda wild how well we can model the complexity of reality, and run in realtime in a browser at 100fps

5 min video from an insta360 camera turned into a big ass 3d gaussian splat using the new niantic scaniverse app it’s kinda wild how well we can model the complexity of reality, and run in realtime in a browser at 100fps

62,321 просмотров

Nano is a depth-aware atmospheric haze plugin that uses ML depth estimation to add physically accurate fog and light scattering to your footage. Works *best* on log footage with visible light sources - it analyzes scene highlights then creates airlight (atmospheric scatter) and halation (light bloom) that responds to actual depth in the scene. Pretty clever approach to getting that cinematic haze look without having to pump a fog machine on set. Makes the OG Trapcode Shine look extremely dated (basically 2D light streaks masked by luminance values), and is yet way more controllable than the current crop of generative AI video-to-video tools.

Nano is a depth-aware atmospheric haze plugin that uses ML depth estimation to add physically accurate fog and light scattering to your footage. Works *best* on log footage with visible light sources - it analyzes scene highlights then creates airlight (atmospheric scatter) and halation (light bloom) that responds to actual depth in the scene. Pretty clever approach to getting that cinematic haze look without having to pump a fog machine on set. Makes the OG Trapcode Shine look extremely dated (basically 2D light streaks masked by luminance values), and is yet way more controllable than the current crop of generative AI video-to-video tools.

275,047 просмотров

Generative AI is really cool but sometimes you want to drift your car through an intersection in a super specific way. Draw a spline then get 3D onion skinning so you can adjust the curves with a clear spatial reference. iCars plugin for Blender:

Generative AI is really cool but sometimes you want to drift your car through an intersection in a super specific way. Draw a spline then get 3D onion skinning so you can adjust the curves with a clear spatial reference. iCars plugin for Blender:

183,667 просмотров

Heads up! Mosaic dropped a pretty wild dataset of 1.26 million 360° images of Prague 🤯 If you're a researcher, creator or developer into 3D/AI/Geo, I think you're gonna wanna play with this Here's the scoop on this 15 TERAPIXEL dataset & the crazy things you can do with it 🧵

Heads up! Mosaic dropped a pretty wild dataset of 1.26 million 360° images of Prague 🤯 If you're a researcher, creator or developer into 3D/AI/Geo, I think you're gonna wanna play with this Here's the scoop on this 15 TERAPIXEL dataset & the crazy things you can do with it 🧵

295,756 просмотров

Google just took a big step towards building ChatGPT for Earth. AlphaEarth Foundations does something clever -- instead of drowning in petabytes of Earth observation data, it creates compact summaries of every 10x10m square on Earth by fusing optical, radar, LiDAR, and climate data. The kicker is it can see through clouds in Ecuador and reveal hidden agricultural patterns in Canada. MapBiomas and Global Ecosystems Atlas already using it for conservation work.

Google just took a big step towards building ChatGPT for Earth. AlphaEarth Foundations does something clever -- instead of drowning in petabytes of Earth observation data, it creates compact summaries of every 10x10m square on Earth by fusing optical, radar, LiDAR, and climate data. The kicker is it can see through clouds in Ecuador and reveal hidden agricultural patterns in Canada. MapBiomas and Global Ecosystems Atlas already using it for conservation work.

165,835 просмотров

I can no longer walk around a city without seeing a machine-readable 3D model in my head. This is a geometric and semantic 3D model of San Francisco. These maps connect the world of bits and atoms, enabling visual positioning, 3D navigation and geospatial intelligence.

I can no longer walk around a city without seeing a machine-readable 3D model in my head. This is a geometric and semantic 3D model of San Francisco. These maps connect the world of bits and atoms, enabling visual positioning, 3D navigation and geospatial intelligence.

260,175 просмотров

one 2d photo --> 3d gaussian splat quick test with Echo-2 by SpAItial -- these 3d scene generation models are getting better! already at a sufficient quality to serve as a virtual set / backdrop in your 3d tool of choice

one 2d photo --> 3d gaussian splat quick test with Echo-2 by SpAItial -- these 3d scene generation models are getting better! already at a sufficient quality to serve as a virtual set / backdrop in your 3d tool of choice

23,511 просмотров

You can capture reality with remarkable fidelity - here’s a 1:1 3d gaussian splat perfectly aligned with reality. Once Meta layers in a scalable version of their codec avatar tech - this stuff is about to get trippy. Literally transcending time & space.

You can capture reality with remarkable fidelity - here’s a 1:1 3d gaussian splat perfectly aligned with reality. Once Meta layers in a scalable version of their codec avatar tech - this stuff is about to get trippy. Literally transcending time & space.

106,388 просмотров

If your developer ain’t locking in like this, your startup is ngmi

If your developer ain’t locking in like this, your startup is ngmi

147,573 просмотров

Videos

bilawalsidhu's profile picture

People are undoubtedly a little alarmed at having unwittingly helped build a 3D map of the world for Niantic by contributing 30 billion crowdsourced images. I interviewed Niantic's CTO Brian McClendon about exactly this in a TED interview last year -- he's also the guy who co-created Google Earth. But let's put it in perspective. Pokestop data isn't what you think it is. It's not a surveillance panopticon of your neighborhood. These are static captures of parks, statues, murals, landmarks -- the places people congregate. Brian described it as "building the map from the bottom up, from the locations where people spend time." Think of these 20 million waypoints as basically the inverse of what Google mapped with Street View. Google mapped the drivable streets. Niantic mapped where people actually hang out. Cool data, genuinely useful for visual positioning -- but very different from what the headlines imply. And lest we forget that Niantic is just one of many companies quietly building their own map of the world right now -- and they're all capturing different facets of reality: >🚶 person-level: Axon body cams on hundreds of thousands of officers. Meta Ray-Ban glasses capturing first-person POV at scale -- overseas operators reviewing images every time someone says "Hey Meta." > 🚗 vehicle-level: Tesla dashcams on every car in the fleet, massive onboard compute extracting and distilling data to the cloud. Waymo with cm-accurate 3D maps of every city they operate in. Fleet telematics cameras on delivery vehicles globally. > 🏠 street & home-level: Flock Safety deploying CCTV across neighborhoods and cities. Amazon with Ring cameras on every doorstep and mailroom (recently got dragged over that Super Bowl commercial about fusing all these cams together to find your dog) plus dashcams on every Prime delivery van. Roomba mapping your floor plan every time it vacuums -- Amazon wanted that data badly enough to try acquiring iRobot for $1.7B before regulators shut it down. > 🥽 headset-level: Apple Vision Pro and Meta Quest build a 3D model of whatever room you're in every time you put them on. Between Ring, Roomba, and your headset, your entire home is being spatially understood by at least three different companies. >📍platform-level: Google with Street View cars, aerial planes, satellite imagery, and live location from every Android phone in your pocket. Apple doing the same with mapping cars AND every LiDAR iPhone is quietly a 3D scanner. And yeah, despite the "Apple is too privacy-conscious" narrative, they're collecting location data too. >🏃 trajectory-level: Strava mapped every running and cycling trail on Earth -- and accidentally exposed secret military bases in Afghanistan and Syria because soldiers logged their jogs. When you aggregate enough individual trajectories, patterns emerge that were never supposed to be visible. > 🛰️ space-level: Planet Labs imaging the entire Earth's landmass every single day from orbit. Vantor capturing it in higher detail. Iceye doing it in 3D using SAR. If something changes anywhere on the planet -- a building goes up, a forest burns down, a military convoy moves -- before-and-after imagery within 24 hours. Fused together -- we have everything from body cam to dashcam to doorbell to phone to satellite -- every layer of physical reality is being mapped by somebody right now. Different sensors, different angles, different purposes. Same pattern. The interesting part is how they incentivize it. Google spends billions. Mapillary tried altruism. Hivemapper grinds with crypto. Pokémon GO cracked something none of them could: a game mechanic that subsidizes the scanning behavior. You're not building a map. You're catching pokemon. The map is just a side effect. 3D scanning is still a niche hobby for reality capture nerds like me. The moment somebody gamifies dense 3D capture at scale -- not posed photos but actual geometry -- that's when this blows wide open. Niantic sold the games for $3.5B but kept the spatial platform, with a data-sharing agreement in place. One team makes the game great, the other builds the spatial infrastructure underneath. Incentives finally aligned. Gaming is becoming a way for humans to contribute real-world trajectories that help physical AI learn about the real world. Google does it with live traffic. Tesla does it with autopilot. The mechanic is different but the pattern is identical -- and most people are already part of at least one -- if not a majority -- of these datasets whether they realize it or not.

Bilawal Sidhu

203,106 просмотров • 2 месяцев назад

bilawalsidhu's profile picture

RIP truth - it's been nice knowing ya

Bilawal Sidhu

1,402,621 просмотров • 1 год назад

bilawalsidhu's profile picture

Wow. Recreating the Shawshank Redemption prison in 3D from a single video, in real time (!) Just read the MASt3R-SLAM paper and it's pretty neat. These folks basically built a real-time dense SLAM system on top of MASt3R, which is a transformer-based neural network that can do 3d reconstruction and localization from uncalibrated image pairs. The cool part is they don't need a fixed camera model -- it just works with arbitrary cameras -- think different focal lengths, sensor sizes, even handling zooming in video (FMV drone video anyone?!). If you've done photogrammetry or played with NeRFs you know that is a HUGE deal. They've solved some tricky problems like efficient point matching and tracking, plus they've figured out how to fuse point clouds and handle loop closures in real-time. Their system runs at about 15 FPS on a 4090 and produces both camera poses and dense geometry. When they know the camera calibration, they get SOTA results across several benchmarks, but even without calibration, they still perform well. What's interesting is the approach -- most recent SLAM work has built on DROID-SLAM's architecture, but these folks went a different direction by leveraging a strong 3D reconstruction prior. Seems to give them more coherent geometry, which makes sense since that's what MASt3R was designed for. For anyone who cares about monocular SLAM and 3D reconstruction, this feels like a significant step toward plug-and-play dense SLAM without calibration headaches -- perfect for drones, robots, AR/VR -- the works!

Bilawal Sidhu

703,654 просмотров • 1 год назад