Loading video...

Video Failed to Load

Go Home

This AI paper just solved Google Earth's biggest problem. Satellites look down. Humans look across. That perspective gap is why 3D maps are limited to cities you can blanket with aerial flyovers. Skyfall-GS bridges the gap by synthesizing the views we never captured - rebuilding missing facades and street-level...

135,003 views • 6 months ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

So these researchers figured out you can basically hallucinate 3D cities into existence using just satellite photos & a diffusion model. The problem's pretty straightforward: satellites only see rooftops. Building facades? Invisible. Street-level detail? Doesn't exist. But people want flyable 3D environments, which means you need all that occluded geometry. When I worked on google maps photogrammetry, we could only use satellite-based 3D for isolated stuff like the pyramids - anything city-scale required airplane flyovers. Which is fine until you hit aerial-denied regions where you literally can't fly. Huge chunks of the world just unavailable. Their trick is honestly kind of beautiful. They train gaussian splats on satellite views, but as it descends toward ground level, the renders turn to absolute garbage - artifacts everywhere. Instead of fighting this, they just treat those nightmare renders as the input to a diffusion model. Basically - "hey FLUX, fix this mess." Then here's where it gets clever: they generate multiple diffusion samples per view instead of committing to one. Because any single denoising path is probably wrong in 3D space, but if you generate a couple and let the GS optimization find consensus across them, you get actual geometric consistency. They do this in episodes, curriculum style - start high, gradually descend (hence the name Skyfall-GS!). With each iteration the ground-level views get less fucked. By the end you've got real-time flyable cities that look surprisingly real, and the geometry still matches the satellite input. No 3D training data. No street-level photos. Just satellites + diffusion doing what it does best - filling in the blanks. It's like neural scene completion but actually practical, and it unlocks basically the entire world.

Bilawal Sidhu

241,774 views • 7 months ago

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior paper page: present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D reference image to guide the stages of geometry sculpting and texture boosting. A central focus of this work is to address the consistency issue that existing works encounter. To sculpt geometries that render coherently, we perform score distillation sampling via a view-dependent diffusion model. This 3D prior, alongside several training strategies, prioritizes the geometry consistency but compromises the texture fidelity. We further propose Bootstrapped Score Distillation to specifically boost the texture. We train a personalized diffusion model, Dreambooth, on the augmented renderings of the scene, imbuing it with 3D knowledge of the scene being optimized. The score distillation from this 3D-aware diffusion prior provides view-consistent guidance for the scene. Notably, through an alternating optimization of the diffusion prior and 3D scene representation, we achieve mutually reinforcing improvements: the optimized 3D scene aids in training the scene-specific diffusion model, which offers increasingly view-consistent guidance for 3D optimization. The optimization is thus bootstrapped and leads to substantial texture boosting. With tailored 3D priors throughout the hierarchical generation, DreamCraft3D generates coherent 3D objects with photorealistic renderings, advancing the state-of-the-art in 3D content generation.

AK

161,400 views • 2 years ago