
Georgia Gkioxari
@georgiagkioxari • 12,129 subscribers
Assistant professor in Computing + Mathematical Sciences @Caltech 🏛️ ∙ Computer vision enthusiast 🤖 ∙ SAM 3D, PyTorch3D ∙ Ex: FAIR, Google 👩🏻💻∙ From 🇬🇷
Shorts
Videos

3D editing is hard: you need to ground an image + instruction and generate a faithful 3D shape in one forward pass -- no test-time optimization. So, we steer pretrained image-to-3D representations to do text-guided 3D edits; no massive 3D edit-pair dataset needed. Key trap: the “no-edit” solution is a nasty local minimum. We fix it with preference optimization, pushing the model to actually edit. Steer3D is the second work that adapts alignment ideas from LLMs to the 3D modality. SAM 3D also used DPO to improve its 3D generations.
Georgia Gkioxari115,844 просмотров • 5 месяцев назад

💫It's fascinating that a single feed-forward pass through an LLM can replace a complex rendering pipeline, like Blender! Just feed it 3D shapes, xyz positions, and poses as tokens, and it spits out the image token-by-token. The dual, aka scene reconstruction, is also possible! 👇
Georgia Gkioxari44,553 просмотров • 1 год назад
Больше нет контента для загрузки
