Felix Heide's banner

Felix Heide

@_FelixHeide_ • 1,966 subscribers

Princeton Computational Imaging Lab: https://t.co/n8gRRpdvr4 Head of AI at Torc Robotics: https://t.co/7RonQDi1MJ

Shorts

Are we done with object detection? What about tiny objects beyond 200 meters? 🔎 Telescope 🔭 addresses long-range perception by explicitly tackling extreme scale imbalance ⚖️ in images. It hinges on a learnable hyperbolic foveation transform from a low-resolution image, magnifying distant regions 🔍 while compressing nearby ones - effectively normalizing object scales with minimal computational overhead. Objects are detected in the transformed (Riemannian) space using a novel bounding box parameterization and are then mapped back to the original image. Project:

Are we done with object detection? What about tiny objects beyond 200 meters? 🔎 Telescope 🔭 addresses long-range perception by explicitly tackling extreme scale imbalance ⚖️ in images. It hinges on a learnable hyperbolic foveation transform from a low-resolution image, magnifying distant regions 🔍 while compressing nearby ones - effectively normalizing object scales with minimal computational overhead. Objects are detected in the transformed (Riemannian) space using a novel bounding box parameterization and are then mapped back to the original image. Project:

188,500 Aufrufe

Chop the gradients ✂️! We found that truncating decoder gradients in latent video diffusion to a fixed window allows us to finetune on videos with pixel-wise perceptual losses without running out of memory. Pixel losses have been essential for image generation and reconstruction, but until now, they haven't scaled to long-duration, high-resolution video diffusion due to recursive activation accumulation in causal decoders, leading to OOM during training 💥📉. Project: Video diffusion models can do a lot more 🚀 when you can backprop the decoder! Post-process neural rendered scenes, super-resolve videos, harmonize lighting in controlled synthetic driving scenes, and inpaint videos — all in a single step ⚡ with a quick finetune from a standard diffusion model.

Chop the gradients ✂️! We found that truncating decoder gradients in latent video diffusion to a fixed window allows us to finetune on videos with pixel-wise perceptual losses without running out of memory. Pixel losses have been essential for image generation and reconstruction, but until now, they haven't scaled to long-duration, high-resolution video diffusion due to recursive activation accumulation in causal decoders, leading to OOM during training 💥📉. Project: Video diffusion models can do a lot more 🚀 when you can backprop the decoder! Post-process neural rendered scenes, super-resolve videos, harmonize lighting in controlled synthetic driving scenes, and inpaint videos — all in a single step ⚡ with a quick finetune from a standard diffusion model.

28,323 Aufrufe

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

ScenarioControl 🚗🛣️ - Scenario Generation from a single Dashcam Image 📸 or Text Prompt 💬!! Excited to introduce a new vision-language control mechanism for learned driving scenario generation. Given a single dashcam image or a scene prompt or an image, we generate a full scene layout 🧩, temporally consistent rollouts, including map 🗺️, agents 🚗, and ego video🛣️ ScenarioControl enables direct, fine-grained control over layout and traffic while preserving realism. It operates in a vectorized latent space with a new cross-global control mechanism to fuse vision-language inputs with scene structure while preserving realism. Interfaces seamlessly with generative video models! Project: Super fun project by Lili Gao, Yanbo Xu , William Koch, Samuele Ruffino, Luke Rowe , Behdad Chalaki, Dmitriy Rivkin, Julian Ost, Roger Girgis, Mario Bijelic.

ScenarioControl 🚗🛣️ - Scenario Generation from a single Dashcam Image 📸 or Text Prompt 💬!! Excited to introduce a new vision-language control mechanism for learned driving scenario generation. Given a single dashcam image or a scene prompt or an image, we generate a full scene layout 🧩, temporally consistent rollouts, including map 🗺️, agents 🚗, and ego video🛣️ ScenarioControl enables direct, fine-grained control over layout and traffic while preserving realism. It operates in a vectorized latent space with a new cross-global control mechanism to fuse vision-language inputs with scene structure while preserving realism. Interfaces seamlessly with generative video models! Project: Super fun project by Lili Gao, Yanbo Xu , William Koch, Samuele Ruffino, Luke Rowe , Behdad Chalaki, Dmitriy Rivkin, Julian Ost, Roger Girgis, Mario Bijelic.

22,284 Aufrufe • vor 3 Monaten

Splines instead of Gaussians 😉 Introducing Neural Spline Fields, which can see through occlusions! We learn to represent a stack of misaligned captures as a multi-layer image sandwich. Then you can extract your favorite layer to remove occlusions, reflections, or even your own shadows from the scene! Paper and Code: Fun work by Ilya Chugunov , David Shustin , Ruyu Yan, and Chenyang Lei.

Splines instead of Gaussians 😉 Introducing Neural Spline Fields, which can see through occlusions! We learn to represent a stack of misaligned captures as a multi-layer image sandwich. Then you can extract your favorite layer to remove occlusions, reflections, or even your own shadows from the scene! Paper and Code: Fun work by Ilya Chugunov , David Shustin , Ruyu Yan, and Chenyang Lei.

137,306 Aufrufe • vor 2 Jahren

WorldFlow3D: Unbounded 3D World Generation 🌍 by Flow Through Hierarchical Distributions, without VAEs ! We reformulate 3D generation as flowing through sequentially finer 3D distributions, cutting training time by more than half ⏱️ compared to existing approaches! Vectorized map layouts provide full scene controllability 🗺️, and a novel flow-field alignment process enables causally coherent, spatially unbounded generation 🌍. This generative method generalizes across both real and synthetic data distributions! Project: Project led by Amogh Joshi and Julian Ost — will be super fun to build on this! 🔥

WorldFlow3D: Unbounded 3D World Generation 🌍 by Flow Through Hierarchical Distributions, without VAEs ! We reformulate 3D generation as flowing through sequentially finer 3D distributions, cutting training time by more than half ⏱️ compared to existing approaches! Vectorized map layouts provide full scene controllability 🗺️, and a novel flow-field alignment process enables causally coherent, spatially unbounded generation 🌍. This generative method generalizes across both real and synthetic data distributions! Project: Project led by Amogh Joshi and Julian Ost — will be super fun to build on this! 🔥

19,517 Aufrufe • vor 3 Monaten

Ultra-thin flat cameras are possible with nanophotonic optics! Excited to share recent work that shrinks the entire optical stack down to a 700-nanometer thick layer of optics on the sensor cover glass. We will present this paper later this week at #SIGGRAPHAsia2023 ! Paper: Super fun work with Praneeth Chakravarthula, Jipeng Sun from Princeton University, and our friends from UW Arka Majumdar, and Johannes E. Fröch.

Ultra-thin flat cameras are possible with nanophotonic optics! Excited to share recent work that shrinks the entire optical stack down to a 700-nanometer thick layer of optics on the sensor cover glass. We will present this paper later this week at #SIGGRAPHAsia2023 ! Paper: Super fun work with Praneeth Chakravarthula, Jipeng Sun from Princeton University, and our friends from UW Arka Majumdar, and Johannes E. Fröch.

99,697 Aufrufe • vor 2 Jahren

Excited to share our #NeurIPS2025 work on learning motion hierarchies! We introduce a general hierarchical graph learning method that learns structured, interpretable motion directly from data, no prior structure or assumptions needed!!! Project and Paper: Amazing work led by William Koch, Cheng Zheng, and Baiang Li ! See us in San Diego for #NeurIPS2025!

Excited to share our #NeurIPS2025 work on learning motion hierarchies! We introduce a general hierarchical graph learning method that learns structured, interpretable motion directly from data, no prior structure or assumptions needed!!! Project and Paper: Amazing work led by William Koch, Cheng Zheng, and Baiang Li ! See us in San Diego for #NeurIPS2025!

25,314 Aufrufe • vor 7 Monaten

Large-scale 3D Scene Generation (all scenes are real-time rendered)!! Physically-grounded generative data without hallucinations is the missing link for robot learning and testing at scale. We introduce a method that directly generates large-scale 3D driving scenes with accurate geometry, allowing for causal view synthesis and generation with object permanence and explicit 3D geometry. This also allows for extreme trajectory extrapolation without failure! We also show that we can build fully data-driven simulators for end-to-end learning with this approach. Project: with the amazing team of Julian Ost, Amogh Joshi , Andrea Ramazzina, Maximilian Bömer, Mario Bijelic.

Large-scale 3D Scene Generation (all scenes are real-time rendered)!! Physically-grounded generative data without hallucinations is the missing link for robot learning and testing at scale. We introduce a method that directly generates large-scale 3D driving scenes with accurate geometry, allowing for causal view synthesis and generation with object permanence and explicit 3D geometry. This also allows for extreme trajectory extrapolation without failure! We also show that we can build fully data-driven simulators for end-to-end learning with this approach. Project: with the amazing team of Julian Ost, Amogh Joshi , Andrea Ramazzina, Maximilian Bömer, Mario Bijelic.

27,779 Aufrufe • vor 10 Monaten

3D Object Tracking without Training Data? In our nature Machine Intelligence paper ( we recast 3D tracking as an inverse neural rendering task where we fit a scene graph to an image that best explains this image. The method generalizes to completely unseen datasets and is explainable. Project and Code: Fun collaboration between Princeton Computer Science and Torc Robotics, with Julian Ost and Tanushree Banerjee leading this project.

3D Object Tracking without Training Data? In our nature Machine Intelligence paper ( we recast 3D tracking as an inverse neural rendering task where we fit a scene graph to an image that best explains this image. The method generalizes to completely unseen datasets and is explainable. Project and Code: Fun collaboration between Princeton Computer Science and Torc Robotics, with Julian Ost and Tanushree Banerjee leading this project.

27,858 Aufrufe • vor 11 Monaten

Starting the new year without human labeling 🎉!! Multimodal lidar-camera data is a gold mine of dense 3D geometry hiding in plain sight. For supervised pretraining and validation at scale at Torc-Robotics, we rely on fully automated pseudo-labeling pipelines. Exploiting geometric priors from temporally accumulated LiDAR maps and an iterative update rule enforces joint geometric–semantic consistency while detecting moving objects via inconsistencies. We achieve 3D semantic labels and 3D bounding boxes with human-like quality at 200m+ range required for highway driving. Paper: Exciting work with Torc-Robotics with Filippo Ghilotti, Samuel Brucker, Nahku Saidy, Matteo Matteucci, Mario Bijelic.

Starting the new year without human labeling 🎉!! Multimodal lidar-camera data is a gold mine of dense 3D geometry hiding in plain sight. For supervised pretraining and validation at scale at Torc-Robotics, we rely on fully automated pseudo-labeling pipelines. Exploiting geometric priors from temporally accumulated LiDAR maps and an iterative update rule enforces joint geometric–semantic consistency while detecting moving objects via inconsistencies. We achieve 3D semantic labels and 3D bounding boxes with human-like quality at 200m+ range required for highway driving. Paper: Exciting work with Torc-Robotics with Filippo Ghilotti, Samuel Brucker, Nahku Saidy, Matteo Matteucci, Mario Bijelic.

18,210 Aufrufe • vor 6 Monaten

Evaluating Neural Networks at the Speed of Light (with Light!). See live optical inference in the video below. Excited to share recent academic work on optical neural networks as a collection of computing elements embedded in the camera lens! These elements perform computation optically even before an image is captured, using the photons in the scene instead of GPU computation after the capture. We were able to achieve ImageNet classification more than two orders of magnitude faster than conventional neural networks on today's GPUs at almost no power consumption! To do this, we developed an array of metalenses that perform this computation on light from the scene. Project: Paper: Amazing collaboration with Kaixuan Wei, Xiao Li, Johannes Froech, Praneeth Chakravarthula, James Whitehead, Ethan Tseng , Arka Majumdar .

Evaluating Neural Networks at the Speed of Light (with Light!). See live optical inference in the video below. Excited to share recent academic work on optical neural networks as a collection of computing elements embedded in the camera lens! These elements perform computation optically even before an image is captured, using the photons in the scene instead of GPU computation after the capture. We were able to achieve ImageNet classification more than two orders of magnitude faster than conventional neural networks on today's GPUs at almost no power consumption! To do this, we developed an array of metalenses that perform this computation on light from the scene. Project: Paper: Amazing collaboration with Kaixuan Wei, Xiao Li, Johannes Froech, Praneeth Chakravarthula, James Whitehead, Ethan Tseng , Arka Majumdar .

29,895 Aufrufe • vor 1 Jahr

Code for Neural Spline Fields is out! We use burst image fusion to see through occlusions and remove reflections. We just released the code here: Comes with a fun set of notebooks and tutorials!

Code for Neural Spline Fields is out! We use burst image fusion to see through occlusions and remove reflections. We just released the code here: Comes with a fun set of notebooks and tutorials!

19,117 Aufrufe • vor 2 Jahren

Implicit Neural Light Spheres lets you turn panoramic captures into dynamic wide FOV renders (with real-time rendering!). Instead of generating panoramas with image stitching, we use neural light spheres to jointly estimate the camera path and a high-resolution scene reconstruction to produce novel wide field-of-view projections of the environment. Code, data, and info: Amazing work with Ilya Chugunov, Amogh Joshi, Kiran Murthy, François Bleibel

Implicit Neural Light Spheres lets you turn panoramic captures into dynamic wide FOV renders (with real-time rendering!). Instead of generating panoramas with image stitching, we use neural light spheres to jointly estimate the camera path and a high-resolution scene reconstruction to produce novel wide field-of-view projections of the environment. Code, data, and info: Amazing work with Ilya Chugunov, Amogh Joshi, Kiran Murthy, François Bleibel

10,301 Aufrufe • vor 1 Jahr

Keine weiteren Inhalte verfügbar