Gordon Wetzstein's banner

Gordon Wetzstein

@GordonWetzstein • 5,295 subscribers

Professor at Stanford University & Co-founder at Rhoda AI

Shorts

High-resolution image and video generation is hitting a wall because attention in DiTs scales quadratically with token count. But does every pixel need to be in full resolution? Introducing Foveated Diffusion: a new approach for efficient diffusion-based generation that allocates compute where it matters most. 1/7🧵

High-resolution image and video generation is hitting a wall because attention in DiTs scales quadratically with token count. But does every pixel need to be in full resolution? Introducing Foveated Diffusion: a new approach for efficient diffusion-based generation that allocates compute where it matters most. 1/7🧵

164,096 次观看

The era of ultra-high-resolution imaging has arrived. Modern image sensors exceeding 200 MP resolution are common in smartphones, with over 400 MP sensors under development. However, the large number of pixels poses significant challenges for acquisition and processing, especially on edge devices. Which pixels should be acquired, and when, for bandwidth-efficient imaging and perception? We introduce Policy-based Foveated Imaging and Perception, an on-device, real-time, predictive, and task-aware framework that dynamically allocates sensor resolution to prioritize important regions under specific perception objectives. This paper will be presented at #SIGGRAPH2026! [1/6]

The era of ultra-high-resolution imaging has arrived. Modern image sensors exceeding 200 MP resolution are common in smartphones, with over 400 MP sensors under development. However, the large number of pixels poses significant challenges for acquisition and processing, especially on edge devices. Which pixels should be acquired, and when, for bandwidth-efficient imaging and perception? We introduce Policy-based Foveated Imaging and Perception, an on-device, real-time, predictive, and task-aware framework that dynamically allocates sensor resolution to prioritize important regions under specific perception objectives. This paper will be presented at #SIGGRAPH2026! [1/6]

18,310 次观看

📢Introducing Generated Reality📢 A world model for XR that turns your tracked hand and head poses into an interactive, generative video experience. Take world models to the next level by interacting with the world using your own body! 🔗 1/4

📢Introducing Generated Reality📢 A world model for XR that turns your tracked hand and head poses into an interactive, generative video experience. Take world models to the next level by interacting with the world using your own body! 🔗 1/4

20,045 次观看

The context size of video world models is only a few frames. Like a human with severe memory loss! We design a long-term memory for world models based on explicit 3D representations inspired by the human mind. This enables long-term consistency. 1/3

The context size of video world models is only a few frames. Like a human with severe memory loss! We design a long-term memory for world models based on explicit 3D representations inspired by the human mind. This enables long-term consistency. 1/3

35,075 次观看

🚀 Just published in Nature Photonics: synthetic aperture waveguide holography—a new path toward ultra-thin, high-quality 3D mixed reality displays. 📄 #Photonics #Holography #MR 1/5

🚀 Just published in Nature Photonics: synthetic aperture waveguide holography—a new path toward ultra-thin, high-quality 3D mixed reality displays. 📄 #Photonics #Holography #MR 1/5

25,501 次观看

Most video models 🤯forget the past 🐌slow down over time 🔁rely on bidirectional (not causal) attention Our state-space video world models (SSM) 🧠remember across hundreds of frames ⚡️generate at constant speed ⏩is fully causal, enabling real-time rollout 1/3

Most video models 🤯forget the past 🐌slow down over time 🔁rely on bidirectional (not causal) attention Our state-space video world models (SSM) 🧠remember across hundreds of frames ⚡️generate at constant speed ⏩is fully causal, enabling real-time rollout 1/3

20,015 次观看

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

How do we generate videos on the scale of minutes, without drifting or forgetting about the historical context? We introduce Mixture of Contexts. Every minute-long video below is the direct output of our model in a single pass, with no post-processing, stitching, or editing. 1/4

How do we generate videos on the scale of minutes, without drifting or forgetting about the historical context? We introduce Mixture of Contexts. Every minute-long video below is the direct output of our model in a single pass, with no post-processing, stitching, or editing. 1/4

Gordon Wetzstein

158,208 次观看 • 10 个月前

Video world models today have a very limited context length. Mode Seeking meets Mean Seeking (MMM) unlocks long-context, persistent video world models through a unified representation. 1/8 🧵

Video world models today have a very limited context length. Mode Seeking meets Mean Seeking (MMM) unlocks long-context, persistent video world models through a unified representation. 1/8 🧵

Gordon Wetzstein

43,597 次观看 • 4 个月前

We built a real-time multiplayer game generated entirely by a neural network—and now you can actually play it. In collaboration with Modal, we just launched the live demo for MultiGen, our diffusion-based multiplayer game engine. Grab some friends and try it here 👇

We built a real-time multiplayer game generated entirely by a neural network—and now you can actually play it. In collaboration with Modal, we just launched the live demo for MultiGen, our diffusion-based multiplayer game engine. Grab some friends and try it here 👇

Gordon Wetzstein

26,537 次观看 • 4 个月前

This is a step forward for long-context video tuning, which has already unlocked impressive multi-shot video generation pipelines from . Generating minutes- or even hour-long videos without drifting or forgetting is not "coming soon" -- it is already here. 3/4

This is a step forward for long-context video tuning, which has already unlocked impressive multi-shot video generation pipelines from . Generating minutes- or even hour-long videos without drifting or forgetting is not "coming soon" -- it is already here. 3/4

Gordon Wetzstein

27,473 次观看 • 10 个月前

Gaussian Shell Maps are a new neural scene representation that connects fields and 3D Gaussians. This representation unlocks the full potential of 3D Gaussian splatting for generative AI applications, such as 3D avatar generation. 1/2

Gordon Wetzstein

52,480 次观看 • 2 年前

Video generation of humans with control over body pose and facial expressions is crucial for a plethora of applications. Towards this goal, we introduce a new interspatial attention (ISA) mechanism as a scalable building block for DiT–based video generation models #SIGGRAPH2025

Video generation of humans with control over body pose and facial expressions is crucial for a plethora of applications. Towards this goal, we introduce a new interspatial attention (ISA) mechanism as a scalable building block for DiT–based video generation models #SIGGRAPH2025

Gordon Wetzstein

28,407 次观看 • 1 年前

Introducing FLARE #CVPR2026 2025 FLARE is a feed-forward model that simultaneously estimates high-quality camera poses, 3D geometry, and appearance from sparse uncalibrated images. 1/4

Introducing FLARE #CVPR2026 2025 FLARE is a feed-forward model that simultaneously estimates high-quality camera poses, 3D geometry, and appearance from sparse uncalibrated images. 1/4

Gordon Wetzstein

29,518 次观看 • 1 年前

Excited to share our new #SIGGRAPH2025 paper! In this work, we show how to combine Gaussian splatting and computer-generated holography using Gaussian Wave Splatting. This enables photorealistic 3D holograms for emerging holographic VR/AR displays. 1/4

Excited to share our new #SIGGRAPH2025 paper! In this work, we show how to combine Gaussian splatting and computer-generated holography using Gaussian Wave Splatting. This enables photorealistic 3D holograms for emerging holographic VR/AR displays. 1/4

Gordon Wetzstein

21,220 次观看 • 1 年前

Current 3D generative models are slow and low quality. We present GRM, a large-scale model that reconstructs 3D Gaussians in 0.1s and generates high-quality 3D assets from text or single images in a few seconds. Demo: 1/4

Current 3D generative models are slow and low quality. We present GRM, a large-scale model that reconstructs 3D Gaussians in 0.1s and generates high-quality 3D assets from text or single images in a few seconds. Demo: 1/4

Gordon Wetzstein

19,210 次观看 • 2 年前

没有更多内容可加载