Shubham Tulsiani's banner

Shubham Tulsiani

@shubhtuls • 3,513 subscribers

Assistant Professor in the Robotics Institute, Carnegie Mellon University I want to build perception systems that can understand the physical world

Shorts

[1/6] What representation comes to mind when you think of a ‘camera’? Perhaps an extrinsic + intrinsic matrix? In our ICLR (oral) paper, we instead infer a distributed representation where each pixel is associated with a ray, and show SoTA results for few-view pose estimation.

[1/6] What representation comes to mind when you think of a ‘camera’? Perhaps an extrinsic + intrinsic matrix? In our ICLR (oral) paper, we instead infer a distributed representation where each pixel is associated with a ray, and show SoTA results for few-view pose estimation.

141,815 Aufrufe

[1/6] Our #CVPR2025 paper “DiffusionSfM” extends our RayDiffusion framework — inferring both geometry and cameras via diffusing pixelwise ray origins and endpoints.

[1/6] Our #CVPR2025 paper “DiffusionSfM” extends our RayDiffusion framework — inferring both geometry and cameras via diffusing pixelwise ray origins and endpoints.

41,433 Aufrufe

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

[1/N] Rotary Position Embeddings (RoPE) are ubiquitous across transformers that process tokens from 1D, 2D, or 3D grids e.g. language, images, or videos. Our RayRoPE formulation extends these to multi-view transformers. Paper and code:

[1/N] Rotary Position Embeddings (RoPE) are ubiquitous across transformers that process tokens from 1D, 2D, or 3D grids e.g. language, images, or videos. Our RayRoPE formulation extends these to multi-view transformers. Paper and code:

Shubham Tulsiani

72,553 Aufrufe • vor 5 Monaten

[1/N] We present a plug-and-play mechanism to controllably steer inference of any diffusion/flow model towards a sharper or flatter sampling distribution, resulting in improvements across domains e.g. text-to-image (10% FID reduction), protein generation (improved designability).

[1/N] We present a plug-and-play mechanism to controllably steer inference of any diffusion/flow model towards a sharper or flatter sampling distribution, resulting in improvements across domains e.g. text-to-image (10% FID reduction), protein generation (improved designability).

Shubham Tulsiani

82,999 Aufrufe • vor 9 Monaten

[1/N] Current visual geometry prediction models primarily rely on labeled 3D data. Our CVPR26 paper, Flow3r, allows additionally leveraging unlabeled videos (using flow supervision) for scalable visual geometry learning, enabling accurate multi-view 3D reconstruction in-the-wild.

[1/N] Current visual geometry prediction models primarily rely on labeled 3D data. Our CVPR26 paper, Flow3r, allows additionally leveraging unlabeled videos (using flow supervision) for scalable visual geometry learning, enabling accurate multi-view 3D reconstruction in-the-wild.

Shubham Tulsiani

16,465 Aufrufe • vor 4 Monaten

Keine weiteren Inhalte verfügbar