正在加载视频...

视频加载失败

There's a problem with 3D human pose & shape (HPS) estimation methods. You either get good 3D accuracy or good alignment with the image, but not both. Why? The current top methods use the wrong camera model. TokenHMR at #CVPR2024 analyzes the issue and presents a solution. (1/8)

80,462 次观看 • 2 年前 •via X (Twitter)

11 条评论

Michael Black 的头像
Michael Black2 年前

Current HPS methods use a simplified camera model that differs from the true camera. With the wrong camera, you have to distort the body pose or shape so that projected 3D features match the image. Estimating the true camera, however, is a challenging and unsolved problem (2/8)

Michael Black 的头像
Michael Black2 年前

Using BEDLAM, a synthetic dataset with perfect ground-truth (GT), we quantitatively evaluate the problem. With the HMR2.0 camera, we evaluate the 2D projection error of 3D bodies computed by HMR2.0 and GT bodies. With the wrong camera HM2.0 gets lower 2D error than GT. (3/8)

Michael Black 的头像
Michael Black2 年前

On the flip side, low 2D reprojection error results in worse 3D accuracy. For a given 2D image alignment error, there are effectively an infinite number of 3D poses that can produce this, and they can be really bad. Training a method with a 2D loss and wrong camera is bad. (4/8)

Michael Black 的头像
Michael Black2 年前

3D pseudo-GT that's estimated from 2D with the wrong camera has the same issue. To address this, we introduce two solutions. First, with 2D data, the loss should not try to fit it too well. Our new TALS loss penalizes large 2D errors while down-weighting small ones. (5/8)

Michael Black 的头像
Michael Black2 年前

With TALS, common pose priors have too much influence. Thus we use a VQ-VAE to convert continuous poses to a discrete token representation; trained on AMASS & MOYO. This pre-trained tokenizer provides a vocabulary of valid poses. Pose regression becomes classification. (6/8)

Michael Black 的头像
Michael Black2 年前

TokenHMR estimates 3D HPS using a discrete tokenized pose representation. Our TALS loss mitigates some of the bias caused by simplified camera models and biased pseudo-GT. This enables training on 2D data for robustness without losing 3D accuracy. (7/8)

Michael Black 的头像
Michael Black2 年前

Kudos to the authors: @saidwivedi, @yusun14567741, @PriyankaP1201, @YaoFeng1995 and @Michael_J_Black from @MPI_IS, @meshcapade and @ETH_en arXiv: Code and models are available at (8/8)

Tope Ibrahim 的头像
Tope Ibrahim2 年前

Dear Prof. Black, I will be attending the CVPR conference in Seattle between 18-21 of June as a first-timer. Over the years, you have remained one of the researchers I often draw inspiration from, and I will be very honoured to meet you in person.

Michael Black 的头像
Michael Black2 年前

I look forward to meeting you! Come find me at one of our posters.

Mathieu Tuli 的头像
Mathieu Tuli2 年前

Great work, excited to come chat at cvpr We’ll be there presenting FlowFace as well (face tracking from 2D video) would love to have you come by

Michael Black 的头像
Michael Black2 年前

Thanks for sharing this! I like the UV-flow idea. It combines two of my favorite things: 3D shape estimation and optical flow :) Fun fact: my very first paper on human faces used optical for expression recognition.

相关视频