Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

There's a problem with 3D human pose & shape (HPS) estimation methods. You either get good 3D accuracy or good alignment with the image, but not both. Why? The current top methods use the wrong camera model. TokenHMR at #CVPR2024 analyzes the issue and presents a solution. (1/8)

80,462 Aufrufe • vor 2 Jahren •via X (Twitter)

11 Kommentare

Profilbild von Michael Black
Michael Blackvor 2 Jahren

Current HPS methods use a simplified camera model that differs from the true camera. With the wrong camera, you have to distort the body pose or shape so that projected 3D features match the image. Estimating the true camera, however, is a challenging and unsolved problem (2/8)

Profilbild von Michael Black
Michael Blackvor 2 Jahren

Using BEDLAM, a synthetic dataset with perfect ground-truth (GT), we quantitatively evaluate the problem. With the HMR2.0 camera, we evaluate the 2D projection error of 3D bodies computed by HMR2.0 and GT bodies. With the wrong camera HM2.0 gets lower 2D error than GT. (3/8)

Profilbild von Michael Black
Michael Blackvor 2 Jahren

On the flip side, low 2D reprojection error results in worse 3D accuracy. For a given 2D image alignment error, there are effectively an infinite number of 3D poses that can produce this, and they can be really bad. Training a method with a 2D loss and wrong camera is bad. (4/8)

Profilbild von Michael Black
Michael Blackvor 2 Jahren

3D pseudo-GT that's estimated from 2D with the wrong camera has the same issue. To address this, we introduce two solutions. First, with 2D data, the loss should not try to fit it too well. Our new TALS loss penalizes large 2D errors while down-weighting small ones. (5/8)

Profilbild von Michael Black
Michael Blackvor 2 Jahren

With TALS, common pose priors have too much influence. Thus we use a VQ-VAE to convert continuous poses to a discrete token representation; trained on AMASS & MOYO. This pre-trained tokenizer provides a vocabulary of valid poses. Pose regression becomes classification. (6/8)

Profilbild von Michael Black
Michael Blackvor 2 Jahren

TokenHMR estimates 3D HPS using a discrete tokenized pose representation. Our TALS loss mitigates some of the bias caused by simplified camera models and biased pseudo-GT. This enables training on 2D data for robustness without losing 3D accuracy. (7/8)

Profilbild von Michael Black
Michael Blackvor 2 Jahren

Kudos to the authors: @saidwivedi, @yusun14567741, @PriyankaP1201, @YaoFeng1995 and @Michael_J_Black from @MPI_IS, @meshcapade and @ETH_en arXiv: Code and models are available at (8/8)

Profilbild von Tope Ibrahim
Tope Ibrahimvor 2 Jahren

Dear Prof. Black, I will be attending the CVPR conference in Seattle between 18-21 of June as a first-timer. Over the years, you have remained one of the researchers I often draw inspiration from, and I will be very honoured to meet you in person.

Profilbild von Michael Black
Michael Blackvor 2 Jahren

I look forward to meeting you! Come find me at one of our posters.

Profilbild von Mathieu Tuli
Mathieu Tulivor 2 Jahren

Great work, excited to come chat at cvpr We’ll be there presenting FlowFace as well (face tracking from 2D video) would love to have you come by

Profilbild von Michael Black
Michael Blackvor 2 Jahren

Thanks for sharing this! I like the UV-flow idea. It combines two of my favorite things: 3D shape estimation and optical flow :) Fun fact: my very first paper on human faces used optical for expression recognition.

Ähnliche Videos