正在加载视频...
视频加载失败
There's a problem with 3D human pose & shape (HPS) estimation methods. You either get good 3D accuracy or good alignment with the image, but not both. Why? The current top methods use the wrong camera model. TokenHMR at #CVPR2024 analyzes the issue and presents a solution. (1/8)
11 条评论

Current HPS methods use a simplified camera model that differs from the true camera. With the wrong camera, you have to distort the body pose or shape so that projected 3D features match the image. Estimating the true camera, however, is a challenging and unsolved problem (2/8)

Using BEDLAM, a synthetic dataset with perfect ground-truth (GT), we quantitatively evaluate the problem. With the HMR2.0 camera, we evaluate the 2D projection error of 3D bodies computed by HMR2.0 and GT bodies. With the wrong camera HM2.0 gets lower 2D error than GT. (3/8)

On the flip side, low 2D reprojection error results in worse 3D accuracy. For a given 2D image alignment error, there are effectively an infinite number of 3D poses that can produce this, and they can be really bad. Training a method with a 2D loss and wrong camera is bad. (4/8)

3D pseudo-GT that's estimated from 2D with the wrong camera has the same issue. To address this, we introduce two solutions. First, with 2D data, the loss should not try to fit it too well. Our new TALS loss penalizes large 2D errors while down-weighting small ones. (5/8)

With TALS, common pose priors have too much influence. Thus we use a VQ-VAE to convert continuous poses to a discrete token representation; trained on AMASS & MOYO. This pre-trained tokenizer provides a vocabulary of valid poses. Pose regression becomes classification. (6/8)

TokenHMR estimates 3D HPS using a discrete tokenized pose representation. Our TALS loss mitigates some of the bias caused by simplified camera models and biased pseudo-GT. This enables training on 2D data for robustness without losing 3D accuracy. (7/8)

Kudos to the authors: @saidwivedi, @yusun14567741, @PriyankaP1201, @YaoFeng1995 and @Michael_J_Black from @MPI_IS, @meshcapade and @ETH_en arXiv: Code and models are available at (8/8)

Dear Prof. Black, I will be attending the CVPR conference in Seattle between 18-21 of June as a first-timer. Over the years, you have remained one of the researchers I often draw inspiration from, and I will be very honoured to meet you in person.

I look forward to meeting you! Come find me at one of our posters.

Great work, excited to come chat at cvpr We’ll be there presenting FlowFace as well (face tracking from 2D video) would love to have you come by

Thanks for sharing this! I like the UV-flow idea. It combines two of my favorite things: 3D shape estimation and optical flow :) Fun fact: my very first paper on human faces used optical for expression recognition.
