Video wird geladen...
Video konnte nicht geladen werden
We present TeSMo ( ), a text-controlled scene-aware motion generation method based on denoising diffusion models. It’s an exciting collaboration with Justus Thies, Michael Black, Jason Peng, Davis Rempe. (1/7)
45,724 Aufrufe • vor 2 Jahren •via X (Twitter)
9 Kommentare

Given a 3D scene and a target interaction object, our goal is to generate a plausible human-scene interaction, where a user-specified text prompt can control the motion style. (2/7)

Since there are few datasets with motion, text, and 3D scenes together, we propose to fine-tune a powerful pre-trained text-to-motion motion with a new scene-aware branch. (3/7)

First, we pre-train a scene-agnostic text-to-motion diffusion model on a large motion-capture dataset, prioritizing learning to reach goal locations. Then, we augment it with a scene-aware component, finetuning with augmented data containing detailed scene information. (4/7)

We create the Loco-3D-Front dataset to learn navigation, by integrating locomotion sequences from HumanML3D into diverse 3D environments from 3D-FRONT. For interactions, we annotate text descriptions for sub-sequences from the SAMP dataset. (5/7)

Our method successfully generates realistic motions that navigate around obstacles while being conditioned on various text prompts. (6/7)

For interactions, diverse text descriptions help disambiguate between actions like sitting or standing up, and even allow for stylizing sitting motions, such as crossing arms. (7/7)

@JustusThies @Michael_J_Black @xbpeng4 @davrempe nice work, congrats!

@JustusThies @Michael_J_Black @xbpeng4 @davrempe Looks amazing - congrats!

@JustusThies @Michael_J_Black @xbpeng4 @davrempe 恭喜易老哥毕业🎓
