Loading video...

Video Failed to Load

Go Home

📢📢📢 Excited to share our new work *Autonomous Character-Scene Interaction Synthesis from Text Instruction* (Siggraph Asia 24). It presents a unified model for flexible scene-conditioned motion generation given text, scene, trajectory conditions. The results with smooth interaction look very impressive! 📰Paper: Project: Code and data will be released soon.

11,340 views • 1 year ago •via X (Twitter)

7 Comments

Siyuan Huang's profile picture
Siyuan Huang1 year ago

Some details and designs for our work: (1/5) We tackle the exciting challenge of generating scene-aware interaction motions for virtual characters based on text instructions and target locations within a 3D environment. There are some beautiful results showing how the generated motion interacts with the 3D scenes instructed by input text.

Siyuan Huang's profile picture
Siyuan Huang1 year ago

(2/5) Our motion generation method handles both locomotion and interaction motions. We leverage an auto-regressive conditional diffusion model that takes language guidance, the goal location for the current segment, and the scene voxel as input.

Siyuan Huang's profile picture
Siyuan Huang1 year ago

(3/5) The character's scene awareness comes from a local occupancy grid. Each voxel in the grid indicates whether the corresponding location is occupied by a scene object. Such representation enhances the understanding of 3D space and interaction.

Siyuan Huang's profile picture
Siyuan Huang1 year ago

(4/5) Given the same trajectory and scene, our model generates characters that actively avoid penetrating the scene and exhibit natural cues of scene awareness.

Siyuan Huang's profile picture
Siyuan Huang1 year ago

(5/5) Our model is supercharged by LINGO, a comprehensive motion-captured dataset. We employ a synthetic vision approach, where scene objects are projected into the virtual view displayed in a VR headset worn by the motion actor.

one واحد's profile picture
one واحد1 year ago

nice work 👏

Sentients's profile picture
Sentients1 year ago

🔥🔥🔥🔥

Related Videos

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

AK

62,768 views • 3 years ago