Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

📢📢📢 Excited to share our new work Autonomous Character-Scene Interaction Synthesis from Text Instruction (Siggraph Asia 24). It presents a unified model for flexible scene-conditioned motion generation given text, scene, trajectory conditions. The results with smooth interaction look very impressive! 📰Paper: Project: Code and data will be released soon.

Siyuan Huang

3,179 subscribers

11,340 Aufrufe • vor 1 Jahr •via X (Twitter)

Kunst Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

7 Kommentare

Profilbild von Siyuan Huang

Siyuan Huangvor 1 Jahr

Some details and designs for our work: (1/5) We tackle the exciting challenge of generating scene-aware interaction motions for virtual characters based on text instructions and target locations within a 3D environment. There are some beautiful results showing how the generated motion interacts with the 3D scenes instructed by input text.

Profilbild von Siyuan Huang

Siyuan Huangvor 1 Jahr

（2/5) Our motion generation method handles both locomotion and interaction motions. We leverage an auto-regressive conditional diffusion model that takes language guidance, the goal location for the current segment, and the scene voxel as input.

Profilbild von Siyuan Huang

Siyuan Huangvor 1 Jahr

(3/5) The character's scene awareness comes from a local occupancy grid. Each voxel in the grid indicates whether the corresponding location is occupied by a scene object. Such representation enhances the understanding of 3D space and interaction.

Profilbild von Siyuan Huang

Siyuan Huangvor 1 Jahr

(4/5) Given the same trajectory and scene, our model generates characters that actively avoid penetrating the scene and exhibit natural cues of scene awareness.

Profilbild von Siyuan Huang

Siyuan Huangvor 1 Jahr

(5/5) Our model is supercharged by LINGO, a comprehensive motion-captured dataset. We employ a synthetic vision approach, where scene objects are projected into the virtual view displayed in a VR headset worn by the motion actor.

Profilbild von one واحد

one واحدvor 1 Jahr

nice work 👏

Profilbild von Sentients

Sentientsvor 1 Jahr

🔥🔥🔥🔥

Ähnliche Videos

(1/4) Excited to share our #ICCV2023 paper Text2Room! We generate scene-scale textured 3D meshes from a given text prompt leveraging 2D text-to-image models such as StableDiffusion. Project: Code: Video:

(1/4) Excited to share our #ICCV2023 paper Text2Room! We generate scene-scale textured 3D meshes from a given text prompt leveraging 2D text-to-image models such as StableDiffusion. Project: Code: Video:

Matthias Niessner

74,895 Aufrufe • vor 2 Jahren

I'm excited to share our new work Align3R that estimates camera poses and consistent depth maps from a monocular video of a dynamic scene. Project page: Code: Paper:

I'm excited to share our new work Align3R that estimates camera poses and consistent depth maps from a monocular video of a dynamic scene. Project page: Code: Paper:

Yuan Liu

56,547 Aufrufe • vor 1 Jahr

📢Animating the Uncaptured 📢 We animate 3D humanoid meshes using video diffusion priors given a text prompt. 🎥 🌍 Realistic motion generation for 3D characters - without motion capture! 🚀 Great work by Marc Benedí Angela Dai

📢Animating the Uncaptured 📢 We animate 3D humanoid meshes using video diffusion priors given a text prompt. 🎥 🌍 Realistic motion generation for 3D characters - without motion capture! 🚀 Great work by Marc Benedí Angela Dai

Matthias Niessner

11,704 Aufrufe • vor 1 Jahr

SceneScape: Text-Driven Consistent Scene Generation abs: project page: text-driven perpetual view generation -- synthesizing long videos of arbitrary scenes solely from an input text describing the scene and camera poses

SceneScape: Text-Driven Consistent Scene Generation abs: project page: text-driven perpetual view generation -- synthesizing long videos of arbitrary scenes solely from an input text describing the scene and camera poses

AK

73,258 Aufrufe • vor 3 Jahren

Excited to share our recent work, UniSH, which unifies dynamic 3D scene reconstruction and SMPL estimation within a single framework. (Left-top is input video). Code has been released! Project page: Paper:

Excited to share our recent work, UniSH, which unifies dynamic 3D scene reconstruction and SMPL estimation within a single framework. (Left-top is input video). Code has been released! Project page: Paper:

Yuan Liu

19,242 Aufrufe • vor 5 Monaten

📢📢📢 Excited to share our ICLR25 work, ArtGS, which can reconstruct articulated objects and scenes from human-scene interactions. 💻🖱️🧱🪑📖 Understanding physics and articulation is extremely important. With ArtGS, we can build a digital and simulatable replica of the real 3D world with minimal human effort!

📢📢📢 Excited to share our ICLR25 work, ArtGS, which can reconstruct articulated objects and scenes from human-scene interactions. 💻🖱️🧱🪑📖 Understanding physics and articulation is extremely important. With ArtGS, we can build a digital and simulatable replica of the real 3D world with minimal human effort!

Siyuan Huang

12,819 Aufrufe • vor 1 Jahr

📢📢📢 TC4D: Trajectory-Conditioned Text-to-4D Generation Homepage: Lead by Sherwin Bahmani, with collaborators Xian Liu Yifan Wang Ivan Skorokhodov Victor Rong Ziwei Liu Xihui Liu Jeong Joon Park Sergey Tulyakov Gordon Wetzstein Andrea Tagliasacchi @CVPR David Lindell

📢📢📢 TC4D: Trajectory-Conditioned Text-to-4D Generation Homepage: Lead by Sherwin Bahmani, with collaborators Xian Liu Yifan Wang Ivan Skorokhodov Victor Rong Ziwei Liu Xihui Liu Jeong Joon Park Sergey Tulyakov Gordon Wetzstein Andrea Tagliasacchi @CVPR David Lindell

Andrea Tagliasacchi @CVPR

16,976 Aufrufe • vor 2 Jahren

📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a camera-conditioned video diffusion model can be transformed into a generative reconstruction model that directly outputs a high-quality 3D Gaussian Splatting representation through self-distillation, without requiring real-world training data. Check out our results in the video (wait for dynamic scenes in the second half!) : Project Page: Code and Models: Paper:

📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a camera-conditioned video diffusion model can be transformed into a generative reconstruction model that directly outputs a high-quality 3D Gaussian Splatting representation through self-distillation, without requiring real-world training data. Check out our results in the video (wait for dynamic scenes in the second half!) : Project Page: Code and Models: Paper:

Sherwin Bahmani

66,523 Aufrufe • vor 9 Monaten

📢GaussianGPT: autoregressive 3D Gaussian scene generation. We introduce a GPT-style model that directly generates 3D Gaussian scenes, token by token, in a series of small, discrete decision steps. Generation, completion, and large-scale outpainting in a single pipeline. Unlike diffusion-based approaches, GaussianGPT explicitly models the scene distribution at every step, allowing for quite flexible scene synthesis. 🌐 ▶️ Great work by Nicolas von Lützow, Barbara Roessle, Katharina Schmid

📢GaussianGPT: autoregressive 3D Gaussian scene generation. We introduce a GPT-style model that directly generates 3D Gaussian scenes, token by token, in a series of small, discrete decision steps. Generation, completion, and large-scale outpainting in a single pipeline. Unlike diffusion-based approaches, GaussianGPT explicitly models the scene distribution at every step, allowing for quite flexible scene synthesis. 🌐 ▶️ Great work by Nicolas von Lützow, Barbara Roessle, Katharina Schmid

Matthias Niessner

151,792 Aufrufe • vor 3 Monaten

I'm excited to share our new work, VistaDream, which generates a 3D Gaussian field from a single-view image. The codes have already been released. Project page: (with interactive demos) Code: Paper:

I'm excited to share our new work, VistaDream, which generates a 3D Gaussian field from a single-view image. The codes have already been released. Project page: (with interactive demos) Code: Paper:

Yuan Liu

79,592 Aufrufe • vor 1 Jahr

📢 Excited to share our latest work Conference on Robot Learning: Motion Priors Reimagined: Adapting Flat-Terrain Skills for Complex Quadruped Mobility! #CoRL2025 🐶Now ANYmal learns to walk (hop) from real dog motions! 📄Paper: 🌐Project Website:

📢 Excited to share our latest work Conference on Robot Learning: Motion Priors Reimagined: Adapting Flat-Terrain Skills for Complex Quadruped Mobility! #CoRL2025 🐶Now ANYmal learns to walk (hop) from real dog motions! 📄Paper: 🌐Project Website:

Robotic Systems Lab

10,212 Aufrufe • vor 9 Monaten

🤖Simulation-Ready Human–Scene Interaction Reconstruction🤖 #HSImul3R reconstructs sim-ready 3D human-scene interactions from casual videos with *physics-in-the-loop* that can be directly deployed to humanoids. - Project: - Code:

🤖Simulation-Ready Human–Scene Interaction Reconstruction🤖 #HSImul3R reconstructs sim-ready 3D human-scene interactions from casual videos with physics-in-the-loop that can be directly deployed to humanoids. - Project: - Code:

Ziwei Liu

15,544 Aufrufe • vor 3 Monaten

Excited to share our #CHI2026 paper “Texterial: A Text-as-Material Interaction Paradigm for LLM-Mediated Writing” (done during internship at Microsoft Research) We imagine interacting with LLMs by treating text as a material like plants/clay. 📃 🧵[1/n]

Excited to share our #CHI2026 paper “Texterial: A Text-as-Material Interaction Paradigm for LLM-Mediated Writing” (done during internship at Microsoft Research) We imagine interacting with LLMs by treating text as a material like plants/clay. 📃 🧵[1/n]

Jocelyn Shen

19,636 Aufrufe • vor 4 Monaten

📢ScanEdit: Hierarchically-Guided Functional 3D Scan Editing Edit complex, real-world 3D scans with text -- Mohamed El Amine Boudjoghra combines LLM reasoning with geometric optimization to produce physically plausible, instruction-aligned scene edits Check it out:

📢ScanEdit: Hierarchically-Guided Functional 3D Scan Editing Edit complex, real-world 3D scans with text -- Mohamed El Amine Boudjoghra combines LLM reasoning with geometric optimization to produce physically plausible, instruction-aligned scene edits Check it out:

Angela Dai

12,985 Aufrufe • vor 1 Jahr

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

AK

62,768 Aufrufe • vor 3 Jahren

What is missing to bring real-time motion research into AAA games and real-world robotics? We present MotionBricks, a step toward bridging this gap with two key components: - a single generative latent motion backbone covering 350,000+ motion skills, running at 15,000 FPS with 2 ms latency and substantially improved quality and reliability. - a unified smart primitive interface for locomotion, object / scene interaction, with fine-grained control over generated behaviors. Webpage: Code: Paper: (ACM TOG / SIGGRAPH 2026)

What is missing to bring real-time motion research into AAA games and real-world robotics? We present MotionBricks, a step toward bridging this gap with two key components: - a single generative latent motion backbone covering 350,000+ motion skills, running at 15,000 FPS with 2 ms latency and substantially improved quality and reliability. - a unified smart primitive interface for locomotion, object / scene interaction, with fine-grained control over generated behaviors. Webpage: Code: Paper: (ACM TOG / SIGGRAPH 2026)

tingwu.wang

152,688 Aufrufe • vor 2 Monaten

Introducing ShapeR, a method for robust conditional 3D shape generation from casually captured sequences. ShapeR leverages a rectified flow transformer conditioned on per-object multimodal data to turn casual image sequences into full metric scene reconstructions. Project Page: Paper: Links to code and huggingface below ⬇️

Introducing ShapeR, a method for robust conditional 3D shape generation from casually captured sequences. ShapeR leverages a rectified flow transformer conditioned on per-object multimodal data to turn casual image sequences into full metric scene reconstructions. Project Page: Paper: Links to code and huggingface below ⬇️

Yawar Siddiqui

70,626 Aufrufe • vor 5 Monaten

(1/2) 📢📢📢NeRSemble #SIGGRAPH'23! We reconstruct dynamic radiance fields for high-quality novel view synthesis of human heads. Key is a deformation field and an ensemble of multi-resolution hash encodings to model coarse & fine-scale deformations.

(1/2) 📢📢📢NeRSemble #SIGGRAPH'23! We reconstruct dynamic radiance fields for high-quality novel view synthesis of human heads. Key is a deformation field and an ensemble of multi-resolution hash encodings to model coarse & fine-scale deformations.

Matthias Niessner

69,652 Aufrufe • vor 3 Jahren

📢 Text Follow Path is in Early Access! Align text along a shape, dynamically adjust placement, and even bind it to data.

📢 Text Follow Path is in Early Access! Align text along a shape, dynamically adjust placement, and even bind it to data.

Rive

82,743 Aufrufe • vor 1 Jahr

VideoComposer brings ControlNet guidance to text and video-to-video. The model enables to combine multiple modalities like text, sketch, style and even motion to drive video generation. The results look extremely good.

VideoComposer brings ControlNet guidance to text and video-to-video. The model enables to combine multiple modalities like text, sketch, style and even motion to drive video generation. The results look extremely good.

Dreaming Tulpa 🥓👑

18,045 Aufrufe • vor 3 Jahren