Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Example-based Motion Synthesis via Generative Motion Matching paper page: present GenMM, a generative model that "mines" as many diverse motions as possible from a single or few example sequences. In stark contrast to existing data-driven methods, which typically require long offline training time, are prone to visual artifacts, and... tend to fail on large and complex skeletons, GenMM inherits the training-free nature and the superior quality of the well-known Motion Matching method. GenMM can synthesize a high-quality motion within a fraction of a second, even with highly complex and large skeletal structures. At the heart of our generative framework lies the generative motion matching module, which utilizes the bidirectional visual similarity as a generative cost function to motion matching, and operates in a multi-stage framework to progressively refine a random guess using exemplar motion matches. In addition to diverse motion generation, we show the versatility of our generative framework by extending it to a number of scenarios that are not possible with motion matching alone, including motion completion, key frame-guided generation, infinite looping, and motion reassembly.show more

AK

502,779 subscribers

93,694 просмотров • 3 лет назад •via X (Twitter)

Наука и технологии Образование Искусство

Anya Rossi• Live Now

Private livecam show

Комментарии: 8

Фото профиля Mark J. G

Mark J. G3 лет назад

Where is the blender plugin/code?

Фото профиля Jesse Wood

Jesse Wood3 лет назад

GIFs are about to get a whole lot cooler 😎 "When the starting and ending frames are specified to be the same our method generates an infinite loop"

Фото профиля Michael Watson

Michael Watson3 лет назад

That is spectacular

Фото профиля Bobcat

Bobcat3 лет назад

This is brilliant

Фото профиля Brayan Meza Castillo 👨‍💻

Brayan Meza Castillo 👨‍💻3 лет назад

@ezdubs_bot English spanish

Фото профиля EzDubs

EzDubs3 лет назад

@_akhaliq @BrayanMezaC_Dev Done! Here is your Spanish dub: 📺🔴 Dub 𝙔𝙤𝙪𝙏𝙪𝙗𝙚 videos: 💬🟢 Dub 𝙒𝙝𝙖𝙩𝙨𝘼𝙥𝙥 videos and voice memos:

Фото профиля EzDubs

EzDubs3 лет назад

@BrayanMezaC_Dev Done! Here is your Spanish dub: 📺🔴 Dub 𝙔𝙤𝙪𝙏𝙪𝙗𝙚 videos: 💬🟢 Dub 𝙒𝙝𝙖𝙩𝙨𝘼𝙥𝙥 videos and voice memos:

Фото профиля Thread Reader App

Thread Reader App3 лет назад

@_akhaliq @MejoraConIA Sorry we only unroll consecutive tweets from the same author, but if you want to grab the whole convo try @pdfmakerapp! 🤖

Похожие видео

MotionGPT: Human Motion as a Foreign Language paper page: Though the advancement of pre-trained large language models unfolds, the exploration of building a unified model for language and other multi-modal data, such as motion, remains challenging and untouched so far. Fortunately, human motion displays a semantic coupling akin to human language, often perceived as a form of body language. By fusing language data with large-scale motion models, motion-language pre-training that can enhance the performance of motion-related tasks becomes feasible. Driven by this insight, we propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks. Specifically, we employ the discrete vector quantization for human motion and transfer 3D motion into motion tokens, similar to the generation process of word tokens. Building upon this "motion vocabulary", we perform language modeling on both motion and text in a unified manner, treating human motion as a specific language. Moreover, inspired by prompt learning, we pre-train MotionGPT with a mixture of motion-language data and fine-tune it on prompt-based question-and-answer tasks. Extensive experiments demonstrate that MotionGPT achieves state-of-the-art performances on multiple motion tasks including text-driven motion generation, motion captioning, motion prediction, and motion in-between.

MotionGPT: Human Motion as a Foreign Language paper page: Though the advancement of pre-trained large language models unfolds, the exploration of building a unified model for language and other multi-modal data, such as motion, remains challenging and untouched so far. Fortunately, human motion displays a semantic coupling akin to human language, often perceived as a form of body language. By fusing language data with large-scale motion models, motion-language pre-training that can enhance the performance of motion-related tasks becomes feasible. Driven by this insight, we propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks. Specifically, we employ the discrete vector quantization for human motion and transfer 3D motion into motion tokens, similar to the generation process of word tokens. Building upon this "motion vocabulary", we perform language modeling on both motion and text in a unified manner, treating human motion as a specific language. Moreover, inspired by prompt learning, we pre-train MotionGPT with a mixture of motion-language data and fine-tune it on prompt-based question-and-answer tasks. Extensive experiments demonstrate that MotionGPT achieves state-of-the-art performances on multiple motion tasks including text-driven motion generation, motion captioning, motion prediction, and motion in-between.

AK

125,311 просмотров • 3 лет назад

Genesis's generative framework supports data generation beyond robotics, such as character motion: 7/n

Genesis's generative framework supports data generation beyond robotics, such as character motion: 7/n

Zhou Xian

51,784 просмотров • 1 год назад

Loopy Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency paper page: With the introduction of diffusion-based video generation techniques, audio-conditioned human video generation has recently achieved significant breakthroughs in both the naturalness of motion and the synthesis of portrait details. Due to the limited control of audio signals in driving human motion, existing methods often add auxiliary spatial signals to stabilize movements, which may compromise the naturalness and freedom of motion. In this paper, we propose an end-to-end audio-only conditioned video diffusion model named Loopy. Specifically, we designed an inter- and intra-clip temporal module and an audio-to-latents module, enabling the model to leverage long-term motion information from the data to learn natural motion patterns and improving audio-portrait movement correlation. This method removes the need for manually specified spatial motion templates used in existing methods to constrain motion during inference. Extensive experiments show that Loopy outperforms recent audio-driven portrait diffusion models, delivering more lifelike and high-quality results across various scenarios.

AK

128,745 просмотров • 1 год назад

We present Neural Riemannian Motion Fields (NRMF) as a promising paradigm for generative modeling of articulated motion. Unlike various diffusion based models, NRMF is useful both in generation and in deployment as a prior across tasks: #CVPR #AI #CVML👇

We present Neural Riemannian Motion Fields (NRMF) as a promising paradigm for generative modeling of articulated motion. Unlike various diffusion based models, NRMF is useful both in generation and in deployment as a prior across tasks: #CVPR #AI #CVML👇

Tolga Birdal

16,624 просмотров • 8 месяцев назад

✨We are excited to open-source Tencent HY-Motion 1.0, a billion-parameter text-to-motion model built on the Diffusion Transformer (DiT) architecture and flow matching. Tencent HY-Motion 1.0 empowers developers and individual creators alike by transforming natural language into high-fidelity, fluid, and diverse 3D character animations, delivering exceptional instruction-following capabilities across a broad range of categories. The generated 3D animation assets can be seamlessly integrated into typical 3D animation pipelines.🎮🎥 Highlights: 🔹Billion-Scale DiT: Successfully scaled flow-matching DiT to 1B+ parameters, setting a new ceiling for instruction-following capability and generated motion quality. 🔹Full-Stage Training Strategy: The industry’s first motion generation model featuring a complete Pre-training → SFT → RL loop to optimize physical plausibility and semantic accuracy. 🔹Comprehensive Category Coverage: Features 200+ motion categories across 6 major classes—the most comprehensive in the industry, curated via a meticulous data pipeline. 🌐Project Page: 🔗Github: 🤗Hugging Face: 📄Technical report:

✨We are excited to open-source Tencent HY-Motion 1.0, a billion-parameter text-to-motion model built on the Diffusion Transformer (DiT) architecture and flow matching. Tencent HY-Motion 1.0 empowers developers and individual creators alike by transforming natural language into high-fidelity, fluid, and diverse 3D character animations, delivering exceptional instruction-following capabilities across a broad range of categories. The generated 3D animation assets can be seamlessly integrated into typical 3D animation pipelines.🎮🎥 Highlights: 🔹Billion-Scale DiT: Successfully scaled flow-matching DiT to 1B+ parameters, setting a new ceiling for instruction-following capability and generated motion quality. 🔹Full-Stage Training Strategy: The industry’s first motion generation model featuring a complete Pre-training → SFT → RL loop to optimize physical plausibility and semantic accuracy. 🔹Comprehensive Category Coverage: Features 200+ motion categories across 6 major classes—the most comprehensive in the industry, curated via a meticulous data pipeline. 🌐Project Page: 🔗Github: 🤗Hugging Face: 📄Technical report:

Tencent Hy

327,734 просмотров • 5 месяцев назад

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking paper page: introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks.

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking paper page: introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks.

AK

122,525 просмотров • 2 лет назад

Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation paper page: Recent advances in generative modeling have led to promising progress on synthesizing 3D human motion from text, with methods that can generate character animations from short prompts and specified durations. However, using a single text prompt as input lacks the fine-grained control needed by animators, such as composing multiple actions and defining precise durations for parts of the motion. To address this, we introduce the new problem of timeline control for text-driven motion synthesis, which provides an intuitive, yet fine-grained, input interface for users. Instead of a single prompt, users can specify a multi-track timeline of multiple prompts organized in temporal intervals that may overlap. This enables specifying the exact timings of each action and composing multiple actions in sequence or at overlapping intervals. To generate composite animations from a multi-track timeline, we propose a new test-time denoising method. This method can be integrated with any pre-trained motion diffusion model to synthesize realistic motions that accurately reflect the timeline. At every step of denoising, our method processes each timeline interval (text prompt) individually, subsequently aggregating the predictions with consideration for the specific body parts engaged in each action. Experimental comparisons and ablations validate that our method produces realistic motions that respect the semantics and timing of given text prompts.

Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation paper page: Recent advances in generative modeling have led to promising progress on synthesizing 3D human motion from text, with methods that can generate character animations from short prompts and specified durations. However, using a single text prompt as input lacks the fine-grained control needed by animators, such as composing multiple actions and defining precise durations for parts of the motion. To address this, we introduce the new problem of timeline control for text-driven motion synthesis, which provides an intuitive, yet fine-grained, input interface for users. Instead of a single prompt, users can specify a multi-track timeline of multiple prompts organized in temporal intervals that may overlap. This enables specifying the exact timings of each action and composing multiple actions in sequence or at overlapping intervals. To generate composite animations from a multi-track timeline, we propose a new test-time denoising method. This method can be integrated with any pre-trained motion diffusion model to synthesize realistic motions that accurately reflect the timeline. At every step of denoising, our method processes each timeline interval (text prompt) individually, subsequently aggregating the predictions with consideration for the specific body parts engaged in each action. Experimental comparisons and ablations validate that our method produces realistic motions that respect the semantics and timing of given text prompts.

AK

126,548 просмотров • 2 лет назад

The Type Motion demo on Google AI Studio transforms simple text phrases into cinematic motion text using a two-step generative process.

The Type Motion demo on Google AI Studio transforms simple text phrases into cinematic motion text using a two-step generative process.

Google AI Developers

67,211 просмотров • 4 месяцев назад

Who's pulling the strings? From Dreams to reality. Varying levels of AI hallucination, from versions that mimic the motion of my puppet quite closely, to versions where the generative process tries to make the motion more believable.

Who's pulling the strings? From Dreams to reality. Varying levels of AI hallucination, from versions that mimic the motion of my puppet quite closely, to versions where the generative process tries to make the motion more believable.

Martin Nebelong

139,381 просмотров • 1 год назад

ocean lines in motion #shader #WIP #glsl #waves #genart #generative #gold (+some extra motion towards the end)

ocean lines in motion #shader #WIP #glsl #waves #genart #generative #gold (+some extra motion towards the end)

Florian Berger

254,195 просмотров • 3 лет назад

Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes Contributions: • We propose STORM, the first feed-forward, self-supervised method for fast and accurate reconstruction of dynamic 3D scenes from sparse, multi-timestep, posed camera images. • Our bottom-up framework aggregates and transforms per-frame 3D Gaussian Splats into a cohesive scene representation, enabling self-supervised motion estimation. Furthermore, we introduce motion tokens that capture common motion primitives and regularize motion predictions, facilitating dynamic motion group segmentation without explicit motion or correspondence supervision. • We present several enhancements for in-the-wild scenarios, including sky modeling, camera exposure inconsistency handling, large novel-view extrapolation, and fine-grained human motions reconstruction, making STORM well-suited for real-world applications.

Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes Contributions: • We propose STORM, the first feed-forward, self-supervised method for fast and accurate reconstruction of dynamic 3D scenes from sparse, multi-timestep, posed camera images. • Our bottom-up framework aggregates and transforms per-frame 3D Gaussian Splats into a cohesive scene representation, enabling self-supervised motion estimation. Furthermore, we introduce motion tokens that capture common motion primitives and regularize motion predictions, facilitating dynamic motion group segmentation without explicit motion or correspondence supervision. • We present several enhancements for in-the-wild scenarios, including sky modeling, camera exposure inconsistency handling, large novel-view extrapolation, and fine-grained human motions reconstruction, making STORM well-suited for real-world applications.

MrNeRF

53,292 просмотров • 1 год назад

🔥 Introducing MVLift: Generate realistic 3D motion without any 3D training data - just using 2D poses from monocular videos! Applicable to human motion, human-object interaction & animal motion. Joint work w/ Jiajun Wu & Karen 💡 How? We reformulate 3D motion estimation as generating consistent multi-view 2D pose sequences. Our framework uses 2D motion diffusion to progressively establish multi-view consistency, requiring only single-view 2D pose sequences for training. Project: Video with demonstration: Paper:

🔥 Introducing MVLift: Generate realistic 3D motion without any 3D training data - just using 2D poses from monocular videos! Applicable to human motion, human-object interaction & animal motion. Joint work w/ Jiajun Wu & Karen 💡 How? We reformulate 3D motion estimation as generating consistent multi-view 2D pose sequences. Our framework uses 2D motion diffusion to progressively establish multi-view consistency, requiring only single-view 2D pose sequences for training. Project: Video with demonstration: Paper:

Jiaman Li

15,788 просмотров • 1 год назад

Motion planning in complex tasks is hard and still done via slow, explicit, traditional planners. We present a generalist Neural Motion Planner -- a single neural network that plans complex dynamic motions quickly and accurately at test time. Building upon our lab's sim2real efforts, the key idea is to create many complex scenes in simulation and then distill classical motion planner trajectories into a single reactive neural network policy. More details in the thread below! 👇 Open-sourced:

Motion planning in complex tasks is hard and still done via slow, explicit, traditional planners. We present a generalist Neural Motion Planner -- a single neural network that plans complex dynamic motions quickly and accurately at test time. Building upon our lab's sim2real efforts, the key idea is to create many complex scenes in simulation and then distill classical motion planner trajectories into a single reactive neural network policy. More details in the thread below! 👇 Open-sourced:

Deepak Pathak

28,642 просмотров • 1 год назад

A symphony of complex motion. The motion of a complex exponential equation, visualized. [🎞️ thebrainmaze]

A symphony of complex motion. The motion of a complex exponential equation, visualized. [🎞️ thebrainmaze]

Massimo

17,785 просмотров • 10 месяцев назад

Google presents Still-Moving Customized Video Generation without Customized Video Data Customizing text-to-image (T2I) models has seen tremendous progress recently, particularly in areas such as personalization, stylization, and conditional generation. However, expanding this progress to video generation is still in its infancy, primarily due to the lack of customized video data. In this work, we introduce Still-Moving, a novel generic framework for customizing a text-to-video (T2V) model, without requiring any customized video data. The framework applies to the prominent T2V design where the video model is built over a text-to-image (T2I) model (e.g., via inflation). We assume access to a customized version of the T2I model, trained only on still image data (e.g., using DreamBooth or StyleDrop). Naively plugging in the weights of the customized T2I model into the T2V model often leads to significant artifacts or insufficient adherence to the customization data. To overcome this issue, we train lightweight Spatial Adapters that adjust the features produced by the injected T2I layers. Importantly, our adapters are trained on "frozen videos" (i.e., repeated images), constructed from image samples generated by the customized T2I model. This training is facilitated by a novel Motion Adapter module, which allows us to train on such static videos while preserving the motion prior of the video model. At test time, we remove the Motion Adapter modules and leave in only the trained Spatial Adapters. This restores the motion prior of the T2V model while adhering to the spatial prior of the customized T2I model. We demonstrate the effectiveness of our approach on diverse tasks including personalized, stylized, and conditional generation. In all evaluated scenarios, our method seamlessly integrates the spatial prior of the customized T2I model with a motion prior supplied by the T2V model.

Google presents Still-Moving Customized Video Generation without Customized Video Data Customizing text-to-image (T2I) models has seen tremendous progress recently, particularly in areas such as personalization, stylization, and conditional generation. However, expanding this progress to video generation is still in its infancy, primarily due to the lack of customized video data. In this work, we introduce Still-Moving, a novel generic framework for customizing a text-to-video (T2V) model, without requiring any customized video data. The framework applies to the prominent T2V design where the video model is built over a text-to-image (T2I) model (e.g., via inflation). We assume access to a customized version of the T2I model, trained only on still image data (e.g., using DreamBooth or StyleDrop). Naively plugging in the weights of the customized T2I model into the T2V model often leads to significant artifacts or insufficient adherence to the customization data. To overcome this issue, we train lightweight Spatial Adapters that adjust the features produced by the injected T2I layers. Importantly, our adapters are trained on "frozen videos" (i.e., repeated images), constructed from image samples generated by the customized T2I model. This training is facilitated by a novel Motion Adapter module, which allows us to train on such static videos while preserving the motion prior of the video model. At test time, we remove the Motion Adapter modules and leave in only the trained Spatial Adapters. This restores the motion prior of the T2V model while adhering to the spatial prior of the customized T2I model. We demonstrate the effectiveness of our approach on diverse tasks including personalized, stylized, and conditional generation. In all evaluated scenarios, our method seamlessly integrates the spatial prior of the customized T2I model with a motion prior supplied by the T2V model.

AK

40,467 просмотров • 1 год назад

Physics-based Motion Retargeting from Sparse Inputs paper page: Avatars are important to create interactive and immersive experiences in virtual worlds. One challenge in animating these characters to mimic a user's motion is that commercial AR/VR products consist only of a headset and controllers, providing very limited sensor data of the user's pose. Another challenge is that an avatar might have a different skeleton structure than a human and the mapping between them is unclear. In this work we address both of these challenges. We introduce a method to retarget motions in real-time from sparse human sensor data to characters of various morphologies. Our method uses reinforcement learning to train a policy to control characters in a physics simulator. We only require human motion capture data for training, without relying on artist-generated animations for each avatar. This allows us to use large motion capture datasets to train general policies that can track unseen users from real and sparse data in real-time. We demonstrate the feasibility of our approach on three characters with different skeleton structure: a dinosaur, a mouse-like creature and a human. We show that the avatar poses often match the user surprisingly well, despite having no sensor information of the lower body available. We discuss and ablate the important components in our framework, specifically the kinematic retargeting step, the imitation, contact and action reward as well as our asymmetric actor-critic observations. We further explore the robustness of our method in a variety of settings including unbalancing, dancing and sports motions.

AK

106,519 просмотров • 2 лет назад

Seamless Human Motion Composition with Blended Positional Encodings Conditional human motion generation is an important topic with many applications in virtual reality, gaming, and robotics. While prior works have focused on generating motion guided by text, music, or scenes, these typically result in isolated motions confined to short durations. Instead, we address the generation of long, continuous sequences guided by a series of varying textual descriptions. In this context, we introduce FlowMDM, the first diffusion-based model that generates seamless Human Motion Compositions (HMC) without any postprocessing or redundant denoising steps. For this, we introduce the Blended Positional Encodings, a technique that leverages both absolute and relative positional encodings in the denoising chain. More specifically, global motion coherence is recovered at the absolute stage, whereas smooth and realistic transitions are built at the relative stage. As a result, we achieve state-of-the-art results in terms of accuracy, realism, and smoothness on the Babel and HumanML3D datasets. FlowMDM excels when trained with only a single description per motion sequence thanks to its Pose-Centric Cross-ATtention, which makes it robust against varying text descriptions at inference time. Finally, to address the limitations of existing HMC metrics, we propose two new metrics: the Peak Jerk and the Area Under the Jerk, to detect abrupt transitions.

Seamless Human Motion Composition with Blended Positional Encodings Conditional human motion generation is an important topic with many applications in virtual reality, gaming, and robotics. While prior works have focused on generating motion guided by text, music, or scenes, these typically result in isolated motions confined to short durations. Instead, we address the generation of long, continuous sequences guided by a series of varying textual descriptions. In this context, we introduce FlowMDM, the first diffusion-based model that generates seamless Human Motion Compositions (HMC) without any postprocessing or redundant denoising steps. For this, we introduce the Blended Positional Encodings, a technique that leverages both absolute and relative positional encodings in the denoising chain. More specifically, global motion coherence is recovered at the absolute stage, whereas smooth and realistic transitions are built at the relative stage. As a result, we achieve state-of-the-art results in terms of accuracy, realism, and smoothness on the Babel and HumanML3D datasets. FlowMDM excels when trained with only a single description per motion sequence thanks to its Pose-Centric Cross-ATtention, which makes it robust against varying text descriptions at inference time. Finally, to address the limitations of existing HMC metrics, we propose two new metrics: the Peak Jerk and the Area Under the Jerk, to detect abrupt transitions.

AK

30,779 просмотров • 2 лет назад

Video2Humanoid. Still trouble with bad retargeted humanoid motions? Humanoids now are Easy to track any video using our new Neural Motion Retargeting (NMR) Method. Optimization-based motion retargeting methods like IK and GMR are solving a non-convex problem frame by frame, which makes them sensitive to initialization, hard to tune, and prone to poor local minima. The result is familiar: joint discontinuities, self-collisions, and unstable foot contact. Our solution is simple but effective : instead of optimizing each frame independently, learn the mapping between human motion and robot motion as a distribution via networks. 📊Results on Unitree G1 - 0 joint jumps - 54% fewer self-collision frames - 61% fewer joint limit violations - Faster convergence for downstream control policy training NMR turns motion retargeting from a fragile optimization problem into a learned, scalable pipeline for more stable humanoid motion. please visit for more visualization results. opensource:

Video2Humanoid. Still trouble with bad retargeted humanoid motions? Humanoids now are Easy to track any video using our new Neural Motion Retargeting (NMR) Method. Optimization-based motion retargeting methods like IK and GMR are solving a non-convex problem frame by frame, which makes them sensitive to initialization, hard to tune, and prone to poor local minima. The result is familiar: joint discontinuities, self-collisions, and unstable foot contact. Our solution is simple but effective : instead of optimizing each frame independently, learn the mapping between human motion and robot motion as a distribution via networks. 📊Results on Unitree G1 - 0 joint jumps - 54% fewer self-collision frames - 61% fewer joint limit violations - Faster convergence for downstream control policy training NMR turns motion retargeting from a fragile optimization problem into a learned, scalable pipeline for more stable humanoid motion. please visit for more visualization results. opensource:

Xiao-Xiao Long

24,048 просмотров • 2 месяцев назад

What is missing to bring real-time motion research into AAA games and real-world robotics? We present MotionBricks, a step toward bridging this gap with two key components: - a single generative latent motion backbone covering 350,000+ motion skills, running at 15,000 FPS with 2 ms latency and substantially improved quality and reliability. - a unified smart primitive interface for locomotion, object / scene interaction, with fine-grained control over generated behaviors. Webpage: Code: Paper: (ACM TOG / SIGGRAPH 2026)

What is missing to bring real-time motion research into AAA games and real-world robotics? We present MotionBricks, a step toward bridging this gap with two key components: - a single generative latent motion backbone covering 350,000+ motion skills, running at 15,000 FPS with 2 ms latency and substantially improved quality and reliability. - a unified smart primitive interface for locomotion, object / scene interaction, with fine-grained control over generated behaviors. Webpage: Code: Paper: (ACM TOG / SIGGRAPH 2026)

tingwu.wang

151,490 просмотров • 1 месяц назад

Implementing motion imitation methods involves lots of nuisances. Not many codebases get all the details right. So, we're excited to release MimicKit! A framework with high quality implementations of our methods: DeepMimic, AMP, ASE, ADD, and more to come!

Implementing motion imitation methods involves lots of nuisances. Not many codebases get all the details right. So, we're excited to release MimicKit! A framework with high quality implementations of our methods: DeepMimic, AMP, ASE, ADD, and more to come!

Jason Peng

168,891 просмотров • 8 месяцев назад