正在加载视频...

视频加载失败

Code and data are now online for CameraHMR, our state-of-the-art parametric 3D human pose and shape (HPS) estimation method that will appear at hashtag#3DV2025. There are 4 key contributions that make it so accurate and robust: 1. To get accurate 3D shape and pose as well as good alignment...

21,647 次观看 • 1 年前 •via X (Twitter)

9 条评论

Jerin Philip 的头像
Jerin Philip1 年前

Nice work, currently looking into using it. When is the release of CamSMPLify + HumanFoV + DenseKP expected? What's the hardware requirements to train this end-to-end? 4D-Humans mention 8 A100 GPUs for 7 days, does this require similar?

Michael Black 的头像
Michael Black1 年前

All code will be out before 3DV takes place.

Digital Currency 的头像
Digital Currency2 年前

From 3D modeling to VR/AR development, our MSc in Metaverse program equips you with the technical skills to excel in the rapidly evolving digital world. Don't miss out—enroll today! #UNIC #MScMetaverse

Duke 'Burrito Haver' Zero 的头像
Duke 'Burrito Haver' Zero1 年前

could you embed a sentiment layer keyed to facial expressions / body language?

Michael Black 的头像
Michael Black1 年前

This method is just about pixels to parameters but, yes, relating these parameters to expressions and body language is something we are interested in. E.g., have a look at our work on generating moving people from audio:

lotsoflittleprojects 的头像
lotsoflittleprojects1 年前

Will this model get folded into Meshcapade?

Michael Black 的头像
Michael Black1 年前

The next Meshcapade release will have many goodies that go beyond CameraHMR. Coming soon!

Matt Jaynes 的头像
Matt Jaynes1 年前

@camenduru Damn! Impressive!

JohnYue122333 的头像
JohnYue1223331 年前

MPI, awesome!

相关视频

Physics-based Motion Retargeting from Sparse Inputs paper page: Avatars are important to create interactive and immersive experiences in virtual worlds. One challenge in animating these characters to mimic a user's motion is that commercial AR/VR products consist only of a headset and controllers, providing very limited sensor data of the user's pose. Another challenge is that an avatar might have a different skeleton structure than a human and the mapping between them is unclear. In this work we address both of these challenges. We introduce a method to retarget motions in real-time from sparse human sensor data to characters of various morphologies. Our method uses reinforcement learning to train a policy to control characters in a physics simulator. We only require human motion capture data for training, without relying on artist-generated animations for each avatar. This allows us to use large motion capture datasets to train general policies that can track unseen users from real and sparse data in real-time. We demonstrate the feasibility of our approach on three characters with different skeleton structure: a dinosaur, a mouse-like creature and a human. We show that the avatar poses often match the user surprisingly well, despite having no sensor information of the lower body available. We discuss and ablate the important components in our framework, specifically the kinematic retargeting step, the imitation, contact and action reward as well as our asymmetric actor-critic observations. We further explore the robustness of our method in a variety of settings including unbalancing, dancing and sports motions.

AK

106,519 次观看 • 2 年前

Muslims are now specifically targeting California to run for office and take over the government “We want to make sure we activate California Muslims in so we can shape the rest of the country, because we can shape California. We need you to run for office” “Muslims. We will build a network of activists at every masjid, every mosque, to ensure that every eligible Muslim, every eligible person is registered to vote, that every elected official at every level engages with the Muslim community, that more American Muslims run for office — And we want to make sure that we activate California Muslims in so we can shape the rest of the country, because we can shape California. We need you to run for office, and I want to salute the dozens and dozens of American Muslims who were, who had the courage and the commitment and the resolve to run for office. Dozens of them won, others did not. They will win next time. What Allah promises is guaranteed 100%.” “How many phone calls have you made? How many protests have you tried to attend? Have you met or called your elected officials, your member of Congress? Have you gone to call the White House? Have you joined the efforts, the political efforts happening in every city to organize the Muslim votes as we deal with elections? Ask yourself, where am I in the equation? Because until there is a critical mass, Allah is withholding what he has promised us, because we haven't fulfilled our part of the deal” “Mahmoud Saifi in Redlands. Dr. Asif Mahmud, running for Congress. Fatima Eqbal Zubair in LA for Assembly, and many others. But as important as running is, it is equally important to build the power of the grassroots at the grassroots level. This year, we've established the Muslim Community Action Network, we know it as MCAN, which aims to train activists as community organizers who can inform and mobilize their local community, set up candidate forums, engage with local politicians, and advocate for local, state, and national issues of importance to our community.“

Wall Street Apes

338,914 次观看 • 6 个月前

#WATCH | India AI Impact Summit 2026 | Delhi: Founder Chairman and CEO of Sampark Foundation & former CEO of HCL Technologies, Vineet Nayar says, "...From an employment point of view I think it is very important for us to understand that Indian companies, including Indian IT companies, are going to be profit-driven and therefore if you believe that they are going to create employment you must be dreaming. Therefore, the question is how do we create employment in this environment, and that employment comes from mass scale startups, which is what this government has already doing. So, how do we create new sets of people who are trying to solve new sets of problems not new sets of technology and if we do that we will get it right. I think we as Indians have to be very careful on who does data belong to and that is the debate we have a problem with. The LLM models which exist worldwide are far superior than the Indian models. Unfortunately, in India, we never develop products, so therefore we do not have SLMs and LLMs which are world-class. On one side, we have global LLM products which are coming to India and trading on our Indian data. Should we allowed that or should we not allowed that? But on the other side if we don't allow that then we have the data but we don't have the LLM models. So, how do we encourage technology completely to develop the LLM models. This needs radicals strategic thinking and a very important aspect otherwise we will either give up a data. So, I think it's a very critical aspect for us to think about - who does this data belong, what is the kind of incentives we are going to give to develop LLM technologies or SLM technologies fast so that we train on our data otherwise an LLM will come in with our data and we'll immediately see return and we'll celebrate and we will do all these kind of press releases but the India will lose a competitive advantage on something which is very critical for the next decade."

ANI

18,742 次观看 • 3 个月前

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

AK

249,572 次观看 • 2 年前