Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Code and data are now online for CameraHMR, our state-of-the-art parametric 3D human pose and shape (HPS) estimation method that will appear at hashtag#3DV2025. There are 4 key contributions that make it so accurate and robust: 1. To get accurate 3D shape and pose as well as good alignment... to image features, you need to know the focal length of the camera. To solve this, we train HumanFOV to compute the field of view. 2. We introduce CameraHMR, which integrates HumanFOV into HMR2.0 to exploit the estimated focal length. 3. To get accurate pseudo ground truth (pGT) training data, we compute the focal length for images in 4DHumans dataset and modify SMPLify to take this into account. 4. But SMPLify only uses sparse 2D keypoints, which do not capture body shape. So we train a dense surface keypoint detector, DenseKP, on BEDLAM and run it on 4DHumans, resulting in improved body shape. The resulting method is CamSMPLify. We iterate training CameraHMR and running CamSMPLify on the training set initialized with CameraHMR. This results in much improved pGT for 4DHumans and a SOTA single-image HMR method.show more

Michael Black

99,050 subscribers

21,696 views • 1 year ago •via X (Twitter)

Science & Technology Education Arts

Anya Rossi• Live Now

Private livecam show

9 Comments

Jerin Philip1 year ago

Nice work, currently looking into using it. When is the release of CamSMPLify + HumanFoV + DenseKP expected? What's the hardware requirements to train this end-to-end? 4D-Humans mention 8 A100 GPUs for 7 days, does this require similar?

Michael Black1 year ago

All code will be out before 3DV takes place.

Digital Currency2 years ago

From 3D modeling to VR/AR development, our MSc in Metaverse program equips you with the technical skills to excel in the rapidly evolving digital world. Don't miss out—enroll today! #UNIC #MScMetaverse

Duke 'Burrito Haver' Zero1 year ago

could you embed a sentiment layer keyed to facial expressions / body language?

Michael Black1 year ago

This method is just about pixels to parameters but, yes, relating these parameters to expressions and body language is something we are interested in. E.g., have a look at our work on generating moving people from audio:

lotsoflittleprojects1 year ago

Will this model get folded into Meshcapade?

Michael Black1 year ago

The next Meshcapade release will have many goodies that go beyond CameraHMR. Coming soon!

Matt Jaynes1 year ago

@camenduru Damn! Impressive!

JohnYue1223331 year ago

MPI, awesome!

Related Videos

The BEDLAM2.0 dataset (B2) is here, just in time to train your 3D human pose and shape estimation methods for CVPR. B2 goes beyond BEDLAM (B1) to include widely varied and natural camera motions and fields of view, more diverse body shapes, strand-based hair, more garments, shoes, more body motions, and more 3D scenes. Compared with B1, training on B2 produces more accurate 3D human pose, resulting in SOTA accuracy, particularly for estimates in world coordinates. B2 lets you jointly train camera motion and human motion regressors, and we also provide depth maps. Check out data, code, dataset statistics, and much more. BEDLAM2.0 will appear in the 2025 NeurIPS Datasets and Benchmarks Track. Joint work with Joachim Tesch, Giorgio Becherini, Prerana Achar, Anastasios Yiannakidis, Muhammed Kocabas, Priyanka Patel.

The BEDLAM2.0 dataset (B2) is here, just in time to train your 3D human pose and shape estimation methods for CVPR. B2 goes beyond BEDLAM (B1) to include widely varied and natural camera motions and fields of view, more diverse body shapes, strand-based hair, more garments, shoes, more body motions, and more 3D scenes. Compared with B1, training on B2 produces more accurate 3D human pose, resulting in SOTA accuracy, particularly for estimates in world coordinates. B2 lets you jointly train camera motion and human motion regressors, and we also provide depth maps. Check out data, code, dataset statistics, and much more. BEDLAM2.0 will appear in the 2025 NeurIPS Datasets and Benchmarks Track. Joint work with Joachim Tesch, Giorgio Becherini, Prerana Achar, Anastasios Yiannakidis, Muhammed Kocabas, Priyanka Patel.

Michael Black

27,536 views • 9 months ago

Physics-based Motion Retargeting from Sparse Inputs paper page: Avatars are important to create interactive and immersive experiences in virtual worlds. One challenge in animating these characters to mimic a user's motion is that commercial AR/VR products consist only of a headset and controllers, providing very limited sensor data of the user's pose. Another challenge is that an avatar might have a different skeleton structure than a human and the mapping between them is unclear. In this work we address both of these challenges. We introduce a method to retarget motions in real-time from sparse human sensor data to characters of various morphologies. Our method uses reinforcement learning to train a policy to control characters in a physics simulator. We only require human motion capture data for training, without relying on artist-generated animations for each avatar. This allows us to use large motion capture datasets to train general policies that can track unseen users from real and sparse data in real-time. We demonstrate the feasibility of our approach on three characters with different skeleton structure: a dinosaur, a mouse-like creature and a human. We show that the avatar poses often match the user surprisingly well, despite having no sensor information of the lower body available. We discuss and ablate the important components in our framework, specifically the kinematic retargeting step, the imitation, contact and action reward as well as our asymmetric actor-critic observations. We further explore the robustness of our method in a variety of settings including unbalancing, dancing and sports motions.

AK

106,527 views • 3 years ago

Drivable 3D Gaussian Avatars paper page: present Drivable 3D Gaussian Avatars (D3GA), the first 3D controllable model for human bodies rendered with Gaussian splats. Current photorealistic drivable avatars require either accurate 3D registrations during training, dense input images during testing, or both. The ones based on neural radiance fields also tend to be prohibitively slow for telepresence applications. This work uses the recently presented 3D Gaussian Splatting (3DGS) technique to render realistic humans at real-time framerates, using dense calibrated multi-view videos as input. To deform those primitives, we depart from the commonly used point deformation method of linear blend skinning (LBS) and use a classic volumetric deformation method: cage deformations. Given their smaller size, we drive these deformations with joint angles and keypoints, which are more suitable for communication applications. Our experiments on nine subjects with varied body shapes, clothes, and motions obtain higher-quality results than state-of-the-art methods when using the same training and test data.

Drivable 3D Gaussian Avatars paper page: present Drivable 3D Gaussian Avatars (D3GA), the first 3D controllable model for human bodies rendered with Gaussian splats. Current photorealistic drivable avatars require either accurate 3D registrations during training, dense input images during testing, or both. The ones based on neural radiance fields also tend to be prohibitively slow for telepresence applications. This work uses the recently presented 3D Gaussian Splatting (3DGS) technique to render realistic humans at real-time framerates, using dense calibrated multi-view videos as input. To deform those primitives, we depart from the commonly used point deformation method of linear blend skinning (LBS) and use a classic volumetric deformation method: cage deformations. Given their smaller size, we drive these deformations with joint angles and keypoints, which are more suitable for communication applications. Our experiments on nine subjects with varied body shapes, clothes, and motions obtain higher-quality results than state-of-the-art methods when using the same training and test data.

AK

327,105 views • 2 years ago

Do you know how we simulate electromagnetic radiation around real systems like antennas, aircraft, and waveguides? We break the geometry into a mesh, a collection of small elements that lets the solver compute Maxwell’s equations locally instead of attacking the whole shape at once. This makes the simulations practical, fast, and accurate enough to use in real engineering.

Do you know how we simulate electromagnetic radiation around real systems like antennas, aircraft, and waveguides? We break the geometry into a mesh, a collection of small elements that lets the solver compute Maxwell’s equations locally instead of attacking the whole shape at once. This makes the simulations practical, fast, and accurate enough to use in real engineering.

Mathelirium

73,166 views • 3 months ago

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation Contributions: • We propose a novel approach to inject SfM priors into diffusion-based depth estimation, enabling highly accurate and multi-view consistent depth predictions for each viewpoint. • Based on the proposed depth estimator, we design a new multi-view 3D geometry reconstruction framework and process some synthetic datasets to facilitate training. • We evaluate our method on diverse real-world scene data, including objects, indoor environments, streetscapes, and aerial scenes, demonstrating the superior performance and generalization capability of our approach.

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation Contributions: • We propose a novel approach to inject SfM priors into diffusion-based depth estimation, enabling highly accurate and multi-view consistent depth predictions for each viewpoint. • Based on the proposed depth estimator, we design a new multi-view 3D geometry reconstruction framework and process some synthetic datasets to facilitate training. • We evaluate our method on diverse real-world scene data, including objects, indoor environments, streetscapes, and aerial scenes, demonstrating the superior performance and generalization capability of our approach.

MrNeRF

25,651 views • 1 year ago

“The problems we are facing may seem insurmountable, but there are always solutions to be found and applied. It is a question of mindset. We cannot remain stuck in the past, focused on differences and grievances. We must look forward, and shape the future we want for our people. That is how we really wish to see the invitation to meet here, as an opportunity to collaborate and find solutions, and make the investments required to get results. Time is not on our side, but together, we are much stronger.” President Kagame | Saudi-African Summit.

“The problems we are facing may seem insurmountable, but there are always solutions to be found and applied. It is a question of mindset. We cannot remain stuck in the past, focused on differences and grievances. We must look forward, and shape the future we want for our people. That is how we really wish to see the invitation to meet here, as an opportunity to collaborate and find solutions, and make the investments required to get results. Time is not on our side, but together, we are much stronger.” President Kagame | Saudi-African Summit.

Presidency | Rwanda

104,632 views • 2 years ago

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking paper page: introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks.

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking paper page: introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks.

AK

122,533 views • 3 years ago

Mike Dunleavy Jr. says the Warriors are not in a situation to fix everything in free agency but they’re looking to put together a good team going into training camp: “I think there’s where you want to be in September when training camp starts, and there’s where you want to be in the spring as you try and compete for the postseason and get in the postseason. We have a couple transaction cycles here in the summer and then once again in February where we feel like we can build and maximize our team. And so I think the biggest thing is to put together a good team going into camp. And as I mentioned before, a lot of the improvement for us is going to have to come from within our habits, taking care of the ball, doing the little things. Get that knocked out, you get on your way to having a good season. We need to stay healthy and then get those guys (Jimmy & Moses) back, and then we’ll see at the trade deadline and we’ll evaluate everything. But this is not a fix-everything over the next couple weeks in free agency. We’re not in that situation based on some stuff we have going on, but we can get a lot better. We can have a good start to the season, and we can be in really good shape to make a run. But we got to get off to a good start.” (via NBC Sports Bay Area & CA with Bonta Hill and Monte Poole)

Mike Dunleavy Jr. says the Warriors are not in a situation to fix everything in free agency but they’re looking to put together a good team going into training camp: “I think there’s where you want to be in September when training camp starts, and there’s where you want to be in the spring as you try and compete for the postseason and get in the postseason. We have a couple transaction cycles here in the summer and then once again in February where we feel like we can build and maximize our team. And so I think the biggest thing is to put together a good team going into camp. And as I mentioned before, a lot of the improvement for us is going to have to come from within our habits, taking care of the ball, doing the little things. Get that knocked out, you get on your way to having a good season. We need to stay healthy and then get those guys (Jimmy & Moses) back, and then we’ll see at the trade deadline and we’ll evaluate everything. But this is not a fix-everything over the next couple weeks in free agency. We’re not in that situation based on some stuff we have going on, but we can get a lot better. We can have a good start to the season, and we can be in really good shape to make a run. But we got to get off to a good start.” (via NBC Sports Bay Area & CA with Bonta Hill and Monte Poole)

aly ✶

54,795 views • 27 days ago

Google announces PALP Prompt Aligned Personalization of Text-to-Image Models paper page: Content creators often aim to create personalized images using personal subjects that go beyond the capabilities of conventional text-to-image models. Additionally, they may want the resulting image to encompass a specific location, style, ambiance, and more. Existing personalization methods may compromise personalization ability or the alignment to complex textual prompts. This trade-off can impede the fulfillment of user prompts and subject fidelity. We propose a new approach focusing on personalization methods for a single prompt to address this issue. We term our approach prompt-aligned personalization. While this may seem restrictive, our method excels in improving text alignment, enabling the creation of images with complex and intricate prompts, which may pose a challenge for current techniques. In particular, our method keeps the personalized model aligned with a target prompt using an additional score distillation sampling term. We demonstrate the versatility of our method in multi- and single-shot settings and further show that it can compose multiple subjects or use inspiration from reference images, such as artworks. We compare our approach quantitatively and qualitatively with existing baselines and state-of-the-art techniques.

Google announces PALP Prompt Aligned Personalization of Text-to-Image Models paper page: Content creators often aim to create personalized images using personal subjects that go beyond the capabilities of conventional text-to-image models. Additionally, they may want the resulting image to encompass a specific location, style, ambiance, and more. Existing personalization methods may compromise personalization ability or the alignment to complex textual prompts. This trade-off can impede the fulfillment of user prompts and subject fidelity. We propose a new approach focusing on personalization methods for a single prompt to address this issue. We term our approach prompt-aligned personalization. While this may seem restrictive, our method excels in improving text alignment, enabling the creation of images with complex and intricate prompts, which may pose a challenge for current techniques. In particular, our method keeps the personalized model aligned with a target prompt using an additional score distillation sampling term. We demonstrate the versatility of our method in multi- and single-shot settings and further show that it can compose multiple subjects or use inspiration from reference images, such as artworks. We compare our approach quantitatively and qualitatively with existing baselines and state-of-the-art techniques.

AK

90,199 views • 2 years ago

#F1Testing | Lewis Hamilton on the plan for the rest of the shakedown: “For us it’s still to continue and try to get as much mileage and knowledge on this engine, and on the car, on the aero side. We went through a programme this morning and found some learnings, Charles is doing a different set this afternoon, which is great… My role is to listen as much as possible, at the end of the day we both come together and we both talk about our problems, the positives and the negatives, and we’ll come up with a plan of what we want to tackle tomorrow as our last day.” “But we got good data so far so it’s just understanding that and making sure that you’re making really clear and concise decisions in terms of what to test moving forwards before we get to Bahrain.”

#F1Testing | Lewis Hamilton on the plan for the rest of the shakedown: “For us it’s still to continue and try to get as much mileage and knowledge on this engine, and on the car, on the aero side. We went through a programme this morning and found some learnings, Charles is doing a different set this afternoon, which is great… My role is to listen as much as possible, at the end of the day we both come together and we both talk about our problems, the positives and the negatives, and we’ll come up with a plan of what we want to tackle tomorrow as our last day.” “But we got good data so far so it’s just understanding that and making sure that you’re making really clear and concise decisions in terms of what to test moving forwards before we get to Bahrain.”

deni

42,540 views • 6 months ago

3D Object Tracking without Training Data? In our nature Machine Intelligence paper ( we recast 3D tracking as an inverse neural rendering task where we fit a scene graph to an image that best explains this image. The method generalizes to completely unseen datasets and is explainable. Project and Code: Fun collaboration between Princeton Computer Science and Torc Robotics, with Julian Ost and Tanushree Banerjee leading this project.

3D Object Tracking without Training Data? In our nature Machine Intelligence paper ( we recast 3D tracking as an inverse neural rendering task where we fit a scene graph to an image that best explains this image. The method generalizes to completely unseen datasets and is explainable. Project and Code: Fun collaboration between Princeton Computer Science and Torc Robotics, with Julian Ost and Tanushree Banerjee leading this project.

Felix Heide

27,858 views • 11 months ago

Muslims are now specifically targeting California to run for office and take over the government “We want to make sure we activate California Muslims in so we can shape the rest of the country, because we can shape California. We need you to run for office” “Muslims. We will build a network of activists at every masjid, every mosque, to ensure that every eligible Muslim, every eligible person is registered to vote, that every elected official at every level engages with the Muslim community, that more American Muslims run for office — And we want to make sure that we activate California Muslims in so we can shape the rest of the country, because we can shape California. We need you to run for office, and I want to salute the dozens and dozens of American Muslims who were, who had the courage and the commitment and the resolve to run for office. Dozens of them won, others did not. They will win next time. What Allah promises is guaranteed 100%.” “How many phone calls have you made? How many protests have you tried to attend? Have you met or called your elected officials, your member of Congress? Have you gone to call the White House? Have you joined the efforts, the political efforts happening in every city to organize the Muslim votes as we deal with elections? Ask yourself, where am I in the equation? Because until there is a critical mass, Allah is withholding what he has promised us, because we haven't fulfilled our part of the deal” “Mahmoud Saifi in Redlands. Dr. Asif Mahmud, running for Congress. Fatima Eqbal Zubair in LA for Assembly, and many others. But as important as running is, it is equally important to build the power of the grassroots at the grassroots level. This year, we've established the Muslim Community Action Network, we know it as MCAN, which aims to train activists as community organizers who can inform and mobilize their local community, set up candidate forums, engage with local politicians, and advocate for local, state, and national issues of importance to our community.“

Muslims are now specifically targeting California to run for office and take over the government “We want to make sure we activate California Muslims in so we can shape the rest of the country, because we can shape California. We need you to run for office” “Muslims. We will build a network of activists at every masjid, every mosque, to ensure that every eligible Muslim, every eligible person is registered to vote, that every elected official at every level engages with the Muslim community, that more American Muslims run for office — And we want to make sure that we activate California Muslims in so we can shape the rest of the country, because we can shape California. We need you to run for office, and I want to salute the dozens and dozens of American Muslims who were, who had the courage and the commitment and the resolve to run for office. Dozens of them won, others did not. They will win next time. What Allah promises is guaranteed 100%.” “How many phone calls have you made? How many protests have you tried to attend? Have you met or called your elected officials, your member of Congress? Have you gone to call the White House? Have you joined the efforts, the political efforts happening in every city to organize the Muslim votes as we deal with elections? Ask yourself, where am I in the equation? Because until there is a critical mass, Allah is withholding what he has promised us, because we haven't fulfilled our part of the deal” “Mahmoud Saifi in Redlands. Dr. Asif Mahmud, running for Congress. Fatima Eqbal Zubair in LA for Assembly, and many others. But as important as running is, it is equally important to build the power of the grassroots at the grassroots level. This year, we've established the Muslim Community Action Network, we know it as MCAN, which aims to train activists as community organizers who can inform and mobilize their local community, set up candidate forums, engage with local politicians, and advocate for local, state, and national issues of importance to our community.“

Wall Street Apes

338,914 views • 8 months ago

#WATCH | India AI Impact Summit 2026 | Delhi: Founder Chairman and CEO of Sampark Foundation & former CEO of HCL Technologies, Vineet Nayar says, "...From an employment point of view I think it is very important for us to understand that Indian companies, including Indian IT companies, are going to be profit-driven and therefore if you believe that they are going to create employment you must be dreaming. Therefore, the question is how do we create employment in this environment, and that employment comes from mass scale startups, which is what this government has already doing. So, how do we create new sets of people who are trying to solve new sets of problems not new sets of technology and if we do that we will get it right. I think we as Indians have to be very careful on who does data belong to and that is the debate we have a problem with. The LLM models which exist worldwide are far superior than the Indian models. Unfortunately, in India, we never develop products, so therefore we do not have SLMs and LLMs which are world-class. On one side, we have global LLM products which are coming to India and trading on our Indian data. Should we allowed that or should we not allowed that? But on the other side if we don't allow that then we have the data but we don't have the LLM models. So, how do we encourage technology completely to develop the LLM models. This needs radicals strategic thinking and a very important aspect otherwise we will either give up a data. So, I think it's a very critical aspect for us to think about - who does this data belong, what is the kind of incentives we are going to give to develop LLM technologies or SLM technologies fast so that we train on our data otherwise an LLM will come in with our data and we'll immediately see return and we'll celebrate and we will do all these kind of press releases but the India will lose a competitive advantage on something which is very critical for the next decade."

#WATCH | India AI Impact Summit 2026 | Delhi: Founder Chairman and CEO of Sampark Foundation & former CEO of HCL Technologies, Vineet Nayar says, "...From an employment point of view I think it is very important for us to understand that Indian companies, including Indian IT companies, are going to be profit-driven and therefore if you believe that they are going to create employment you must be dreaming. Therefore, the question is how do we create employment in this environment, and that employment comes from mass scale startups, which is what this government has already doing. So, how do we create new sets of people who are trying to solve new sets of problems not new sets of technology and if we do that we will get it right. I think we as Indians have to be very careful on who does data belong to and that is the debate we have a problem with. The LLM models which exist worldwide are far superior than the Indian models. Unfortunately, in India, we never develop products, so therefore we do not have SLMs and LLMs which are world-class. On one side, we have global LLM products which are coming to India and trading on our Indian data. Should we allowed that or should we not allowed that? But on the other side if we don't allow that then we have the data but we don't have the LLM models. So, how do we encourage technology completely to develop the LLM models. This needs radicals strategic thinking and a very important aspect otherwise we will either give up a data. So, I think it's a very critical aspect for us to think about - who does this data belong, what is the kind of incentives we are going to give to develop LLM technologies or SLM technologies fast so that we train on our data otherwise an LLM will come in with our data and we'll immediately see return and we'll celebrate and we will do all these kind of press releases but the India will lose a competitive advantage on something which is very critical for the next decade."

ANI

18,753 views • 5 months ago

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

AK

249,708 views • 3 years ago

✨ Made a new mini feature on Photo AI: [ Grab from 3d model ] So the problem is we're at that stage in time (typical for AI) where image-to-3d models are not good enough but are fun to play with, but we know they'll be good enough in 1-2 years With [ Make 3d model ] you already can turn any Photo AI pic into a 3d model but it still looks hyper clunky and deformed, but it works! One cool idea I had to make that more useful and made now: Let people make a 3d model then change the view of the it with the 3d viewer, then press [ o ] and it grabs a frame of the 3d That image you can then [ Remix ] (img2img), and it becomes a real photo again and that in turn you can then turn into a video again with [ Make video ] So that essentially gives you a fully freeform camera position control to take photos with One thing I need to fix is the background/skybox, I kinda need to take the original photo and remove the person and just get the background for the 3d model viewer, in this case it should be white, but it's a start!

✨ Made a new mini feature on Photo AI: [ Grab from 3d model ] So the problem is we're at that stage in time (typical for AI) where image-to-3d models are not good enough but are fun to play with, but we know they'll be good enough in 1-2 years With [ Make 3d model ] you already can turn any Photo AI pic into a 3d model but it still looks hyper clunky and deformed, but it works! One cool idea I had to make that more useful and made now: Let people make a 3d model then change the view of the it with the 3d viewer, then press [ o ] and it grabs a frame of the 3d That image you can then [ Remix ] (img2img), and it becomes a real photo again and that in turn you can then turn into a video again with [ Make video ] So that essentially gives you a fully freeform camera position control to take photos with One thing I need to fix is the background/skybox, I kinda need to take the original photo and remove the person and just get the background for the 3d model viewer, in this case it should be white, but it's a start!

@levelsio

119,210 views • 1 year ago

3D editing is hard: you need to ground an image + instruction and generate a faithful 3D shape in one forward pass -- no test-time optimization. So, we steer pretrained image-to-3D representations to do text-guided 3D edits; no massive 3D edit-pair dataset needed. Key trap: the “no-edit” solution is a nasty local minimum. We fix it with preference optimization, pushing the model to actually edit. Steer3D is the second work that adapts alignment ideas from LLMs to the 3D modality. SAM 3D also used DPO to improve its 3D generations.

3D editing is hard: you need to ground an image + instruction and generate a faithful 3D shape in one forward pass -- no test-time optimization. So, we steer pretrained image-to-3D representations to do text-guided 3D edits; no massive 3D edit-pair dataset needed. Key trap: the “no-edit” solution is a nasty local minimum. We fix it with preference optimization, pushing the model to actually edit. Steer3D is the second work that adapts alignment ideas from LLMs to the 3D modality. SAM 3D also used DPO to improve its 3D generations.

Georgia Gkioxari

116,061 views • 7 months ago

Former Liberal Finance Minister John Manley on the eve of the Carney budget "We're rich in natural resources. We've been very poor at getting those out of the ground and into foreign hands where we want to sell them in order to get the returns on them." "That's the quickest way for us to earn the kind of money that we need to make is to get those resources to markets." "And it means regulatory reforms and it means that we've got to have the Prime Minister and a cabinet and provincial premiers that are out there saying we can do this, we are going to do this." "We're going to strip down those inter provincial trade barriers once and for all." "We're going to build the infrastructure necessary in our ports, on our highways, our transportation systems, not just pipelines, but all of the other things that we need to do in order to get our goods into foreign hands."

Former Liberal Finance Minister John Manley on the eve of the Carney budget "We're rich in natural resources. We've been very poor at getting those out of the ground and into foreign hands where we want to sell them in order to get the returns on them." "That's the quickest way for us to earn the kind of money that we need to make is to get those resources to markets." "And it means regulatory reforms and it means that we've got to have the Prime Minister and a cabinet and provincial premiers that are out there saying we can do this, we are going to do this." "We're going to strip down those inter provincial trade barriers once and for all." "We're going to build the infrastructure necessary in our ports, on our highways, our transportation systems, not just pipelines, but all of the other things that we need to do in order to get our goods into foreign hands."

cbcwatcher

82,593 views • 8 months ago

If we want to resolve our current income, wealth, and productivity gap, we need to get back to the basics. We need good education, so our children are productive and can work well together. We need a civil society with rule of law, and good capital markets that produce good income. And we need to double down on the fundamentals — earning more than we spend, and having more assets than liabilities. Do all of that, and we’ll be in great shape.

If we want to resolve our current income, wealth, and productivity gap, we need to get back to the basics. We need good education, so our children are productive and can work well together. We need a civil society with rule of law, and good capital markets that produce good income. And we need to double down on the fundamentals — earning more than we spend, and having more assets than liabilities. Do all of that, and we’ll be in great shape.

Ray Dalio

92,359 views • 11 months ago

"One of the very confusing things about the models right now: how to reconcile the fact that they are doing so well on evals. And you look at the evals and you go, 'Those are pretty hard evals.' But the economic impact seems to be dramatically behind. There is [a possible] explanation. Back when people were doing pre-training, the question of what data to train on was answered, because that answer was everything. So you don't have to think if it's going to be this data or that data. When people do RL training, they say, 'Okay, we want to have this kind of RL training for this thing and that kind of RL training for that thing.' You say, 'Hey, I would love our model to do really well when we release it. I want the evals to look great. What would be RL training that could help on this task?' If you combine this with generalization of the models actually being inadequate, that has the potential to explain a lot of what we are seeing, this disconnect between eval performance and actual real-world performance"

"One of the very confusing things about the models right now: how to reconcile the fact that they are doing so well on evals. And you look at the evals and you go, 'Those are pretty hard evals.' But the economic impact seems to be dramatically behind. There is [a possible] explanation. Back when people were doing pre-training, the question of what data to train on was answered, because that answer was everything. So you don't have to think if it's going to be this data or that data. When people do RL training, they say, 'Okay, we want to have this kind of RL training for this thing and that kind of RL training for that thing.' You say, 'Hey, I would love our model to do really well when we release it. I want the evals to look great. What would be RL training that could help on this task?' If you combine this with generalization of the models actually being inadequate, that has the potential to explain a lot of what we are seeing, this disconnect between eval performance and actual real-world performance"

Dwarkesh Patel

502,249 views • 8 months ago