正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Infinite Photorealistic Worlds using Procedural Generation paper page: introduce Infinigen, a procedural generator of photorealistic 3D scenes of the natural world. Infinigen is entirely procedural: every asset, from shape to texture, is generated from scratch via randomized mathematical rules, using no external source and allowing infinite variation and composition.... Infinigen offers broad coverage of objects and scenes in the natural world including plants, animals, terrains, and natural phenomena such as fire, cloud, rain, and snow. Infinigen can be used to generate unlimited, diverse training data for a wide range of computer vision tasks including object detection, semantic segmentation, optical flow, and 3D reconstruction. We expect Infinigen to be a useful resource for computer vision research and beyond.show more

AK

505,955 subscribers

275,321 次观看 • 3 年前 •via X (Twitter)

教育艺术科学技术

Anya Rossi• Live Now

Private livecam show

10 条评论

AK 的头像

AK3 年前

github:

bellicose_bestie 的头像

bellicose_bestie3 年前

Are research labs just publishing procedural algorithm work as "ai" on hugging face (aside from just arxiv) just to get more exposure?

Xenofy🛸👽 的头像

Xenofy🛸👽3 年前

@JCorvinusVR

burgesst 🗿🪣 的头像

burgesst 🗿🪣3 年前

How the hell do you define a chameleon procedurally?

Sir Mr Meow Meow 的头像

Sir Mr Meow Meow3 年前

wow getting pretty amazing tbh

baloblack 的头像

baloblack3 年前

👍🏾

Stalin Kay 的头像

Stalin Kay3 年前

@readwise save thread

0ptim 的头像

0ptim3 年前

Hey Sean, this might be of interest to you. @NoMansSky

Ivan Parfenchuk 的头像

Ivan Parfenchuk3 年前

Infinigen Twitch stream soon?

John R. Lawson 🌦 的头像

John R. Lawson 🌦3 年前

We're in the Computational Universe, after all.

相关视频

Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video Contribution quote from the paper: In summary, our main contributions are • a comprehensive pipeline for reconstructing the shape, appearance, and behavior of real-world garments using Gaussian splatting, • an algorithm for registering garment meshes to multi- view videos with an optimization procedure based on Gaussian splatting, and • a Gaussian Garment representation that combines triangle meshes with Gaussian textures to capture photorealistic appearance and can be used as a fully controllable 3D asset.

Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video Contribution quote from the paper: In summary, our main contributions are • a comprehensive pipeline for reconstructing the shape, appearance, and behavior of real-world garments using Gaussian splatting, • an algorithm for registering garment meshes to multi- view videos with an optimization procedure based on Gaussian splatting, and • a Gaussian Garment representation that combines triangle meshes with Gaussian textures to capture photorealistic appearance and can be used as a fully controllable 3D asset.

MrNeRF

27,277 次观看 • 1 年前

This might be the craziest workflow you'll see this week. From #VR greyboxing to 3D worlds with the help of #AI. The 2 tools I used are ShapesXR and Marble from World Labs. Let me tell you why and how. - ShapesXR is the most powerful and fastest way to assemble 3D scenes at scale. - The library of assets + ability to sketch from procedural primitives gives the perfect balance between speed and freedom. - ShapesXR allows you to export your creation as a glTF that can be used as a "3D prompt" into Marble - Marble now allows you to create a panorama using a 3D scene as input increasing DRAMATICALLY the control you have during the creation process - Through the workflow you can adjust textures and objects in the scene using AI inpainting - There are tons of options on how to use the final output (create a #GaussianSplatting, export collision mesh and even a detailed mesh 🤯) This is the perfect examples of how VR creative tools improve and accelerate the design process... so make sure to follow me if that's your jam 😉.

This might be the craziest workflow you'll see this week. From #VR greyboxing to 3D worlds with the help of #AI. The 2 tools I used are ShapesXR and Marble from World Labs. Let me tell you why and how. - ShapesXR is the most powerful and fastest way to assemble 3D scenes at scale. - The library of assets + ability to sketch from procedural primitives gives the perfect balance between speed and freedom. - ShapesXR allows you to export your creation as a glTF that can be used as a "3D prompt" into Marble - Marble now allows you to create a panorama using a 3D scene as input increasing DRAMATICALLY the control you have during the creation process - Through the workflow you can adjust textures and objects in the scene using AI inpainting - There are tons of options on how to use the final output (create a #GaussianSplatting, export collision mesh and even a detailed mesh 🤯) This is the perfect examples of how VR creative tools improve and accelerate the design process... so make sure to follow me if that's your jam 😉.

Gabriele Romagnoli

20,126 次观看 • 7 个月前

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors paper page: present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images.

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors paper page: present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images.

AK

305,643 次观看 • 3 年前

New open-source 3D world-generation model. I'm rendering a couple of worlds in the video, so check it out. You'll find the GitHub and the Hugging Face links to the model below. This is a multi-modal world model that you can use for a bunch of things: • To generate new worlds • To reconstruct worlds • To simulate 3D interactive worlds from a prompt, images, or a video You can edit the 3D outputs in Unity and Unreal Engine (they export as meshes, 3DGS files, and point clouds). You can also generate 3D characters in the world and walk around. Pretty fun stuff!

New open-source 3D world-generation model. I'm rendering a couple of worlds in the video, so check it out. You'll find the GitHub and the Hugging Face links to the model below. This is a multi-modal world model that you can use for a bunch of things: • To generate new worlds • To reconstruct worlds • To simulate 3D interactive worlds from a prompt, images, or a video You can edit the 3D outputs in Unity and Unreal Engine (they export as meshes, 3DGS files, and point clouds). You can also generate 3D characters in the world and walk around. Pretty fun stuff!

Santiago

65,446 次观看 • 2 个月前

Tanguy Talbert is creating an entirely procedural environment with no manual placement using Unreal Engine's Procedural Content Generation (PCG). Here is one of its tools – a green bridge generator: #UnrealEngine5 #unrealengine #3D #3dart #digitalart #art

Tanguy Talbert is creating an entirely procedural environment with no manual placement using Unreal Engine's Procedural Content Generation (PCG). Here is one of its tools – a green bridge generator: #UnrealEngine5 #unrealengine #3D #3dart #digitalart #art

80 LEVEL

35,216 次观看 • 1 年前

Tracking Anything with Decoupled Video Segmentation paper page: Training data for video segmentation are expensive to annotate. This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary settings. To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. Due to this design, we only need an image-level model for the target task (which is cheaper to train) and a universal temporal propagation model which is trained once and generalizes across tasks. To effectively combine these two modules, we use bi-directional propagation for (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation. We show that this decoupled formulation compares favorably to end-to-end approaches in several data-scarce tasks including large-vocabulary video panoptic segmentation, open-world video segmentation, referring video segmentation, and unsupervised video object segmentation.

Tracking Anything with Decoupled Video Segmentation paper page: Training data for video segmentation are expensive to annotate. This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary settings. To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. Due to this design, we only need an image-level model for the target task (which is cheaper to train) and a universal temporal propagation model which is trained once and generalizes across tasks. To effectively combine these two modules, we use bi-directional propagation for (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation. We show that this decoupled formulation compares favorably to end-to-end approaches in several data-scarce tasks including large-vocabulary video panoptic segmentation, open-world video segmentation, referring video segmentation, and unsupervised video object segmentation.

AK

305,560 次观看 • 2 年前

Real-world robot data is expensive and slow to collect, creating a major challenge for humanoid development. 🤖 The NVIDIA GR00T N1.6 open vision language action model is pre-trained on a diverse mix of data, including thousands of hours of Stanford Vision and Learning Lab’s BEHAVIOR simulation data, which covers long-horizon everyday manipulation tasks. This diverse training is the key to robust cross-embodiment performance and real-world adaptability. 🌍 Read the blog 🔗

Real-world robot data is expensive and slow to collect, creating a major challenge for humanoid development. 🤖 The NVIDIA GR00T N1.6 open vision language action model is pre-trained on a diverse mix of data, including thousands of hours of Stanford Vision and Learning Lab’s BEHAVIOR simulation data, which covers long-horizon everyday manipulation tasks. This diverse training is the key to robust cross-embodiment performance and real-world adaptability. 🌍 Read the blog 🔗

NVIDIA Robotics

13,429 次观看 • 5 个月前

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

AK

249,572 次观看 • 2 年前

ViPE: Video Pose Engine for 3D Geometric Perception Contributions: • A robust and efficient framework, ViPE, for estimating camera parameters and dense depth from diverse, in-the-wild videos. • A system design that integrates the strengths of classical SLAM (efficiency, scalability) and learned models (robustness), with key improvements in efficiency, dynamic object handling, and depth quality over prior work. • A large-scale dataset of annotated videos, created using ViPE, to facilitate future research in 3D computer vision.

ViPE: Video Pose Engine for 3D Geometric Perception Contributions: • A robust and efficient framework, ViPE, for estimating camera parameters and dense depth from diverse, in-the-wild videos. • A system design that integrates the strengths of classical SLAM (efficiency, scalability) and learned models (robustness), with key improvements in efficiency, dynamic object handling, and depth quality over prior work. • A large-scale dataset of annotated videos, created using ViPE, to facilitate future research in 3D computer vision.

MrNeRF

42,553 次观看 • 10 个月前

A pattern generator I made in Goo Engine awhile ago. Can generate an infinite amount of procedural tiling patterns! #gooengine #b3d #bnpr Blender 3D

A pattern generator I made in Goo Engine awhile ago. Can generate an infinite amount of procedural tiling patterns! #gooengine #b3d #bnpr Blender 3D

ruki

23,659 次观看 • 3 年前

NEW: Touchable Objects In ANHolographic Display. — In a new study on the HAL open archive, scientists explored how three-dimensional holograms could be grabbed and poked using elastic materials as a key component of volumetric displays. This innovation means 3D graphics can be interacted with — for example, grasping and moving a virtual cube with your hand — without damaging a holographic system. The research has not yet been peer-reviewed, although the scientists demonstrated their findings in a video showcasing the technology. "We are used to direct interaction with our phones, where we tap a button or drag a document directly with our finger on the screen — it is natural and intuitive for humans. This project enables us to use this natural interaction with 3D graphics to leverage our innate abilities of 3D vision and manipulation,” study lead author Asier Marzo, a professor of computer science at the Public University of Navarra, said in a statement. Paper:

NEW: Touchable Objects In ANHolographic Display. — In a new study on the HAL open archive, scientists explored how three-dimensional holograms could be grabbed and poked using elastic materials as a key component of volumetric displays. This innovation means 3D graphics can be interacted with — for example, grasping and moving a virtual cube with your hand — without damaging a holographic system. The research has not yet been peer-reviewed, although the scientists demonstrated their findings in a video showcasing the technology. "We are used to direct interaction with our phones, where we tap a button or drag a document directly with our finger on the screen — it is natural and intuitive for humans. This project enables us to use this natural interaction with 3D graphics to leverage our innate abilities of 3D vision and manipulation,” study lead author Asier Marzo, a professor of computer science at the Public University of Navarra, said in a statement. Paper:

Brian Roemmele

17,306 次观看 • 1 年前

Cancer is a shape shifting disease. We have figured a way to outsmart it through the power of IL-15 and natural killer cells. By 2026 we will grow trillions of natural killer cells from healthy donors and generate the 'world bank of natural killer cells' which can be administered to any patient without donor matching. That is how natural killer cells work. Special Report Bret Baier

Cancer is a shape shifting disease. We have figured a way to outsmart it through the power of IL-15 and natural killer cells. By 2026 we will grow trillions of natural killer cells from healthy donors and generate the 'world bank of natural killer cells' which can be administered to any patient without donor matching. That is how natural killer cells work. Special Report Bret Baier

Dr. Pat Soon-Shiong

349,049 次观看 • 4 个月前

✨We are excited to open-source Tencent HY-Motion 1.0, a billion-parameter text-to-motion model built on the Diffusion Transformer (DiT) architecture and flow matching. Tencent HY-Motion 1.0 empowers developers and individual creators alike by transforming natural language into high-fidelity, fluid, and diverse 3D character animations, delivering exceptional instruction-following capabilities across a broad range of categories. The generated 3D animation assets can be seamlessly integrated into typical 3D animation pipelines.🎮🎥 Highlights: 🔹Billion-Scale DiT: Successfully scaled flow-matching DiT to 1B+ parameters, setting a new ceiling for instruction-following capability and generated motion quality. 🔹Full-Stage Training Strategy: The industry’s first motion generation model featuring a complete Pre-training → SFT → RL loop to optimize physical plausibility and semantic accuracy. 🔹Comprehensive Category Coverage: Features 200+ motion categories across 6 major classes—the most comprehensive in the industry, curated via a meticulous data pipeline. 🌐Project Page: 🔗Github: 🤗Hugging Face: 📄Technical report:

✨We are excited to open-source Tencent HY-Motion 1.0, a billion-parameter text-to-motion model built on the Diffusion Transformer (DiT) architecture and flow matching. Tencent HY-Motion 1.0 empowers developers and individual creators alike by transforming natural language into high-fidelity, fluid, and diverse 3D character animations, delivering exceptional instruction-following capabilities across a broad range of categories. The generated 3D animation assets can be seamlessly integrated into typical 3D animation pipelines.🎮🎥 Highlights: 🔹Billion-Scale DiT: Successfully scaled flow-matching DiT to 1B+ parameters, setting a new ceiling for instruction-following capability and generated motion quality. 🔹Full-Stage Training Strategy: The industry’s first motion generation model featuring a complete Pre-training → SFT → RL loop to optimize physical plausibility and semantic accuracy. 🔹Comprehensive Category Coverage: Features 200+ motion categories across 6 major classes—the most comprehensive in the industry, curated via a meticulous data pipeline. 🌐Project Page: 🔗Github: 🤗Hugging Face: 📄Technical report:

Tencent Hy

328,171 次观看 • 6 个月前

Introducing HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction. We built a multi-camera system and a semi-automatic method for annotating the shape and pose of hands and objects Project page:

Introducing HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction. We built a multi-camera system and a semi-automatic method for annotating the shape and pose of hands and objects Project page:

Yu Xiang

57,228 次观看 • 1 年前

EgoLifter Open-world 3D Segmentation for Egocentric Perception In this paper we present EgoLifter, a novel system that can automatically segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects. The system is specifically

EgoLifter Open-world 3D Segmentation for Egocentric Perception In this paper we present EgoLifter, a novel system that can automatically segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects. The system is specifically

AK

41,117 次观看 • 2 年前

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields paper page: Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.

AK

62,768 次观看 • 3 年前

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking paper page: introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks.

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking paper page: introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks.

AK

122,533 次观看 • 2 年前

I’ve created procedural generators of ICs and connectors using geometry nodes. Due to the large number of different packages of ICs and connectors, it might be better to generate their models as and when needed rather than storing an unlimited number of unique 3D models. Also using the new hobby mat material from Chipp Walters #b3d #geometrynodes #blender3d #electronics #engineering #science #kicad #circuits #pcbdesign #manufacturing #technology #3dmodeling

I’ve created procedural generators of ICs and connectors using geometry nodes. Due to the large number of different packages of ICs and connectors, it might be better to generate their models as and when needed rather than storing an unlimited number of unique 3D models. Also using the new hobby mat material from Chipp Walters #b3d #geometrynodes #blender3d #electronics #engineering #science #kicad #circuits #pcbdesign #manufacturing #technology #3dmodeling

Sam M

50,726 次观看 • 1 年前

Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives paper page: Given a set of calibrated images of a scene, we present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives. While many approaches focus on recovering high-fidelity 3D scenes, we focus on parsing a scene into mid-level 3D representations made of a small set of textured primitives. Such representations are interpretable, easy to manipulate and suited for physics-based simulations. Moreover, unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images through differentiable rendering. Specifically, we model primitives as textured superquadric meshes and optimize their parameters from scratch with an image rendering loss. We highlight the importance of modeling transparency for each primitive, which is critical for optimization and also enables handling varying numbers of primitives. We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points, while providing amodal shape completions of unseen object regions. We compare our approach to the state of the art on diverse scenes from DTU, and demonstrate its robustness on real-life captures from BlendedMVS and Nerfstudio. We also showcase how our results can be used to effortlessly edit a scene or perform physical simulations.

Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives paper page: Given a set of calibrated images of a scene, we present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives. While many approaches focus on recovering high-fidelity 3D scenes, we focus on parsing a scene into mid-level 3D representations made of a small set of textured primitives. Such representations are interpretable, easy to manipulate and suited for physics-based simulations. Moreover, unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images through differentiable rendering. Specifically, we model primitives as textured superquadric meshes and optimize their parameters from scratch with an image rendering loss. We highlight the importance of modeling transparency for each primitive, which is critical for optimization and also enables handling varying numbers of primitives. We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points, while providing amodal shape completions of unseen object regions. We compare our approach to the state of the art on diverse scenes from DTU, and demonstrate its robustness on real-life captures from BlendedMVS and Nerfstudio. We also showcase how our results can be used to effortlessly edit a scene or perform physical simulations.

AK

38,571 次观看 • 3 年前

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior paper page: present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D reference image to guide the stages of geometry sculpting and texture boosting. A central focus of this work is to address the consistency issue that existing works encounter. To sculpt geometries that render coherently, we perform score distillation sampling via a view-dependent diffusion model. This 3D prior, alongside several training strategies, prioritizes the geometry consistency but compromises the texture fidelity. We further propose Bootstrapped Score Distillation to specifically boost the texture. We train a personalized diffusion model, Dreambooth, on the augmented renderings of the scene, imbuing it with 3D knowledge of the scene being optimized. The score distillation from this 3D-aware diffusion prior provides view-consistent guidance for the scene. Notably, through an alternating optimization of the diffusion prior and 3D scene representation, we achieve mutually reinforcing improvements: the optimized 3D scene aids in training the scene-specific diffusion model, which offers increasingly view-consistent guidance for 3D optimization. The optimization is thus bootstrapped and leads to substantial texture boosting. With tailored 3D priors throughout the hierarchical generation, DreamCraft3D generates coherent 3D objects with photorealistic renderings, advancing the state-of-the-art in 3D content generation.

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior paper page: present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D reference image to guide the stages of geometry sculpting and texture boosting. A central focus of this work is to address the consistency issue that existing works encounter. To sculpt geometries that render coherently, we perform score distillation sampling via a view-dependent diffusion model. This 3D prior, alongside several training strategies, prioritizes the geometry consistency but compromises the texture fidelity. We further propose Bootstrapped Score Distillation to specifically boost the texture. We train a personalized diffusion model, Dreambooth, on the augmented renderings of the scene, imbuing it with 3D knowledge of the scene being optimized. The score distillation from this 3D-aware diffusion prior provides view-consistent guidance for the scene. Notably, through an alternating optimization of the diffusion prior and 3D scene representation, we achieve mutually reinforcing improvements: the optimized 3D scene aids in training the scene-specific diffusion model, which offers increasingly view-consistent guidance for 3D optimization. The optimization is thus bootstrapped and leads to substantial texture boosting. With tailored 3D priors throughout the hierarchical generation, DreamCraft3D generates coherent 3D objects with photorealistic renderings, advancing the state-of-the-art in 3D content generation.

AK

161,530 次观看 • 2 年前