Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Introducing WorldView 3D, a first-of-its-kind satellite tasking product that delivers up-to-date 3D ground truth anywhere on Earth. From GPS-denied autonomy and intelligence analysis to consumer-grade mapping and disaster response, the world's most important missions increasingly depend on accurate, up-to-date 3D terrain. But for years, keeping 3D maps current at... global scale hasn't been possible. WorldView 3D changes that. Built on Vantor’s trusted spatial foundation, our new tasking capability empowers customers to task and receive updated 3D data within 24 hours of image collection. For missions that require greater fidelity, they can also produce high-definition 3D maps anywhere in the world. 3D is the future of spatial intelligence, and Vantor is leading the change. Our trusted spatial foundation already includes 100 million sq km of the world mapped inhighly accurate 3D. Now customers can update the locations that matter most with the speed, scale, and fidelity required for critical missions. Read the full announcement:show more

Vantor

139,247 subscribers

15,434 просмотров • 4 дней назад •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

Physical AI for defense and intelligence starts with an accurate model of the world – high-fidelity, geometrically grounded, built for the environments where it actually matters. Today, Niantic Spatial 🌎 is awardable on the Department of War 🇺🇸’s Tradewinds Solutions Marketplace for 3D Reconstruction, our second capability to clear that bar after Visual Positioning, announced earlier this month. Together they deliver two critical layers of the spatial stack: high-fidelity 3D models of contested environments, and precise location within them; a foundation for mission-critical operations that need to visualize and navigate the physical world. The capability is ready. The procurement path is open.

Physical AI for defense and intelligence starts with an accurate model of the world – high-fidelity, geometrically grounded, built for the environments where it actually matters. Today, Niantic Spatial 🌎 is awardable on the Department of War 🇺🇸’s Tradewinds Solutions Marketplace for 3D Reconstruction, our second capability to clear that bar after Visual Positioning, announced earlier this month. Together they deliver two critical layers of the spatial stack: high-fidelity 3D models of contested environments, and precise location within them; a foundation for mission-critical operations that need to visualize and navigate the physical world. The capability is ready. The procurement path is open.

asim ᯅ

14,166 просмотров • 1 месяц назад

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

3D-LLM: Injecting the 3D World into Large Language Models paper page: Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs.

AK

249,572 просмотров • 2 лет назад

The first global, city-scale 3D map is taking shape, and it’s machine-readable. City-scale environments can now be reconstructed in high-fidelity, reaching a level of detail that is becoming almost indistinguishable from reality. This leap is made possible by large-scale mapping using just an Insta360 X5 and Over the Reality technology. So far, our dataset includes: - 220,000+ 3D mapped locations - 97M images - 1,000TB of spatial data Growing by more than 10,000+ newly mapped locations every week. Enabling Visual AI, Robotics, VPS, XR, and Digital twins.

The first global, city-scale 3D map is taking shape, and it’s machine-readable. City-scale environments can now be reconstructed in high-fidelity, reaching a level of detail that is becoming almost indistinguishable from reality. This leap is made possible by large-scale mapping using just an Insta360 X5 and Over the Reality technology. So far, our dataset includes: - 220,000+ 3D mapped locations - 97M images - 1,000TB of spatial data Growing by more than 10,000+ newly mapped locations every week. Enabling Visual AI, Robotics, VPS, XR, and Digital twins.

Over the Reality 🌐

69,605 просмотров • 2 месяцев назад

for physical AI to work in the real world, it needs high-fidelity 3D models of the real world. not just for visualization — for localization, spatial reasoning, and real-to-sim workflows that close the gap between training and deployment. at Niantic Spatial 🌎, we're turning Spexi drone imagery into exactly that — city-scale Gaussian splats grounded in the environments that matter most.

for physical AI to work in the real world, it needs high-fidelity 3D models of the real world. not just for visualization — for localization, spatial reasoning, and real-to-sim workflows that close the gap between training and deployment. at Niantic Spatial 🌎, we're turning Spexi drone imagery into exactly that — city-scale Gaussian splats grounded in the environments that matter most.

asim ᯅ

49,239 просмотров • 29 дней назад

From 3d mesh optimization for games to accurate CAD to 3D conversions for enterprise and workflows that turbo-charge every 3D asset pipeline: InstaLOD is everything you need for the production and automatic optimization of 3D content.

From 3d mesh optimization for games to accurate CAD to 3D conversions for enterprise and workflows that turbo-charge every 3D asset pipeline: InstaLOD is everything you need for the production and automatic optimization of 3D content.

InstaLOD

17,283,631 просмотров • 3 лет назад

We are thrilled to share that our first paper from my new lab, Spateo ( for spatiotemporal modeling of molecular holograms, is now online in Cell: Spateo is a comprehensive analytical framework for 3D whole-embryo spatiotemporal modeling. Its advanced features include: • 3D alignment and reconstruction at the whole-mouse-embryo scale (see the animation). • 3D spatial domain digitization and cell-cell communication analysis to understand spatial gene expression gradients and both inter- and intracellular communication. • 3D morphometric and volumetric analyses along with 3D morphogenesis vector field modeling to quantify dynamics such as surface area, volume, and cell density across organs, and to dissect the interplay between morphogenesis factors and cell migration. • A “Google Earth”-like browser, Spateo-viewer ( and for interactive and intuitive exploration of 3D spatial data. • Additional features, such as RNA signal-based single-cell segmentation. We are also honored that Nature “News and Views” has highlighted this work as well: This is really an amazing outcome after two years' heroic revision process that rewrite the entire paper using a new data ( for whole mouse embryos.

We are thrilled to share that our first paper from my new lab, Spateo ( for spatiotemporal modeling of molecular holograms, is now online in Cell: Spateo is a comprehensive analytical framework for 3D whole-embryo spatiotemporal modeling. Its advanced features include: • 3D alignment and reconstruction at the whole-mouse-embryo scale (see the animation). • 3D spatial domain digitization and cell-cell communication analysis to understand spatial gene expression gradients and both inter- and intracellular communication. • 3D morphometric and volumetric analyses along with 3D morphogenesis vector field modeling to quantify dynamics such as surface area, volume, and cell density across organs, and to dissect the interplay between morphogenesis factors and cell migration. • A “Google Earth”-like browser, Spateo-viewer ( and for interactive and intuitive exploration of 3D spatial data. • Additional features, such as RNA signal-based single-cell segmentation. We are also honored that Nature “News and Views” has highlighted this work as well: This is really an amazing outcome after two years' heroic revision process that rewrite the entire paper using a new data ( for whole mouse embryos.

evo-devo

58,302 просмотров • 1 год назад

Large-scale Gaussian splats have reached a new level of realism. This is a well-known temple in Bangkok, reconstructed as a high-fidelity 3D environment from 360 captures. At this level, the boundary between video and 3D starts to disappear. But what you’re looking at is not a video. It’s a dense spatial representation of a real place, where geometry, texture, and structure are preserved and made machine-readable. This kind of 3D data can power Visual AI, Robotics navigation, VPS localization, XR experiences, world models, and next-generation spatial computing systems. Built with Over the Reality.

Large-scale Gaussian splats have reached a new level of realism. This is a well-known temple in Bangkok, reconstructed as a high-fidelity 3D environment from 360 captures. At this level, the boundary between video and 3D starts to disappear. But what you’re looking at is not a video. It’s a dense spatial representation of a real place, where geometry, texture, and structure are preserved and made machine-readable. This kind of 3D data can power Visual AI, Robotics navigation, VPS localization, XR experiences, world models, and next-generation spatial computing systems. Built with Over the Reality.

Over the Reality 🌐

347,044 просмотров • 1 месяц назад

📣 New research from GenAI at Meta, introducing Meta 3D Gen: A new system for end-to-end generation of 3D assets from text in <1min. Meta 3D Gen is a new combined AI system that can generate high-quality 3D assets, with both high-resolution textures and material maps end-to-end, producing results that are superior to existing solutions — at 3-10x the speed of existing work in this space. Details in the technical report ➡️

📣 New research from GenAI at Meta, introducing Meta 3D Gen: A new system for end-to-end generation of 3D assets from text in <1min. Meta 3D Gen is a new combined AI system that can generate high-quality 3D assets, with both high-resolution textures and material maps end-to-end, producing results that are superior to existing solutions — at 3-10x the speed of existing work in this space. Details in the technical report ➡️

AI at Meta

408,750 просмотров • 2 лет назад

🚨 Hunyuan 3D 3.1 Pro and Rapid is here on fal! 🎯 Pro: High-fidelity Image-to-3D and Text-to-3D generation ⚡ Rapid: Speed-optimized 3D generation ✨ Smart Topology and Part generation for advanced 3D workflows

🚨 Hunyuan 3D 3.1 Pro and Rapid is here on fal! 🎯 Pro: High-fidelity Image-to-3D and Text-to-3D generation ⚡ Rapid: Speed-optimized 3D generation ✨ Smart Topology and Part generation for advanced 3D workflows

fal

26,138 просмотров • 5 месяцев назад

🚀 Announcing Echo — our new frontier model for 3D world generation. Echo turns a simple text prompt or image into a fully explorable, 3D-consistent world. Instead of disconnected views, the result is a single, coherent spatial representation you can move through freely. This is part of a bigger shift in AI: from generating pixels and tokens to generating spaces. Echo predicts a geometry-grounded 3D scene at metric scale, meaning every novel view, depth map, and interaction comes from the same underlying world — not independent hallucinations. Once generated, the world is interactive in real time. You control the camera, explore from any angle, and render instantly — even on low-end hardware, directly in the browser. High-quality 3D world exploration is no longer gated by expensive equipment. Under the hood, Echo infers a physically grounded 3D representation and converts it into a renderable format. For our web demo, we use 3D Gaussian Splatting (3DGS) for fast, GPU-friendly rendering — but the representation itself is flexible and can be easily adapted. Why this matters: consistent 3D worlds unlock real workflows — digital twins, 3D design, game environments, robotics simulation, and more. From a single photo or a line of text, Echo builds worlds that are reliable, editable, and spatially faithful. Echo also enables scene editing and restyling. Change materials, remove or add objects, explore design variations — all while preserving global 3D consistency. Editing no longer breaks the world. This is only the beginning. Echo is the foundation for future world models with dynamics, physical reasoning, and richer interaction — environments that don’t just look right, but behave right. Explore the generated worlds on our website and sign up for the closed beta. The era of spatial intelligence starts here. 🌍 #Echo #WorldModels #SpatialAI #3DFoundationModels Check it out:

🚀 Announcing Echo — our new frontier model for 3D world generation. Echo turns a simple text prompt or image into a fully explorable, 3D-consistent world. Instead of disconnected views, the result is a single, coherent spatial representation you can move through freely. This is part of a bigger shift in AI: from generating pixels and tokens to generating spaces. Echo predicts a geometry-grounded 3D scene at metric scale, meaning every novel view, depth map, and interaction comes from the same underlying world — not independent hallucinations. Once generated, the world is interactive in real time. You control the camera, explore from any angle, and render instantly — even on low-end hardware, directly in the browser. High-quality 3D world exploration is no longer gated by expensive equipment. Under the hood, Echo infers a physically grounded 3D representation and converts it into a renderable format. For our web demo, we use 3D Gaussian Splatting (3DGS) for fast, GPU-friendly rendering — but the representation itself is flexible and can be easily adapted. Why this matters: consistent 3D worlds unlock real workflows — digital twins, 3D design, game environments, robotics simulation, and more. From a single photo or a line of text, Echo builds worlds that are reliable, editable, and spatially faithful. Echo also enables scene editing and restyling. Change materials, remove or add objects, explore design variations — all while preserving global 3D consistency. Editing no longer breaks the world. This is only the beginning. Echo is the foundation for future world models with dynamics, physical reasoning, and richer interaction — environments that don’t just look right, but behave right. Explore the generated worlds on our website and sign up for the closed beta. The era of spatial intelligence starts here. 🌍 #Echo #WorldModels #SpatialAI #3DFoundationModels Check it out:

SpAItial AI

175,909 просмотров • 6 месяцев назад

Apple Maps 3d is insanely impressive now (see video below) Google on the other hand decided to build an insanely clunky solution to display 3d maps to avoid using mobile GPUs: A Google server streams (!) a live video of its 3d rendered map to you, the result is a disastrous: a super laggy and clunky 3d view full of video compression artefacts with a response time of 2-3 seconds for every movement And I know WHY they built that, someone felt it was not inclusive to only have 3d maps that are high performing on iOS because Apple's GPUs are insanely fast and can deal with it (as you can see below in Apple Maps) but they wanted to support inferior phones, I get it But then they also decided to worsen the experience for iOS users who could easily run 3d natively on its GPU It's just an insane product decision, there's no way to cut it: they could have super smooth 3d like Apple Maps but neutered it with a clunky live stream Even more insane because Google does have native 3d on iOS in a completely different app that nobody uses: Google Earth Is 3d important? Not rly for nav no but it shows again something about Google's engineering decisions, like the redesign of the Google OAuth Login screen that made zero sense Nobody is speaking up against bad ideas in Google meetings, everyone's too scared of HR maybe, it's the opposite of a meritocracy and it's just super visible from using Google products right now

Apple Maps 3d is insanely impressive now (see video below) Google on the other hand decided to build an insanely clunky solution to display 3d maps to avoid using mobile GPUs: A Google server streams (!) a live video of its 3d rendered map to you, the result is a disastrous: a super laggy and clunky 3d view full of video compression artefacts with a response time of 2-3 seconds for every movement And I know WHY they built that, someone felt it was not inclusive to only have 3d maps that are high performing on iOS because Apple's GPUs are insanely fast and can deal with it (as you can see below in Apple Maps) but they wanted to support inferior phones, I get it But then they also decided to worsen the experience for iOS users who could easily run 3d natively on its GPU It's just an insane product decision, there's no way to cut it: they could have super smooth 3d like Apple Maps but neutered it with a clunky live stream Even more insane because Google does have native 3d on iOS in a completely different app that nobody uses: Google Earth Is 3d important? Not rly for nav no but it shows again something about Google's engineering decisions, like the redesign of the Google OAuth Login screen that made zero sense Nobody is speaking up against bad ideas in Google meetings, everyone's too scared of HR maybe, it's the opposite of a meritocracy and it's just super visible from using Google products right now

@levelsio

1,488,824 просмотров • 1 год назад

Marble is the first product from World Labs and is powered by our multimodal world model, which lets anyone create high-fidelity, persistent 3D worlds from just a single image, video, text prompt, or 3D layout. Read more at

Marble is the first product from World Labs and is powered by our multimodal world model, which lets anyone create high-fidelity, persistent 3D worlds from just a single image, video, text prompt, or 3D layout. Read more at

World Labs

117,146 просмотров • 7 месяцев назад

#ICCV2025 🤩3D world generation is cool, but it is cooler to play with the worlds using 3D actions 👆💨, and see what happens! — Introducing *WonderPlay*: Now you can create dynamic 3D scenes that respond to your 3D actions from a single image! Web: 🧵1/7

#ICCV2025 🤩3D world generation is cool, but it is cooler to play with the worlds using 3D actions 👆💨, and see what happens! — Introducing WonderPlay: Now you can create dynamic 3D scenes that respond to your 3D actions from a single image! Web: 🧵1/7

Hong-Xing (Koven) Yu

57,590 просмотров • 1 год назад

Introducing SAM 3D, the newest addition to the SAM collection, bringing common sense 3D understanding of everyday images. SAM 3D includes two models: 🛋️ SAM 3D Objects for object and scene reconstruction 🧑‍🤝‍🧑 SAM 3D Body for human pose and shape estimation Both models achieve state-of-the-art performance transforming static 2D images into vivid, accurate reconstructions. 🔗 Learn more:

Introducing SAM 3D, the newest addition to the SAM collection, bringing common sense 3D understanding of everyday images. SAM 3D includes two models: 🛋️ SAM 3D Objects for object and scene reconstruction 🧑‍🤝‍🧑 SAM 3D Body for human pose and shape estimation Both models achieve state-of-the-art performance transforming static 2D images into vivid, accurate reconstructions. 🔗 Learn more:

AI at Meta

858,407 просмотров • 7 месяцев назад

SAM 3D enables accurate 3D reconstruction from a single image, supporting real-world applications in editing, robotics, and interactive scene generation. Matt, a SAM 3D researcher, explains how the two-model design makes this possible for both people and complex environments. 🔗 Read the SAM 3D Objects research paper: 🔗 Read the SAM 3D Body research paper:

SAM 3D enables accurate 3D reconstruction from a single image, supporting real-world applications in editing, robotics, and interactive scene generation. Matt, a SAM 3D researcher, explains how the two-model design makes this possible for both people and complex environments. 🔗 Read the SAM 3D Objects research paper: 🔗 Read the SAM 3D Body research paper:

AI at Meta

17,858 просмотров • 7 месяцев назад

Adobe is entering the image-to-3D game! Their new method, LRM, can create high-fidelity 3D meshes from a single image in just 5 seconds 🔥 It's trained on 1 million objects and is able to generate objects from real-world images and generative AI models.

Adobe is entering the image-to-3D game! Their new method, LRM, can create high-fidelity 3D meshes from a single image in just 5 seconds 🔥 It's trained on 1 million objects and is able to generate objects from real-world images and generative AI models.

Dreaming Tulpa 🥓👑

270,942 просмотров • 2 лет назад

Doppelgangers: Learning to Disambiguate Images of Similar Structures paper page: We consider the visual disambiguation task of determining whether a pair of visually similar images depict the same or distinct 3D surfaces (e.g., the same or opposite sides of a symmetric building). Illusory image matches, where two images observe distinct but visually similar 3D surfaces, can be challenging for humans to differentiate, and can also lead 3D reconstruction algorithms to produce erroneous results. We propose a learning-based approach to visual disambiguation, formulating it as a binary classification task on image pairs. To that end, we introduce a new dataset for this problem, Doppelgangers, which includes image pairs of similar structures with ground truth labels. We also design a network architecture that takes the spatial distribution of local keypoints and matches as input, allowing for better reasoning about both local and global cues. Our evaluation shows that our method can distinguish illusory matches in difficult cases, and can be integrated into SfM pipelines to produce correct, disambiguated 3D reconstructions.

Doppelgangers: Learning to Disambiguate Images of Similar Structures paper page: We consider the visual disambiguation task of determining whether a pair of visually similar images depict the same or distinct 3D surfaces (e.g., the same or opposite sides of a symmetric building). Illusory image matches, where two images observe distinct but visually similar 3D surfaces, can be challenging for humans to differentiate, and can also lead 3D reconstruction algorithms to produce erroneous results. We propose a learning-based approach to visual disambiguation, formulating it as a binary classification task on image pairs. To that end, we introduce a new dataset for this problem, Doppelgangers, which includes image pairs of similar structures with ground truth labels. We also design a network architecture that takes the spatial distribution of local keypoints and matches as input, allowing for better reasoning about both local and global cues. Our evaluation shows that our method can distinguish illusory matches in difficult cases, and can be integrated into SfM pipelines to produce correct, disambiguated 3D reconstructions.

AK

76,878 просмотров • 2 лет назад

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior paper page: present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D reference image to guide the stages of geometry sculpting and texture boosting. A central focus of this work is to address the consistency issue that existing works encounter. To sculpt geometries that render coherently, we perform score distillation sampling via a view-dependent diffusion model. This 3D prior, alongside several training strategies, prioritizes the geometry consistency but compromises the texture fidelity. We further propose Bootstrapped Score Distillation to specifically boost the texture. We train a personalized diffusion model, Dreambooth, on the augmented renderings of the scene, imbuing it with 3D knowledge of the scene being optimized. The score distillation from this 3D-aware diffusion prior provides view-consistent guidance for the scene. Notably, through an alternating optimization of the diffusion prior and 3D scene representation, we achieve mutually reinforcing improvements: the optimized 3D scene aids in training the scene-specific diffusion model, which offers increasingly view-consistent guidance for 3D optimization. The optimization is thus bootstrapped and leads to substantial texture boosting. With tailored 3D priors throughout the hierarchical generation, DreamCraft3D generates coherent 3D objects with photorealistic renderings, advancing the state-of-the-art in 3D content generation.

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior paper page: present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D reference image to guide the stages of geometry sculpting and texture boosting. A central focus of this work is to address the consistency issue that existing works encounter. To sculpt geometries that render coherently, we perform score distillation sampling via a view-dependent diffusion model. This 3D prior, alongside several training strategies, prioritizes the geometry consistency but compromises the texture fidelity. We further propose Bootstrapped Score Distillation to specifically boost the texture. We train a personalized diffusion model, Dreambooth, on the augmented renderings of the scene, imbuing it with 3D knowledge of the scene being optimized. The score distillation from this 3D-aware diffusion prior provides view-consistent guidance for the scene. Notably, through an alternating optimization of the diffusion prior and 3D scene representation, we achieve mutually reinforcing improvements: the optimized 3D scene aids in training the scene-specific diffusion model, which offers increasingly view-consistent guidance for 3D optimization. The optimization is thus bootstrapped and leads to substantial texture boosting. With tailored 3D priors throughout the hierarchical generation, DreamCraft3D generates coherent 3D objects with photorealistic renderings, advancing the state-of-the-art in 3D content generation.

AK

161,530 просмотров • 2 лет назад

Today, we're announcing an expansion of Vantor’s industry-leading imaging constellation—eliminating a long-standing tradeoff between accurate, high-resolution imagery and high-frequency monitoring. For the first time, both come together in a single commercial system. 20 cm-class imaging. Global revisits as often as every 15 minutes. This expansion introduces two new satellite classes: 🛰️ Vantor Vantage™, next-generation 20 cm-class imaging satellites delivering the highest commercial resolution on orbit, coming online as early as 2029 🛰️ Vantor Pulse™, a fleet of 40 cm-class satellites designed for persistent, high-frequency monitoring, coming online as early as 2027 These new satellites build on the strength of the Vantor constellation—including our WorldView Legion satellites, which already deliver the most accurate, high-resolution imagery on orbit. By increasing capacity and refresh rates, the expansion reinforces the spatial foundation behind our Tensorglobe spatial intelligence platform, powering a real-time, AI-ready view of the world in 2D and 3D. These capabilities automate the full intelligence cycle within customer environments, enabling a new level of sovereign capability that helps governments and businesses detect change, track activity, and act faster. 👉 Read the full announcement here: Total clarity from space to ground.

Today, we're announcing an expansion of Vantor’s industry-leading imaging constellation—eliminating a long-standing tradeoff between accurate, high-resolution imagery and high-frequency monitoring. For the first time, both come together in a single commercial system. 20 cm-class imaging. Global revisits as often as every 15 minutes. This expansion introduces two new satellite classes: 🛰️ Vantor Vantage™, next-generation 20 cm-class imaging satellites delivering the highest commercial resolution on orbit, coming online as early as 2029 🛰️ Vantor Pulse™, a fleet of 40 cm-class satellites designed for persistent, high-frequency monitoring, coming online as early as 2027 These new satellites build on the strength of the Vantor constellation—including our WorldView Legion satellites, which already deliver the most accurate, high-resolution imagery on orbit. By increasing capacity and refresh rates, the expansion reinforces the spatial foundation behind our Tensorglobe spatial intelligence platform, powering a real-time, AI-ready view of the world in 2D and 3D. These capabilities automate the full intelligence cycle within customer environments, enabling a new level of sovereign capability that helps governments and businesses detect change, track activity, and act faster. 👉 Read the full announcement here: Total clarity from space to ground.

Vantor

23,995 просмотров • 2 месяцев назад

LLaVA-3D A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Recent advancements in Large Multimodal Models (LMMs) have greatly enhanced their proficiency in 2D visual understanding tasks, enabling them to effectively process and understand images and videos. However, the development of LMMs with 3D-awareness for 3D scene understanding has been hindered by the lack of large-scale 3D vision-language datasets and powerful 3D encoders. In this paper, we introduce a simple yet effective framework called LLaVA-3D. Leveraging the strong 2D understanding priors from LLaVA, our LLaVA-3D efficiently adapts LLaVA for 3D scene understanding without compromising 2D understanding capabilities. To achieve this, we employ a simple yet effective representation, 3D Patch, which connects 2D CLIP patch features with their corresponding positions in 3D space. By integrating the 3D Patches into 2D LMMs and employing joint 2D and 3D vision-language instruction tuning, we establish a unified architecture for both 2D image understanding and 3D scene understanding. Experimental results show that LLaVA-3D converges 3.5x faster than existing 3D LMMs when trained on 3D vision-language datasets. Moreover, LLaVA-3D not only achieves state-of-the-art performance across various 3D tasks but also maintains comparable 2D image understanding and vision-language conversation capabilities with LLaVA.

LLaVA-3D A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Recent advancements in Large Multimodal Models (LMMs) have greatly enhanced their proficiency in 2D visual understanding tasks, enabling them to effectively process and understand images and videos. However, the development of LMMs with 3D-awareness for 3D scene understanding has been hindered by the lack of large-scale 3D vision-language datasets and powerful 3D encoders. In this paper, we introduce a simple yet effective framework called LLaVA-3D. Leveraging the strong 2D understanding priors from LLaVA, our LLaVA-3D efficiently adapts LLaVA for 3D scene understanding without compromising 2D understanding capabilities. To achieve this, we employ a simple yet effective representation, 3D Patch, which connects 2D CLIP patch features with their corresponding positions in 3D space. By integrating the 3D Patches into 2D LMMs and employing joint 2D and 3D vision-language instruction tuning, we establish a unified architecture for both 2D image understanding and 3D scene understanding. Experimental results show that LLaVA-3D converges 3.5x faster than existing 3D LMMs when trained on 3D vision-language datasets. Moreover, LLaVA-3D not only achieves state-of-the-art performance across various 3D tasks but also maintains comparable 2D image understanding and vision-language conversation capabilities with LLaVA.

AK

41,713 просмотров • 1 год назад