Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

📢📢📢 RoMo: Robust Motion Segmentation Improves Structure from Motion TL;DR: boost your SfM pipeline on dynamic scenes. We use epipolar cues + SAMv2 features to find robust masks for moving objects in a zero-shot manner. 🧵👇

Andrea Tagliasacchi @CVPR

16,805 subscribers

18,603 görüntüleme • 1 yıl önce •via X (Twitter)

Eğitim Sağlık & İyilik Bilim & Teknoloji

Anya Rossi• Live Now

Private livecam show

7 Yorum

Andrea Tagliasacchi 🇨🇦 profil fotoğrafı

Andrea Tagliasacchi 🇨🇦1 yıl önce

Let's look at some results. An optimization process finds the moving components of the video, disentangling camera ego motion from scene motion.

Andrea Tagliasacchi 🇨🇦 profil fotoğrafı

Andrea Tagliasacchi 🇨🇦1 yıl önce

Our masks are robust to slow/fast camera movements, and can find multiple moving objects, even when they are in the background (look at the pedestrian🧐)

Andrea Tagliasacchi 🇨🇦 profil fotoğrafı

Andrea Tagliasacchi 🇨🇦1 yıl önce

Why care about motion masks? We show that good motion masks improve SfM performance, making COLMAP+our masks the SOTA on synthetic benchmarks. We also collect a real evaluation dataset with GT camera pose using a robotic arm, to evaluate our method in real casual captures.

Andrea Tagliasacchi 🇨🇦 profil fotoğrafı

Andrea Tagliasacchi 🇨🇦1 yıl önce

How does it work? (three steps) 1) We find the Fundamental matrix between adjacent frames in the video with RANSAC. 2) We then identify parts of the frame that have a very low or a very high epipolar error, as weak supervision signals to find the moving objects.

Andrea Tagliasacchi 🇨🇦 profil fotoğrafı

Andrea Tagliasacchi 🇨🇦1 yıl önce

3) Finally, we train a tiny MLP that classifies SAMv2 features as moving or static given the weak supervisory signal from high and low error masks. These features help complete the motion masks over the video effectively!

Andrea Tagliasacchi 🇨🇦 profil fotoğrafı

Andrea Tagliasacchi 🇨🇦1 yıl önce

and just like that... we get good quality masks, without human annotation or synthetic supervision! Find more results on our website →

Andrea Tagliasacchi 🇨🇦 profil fotoğrafı

Andrea Tagliasacchi 🇨🇦1 yıl önce

This work was led by @lily_goli and @sabour_sara. In collaboration with Mark Matthews, @marcusabrubaker, Dmitry Lagun, @fleet_dj and @srbhsxn at Google DeepMind, and @_AlecJacobson at the University of Toronto.

Benzer Videolar

Text2video models are getting interesting!📽️ Check out how we leverage their space-time features in a zero-shot manner for transferring motion across objects and scenes! Led by Danah Yatim Rafail Fridman,Yoni Kasten Tali Dekel [1/3]

Text2video models are getting interesting!📽️ Check out how we leverage their space-time features in a zero-shot manner for transferring motion across objects and scenes! Led by Danah Yatim Rafail Fridman,Yoni Kasten Tali Dekel [1/3]

Omer Bar Tal

63,301 görüntüleme • 2 yıl önce

📢📢 𝐏𝐞𝐫𝐜𝐇𝐞𝐚𝐝: 𝐏𝐞𝐫𝐜𝐞𝐩𝐭𝐮𝐚𝐥 𝐇𝐞𝐚𝐝 𝐌𝐨𝐝𝐞𝐥 𝐟𝐨𝐫 𝐒𝐢𝐧𝐠𝐥𝐞-𝐈𝐦𝐚𝐠𝐞 𝟑𝐃 𝐇𝐞𝐚𝐝 𝐑𝐞𝐜𝐨𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 & 𝐄𝐝𝐢𝐭𝐢𝐧𝐠📢📢 PercHead reconstructs realistic 3D heads from a single image and enables disentangled 3D editing via geometric controls and style inputs from images or text. At its core is a generalized 3D head decoder trained with perceptual supervision from DINOv2 and SAM 2.1. We find that our new perceptual loss formulation improves reconstruction fidelity compared to commonly-used methods such as LPIPS. Our trained reconstruction model is able to generate 3D-consistent heads from a single input image. Even with challenging side-view inputs, the model robustly infers missing regions for a coherent, high-fidelity output. In addition, our architecture seamlessly adapts to downstream tasks: by swapping the encoder, we can transform the model into a disentangled 3D editing pipeline. In this scenario, we can control geometry through - potentially hand-drawn - segmentation maps, and condition style via image or text prompt. We also provide an interactive GUI to enable the exploration of our editing pipeline. 🌍 📽️ Great work by Antonio Oroz and Tobias Kirschstein

📢📢 𝐏𝐞𝐫𝐜𝐇𝐞𝐚𝐝: 𝐏𝐞𝐫𝐜𝐞𝐩𝐭𝐮𝐚𝐥 𝐇𝐞𝐚𝐝 𝐌𝐨𝐝𝐞𝐥 𝐟𝐨𝐫 𝐒𝐢𝐧𝐠𝐥𝐞-𝐈𝐦𝐚𝐠𝐞 𝟑𝐃 𝐇𝐞𝐚𝐝 𝐑𝐞𝐜𝐨𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 & 𝐄𝐝𝐢𝐭𝐢𝐧𝐠📢📢 PercHead reconstructs realistic 3D heads from a single image and enables disentangled 3D editing via geometric controls and style inputs from images or text. At its core is a generalized 3D head decoder trained with perceptual supervision from DINOv2 and SAM 2.1. We find that our new perceptual loss formulation improves reconstruction fidelity compared to commonly-used methods such as LPIPS. Our trained reconstruction model is able to generate 3D-consistent heads from a single input image. Even with challenging side-view inputs, the model robustly infers missing regions for a coherent, high-fidelity output. In addition, our architecture seamlessly adapts to downstream tasks: by swapping the encoder, we can transform the model into a disentangled 3D editing pipeline. In this scenario, we can control geometry through - potentially hand-drawn - segmentation maps, and condition style via image or text prompt. We also provide an interactive GUI to enable the exploration of our editing pipeline. 🌍 📽️ Great work by Antonio Oroz and Tobias Kirschstein

Matthias Niessner

18,855 görüntüleme • 8 ay önce

Humanoids finally move like humans… and can do more than copy. [Details + demos in thread 👇] A new framework, BeyondMimic, shows how to learn naturalistic whole-body control from human motion. But then goes further by composing those skills into versatile, zero-shot policies. Two key pieces make it work: •Motion tracking pipeline: robustly turns human mocap into dynamic robot motions (jump spins, sprinting, cartwheels) on real hardware •Guided diffusion policy: distills motion primitives and enables zero-shot downstream tasks with simple cost functions The result: ✅ State-of-the-art motion quality from mocap tracking ✅ Versatile task execution; waypoint navigation, joystick teleop, obstacle avoidance ✅ Stable sim-to-real transfer for diverse humanoid skills Thank you for sharing, Qiayuan Liao! 📍 Paper: Video:

Humanoids finally move like humans… and can do more than copy. [Details + demos in thread 👇] A new framework, BeyondMimic, shows how to learn naturalistic whole-body control from human motion. But then goes further by composing those skills into versatile, zero-shot policies. Two key pieces make it work: •Motion tracking pipeline: robustly turns human mocap into dynamic robot motions (jump spins, sprinting, cartwheels) on real hardware •Guided diffusion policy: distills motion primitives and enables zero-shot downstream tasks with simple cost functions The result: ✅ State-of-the-art motion quality from mocap tracking ✅ Versatile task execution; waypoint navigation, joystick teleop, obstacle avoidance ✅ Stable sim-to-real transfer for diverse humanoid skills Thank you for sharing, Qiayuan Liao! 📍 Paper: Video:

Ilir Aliu - eu/acc

72,522 görüntüleme • 11 ay önce

Progressively Optimized Local Radiance Fields for Robust View Synthesis paper page: present an algorithm for reconstructing the radiance field of a large-scale scene from a single casually captured video. The task poses two core challenges. First, most existing radiance field reconstruction approaches rely on accurate pre-estimated camera poses from Structure-from-Motion algorithms, which frequently fail on in-the-wild videos. Second, using a single, global radiance field with finite representational capacity does not scale to longer trajectories in an unbounded scene. For handling unknown poses, we jointly estimate the camera poses with radiance field in a progressive manner. We show that progressive optimization significantly improves the robustness of the reconstruction. For handling large unbounded scenes, we dynamically allocate new local radiance fields trained with frames within a temporal window. This further improves robustness (e.g., performs well even under moderate pose drifts) and allows us to scale to large scenes. Our extensive evaluation on the Tanks and Temples dataset and our collected outdoor dataset, Static Hikes, show that our approach compares favorably with the state-of-the-art.

Progressively Optimized Local Radiance Fields for Robust View Synthesis paper page: present an algorithm for reconstructing the radiance field of a large-scale scene from a single casually captured video. The task poses two core challenges. First, most existing radiance field reconstruction approaches rely on accurate pre-estimated camera poses from Structure-from-Motion algorithms, which frequently fail on in-the-wild videos. Second, using a single, global radiance field with finite representational capacity does not scale to longer trajectories in an unbounded scene. For handling unknown poses, we jointly estimate the camera poses with radiance field in a progressive manner. We show that progressive optimization significantly improves the robustness of the reconstruction. For handling large unbounded scenes, we dynamically allocate new local radiance fields trained with frames within a temporal window. This further improves robustness (e.g., performs well even under moderate pose drifts) and allows us to scale to large scenes. Our extensive evaluation on the Tanks and Temples dataset and our collected outdoor dataset, Static Hikes, show that our approach compares favorably with the state-of-the-art.

AK

140,624 görüntüleme • 3 yıl önce

SAMURAI vs. MetaAI's SAM 2! Traditional visual object tracking struggles in crowded, fast-moving, or self-occluded scenes, as does SAM2. Meet SAMURAI: a completely open-source adaptation of the Segment Anything Model for zero-shot visual tracking! Here's why it's a game-changer: 🚫 No need for retraining or finetuning 🎯 Boosts success rate and precision 🤖 Motion-aware memory selection 💪 Zero-shot performance on diverse datasets But that's not all: 🔬 Refines mask selection 🔮 Predicts object motion effectively 📈 Gains: 7.1% AUC on LaSOT, 3.5% AO on GOT-10k 🏆 Competes with fully supervised methods without extra training Link to the GitHub repo in the next tweet! _____ Find me → Akshay 🚀 ✔️ For more insights & tutorials on AI and Machine Learning.

SAMURAI vs. MetaAI's SAM 2! Traditional visual object tracking struggles in crowded, fast-moving, or self-occluded scenes, as does SAM2. Meet SAMURAI: a completely open-source adaptation of the Segment Anything Model for zero-shot visual tracking! Here's why it's a game-changer: 🚫 No need for retraining or finetuning 🎯 Boosts success rate and precision 🤖 Motion-aware memory selection 💪 Zero-shot performance on diverse datasets But that's not all: 🔬 Refines mask selection 🔮 Predicts object motion effectively 📈 Gains: 7.1% AUC on LaSOT, 3.5% AO on GOT-10k 🏆 Competes with fully supervised methods without extra training Link to the GitHub repo in the next tweet! _____ Find me → Akshay 🚀 ✔️ For more insights & tutorials on AI and Machine Learning.

Akshay 🚀

363,389 görüntüleme • 1 yıl önce

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking paper page: introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks.

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking paper page: introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks.

AK

122,533 görüntüleme • 3 yıl önce

"If the Labour Party is only for people in London, only for people who went to university, then its not worth existing and we should just pack it in" 🌹👩‍🎓 Jonny Ball of UnHerd Blue Labour @ #BattleFest 2025 on "From Reform to Blue Labour: will the working class find a voice at last?"👷‍♂️📢 👇

"If the Labour Party is only for people in London, only for people who went to university, then its not worth existing and we should just pack it in" 🌹👩‍🎓 Jonny Ball of UnHerd Blue Labour @ #BattleFest 2025 on "From Reform to Blue Labour: will the working class find a voice at last?"👷‍♂️📢 👇

Academy of Ideas

29,230 görüntüleme • 7 ay önce

[SIGGRAPH '26] Anchored Temporal Gaussian Splatting for Long Volumetric Video Representation TL;DR: We present ATGS, a novel framework for volumetric video reconstruction that effectively handles long sequences and complex motions. By utilizing time-conditioned anchors and a temporal windowing strategy, ATGS enhances temporal coherence and scalability. Abstract (excerpt): Key insight is that explicitly tracking long term complex motion with individual Gaussian primitives is inherently unstable. Instead, we organize Gaussians around time conditioned anchors that localize their spatial and temporal support, thereby reducing long range motion complexity. We further introduce a temporal windowing strategy to activate only anchors relevant to the queried time, which improves scalability and temporal coherence. In addition, to ensure spatial and temporal stability, we design a compact set of multi level anchor features that encode global features, local spatial features, and local temporal features, jointly constraining Gaussian generation. Extensive experiments demonstrate that ATGS consistently outperforms prior methods on long sequence volumetric videos with complex motions.

[SIGGRAPH '26] Anchored Temporal Gaussian Splatting for Long Volumetric Video Representation TL;DR: We present ATGS, a novel framework for volumetric video reconstruction that effectively handles long sequences and complex motions. By utilizing time-conditioned anchors and a temporal windowing strategy, ATGS enhances temporal coherence and scalability. Abstract (excerpt): Key insight is that explicitly tracking long term complex motion with individual Gaussian primitives is inherently unstable. Instead, we organize Gaussians around time conditioned anchors that localize their spatial and temporal support, thereby reducing long range motion complexity. We further introduce a temporal windowing strategy to activate only anchors relevant to the queried time, which improves scalability and temporal coherence. In addition, to ensure spatial and temporal stability, we design a compact set of multi level anchor features that encode global features, local spatial features, and local temporal features, jointly constraining Gaussian generation. Extensive experiments demonstrate that ATGS consistently outperforms prior methods on long sequence volumetric videos with complex motions.

MrNeRF

26,905 görüntüleme • 3 ay önce

SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes abs: paper page: Existing methods for the 4D reconstruction of general, non-rigidly deforming objects focus on novel-view synthesis and neglect correspondences. However, time consistency enables advanced downstream tasks like 3D editing, motion analysis, or virtual-asset creation. We propose SceNeRFlow to reconstruct a general, non-rigid scene in a time-consistent manner. Our dynamic-NeRF method takes multi-view RGB videos and background images from static cameras with known camera parameters as input. It then reconstructs the deformations of an estimated canonical model of the geometry and appearance in an online fashion. Since this canonical model is time-invariant, we obtain correspondences even for long-term, long-range motions. We employ neural scene representations to parametrize the components of our method. Like prior dynamic-NeRF methods, we use a backwards deformation model. We find non-trivial adaptations of this model necessary to handle larger motions: We decompose the deformations into a strongly regularized coarse component and a weakly regularized fine component, where the coarse component also extends the deformation field into the space surrounding the object, which enables tracking over time. We show experimentally that, unlike prior work that only handles small motion, our method enables the reconstruction of studio-scale motions.

SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes abs: paper page: Existing methods for the 4D reconstruction of general, non-rigidly deforming objects focus on novel-view synthesis and neglect correspondences. However, time consistency enables advanced downstream tasks like 3D editing, motion analysis, or virtual-asset creation. We propose SceNeRFlow to reconstruct a general, non-rigid scene in a time-consistent manner. Our dynamic-NeRF method takes multi-view RGB videos and background images from static cameras with known camera parameters as input. It then reconstructs the deformations of an estimated canonical model of the geometry and appearance in an online fashion. Since this canonical model is time-invariant, we obtain correspondences even for long-term, long-range motions. We employ neural scene representations to parametrize the components of our method. Like prior dynamic-NeRF methods, we use a backwards deformation model. We find non-trivial adaptations of this model necessary to handle larger motions: We decompose the deformations into a strongly regularized coarse component and a weakly regularized fine component, where the coarse component also extends the deformation field into the space surrounding the object, which enables tracking over time. We show experimentally that, unlike prior work that only handles small motion, our method enables the reconstruction of studio-scale motions.

AK

76,380 görüntüleme • 2 yıl önce

📢📢 𝐀𝐯𝐚𝐭𝟑𝐫 📢📢 Avat3r creates high-quality 3D head avatars from just a few input images in a single forward pass with a new dynamic 3DGS reconstruction model. Video: Project: Our core idea is to make Gaussian Reconstruction Models animatable. We find that a simple cross-attention to an expression code sequence is already sufficient to model complex facial expressions. We then incorporate position maps from DUSt3R and feature maps from Sapiens to facilitate the prediction task. While DUSt3R's position maps act as a pixel-aligned initialization for the Gaussians' positions, the Sapiens feature maps help the cross-view transformer to match corresponding image tokens in the 4 input images. One major challenge in creating a 3D head avatar from smartphone images comes from inconsistent facial expressions when the subject could not remain perfectly static during the capture. We eliminate this static requirement by simply showing our model input images with different facial expressions during training. This technique makes our model robust to inconsistent input images later on. Finally, we show that despite the model has been trained with 4 input images, one can even create a 3D head avatar when only a single image is available. To achieve this, we employ a pre-trained 3D GAN to lift the single image to 3D and then render the 4 input images for our model. This allows us to create 3D head avatars from single images and even highly out-of-distribution examples like AI generated faces, paintings or statues. Great work by Tobias Kirschstein from his internship at Meta with Javier Romero, Artem Sevastopolsky, and Shunsuke Saito

Matthias Niessner

74,763 görüntüleme • 1 yıl önce

📢 Eyeline and Netflix latest research, 🌟Go-with-the-Track🌟, has been accepted to #SIGGRAPH2026. We propose a video generation framework that jointly conditions on multiple reference images and reference-anchored point tracks, enabling a range of content creation applications within a unified model: • Motion-preserving video restylization using multiple reference images; • Mesh- or keypoint-driven compositing and stylization; • Camera motion control for both static and dynamic scenes using multiple reference images captured from different view-points and time steps. ✊Congrats to the team: Koichi Namekata, Yash Kant, Zhizheng Liu, Ryan Burgert, Yuancheng Xu, Jordan Lin, Emmett Steven, Julien Philip, LiMa, Andrea Vedaldi, Paul Debevec, Ning Yu 📰 Paper: 🌐 Project: ⌨️ Code: *** This work is part of the ongoing research and development at Eyeline and Netflix, and we look forward to seeing its techniques and workflows adopted in future productions.

📢 Eyeline and Netflix latest research, 🌟Go-with-the-Track🌟, has been accepted to #SIGGRAPH2026. We propose a video generation framework that jointly conditions on multiple reference images and reference-anchored point tracks, enabling a range of content creation applications within a unified model: • Motion-preserving video restylization using multiple reference images; • Mesh- or keypoint-driven compositing and stylization; • Camera motion control for both static and dynamic scenes using multiple reference images captured from different view-points and time steps. ✊Congrats to the team: Koichi Namekata, Yash Kant, Zhizheng Liu, Ryan Burgert, Yuancheng Xu, Jordan Lin, Emmett Steven, Julien Philip, LiMa, Andrea Vedaldi, Paul Debevec, Ning Yu 📰 Paper: 🌐 Project: ⌨️ Code: *** This work is part of the ongoing research and development at Eyeline and Netflix, and we look forward to seeing its techniques and workflows adopted in future productions.

Ning Yu

23,561 görüntüleme • 1 ay önce

Domain Randomization (DR) is a key component of the data augmentation pipeline at Axis Robotics. By applying DR, we are able to scale verified, high-quality human trajectories by 10x to 100x. During training, we systematically introduce variances in environmental parameters. This prevents the model from relying on spurious visual correlations. The objective is to ensure the policy learns rather than overfitting. To demonstrate the necessity and effectiveness of this approach, we evaluated both DR and No-DR models on Task 74 (pour_water_into_mug). The empirical results show a definitive impact on real-world deployment reliability: integrating DR into the pipeline increased the success rate from 0% to 90% (Fig. 1). This divergence stems from how the respective policies process visual observations (Fig. 2). The baseline (No DR) model overfits to the static visual background. It essentially memorizes the poses from the training dataset but fails to generalize when subjected to the inevitable variances of real-world deployment. Consequently, it cannot execute the correct manipulation on the target object. Conversely, the DR-trained model learns to extract essential geometric features and physical constraints, filtering out superficial visual noise. This leads to significantly higher robustness in dynamic environments. The structural difference in execution is clearly reflected in the end-effector trajectory data: These real-world deployment recordings further illustrate this difference (Videos 1 and 2). Scaling Physical AI requires turning raw trajectory data into robust policies, and a rigorously engineered DR infrastructure is an essential bridge to close the Sim2Real gap.

Domain Randomization (DR) is a key component of the data augmentation pipeline at Axis Robotics. By applying DR, we are able to scale verified, high-quality human trajectories by 10x to 100x. During training, we systematically introduce variances in environmental parameters. This prevents the model from relying on spurious visual correlations. The objective is to ensure the policy learns rather than overfitting. To demonstrate the necessity and effectiveness of this approach, we evaluated both DR and No-DR models on Task 74 (pour_water_into_mug). The empirical results show a definitive impact on real-world deployment reliability: integrating DR into the pipeline increased the success rate from 0% to 90% (Fig. 1). This divergence stems from how the respective policies process visual observations (Fig. 2). The baseline (No DR) model overfits to the static visual background. It essentially memorizes the poses from the training dataset but fails to generalize when subjected to the inevitable variances of real-world deployment. Consequently, it cannot execute the correct manipulation on the target object. Conversely, the DR-trained model learns to extract essential geometric features and physical constraints, filtering out superficial visual noise. This leads to significantly higher robustness in dynamic environments. The structural difference in execution is clearly reflected in the end-effector trajectory data: These real-world deployment recordings further illustrate this difference (Videos 1 and 2). Scaling Physical AI requires turning raw trajectory data into robust policies, and a rigorously engineered DR infrastructure is an essential bridge to close the Sim2Real gap.

Axis Robotics

27,125 görüntüleme • 3 ay önce

GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views TL;DR: Are we witnessing the first steps towards 3DGS live streaming? Contributions: • We introduce a generalizable 3D Gaussian Splatting methodology that employs pixel-wise Gaussian parameter maps defined on 2D source image planes to formulate 3D Gaussians in a feed-forward manner. • We propose a fully differentiable framework composed of an iterative depth estimation module and a Gaussian parameter regression module. The intermediate depth prediction bridges the two components and allows them to benefit from joint training. • We introduce a regularization term and an epipolar attention mechanism to preserve geometry consistency between the two source views when using only rendering loss. Our method generalizes well to unseen characters even in complicated scenes. • We develop a real-time FVV system that achieves high-resolution rendering of characters in the scene without any geometry supervision.

GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views TL;DR: Are we witnessing the first steps towards 3DGS live streaming? Contributions: • We introduce a generalizable 3D Gaussian Splatting methodology that employs pixel-wise Gaussian parameter maps defined on 2D source image planes to formulate 3D Gaussians in a feed-forward manner. • We propose a fully differentiable framework composed of an iterative depth estimation module and a Gaussian parameter regression module. The intermediate depth prediction bridges the two components and allows them to benefit from joint training. • We introduce a regularization term and an epipolar attention mechanism to preserve geometry consistency between the two source views when using only rendering loss. Our method generalizes well to unseen characters even in complicated scenes. • We develop a real-time FVV system that achieves high-resolution rendering of characters in the scene without any geometry supervision.

MrNeRF

25,862 görüntüleme • 1 yıl önce

Another day of solving problem Introducing Monad Multi Sender: a tool to airdrop any ERC-20 tokens on Monad testnet (soon NFTs) from your wallet to hundreds of wallets in one click for under 0.2 MON gas fees I had to airdrop p⨀ppies token $IPY to Genesis NFT holders but couldn't find a way and I'm not a human faucet to send them manually. So here we go I made this for you guys Here’s a detailed guide on how to use it 🧵👇

Another day of solving problem Introducing Monad Multi Sender: a tool to airdrop any ERC-20 tokens on Monad testnet (soon NFTs) from your wallet to hundreds of wallets in one click for under 0.2 MON gas fees I had to airdrop p⨀ppies token $IPY to Genesis NFT holders but couldn't find a way and I'm not a human faucet to send them manually. So here we go I made this for you guys Here’s a detailed guide on how to use it 🧵👇

p⨀pnad

35,185 görüntüleme • 1 yıl önce

Last week, we shared a refresher on our favorite Flow features. Today, we're doing a deep dive into prompting with a few tips on how to craft the optimal prompt, create your dream video, and #FindYourFlow 🕺 Clearly identify your characters or objects and describe their movements 🎥 Frame your shot with compositional terms like "wide shot" or “close-up" and direct the camera with instructions like "tracking shot" or "aerial view" 💡The lighting and environment set the entire mood. Instead of "a room" use more descriptive terms like "a dusty attic filled with forgotten treasures, a single beam of afternoon light cutting through a grimy window" 🧸 Flow is not limited to realistic visuals. You can experiment with a wide array of animation styles like "stop motion" or "knitted animation" 🗣️ With Veo 3, you can include sound effects and dialogue in your prompt. Describe the ambient sounds happening throughout the scene and generate dialogue by including it in your prompt. For more Flow tips, see here ➡️ 🚨Pro tip: this is another great bookmark 😉

Last week, we shared a refresher on our favorite Flow features. Today, we're doing a deep dive into prompting with a few tips on how to craft the optimal prompt, create your dream video, and #FindYourFlow 🕺 Clearly identify your characters or objects and describe their movements 🎥 Frame your shot with compositional terms like "wide shot" or “close-up" and direct the camera with instructions like "tracking shot" or "aerial view" 💡The lighting and environment set the entire mood. Instead of "a room" use more descriptive terms like "a dusty attic filled with forgotten treasures, a single beam of afternoon light cutting through a grimy window" 🧸 Flow is not limited to realistic visuals. You can experiment with a wide array of animation styles like "stop motion" or "knitted animation" 🗣️ With Veo 3, you can include sound effects and dialogue in your prompt. Describe the ambient sounds happening throughout the scene and generate dialogue by including it in your prompt. For more Flow tips, see here ➡️ 🚨Pro tip: this is another great bookmark 😉

Google Labs

33,868 görüntüleme • 1 yıl önce

[📢 FOR OUR NINE] The OT9 Archive is live! 🌹 You can now explore 700+ heartfelt letters from ZEROSES around the world, all written for our nine. Next week, a physical letter carrying the page's QR code will be sent directly to the boys’ agencies. 🔍 Use the search tool to type in your nickname and find the message you've submitted ♡ The website is not temporary, so you can visit it anytime. 🔊 Sound ON 🔗 #ZEROBASEONE #ZB1 #제로베이스원

[📢 FOR OUR NINE] The OT9 Archive is live! 🌹 You can now explore 700+ heartfelt letters from ZEROSES around the world, all written for our nine. Next week, a physical letter carrying the page's QR code will be sent directly to the boys’ agencies. 🔍 Use the search tool to type in your nickname and find the message you've submitted ♡ The website is not temporary, so you can visit it anytime. 🔊 Sound ON 🔗 #ZEROBASEONE #ZB1 #제로베이스원

ZB9 MARKETING

14,340 görüntüleme • 3 ay önce

📢 Announcing one of the most exciting works from us this year on **scalable robot policy evaluation through real-to-sim transfer**, moving toward a scalable evaluation engine with structured world models that capture the appearance, geometry, and dynamics of environments involving deformable objects. 🤖 Evaluation remains one of the biggest bottlenecks in building general-purpose robots. Today, robots are still evaluated only in the real world, which is **orders of magnitude slower** than the development of language agents. We propose a new framework where simulation performance **strongly correlates** with the real world (r > 0.9), even for deformable objects. The key difference from existing work lies in the correlation between simulation and reality: if a robot model performs better in the digital world, does it also perform better in the real world? This question has long made people hesitant about simulation-based evaluation — especially for deformable objects. We are changing that. Our pipeline achieves effective real-to-sim transfer, establishing **state-of-the-art correlation** between simulation and reality for deformable object manipulation. It provides a **scalable and reproducible evaluation engine** for robot learning. 🌐

📢 Announcing one of the most exciting works from us this year on scalable robot policy evaluation through real-to-sim transfer, moving toward a scalable evaluation engine with structured world models that capture the appearance, geometry, and dynamics of environments involving deformable objects. 🤖 Evaluation remains one of the biggest bottlenecks in building general-purpose robots. Today, robots are still evaluated only in the real world, which is orders of magnitude slower than the development of language agents. We propose a new framework where simulation performance strongly correlates with the real world (r > 0.9), even for deformable objects. The key difference from existing work lies in the correlation between simulation and reality: if a robot model performs better in the digital world, does it also perform better in the real world? This question has long made people hesitant about simulation-based evaluation — especially for deformable objects. We are changing that. Our pipeline achieves effective real-to-sim transfer, establishing state-of-the-art correlation between simulation and reality for deformable object manipulation. It provides a scalable and reproducible evaluation engine for robot learning. 🌐

Yunzhu Li

39,960 görüntüleme • 8 ay önce

JOIN DR. SH. YASIR QADHI IN HONOURING THE LEGACY OF ABU DIAA KHALED NABHAN FROM GAZA Across Gaza, Lebanon, and Yemen, children are facing unimaginable hardship—losing their homes, schools, and loved ones. 💔 Dr. Yasir Qadhi joins READ Foundation to support children in crisis, providing: ✅ Emergency aid ✅ Education & psychosocial care ✅ Rebuilding efforts for schools & safe spaces 💚 In honour of Abu Diaa Khaled Nabhan’s heartbreaking loss, we stand together to restore hope. 🙏 Your donation can change lives. Support now and be the reason a child smiles again. 📢 Use the blessed 30 days of Ramadan to change a life. 🌙✨ Click on the link in the bio! #READFoundation #SupportChildren #RamadanGiving #Charity #EducationForAll

JOIN DR. SH. YASIR QADHI IN HONOURING THE LEGACY OF ABU DIAA KHALED NABHAN FROM GAZA Across Gaza, Lebanon, and Yemen, children are facing unimaginable hardship—losing their homes, schools, and loved ones. 💔 Dr. Yasir Qadhi joins READ Foundation to support children in crisis, providing: ✅ Emergency aid ✅ Education & psychosocial care ✅ Rebuilding efforts for schools & safe spaces 💚 In honour of Abu Diaa Khaled Nabhan’s heartbreaking loss, we stand together to restore hope. 🙏 Your donation can change lives. Support now and be the reason a child smiles again. 📢 Use the blessed 30 days of Ramadan to change a life. 🌙✨ Click on the link in the bio! #READFoundation #SupportChildren #RamadanGiving #Charity #EducationForAll

Dr. Yasir Qadhi

13,587 görüntüleme • 1 yıl önce

I’m really proud of a lot of the shots I have been able to achieve lately with the adherence to prompts getting so good it’s a level of creative control I e been dying for and really starting to make me actual dreams come true for just myself working with these tools! Prompting is such a unique and creative aspect of what we all do in AI. Knowing when to prompt long form and when not to, what keyword tokens and how they are applied in sequence, structure and formatting, they all matter. Especially with the models not needing extensive prompts, that’s actually where you can shine with how you use it because sometimes simpler is better but in other instances extensive motion/movement detail can be the difference maker for a cool shot or a spectacular one. I’ve been focusing a lot on shot development prompts and structure testing and I’ve managed to find a sweet spot. I’m Gonna put together a little rundown on how I did some of mine next week for this film. 🙌😎🔥💯🎥

I’m really proud of a lot of the shots I have been able to achieve lately with the adherence to prompts getting so good it’s a level of creative control I e been dying for and really starting to make me actual dreams come true for just myself working with these tools! Prompting is such a unique and creative aspect of what we all do in AI. Knowing when to prompt long form and when not to, what keyword tokens and how they are applied in sequence, structure and formatting, they all matter. Especially with the models not needing extensive prompts, that’s actually where you can shine with how you use it because sometimes simpler is better but in other instances extensive motion/movement detail can be the difference maker for a cool shot or a spectacular one. I’ve been focusing a lot on shot development prompts and structure testing and I’ve managed to find a sweet spot. I’m Gonna put together a little rundown on how I did some of mine next week for this film. 🙌😎🔥💯🎥

Dustin Hollywood

15,964 görüntüleme • 1 yıl önce

🧵1/3 I partnered with Kling to make a promo for their new 3.0 model. I came up with the concept, created it and delivered it all on my own in 3 days of early access, I wanted to make something that showed how Kling could be used to tell a diverse range of stories in a diverse range of styles. I've honestly been blown away be how incredible the model is and I haven't even scratched the surface of features like the new 3.0 OMNI model. For me Kling 2.6 was already best in class for most things but 3.0 sees massive improvements to video quality and detail (no need to add grain to mask imperfections) and prompt adherence. Also you can time your generations to be as long as 15 seconds and as short as 3, this is fantastic for pacing your scenes. Read on to find out more about the new Performance and Multi shot features!👇

🧵1/3 I partnered with Kling to make a promo for their new 3.0 model. I came up with the concept, created it and delivered it all on my own in 3 days of early access, I wanted to make something that showed how Kling could be used to tell a diverse range of stories in a diverse range of styles. I've honestly been blown away be how incredible the model is and I haven't even scratched the surface of features like the new 3.0 OMNI model. For me Kling 2.6 was already best in class for most things but 3.0 sees massive improvements to video quality and detail (no need to add grain to mask imperfections) and prompt adherence. Also you can time your generations to be as long as 15 seconds and as short as 3, this is fantastic for pacing your scenes. Read on to find out more about the new Performance and Multi shot features!👇

Uncanny Harry AI

28,977 görüntüleme • 5 ay önce