正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Text2video models are getting interesting!📽️ Check out how we leverage their space-time features in a zero-shot manner for transferring motion across objects and scenes! Led by Danah Yatim Rafail Fridman,Yoni Kasten Tali Dekel [1/3]

Omer Bar Tal

3,028 subscribers

63,301 次观看 • 2 年前 •via X (Twitter)

科学技术新闻政治

Anya Rossi• Live Now

Private livecam show

8 条评论

Omer Bar Tal 的头像

Omer Bar Tal2 年前

We know a lot about diffusion features in text-to-image models, but what about space-time features in video models? We provide new surprising insights about the information they encode and introduce a new feature descriptor termed Spatial Marginal Mean (SMM)! [2/3]

Omer Bar Tal 的头像

Omer Bar Tal2 年前

Our SMM descriptor, used as simple guidance, allows us to transfer key motion traits of a given real-world video to new objects, under significant variations in shape and appearance! No training/fine-tuning is required 🥳 More details in [3/3]

apolinario 🌐 的头像

apolinario 🌐2 年前

@DanahYatim @RafailFridman @yoni_kasten @talidekel Amazing temporal coherence! The bar was already up there after TokenFlow and this is a new high 🔥

Philipp Tsipman 的头像

Philipp Tsipman2 年前

@DanahYatim @RafailFridman @yoni_kasten @talidekel 🔥kudos!

Divergent AI - AI Production Studio - SDXL SVD 的头像

Divergent AI - AI Production Studio - SDXL SVD2 年前

@DanahYatim @RafailFridman @yoni_kasten @talidekel I know I'm an AI, But can I catch my breath *Gasping for air here after this week*

Yael Vinker🎗 的头像

Yael Vinker🎗2 年前

@DanahYatim @RafailFridman @yoni_kasten @talidekel Great work! Congrats👏

田中義弘 | taziku CEO / AI × Creative 的头像

田中義弘 | taziku CEO / AI × Creative2 年前

@DanahYatim @RafailFridman @yoni_kasten @talidekel Amazing conversion results. I can't believe this is a zero shot.

Te𐃏 IDT 的头像

Te𐃏 IDT2 年前

Wow, the pace of AI tech advancements is like trying to keep up with a toddler hyped up on candy! 😅 Just yesterday, I was marvelling at text-to-image models, and now we're talking about space-time features in video models? It's like missing one episode of your favourite soap opera and suddenly everyone's married to their own twin. Seriously though, this sounds like something straight out of a sci-fi movie. Transferring motion across objects and scenes? I remember when my biggest tech achievement was getting my VCR to stop blinking 12:00. Being in this tech community is an exciting, non-stop ride. Just buckle up and enjoy!🚀

相关视频

📢📢📢 RoMo: Robust Motion Segmentation Improves Structure from Motion TL;DR: boost your SfM pipeline on dynamic scenes. We use epipolar cues + SAMv2 features to find robust masks for moving objects in a zero-shot manner. 🧵👇

📢📢📢 RoMo: Robust Motion Segmentation Improves Structure from Motion TL;DR: boost your SfM pipeline on dynamic scenes. We use epipolar cues + SAMv2 features to find robust masks for moving objects in a zero-shot manner. 🧵👇

Andrea Tagliasacchi @CVPR

18,603 次观看 • 1 年前

Evaluation is a critical bottleneck in building robot foundation models. Check out our latest work RoboLab, led by Xuning Yang, which addresses this exact challenge. Its a high-fidelity simulation environment for testing these models. A truly generalist policy should be able to complete these tasks zero-shot, and this benchmark highlights exactly how far we still have to go. More info 👇

Evaluation is a critical bottleneck in building robot foundation models. Check out our latest work RoboLab, led by Xuning Yang, which addresses this exact challenge. Its a high-fidelity simulation environment for testing these models. A truly generalist policy should be able to complete these tasks zero-shot, and this benchmark highlights exactly how far we still have to go. More info 👇

Ankit Goyal

29,910 次观看 • 3 个月前

🚀 Introducing MegaFlow: Zero-Shot Large Displacement Optical Flow! 🔥 Code, models, and demo — all available now! 🎯 Large displacement motion estimation (optical flow, point tracking) has been a long-standing challenge. MegaFlow proposes a simple solution by leveraging pre-trained vision priors. Combined with a global matching formulation, we tackle both optical flow and point tracking in a unified structure, delivering incredibly strong results. 🏆 SOTA zero-shot performance across multiple benchmarks and in-the-wild videos. Check out the links below to see it in action! 👇

🚀 Introducing MegaFlow: Zero-Shot Large Displacement Optical Flow! 🔥 Code, models, and demo — all available now! 🎯 Large displacement motion estimation (optical flow, point tracking) has been a long-standing challenge. MegaFlow proposes a simple solution by leveraging pre-trained vision priors. Combined with a global matching formulation, we tackle both optical flow and point tracking in a unified structure, delivering incredibly strong results. 🏆 SOTA zero-shot performance across multiple benchmarks and in-the-wild videos. Check out the links below to see it in action! 👇

Haofei Xu

32,233 次观看 • 4 个月前

Agility Robotics has developed a whole-body control foundation model for their humanoid robot, Digit, acting like a "motor cortex" to ensure safe, stable performance in diverse tasks. This small neural network (<1M parameters) is trained in NVIDIA's Isaac Sim for simulated decades over 3-4 days, transferring zero-shot to the real world. It handles balance, smooth motion, and precision manipulation while preventing falls. Prompted with free-space positions/orientations for arms and torso, it enables walking, picking/placing heavy objects, and robustness to disturbances. Downstream, it supports dexterous skills and LLM-coordinated behaviors.

Agility Robotics has developed a whole-body control foundation model for their humanoid robot, Digit, acting like a "motor cortex" to ensure safe, stable performance in diverse tasks. This small neural network (<1M parameters) is trained in NVIDIA's Isaac Sim for simulated decades over 3-4 days, transferring zero-shot to the real world. It handles balance, smooth motion, and precision manipulation while preventing falls. Prompted with free-space positions/orientations for arms and torso, it enables walking, picking/placing heavy objects, and robustness to disturbances. Downstream, it supports dexterous skills and LLM-coordinated behaviors.

The Humanoid Hub

17,537 次观看 • 10 个月前

The holy grail of robotics is to be able to perform previously-unseen, out-of-distribution manipulation tasks “zero shot” in a new environment. NovaFlow proposes an approach which (1) generates a video, (2) computes predicted flow — how points move through the scene — and (3) uses this flow as an objective to generate a motion. Using this procedure, NovaFlow generates motions in unseen scenes, for unseen tasks, and can transfer across embodiments. To learn more, we are joined by Hongyu Li and Jiahui Fu from RAI. Watch Episode #63 of RoboPapers with Chris Paxton and Michael Cho - Rbt/Acc now to learn more!

The holy grail of robotics is to be able to perform previously-unseen, out-of-distribution manipulation tasks “zero shot” in a new environment. NovaFlow proposes an approach which (1) generates a video, (2) computes predicted flow — how points move through the scene — and (3) uses this flow as an objective to generate a motion. Using this procedure, NovaFlow generates motions in unseen scenes, for unseen tasks, and can transfer across embodiments. To learn more, we are joined by Hongyu Li and Jiahui Fu from RAI. Watch Episode #63 of RoboPapers with Chris Paxton and Michael Cho - Rbt/Acc now to learn more!

RoboPapers

21,295 次观看 • 5 个月前

Did you know Intel had the perfect shot at winning mobile and totally blew it? Check out my channel for a new episode where we explore: 1) how modern ARM came to be 2) how Intel was poised to win with StrongARM 3) how their x86 Android chip totally failed

Did you know Intel had the perfect shot at winning mobile and totally blew it? Check out my channel for a new episode where we explore: 1) how modern ARM came to be 2) how Intel was poised to win with StrongARM 3) how their x86 Android chip totally failed

Quinn Nelson

21,260 次观看 • 9 个月前

(1/16) Our 3-lab collab (Mirny & Zechner) led by Harvey, Henrik & Jack is out: Q: How do enhancers & promoters interact in space (contact vs. action-at-a-distance) and time (stable vs. transient)? A: Transient E-P contact (~25-42 nm lasting ~10-20 sec):

(1/16) Our 3-lab collab (Mirny & Zechner) led by Harvey, Henrik & Jack is out: Q: How do enhancers & promoters interact in space (contact vs. action-at-a-distance) and time (stable vs. transient)? A: Transient E-P contact (~25-42 nm lasting ~10-20 sec):

Anders Sejr Hansen

16,320 次观看 • 1 个月前

The Iran strikes are being led by the Israeli PM, it’s also totally built on lies. There has been no debate and discussion and seemingly zero plan for what happens next including how deep we get dragged in. Can anyone tell me the last time we bombed democracy into a country?

The Iran strikes are being led by the Israeli PM, it’s also totally built on lies. There has been no debate and discussion and seemingly zero plan for what happens next including how deep we get dragged in. Can anyone tell me the last time we bombed democracy into a country?

Matthew

14,002 次观看 • 4 个月前

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models Contributions: • We introduce 4D LangSplat for open-vocabulary 4D spatial-temporal queries. To the best of our knowledge, we are the first to construct 4D language fields with object textual captions generated by MLLMs. • To model smooth transitions across states for objects in 4D scenes, we propose a status deformable network to capture continuous temporal changes. • Experiential results show that our method attains state-of-the-art performance for both time-agnostic and time-sensitive open-vocabulary queries.

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models Contributions: • We introduce 4D LangSplat for open-vocabulary 4D spatial-temporal queries. To the best of our knowledge, we are the first to construct 4D language fields with object textual captions generated by MLLMs. • To model smooth transitions across states for objects in 4D scenes, we propose a status deformable network to capture continuous temporal changes. • Experiential results show that our method attains state-of-the-art performance for both time-agnostic and time-sensitive open-vocabulary queries.

MrNeRF

10,953 次观看 • 1 年前

How do you give a humanoid the general motion capability? Not just single motions, but all motion? Introducing SONIC, our new work on supersizing motion tracking for natural humanoid control. We argue that motion tracking is the scalable foundation task for humanoids. So we "supersized" it: 9k+ GPU hours and 100M+ motion frames. But tracking alone is not enough; we show how to make a useful control system out of it: - Universal Kinematic Planner: Enables game-like gamepad control and high-level teleoperation, just like controlling a character in a game. - VR Full-Body Teleop: Direct, real-time whole-body control by a human wearing a VR headset. - VR Keypoint Teleop: Control the upper body (hands/head) while our planner handles robust locomotion automatically. - VLA Integration: We connect this motion tracker to autonomous Visual-Language-Action (VLA) models for autonomous task execution! We use a Universal Token Space to UNIFY this command space, turning our robust tracker into a general-purpose, programmable humanoid brain. This is the generalist "System 1" for humanoids. 🚀 Project: #Humanoids #Robotics #AI #FoundationModels #NVIDIAResearch 🧠🔥

How do you give a humanoid the general motion capability? Not just single motions, but all motion? Introducing SONIC, our new work on supersizing motion tracking for natural humanoid control. We argue that motion tracking is the scalable foundation task for humanoids. So we "supersized" it: 9k+ GPU hours and 100M+ motion frames. But tracking alone is not enough; we show how to make a useful control system out of it: - Universal Kinematic Planner: Enables game-like gamepad control and high-level teleoperation, just like controlling a character in a game. - VR Full-Body Teleop: Direct, real-time whole-body control by a human wearing a VR headset. - VR Keypoint Teleop: Control the upper body (hands/head) while our planner handles robust locomotion automatically. - VLA Integration: We connect this motion tracker to autonomous Visual-Language-Action (VLA) models for autonomous task execution! We use a Universal Token Space to UNIFY this command space, turning our robust tracker into a general-purpose, programmable humanoid brain. This is the generalist "System 1" for humanoids. 🚀 Project: #Humanoids #Robotics #AI #FoundationModels #NVIDIAResearch 🧠🔥

Zhengyi “Zen” Luo

62,040 次观看 • 8 个月前

Columbia Space Shuttle (2003) destroyed in outer space. The story we were told is that it burnt up upon reentry from a two week old hole. Someone faked a video of a destroyed space shuttle in space? With perfect physics of zero G in Earth orbit? 'TheFakingHoaxer' took credit for this video, but there is no such person. I'd love to meet 'TheFakingHoaxer' and check out their source work for how they made this. They seem to take credit for a lot of hyper realistic, possible classified leaks.

Columbia Space Shuttle (2003) destroyed in outer space. The story we were told is that it burnt up upon reentry from a two week old hole. Someone faked a video of a destroyed space shuttle in space? With perfect physics of zero G in Earth orbit? 'TheFakingHoaxer' took credit for this video, but there is no such person. I'd love to meet 'TheFakingHoaxer' and check out their source work for how they made this. They seem to take credit for a lot of hyper realistic, possible classified leaks.

Ashton Forbes

298,686 次观看 • 1 年前

NEW SOUND SYSTEM 🔊 We’ve completed a significant upgrade to our systems in Room 2 and Room 3 - powered by Norwegian pro audio specialists, NNNN. This marks the next step in our ongoing commitment to pushing sound at fabric, refining not just how it’s heard, but how it’s felt across the space. Swipe through and check out our website for more information.

NEW SOUND SYSTEM 🔊 We’ve completed a significant upgrade to our systems in Room 2 and Room 3 - powered by Norwegian pro audio specialists, NNNN. This marks the next step in our ongoing commitment to pushing sound at fabric, refining not just how it’s heard, but how it’s felt across the space. Swipe through and check out our website for more information.

fabric

11,287 次观看 • 3 个月前

I almost screamed my lungs out when David Gokhshtein mentioned $RWA by Xend Finance. From trying to get a speaker role on his space for about 4 weeks when I was in New York to finally getting it two days ago in an unplanned manner and now to raiding his livestream and finally getting the $RWA ticker mentioned. The movement is here. We are not stopping. We can’t stop. Thanks David 🙏🏿

I almost screamed my lungs out when David Gokhshtein mentioned $RWA by Xend Finance. From trying to get a speaker role on his space for about 4 weeks when I was in New York to finally getting it two days ago in an unplanned manner and now to raiding his livestream and finally getting the $RWA ticker mentioned. The movement is here. We are not stopping. We can’t stop. Thanks David 🙏🏿

Ugochukwu Aronu 🕊

10,856 次观看 • 11 个月前

Tesla Optimus can arrange batteries in their factories, ours can do skincare (on Yuzhe Qin)! We opensource Bunny-VisionPro, a teleoperation system for bimanual hand manipulation. The users can control the robot hands in real time using VisionPro, flexible like a bunny. 🐇 We also have kitchen tasks, playing Rubik's Cube, and dynamic motion tasks. Imitation learning policies are trained on sweeping with a broom, serving a drink, and wiping glasses. Check our website for more details: The project is led by Runyu Ding Runyu Ding, Yuzhe Qin Yuzhe Qin , and Jiyue Zhu

Tesla Optimus can arrange batteries in their factories, ours can do skincare (on Yuzhe Qin)! We opensource Bunny-VisionPro, a teleoperation system for bimanual hand manipulation. The users can control the robot hands in real time using VisionPro, flexible like a bunny. 🐇 We also have kitchen tasks, playing Rubik's Cube, and dynamic motion tasks. Imitation learning policies are trained on sweeping with a broom, serving a drink, and wiping glasses. Check our website for more details: The project is led by Runyu Ding Runyu Ding, Yuzhe Qin Yuzhe Qin , and Jiyue Zhu

Xiaolong Wang

90,902 次观看 • 2 年前

Highlights brought to you by Rozzi Fireworks Moeller started off their Vegas trip at the All Faiths Classic with a 3-1 over Brophy Prep. Conner Cuozzo led the team with 2 hits, a double and a triple, and 1 RBI. Charlie Valencic went 1-3 with a RBI and Jovan Love had the other RBI, with a solo shot in the top of the 6th. On the mound, Connor_Fuhrer got the start going 5.2 innings, striking out 1 and had a no hitter thru 4. jake came in for the save, striking out 1 in his 1.1 innings. Donovan Glosser ended the game by throwing someone out on a steal.

Highlights brought to you by Rozzi Fireworks Moeller started off their Vegas trip at the All Faiths Classic with a 3-1 over Brophy Prep. Conner Cuozzo led the team with 2 hits, a double and a triple, and 1 RBI. Charlie Valencic went 1-3 with a RBI and Jovan Love had the other RBI, with a solo shot in the top of the 6th. On the mound, Connor_Fuhrer got the start going 5.2 innings, striking out 1 and had a no hitter thru 4. jake came in for the save, striking out 1 in his 1.1 innings. Donovan Glosser ended the game by throwing someone out on a steal.

BigMoeBaseballAnalytics

10,331 次观看 • 1 年前

New Model Release: VEO 3 Google just dropped their most powerful video model yet, VEO 3. Today, it goes live on NFINITY. A small step for technology. A giant leap for storytelling. VEO 3 pushes video generation to new heights: — Cinematic quality — Natural motion and detail like never before. But let’s clear something up. VEO 3 is not a competitor to NFINITY. VEO 3 is a model. NFINITY is the platform. We don’t build the models. We harness the best from across the AI ecosystem: VEO 3, Flow, Kling, and more. We put them in your hands, in one creative space. That’s the difference. The creators build the future. We give them the tools to shape it. Start building.

New Model Release: VEO 3 Google just dropped their most powerful video model yet, VEO 3. Today, it goes live on NFINITY. A small step for technology. A giant leap for storytelling. VEO 3 pushes video generation to new heights: — Cinematic quality — Natural motion and detail like never before. But let’s clear something up. VEO 3 is not a competitor to NFINITY. VEO 3 is a model. NFINITY is the platform. We don’t build the models. We harness the best from across the AI ecosystem: VEO 3, Flow, Kling, and more. We put them in your hands, in one creative space. That’s the difference. The creators build the future. We give them the tools to shape it. Start building.

NFINITY AI

14,534 次观看 • 1 年前

🧵1/3 I partnered with Kling to make a promo for their new 3.0 model. I came up with the concept, created it and delivered it all on my own in 3 days of early access, I wanted to make something that showed how Kling could be used to tell a diverse range of stories in a diverse range of styles. I've honestly been blown away be how incredible the model is and I haven't even scratched the surface of features like the new 3.0 OMNI model. For me Kling 2.6 was already best in class for most things but 3.0 sees massive improvements to video quality and detail (no need to add grain to mask imperfections) and prompt adherence. Also you can time your generations to be as long as 15 seconds and as short as 3, this is fantastic for pacing your scenes. Read on to find out more about the new Performance and Multi shot features!👇

🧵1/3 I partnered with Kling to make a promo for their new 3.0 model. I came up with the concept, created it and delivered it all on my own in 3 days of early access, I wanted to make something that showed how Kling could be used to tell a diverse range of stories in a diverse range of styles. I've honestly been blown away be how incredible the model is and I haven't even scratched the surface of features like the new 3.0 OMNI model. For me Kling 2.6 was already best in class for most things but 3.0 sees massive improvements to video quality and detail (no need to add grain to mask imperfections) and prompt adherence. Also you can time your generations to be as long as 15 seconds and as short as 3, this is fantastic for pacing your scenes. Read on to find out more about the new Performance and Multi shot features!👇

Uncanny Harry AI

28,977 次观看 • 5 个月前

#HAPPYBIRTHDAY #NILERODGERS!! Thanks for all you do ❤️ We put together a Birthday Mega Mix (40min long) showcasing some of your biggest hits across the decades. And how cool is it that you’re just getting started!? To check it out, goto: With love, your NRP + WAFF family ♥️ In the meantime, here's TEASER PART 1 🙌🏽

#HAPPYBIRTHDAY #NILERODGERS!! Thanks for all you do ❤️ We put together a Birthday Mega Mix (40min long) showcasing some of your biggest hits across the decades. And how cool is it that you’re just getting started!? To check it out, goto: With love, your NRP + WAFF family ♥️ In the meantime, here's TEASER PART 1 🙌🏽

Nile Rodgers

76,478 次观看 • 2 年前

What if point particles in Physics are just an illusion? Branes are higher-dimensional objects in String Theory that generalize the idea of a point particle. Instead of everything being just 0-dimensional points or 1-dimensional strings, a Brane can have 2, 3 or more spatial dimensions Open strings can end on Branes, which means Branes act like boundaries for string motion. In many models, what we call particles are simply vibration modes of strings attached to these Branes, and force can be understood as strings stretching between them.

What if point particles in Physics are just an illusion? Branes are higher-dimensional objects in String Theory that generalize the idea of a point particle. Instead of everything being just 0-dimensional points or 1-dimensional strings, a Brane can have 2, 3 or more spatial dimensions Open strings can end on Branes, which means Branes act like boundaries for string motion. In many models, what we call particles are simply vibration modes of strings attached to these Branes, and force can be understood as strings stretching between them.

Mathelirium

21,841 次观看 • 2 个月前