Pablo Vela's banner

Pablo Vela

@pablovelagomez1 • 2,789 subscribers

I like to make computers see. Currently at @rerundotio

Shorts

Sam3 + Body is freaking amazing. I'm in the process of building an open-source Rerun and Gradio demo that is similar to what Meta provided. Got the basic functionality up and running, now I need to hook it up to a Gradio interface. It's a really good model

Sam3 + Body is freaking amazing. I'm in the process of building an open-source Rerun and Gradio demo that is similar to what Meta provided. Got the basic functionality up and running, now I need to hook it up to a Gradio interface. It's a really good model

122,031 次观看

Finished building out the Rerun and Gradio app for SAM3D-Body, and I think it came out really clean! Under the hood, it's using three models 1. sam3d for exemplar segmentation based on the "person" text prompt. 2. sam3d-body for generating the 2d keypoints, 3d keypoints and mesh 3. mogev2 for intrinsic/fov estimation Really happy with how it came out. I'll probably start working on videos and multiview captures next!

Finished building out the Rerun and Gradio app for SAM3D-Body, and I think it came out really clean! Under the hood, it's using three models 1. sam3d for exemplar segmentation based on the "person" text prompt. 2. sam3d-body for generating the 2d keypoints, 3d keypoints and mesh 3. mogev2 for intrinsic/fov estimation Really happy with how it came out. I'll probably start working on videos and multiview captures next!

102,663 次观看

I had to leave CVPR a little early, but I had a wonderful time talking to lots of interesting folks. In particular, I had the chance to test out and talk to Michael Black, Hanz Cuevas V., and Anastasios Yiannakidis about the project. I made a few nice idiomatic Rerun upgrades to it, such as converting images to h264 videos, adding keypoint + segmentation masks, and replacing the sam2 backbone with efficienttam. The method is really, really good compared to everything else I've tried, and I plan on working with it a lot more. With a little engineering effort, I think it could come close to running in real time.

I had to leave CVPR a little early, but I had a wonderful time talking to lots of interesting folks. In particular, I had the chance to test out and talk to Michael Black, Hanz Cuevas V., and Anastasios Yiannakidis about the project. I made a few nice idiomatic Rerun upgrades to it, such as converting images to h264 videos, adding keypoint + segmentation masks, and replacing the sam2 backbone with efficienttam. The method is really, really good compared to everything else I've tried, and I plan on working with it a lot more. With a little engineering effort, I think it could come close to running in real time.

24,805 次观看

Colmap 4.0 was very recently released, so it inspired me to do some work to better understand it and its new capabilities with Rerun. I want to really understand how Colmap, and in particular, pycolmap, works outside of just calling it via the CLI. So my goal is to use the low-level pycolmap API to log every part of the pipeline. The explicit goal is to have an alternative to the SQLite database that I can utilize. Instead of SQLite, I want to try logging everything directly to rerun and use RRD. This means I can have deep inspectability and still save the features/matches/2D view geometry, but be able to view it directly in rerun. I think this is one of the superpowers that rerun provides; data and visualizations are deeply integrated. As I'm often working with sequential data (videos), I'm going to specifically focus on four things: 1. Monocular Video Simple: Calls high-level APIs such as pycolmap.extract_features, pycolmap.match_sequential, pycolmap.incremental_mapping. These are basically identical to the CLI options and provide a good baseline. 2. Monocular Video Streamed: Take the above high-level APIs and break them down to their iterator version, logging each component in a streamed manner. This way, I can stream the intermediate features to rerun while the extraction/matching/mapping is happening. 3. Rig with unknown calibration: <- WHAT THE VIDEO SHOWS This is probably the most interesting version and the first one I've been working on. It allows one to set a rig between known sensors, such as in VR/AR devices, leading to much better reconstructions with multiple cameras. This is the case where we don't know the calibration a priori, so we have to run a reconstruction twice: once as a normal Colmap reconstruction with no rig constraints, use this to generate the constraints, and then do it again with the newly found rig. 4. Rig with known calibration: This is the RoboCap example, where we have a pre-calibrated set of sensors, so we don't need to run the two reconstructions and also gain better matching between cameras, both spatially and temporally. Again, this leads to a much better reconstruction! Along with all this, GLOMAP has become a first-class global mapper, making it super easy to use directly within pycolmap! I'm excited to do more with this and compare it to things like pycuvslam, vipe, and other alternatives.

Colmap 4.0 was very recently released, so it inspired me to do some work to better understand it and its new capabilities with Rerun. I want to really understand how Colmap, and in particular, pycolmap, works outside of just calling it via the CLI. So my goal is to use the low-level pycolmap API to log every part of the pipeline. The explicit goal is to have an alternative to the SQLite database that I can utilize. Instead of SQLite, I want to try logging everything directly to rerun and use RRD. This means I can have deep inspectability and still save the features/matches/2D view geometry, but be able to view it directly in rerun. I think this is one of the superpowers that rerun provides; data and visualizations are deeply integrated. As I'm often working with sequential data (videos), I'm going to specifically focus on four things: 1. Monocular Video Simple: Calls high-level APIs such as pycolmap.extract_features, pycolmap.match_sequential, pycolmap.incremental_mapping. These are basically identical to the CLI options and provide a good baseline. 2. Monocular Video Streamed: Take the above high-level APIs and break them down to their iterator version, logging each component in a streamed manner. This way, I can stream the intermediate features to rerun while the extraction/matching/mapping is happening. 3. Rig with unknown calibration: <- WHAT THE VIDEO SHOWS This is probably the most interesting version and the first one I've been working on. It allows one to set a rig between known sensors, such as in VR/AR devices, leading to much better reconstructions with multiple cameras. This is the case where we don't know the calibration a priori, so we have to run a reconstruction twice: once as a normal Colmap reconstruction with no rig constraints, use this to generate the constraints, and then do it again with the newly found rig. 4. Rig with known calibration: This is the RoboCap example, where we have a pre-calibrated set of sensors, so we don't need to run the two reconstructions and also gain better matching between cameras, both spatially and temporally. Again, this leads to a much better reconstruction! Along with all this, GLOMAP has become a first-class global mapper, making it super easy to use directly within pycolmap! I'm excited to do more with this and compare it to things like pycuvslam, vipe, and other alternatives.

30,070 次观看

I've been working a lot with SAM3 and the Momentum Human Rig (MHR). I finally integrated it into the data I'm working with Rerun. The progression I've taken looks as follows SAM3 + SAM3D-body on 1. a single image 2. a set of multiple images 3. a single video 4. A multiview video capture I took inspiration from the SAM3D-body paper and built a multiview fitting optimization pipeline. This pipeline involves using the 2D keypoints from the single-view pipeline, triangulating them, and employing an L1 loss between the 2D/3D keypoints. The temporal stability isn't great, so that's the next portion I'm going to focus on. One really frustrating thing about SAM3D-body is the lack of per-joint confidence values. It makes it harder to deal with occlusions. I'm probably going to need to use a separate model, or maybe add a confidence head.

I've been working a lot with SAM3 and the Momentum Human Rig (MHR). I finally integrated it into the data I'm working with Rerun. The progression I've taken looks as follows SAM3 + SAM3D-body on 1. a single image 2. a set of multiple images 3. a single video 4. A multiview video capture I took inspiration from the SAM3D-body paper and built a multiview fitting optimization pipeline. This pipeline involves using the 2D keypoints from the single-view pipeline, triangulating them, and employing an L1 loss between the 2D/3D keypoints. The temporal stability isn't great, so that's the next portion I'm going to focus on. One really frustrating thing about SAM3D-body is the lack of per-joint confidence values. It makes it harder to deal with occlusions. I'm probably going to need to use a separate model, or maybe add a confidence head.

42,267 次观看

Recently, I've been working a lot on Gaussian Splatting. I built a repo for camera pose estimation using Rerun, Gradio, and prefix.dev that merges GLOMAP with hloc! 🧵More info below

Recently, I've been working a lot on Gaussian Splatting. I built a repo for camera pose estimation using Rerun, Gradio, and prefix.dev that merges GLOMAP with hloc! 🧵More info below

67,547 次观看

Some updates on the multiview vistadream pipeline with Rerun! Rerun came in extremely useful here, as being able to visualize depths at each stage of the pipeline allowed me to debug some nasty bugs. Since the last time, I was only working with a single image input. I've added in VGGT as my multiview pose + depth estimator. It works REALLY well for getting camera poses, but the depths are not that great. To try and fix that, I estimated depth maps from MoGeV2 for each of the views, and scale+shift aligned them so that they would match up to the confident sections of VGGT's depth predictions. You can see in the video just how much sharper the visualized 2d depth maps are! The biggest issue continues to be the multiview consistency 🫠 That's up next, along with actually training the Gaussian splat. Lots of work went into actually understanding inputs+outputs for VGGT. I had some funky bugs where the confidence values would all collapse to true I'm also really excited for this pipeline to use Difix3D+ Nvidia instead of Flux Inpainting, it seems like a better suited for a multiview pipeline.

Some updates on the multiview vistadream pipeline with Rerun! Rerun came in extremely useful here, as being able to visualize depths at each stage of the pipeline allowed me to debug some nasty bugs. Since the last time, I was only working with a single image input. I've added in VGGT as my multiview pose + depth estimator. It works REALLY well for getting camera poses, but the depths are not that great. To try and fix that, I estimated depth maps from MoGeV2 for each of the views, and scale+shift aligned them so that they would match up to the confident sections of VGGT's depth predictions. You can see in the video just how much sharper the visualized 2d depth maps are! The biggest issue continues to be the multiview consistency 🫠 That's up next, along with actually training the Gaussian splat. Lots of work went into actually understanding inputs+outputs for VGGT. I had some funky bugs where the confidence values would all collapse to true I'm also really excited for this pipeline to use Difix3D+ Nvidia instead of Flux Inpainting, it seems like a better suited for a multiview pipeline.

29,849 次观看

🚀 Introducing EgoExo Forge - built on top of Rerun, Gradio, and Hugging Face hub (I’ll be in San Francisco July 21–29 — if you’re into robotics, egocentric AI, large-scale data collection, or just want to chat, DM me!) In my opinion, large-scale, diverse, and high-quality data is still the largest bottleneck for generalized robotics deployment. I believe that some version of imitation learning from human examples will be the most scalable + clean way to train humanoid robots 🤖 (similar to what Tesla did for Full Self Driving). Teleop is too expensive to collect a large enough dataset in a reasonable manner, so passive collection via egocentric (and in certain cases, exocentric) views feels like the right bet. Over the past few months, I've been trying to build out the scaffolding for this and using Rerun as my underlying infrastructure. Data being collected needs to be easily inspectable + time series and rerun provides the right tooling for this. My goal is to first build out a ground truth representative dataset from already existing open source data, generate some reasonable baselines, and then go out and collect my own data that adheres to the defined schema. 🔍 Starting with open-source datasets 1. EgoDex from Apple 2. HOCap from Nvidia and the University of Texas at Dallas 3. Assembly101 from Meta All these different datasets have different sensor configurations + annotations, so my goal with egoexo-forge is to have one consistent labeling scheme + data layout. I built a data pipeline that aligns all of the different datasets in one general schema assuming the COCO133 keypoint layout that allows for exo+ego, ego only, or exo only Since the scaffolding is already there, it becomes MUCH easier to add other datasets. So the next ones that I'll be including are HD-EPIC kitchens dataset, HOT3D, and finally my own personal iPhone + insta360 go collection method. Once I have a diverse variety of datasets, I'll double down on what I believe to be the key algorithms required to make useful data for imitation learning 📊 1. Camera Pose estimation via SLAM/SFM for ego perspective (and automatic calibration for exo) 2. Human pose estimation for both egocentric + exocentric views 3. Metric 3D reconstruction + object tracking I'll be setting up reasonable open-source baselines for each of these to validate that these datasets work, and then finally try to use the generated datasets for some imitation learning via the pi0-lerobot repo I've been working on. I plan on making a blog post + providing more info on all of this in the near future so stay tuned

🚀 Introducing EgoExo Forge - built on top of Rerun, Gradio, and Hugging Face hub (I’ll be in San Francisco July 21–29 — if you’re into robotics, egocentric AI, large-scale data collection, or just want to chat, DM me!) In my opinion, large-scale, diverse, and high-quality data is still the largest bottleneck for generalized robotics deployment. I believe that some version of imitation learning from human examples will be the most scalable + clean way to train humanoid robots 🤖 (similar to what Tesla did for Full Self Driving). Teleop is too expensive to collect a large enough dataset in a reasonable manner, so passive collection via egocentric (and in certain cases, exocentric) views feels like the right bet. Over the past few months, I've been trying to build out the scaffolding for this and using Rerun as my underlying infrastructure. Data being collected needs to be easily inspectable + time series and rerun provides the right tooling for this. My goal is to first build out a ground truth representative dataset from already existing open source data, generate some reasonable baselines, and then go out and collect my own data that adheres to the defined schema. 🔍 Starting with open-source datasets 1. EgoDex from Apple 2. HOCap from Nvidia and the University of Texas at Dallas 3. Assembly101 from Meta All these different datasets have different sensor configurations + annotations, so my goal with egoexo-forge is to have one consistent labeling scheme + data layout. I built a data pipeline that aligns all of the different datasets in one general schema assuming the COCO133 keypoint layout that allows for exo+ego, ego only, or exo only Since the scaffolding is already there, it becomes MUCH easier to add other datasets. So the next ones that I'll be including are HD-EPIC kitchens dataset, HOT3D, and finally my own personal iPhone + insta360 go collection method. Once I have a diverse variety of datasets, I'll double down on what I believe to be the key algorithms required to make useful data for imitation learning 📊 1. Camera Pose estimation via SLAM/SFM for ego perspective (and automatic calibration for exo) 2. Human pose estimation for both egocentric + exocentric views 3. Metric 3D reconstruction + object tracking I'll be setting up reasonable open-source baselines for each of these to validate that these datasets work, and then finally try to use the generated datasets for some imitation learning via the pi0-lerobot repo I've been working on. I plan on making a blog post + providing more info on all of this in the near future so stay tuned

32,085 次观看

A preview of what's next, visualized with Rerun and PlayCanvas supersplat ✨ (Also, feel free to send me a DM 📩; I’ll be in San Francisco from July 21–29, and I'm looking to meet like-minded folks!) I'm convinced that Gaussian Splats will be an integral part of any data engine as an underlying representation. So I've started putting together a repo that: 1. Given a single image, perform image outpainting 🖼️🖌️ 2. Estimate a monocular depth map on the outpainted image 📏 3. Train a Gaussian Splat initialized from the monocular depth 🎓✨ 4. Warp to new views, perform inpainting on the missing masks -> Train new splat 🔄🎨 This is going to be integrated into exo-egoforge, but I wanted to start with the simple single-image version before moving to a multi-video implementation There's some weirdness in the final rerun visualization, but the trained splat looks great 🎉! This is all based on the very cool VistaDream paper ( .github.io/) More on this next week!

A preview of what's next, visualized with Rerun and PlayCanvas supersplat ✨ (Also, feel free to send me a DM 📩; I’ll be in San Francisco from July 21–29, and I'm looking to meet like-minded folks!) I'm convinced that Gaussian Splats will be an integral part of any data engine as an underlying representation. So I've started putting together a repo that: 1. Given a single image, perform image outpainting 🖼️🖌️ 2. Estimate a monocular depth map on the outpainted image 📏 3. Train a Gaussian Splat initialized from the monocular depth 🎓✨ 4. Warp to new views, perform inpainting on the missing masks -> Train new splat 🔄🎨 This is going to be integrated into exo-egoforge, but I wanted to start with the simple single-image version before moving to a multi-video implementation There's some weirdness in the final rerun visualization, but the trained splat looks great 🎉! This is all based on the very cool VistaDream paper ( .github.io/) More on this next week!

26,036 次观看

Streaming iPhone data in real-time directly to Rerun 🚀 The collection process is one of the most frustrating parts of building imitation-learning datasets. I’ve got a little army of sensors—📱 iPhone, iPad, Quest 3—but getting them temporally aligned, spatially aligned, AND seeing real-time feedback while recording is tough. I stumbled on a great library from Cake Lab (WPI) called ARFlow. It’s a thin client built on Unity’s ARFoundation that connects over gRPC to a server running Rerun for live data logging. I forked it to: - Log the SLAM translation poses, and - Upgrade rerun to v0.23 for my use case. So far, it works well, but there are still a few hitches: 1. Right now, it’s solid on iPhone and iPad; my Quest 3 client is still slow and not super reliable. 2. I’m using an older ARFlow branch focused on real-time streaming only—no spatial or temporal sync yet. Unity builds for iOS keep failing. 🛠️ 3. Nothing is saved locally to the client, so packet loss is a risk on shaky networks. There’s huge potential in tapping the ubiquitous sensors we carry around every day, and ARFlow is a big step toward making that easy

Streaming iPhone data in real-time directly to Rerun 🚀 The collection process is one of the most frustrating parts of building imitation-learning datasets. I’ve got a little army of sensors—📱 iPhone, iPad, Quest 3—but getting them temporally aligned, spatially aligned, AND seeing real-time feedback while recording is tough. I stumbled on a great library from Cake Lab (WPI) called ARFlow. It’s a thin client built on Unity’s ARFoundation that connects over gRPC to a server running Rerun for live data logging. I forked it to: - Log the SLAM translation poses, and - Upgrade rerun to v0.23 for my use case. So far, it works well, but there are still a few hitches: 1. Right now, it’s solid on iPhone and iPad; my Quest 3 client is still slow and not super reliable. 2. I’m using an older ARFlow branch focused on real-time streaming only—no spatial or temporal sync yet. Unity builds for iOS keep failing. 🛠️ 3. Nothing is saved locally to the client, so packet loss is a risk on shaky networks. There’s huge potential in tapping the ubiquitous sensors we carry around every day, and ARFlow is a big step toward making that easy

27,383 次观看

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Recently, I've been playing with my iPhone ToF sensor, but the problem has always been the abysmal resolution (256x192). The team behind DepthAnything released PromptDepthAnything that fixes this. Using polycam to collect the raw data, Gradio to generate a UI, and Rerun to visualize. Links at the end of the thread

Recently, I've been playing with my iPhone ToF sensor, but the problem has always been the abysmal resolution (256x192). The team behind DepthAnything released PromptDepthAnything that fixes this. Using polycam to collect the raw data, Gradio to generate a UI, and Rerun to visualize. Links at the end of the thread

244,728 次观看 • 1 年前

There's been a few cool updates recently. In particular, Rerun 0.33 released headless rendering. This, along with the Fable 5 release pushed me to work torwards making MAMMA realtime! I threw Fable at the problem, and it was able to take original implementation that was ~12 seconds / frame and get it all the way down to 40ms /frame, or nearly a 300x speedup 🏎️ How did I achieve this? TLDR: - Use rerun's headless rendering as supervision when optimizing - Save rrd file as test fixture to guide model optiziation with /goal - create an html artifact with headless rendering to provide detailed breakdown of what it did and how it actually looks like in the viewer There were a few critical bits to make sure that this ACTUALLY worked and that Fable didn't just cheat or delete something and declare victory. The first is that the original version used Rerun, this allowed us to save things to disk as an RRD file, meaning we could query the contents and use this as a sort of test fixture or golden artifact that held EXACTLY what all of the values should be. Then we can use this with /goal as a metric when doing the optimization to ensure there are no regressions. The second bit is the headless rendering, this gave us the ability to check that not only did the test fixture pass, but it also looked visually correct. This made a huge difference, and an awesome side affect of it is that we can use the headless rendering to create an implementations.html file. This gives a visual guide as to what the agent did (I walk through it in the video below) Along with this, we're working on an MCP server for rerun that allows full interactivity with the rerun viewer for your agent. So for example the agent can click, drag, move views, scroll timelines, ect. I used this to help the agent debug certain parts such as when the 2d sam masks didn't line up, or if the triangulated keypoints werent correctly matching with the optimized mesh. The agents could go, click into the view, scroll through the timeline and see where things went wrong. Fable + Headless Rendering + Rerun MCP == 300x speedup in less then a days work With these new tools, I'm planning on going back to my gaussian splatting implemntation and cleaning it up + making it fast!

There's been a few cool updates recently. In particular, Rerun 0.33 released headless rendering. This, along with the Fable 5 release pushed me to work torwards making MAMMA realtime! I threw Fable at the problem, and it was able to take original implementation that was ~12 seconds / frame and get it all the way down to 40ms /frame, or nearly a 300x speedup 🏎️ How did I achieve this? TLDR: - Use rerun's headless rendering as supervision when optimizing - Save rrd file as test fixture to guide model optiziation with /goal - create an html artifact with headless rendering to provide detailed breakdown of what it did and how it actually looks like in the viewer There were a few critical bits to make sure that this ACTUALLY worked and that Fable didn't just cheat or delete something and declare victory. The first is that the original version used Rerun, this allowed us to save things to disk as an RRD file, meaning we could query the contents and use this as a sort of test fixture or golden artifact that held EXACTLY what all of the values should be. Then we can use this with /goal as a metric when doing the optimization to ensure there are no regressions. The second bit is the headless rendering, this gave us the ability to check that not only did the test fixture pass, but it also looked visually correct. This made a huge difference, and an awesome side affect of it is that we can use the headless rendering to create an implementations.html file. This gives a visual guide as to what the agent did (I walk through it in the video below) Along with this, we're working on an MCP server for rerun that allows full interactivity with the rerun viewer for your agent. So for example the agent can click, drag, move views, scroll timelines, ect. I used this to help the agent debug certain parts such as when the 2d sam masks didn't line up, or if the triangulated keypoints werent correctly matching with the optimized mesh. The agents could go, click into the view, scroll through the timeline and see where things went wrong. Fable + Headless Rendering + Rerun MCP == 300x speedup in less then a days work With these new tools, I'm planning on going back to my gaussian splatting implemntation and cleaning it up + making it fast!

22,771 次观看 • 1 个月前

I've been on a SLAM/SFM kick. It's one of the more underexplored and lacking areas when it comes to human teleop/data collections, so I've brought over Deep Patch Visual Odometry/SLAM to Rerun and Gradio. With this example, we now have 1. pycuvslam 2. pycolmap/glomap 3. mast3r-slam 4. dpvo/slam all integrated into rerun. The question becomes, which method should be used in what situations? They all make different trade-offs with different camera requirements and throughput/accuracy. What about when a new method comes out? Now that I have several different methods, I plan to use VSLAM-LAB for evaluation. It uses prefix.dev to isolate all the dependencies of each of these methods and easily compare them against each other. In particular, I'll be converting the data preprocessing, algorithm outputs, and evaluation into rerun recordings (rrd files). This will allow both programmatic querying of anything stored in the files (which method had the highest ATE-to-FPS ratio? Which dataset/sequence caused the most difficulty? etc. etc.), all with easy visual inspection using the rerun server to link them all together. Another really important side effect of this is how it impacts agents. As Karpathy said ``` LLMs are exceptionally good at looping until they meet specific goals, and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria, and watch it go. ``` by having accuracy and throughput metrics deeply tied with human inspectable artifacts. One can really accelerate agentic development with an actual understanding of how the method/data performs. I think this is another killer use case that I'll be really leaning into to make ingestion of new datasets/methods trivial with an agent. I'm making it my mission for folks to understand that rerun as a visualization tool only scratches the surface of what its true benefit is. Deep integration between data and visuals, with powerful query capabilities. I'll be focusing on the SLAM use case first and then bringing this into the full egocentric/exocentric data collection domain!

I've been on a SLAM/SFM kick. It's one of the more underexplored and lacking areas when it comes to human teleop/data collections, so I've brought over Deep Patch Visual Odometry/SLAM to Rerun and Gradio. With this example, we now have 1. pycuvslam 2. pycolmap/glomap 3. mast3r-slam 4. dpvo/slam all integrated into rerun. The question becomes, which method should be used in what situations? They all make different trade-offs with different camera requirements and throughput/accuracy. What about when a new method comes out? Now that I have several different methods, I plan to use VSLAM-LAB for evaluation. It uses prefix.dev to isolate all the dependencies of each of these methods and easily compare them against each other. In particular, I'll be converting the data preprocessing, algorithm outputs, and evaluation into rerun recordings (rrd files). This will allow both programmatic querying of anything stored in the files (which method had the highest ATE-to-FPS ratio? Which dataset/sequence caused the most difficulty? etc. etc.), all with easy visual inspection using the rerun server to link them all together. Another really important side effect of this is how it impacts agents. As Karpathy said ``` LLMs are exceptionally good at looping until they meet specific goals, and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria, and watch it go. ``` by having accuracy and throughput metrics deeply tied with human inspectable artifacts. One can really accelerate agentic development with an actual understanding of how the method/data performs. I think this is another killer use case that I'll be really leaning into to make ingestion of new datasets/methods trivial with an agent. I'm making it my mission for folks to understand that rerun as a visualization tool only scratches the surface of what its true benefit is. Deep integration between data and visuals, with powerful query capabilities. I'll be focusing on the SLAM use case first and then bringing this into the full egocentric/exocentric data collection domain!

40,864 次观看 • 3 个月前

I've migrated the old Mast3r-SLAM example I had made last year to the latest version of Rerun and made a bunch of improvements! I wanted to spend some time with agents to modernize it. Here's an example of me walking around with my iPhone and getting a dense reconstruction at about 10FPS on a 5090. Heres the following improvements I made. Brought it into the monorepo with proper packaging: • Using prefix.dev pixi-build to get rid of all the mast3r/asmk/lietorch vendored code with just a few small patches. This let me remove so 60k lines of code from the repo! • Don't have to build the lietorch code on my machine anymore, which was taking ~10 minutes to compile (and also made it work on blackwell when it previously did not) Rebuilt the Gradio interface: • Fixed incremental updates, .MOV uploads, and stop behavior • Made the CLI + Gradio interface share the same entry point so updates automatically propagate Upgraded the Rerun integration: • Switched to a multiprocessing async logging strategy • Added video/pointmap/confidence logging • Improved blueprint layout and hid noisy entities from 3D view • Biggest perf win was the async background logger - documented about a ~2.5x speedup from decoupling logging from tracking The newest and most interesting part was my attempt to replace the CUDA kernels for Gauss-Newton ray matching with a Modular Mojo backend. As a Python dev, every time I look at CUDA code I basically shy away as it's pretty difficult for me to understand. Mojo let me rewrite the matching logic in a syntax I'm more comfortable with while still getting near-CUDA performance. Mojo is now the default matching backend with CUDA fallback. One major piece that's missing is the custom PyTorch op path, but I'll eventually do that as well. I heavily leaned on Claude Code to do the CUDA → Mojo migration, and I have no doubt it's not the cleanest or most idiomatic, BUT it's way more readable for me and helps me better understand the underlying algorithm. This was a ton of work, and a large part of why I'm doing it is how the monorepo compounds. This becomes an artifact for the next example I want to build with Claude that I can point to, which will make it even faster to implement. The compounding nature of this is really interesting and part of why I'm spending so much time trying to make things nice and readable.

I've migrated the old Mast3r-SLAM example I had made last year to the latest version of Rerun and made a bunch of improvements! I wanted to spend some time with agents to modernize it. Here's an example of me walking around with my iPhone and getting a dense reconstruction at about 10FPS on a 5090. Heres the following improvements I made. Brought it into the monorepo with proper packaging: • Using prefix.dev pixi-build to get rid of all the mast3r/asmk/lietorch vendored code with just a few small patches. This let me remove so 60k lines of code from the repo! • Don't have to build the lietorch code on my machine anymore, which was taking ~10 minutes to compile (and also made it work on blackwell when it previously did not) Rebuilt the Gradio interface: • Fixed incremental updates, .MOV uploads, and stop behavior • Made the CLI + Gradio interface share the same entry point so updates automatically propagate Upgraded the Rerun integration: • Switched to a multiprocessing async logging strategy • Added video/pointmap/confidence logging • Improved blueprint layout and hid noisy entities from 3D view • Biggest perf win was the async background logger - documented about a ~2.5x speedup from decoupling logging from tracking The newest and most interesting part was my attempt to replace the CUDA kernels for Gauss-Newton ray matching with a Modular Mojo backend. As a Python dev, every time I look at CUDA code I basically shy away as it's pretty difficult for me to understand. Mojo let me rewrite the matching logic in a syntax I'm more comfortable with while still getting near-CUDA performance. Mojo is now the default matching backend with CUDA fallback. One major piece that's missing is the custom PyTorch op path, but I'll eventually do that as well. I heavily leaned on Claude Code to do the CUDA → Mojo migration, and I have no doubt it's not the cleanest or most idiomatic, BUT it's way more readable for me and helps me better understand the underlying algorithm. This was a ton of work, and a large part of why I'm doing it is how the monorepo compounds. This becomes an artifact for the next example I want to build with Claude that I can point to, which will make it even faster to implement. The compounding nature of this is really interesting and part of why I'm spending so much time trying to make things nice and readable.

42,143 次观看 • 3 个月前

Most people think Rerun is a visualization tool. In reality, it's a database masquerading as a visualizer. I wanted to showcase this functionality by building a full data pipeline consisting of: ingestion → baseline method → eval → finetuning for SLAM on egocentric data. I'll eventually extend this to the rest of my ego/exo datasets, but I wanted to start with a smaller bunch of datasets first. Rerun allows you to expose your saved .rrd files to a catalog where you store datasets. You can query, filter, and join them like any database using DataFusion under the hood. These are the same .rrd files that are automatically generated whenever you visualize anything in Rerun and decide to save it to disk. I brought in 109 VSLAM-LAB sequences across 14 datasets into the Rerun catalog as an example. These include 7Scenes, Euroc, eth3d, and others. Now I can query them with segment_table, filter_segments, and filter_contents instead of parsing CSVs and YAML files. With a strong set of ground-truth datasets for SLAM, baseline additions become nearly automatic with agents like Opus/Codex. This unification of data and visualization is imo the largest missing part for Physical AI. Visualization becomes a natural byproduct of having your data properly structured and queryable. The catalog API is what makes it a database, not just a viewer. I initially focused on VSLAM-LAB data, but I'll migrate all the egoexo data to this format in the coming days to really show just how useful this is.

Most people think Rerun is a visualization tool. In reality, it's a database masquerading as a visualizer. I wanted to showcase this functionality by building a full data pipeline consisting of: ingestion → baseline method → eval → finetuning for SLAM on egocentric data. I'll eventually extend this to the rest of my ego/exo datasets, but I wanted to start with a smaller bunch of datasets first. Rerun allows you to expose your saved .rrd files to a catalog where you store datasets. You can query, filter, and join them like any database using DataFusion under the hood. These are the same .rrd files that are automatically generated whenever you visualize anything in Rerun and decide to save it to disk. I brought in 109 VSLAM-LAB sequences across 14 datasets into the Rerun catalog as an example. These include 7Scenes, Euroc, eth3d, and others. Now I can query them with segment_table, filter_segments, and filter_contents instead of parsing CSVs and YAML files. With a strong set of ground-truth datasets for SLAM, baseline additions become nearly automatic with agents like Opus/Codex. This unification of data and visualization is imo the largest missing part for Physical AI. Visualization becomes a natural byproduct of having your data properly structured and queryable. The catalog API is what makes it a database, not just a viewer. I initially focused on VSLAM-LAB data, but I'll migrate all the egoexo data to this format in the coming days to really show just how useful this is.

34,937 次观看 • 2 个月前

We have SLAM on the Robocap! 🎉 Visualized with Rerun Using NVIDIA AI Developer cuVSLAM for GPU-accelerated multicamera tracking. I basically wrote zero code myself and fully used Claude Code for this. It worked because I had so many existing examples to point to that it just wrote everything the way I would have. A few technical wins: 1. Used rattler-build from prefix.dev to package the compiled cuVSLAM CUDA binaries, which made it SUPER easy to use across repos. This also means it works on the DGX Spark (ARM64) out of the box. 2. Zero-setup experience: git clone && pixi run track-robocap auto-downloads a 100MB dataset from HuggingFace and tracks frames. 3. Real-time 3D visualization with trajectories, landmarks, pose graphs, and video playback in Rerun. Still visual-only (not visual-inertial yet), and loop closure needs some debugging. Next steps are getting this into a Gradio interface, then into daggr, and extending it to work with other datasets from exoego-forge. The last piece I'm excited about: Rerun's RRD files now support layers for incremental data. Planning to build pipelines that go from raw sensor data → slam -> human pose → depth estimation → etc. Repo here:

We have SLAM on the Robocap! 🎉 Visualized with Rerun Using NVIDIA AI Developer cuVSLAM for GPU-accelerated multicamera tracking. I basically wrote zero code myself and fully used Claude Code for this. It worked because I had so many existing examples to point to that it just wrote everything the way I would have. A few technical wins: 1. Used rattler-build from prefix.dev to package the compiled cuVSLAM CUDA binaries, which made it SUPER easy to use across repos. This also means it works on the DGX Spark (ARM64) out of the box. 2. Zero-setup experience: git clone && pixi run track-robocap auto-downloads a 100MB dataset from HuggingFace and tracks frames. 3. Real-time 3D visualization with trajectories, landmarks, pose graphs, and video playback in Rerun. Still visual-only (not visual-inertial yet), and loop closure needs some debugging. Next steps are getting this into a Gradio interface, then into daggr, and extending it to work with other datasets from exoego-forge. The last piece I'm excited about: Rerun's RRD files now support layers for incremental data. Planning to build pipelines that go from raw sensor data → slam -> human pose → depth estimation → etc. Repo here:

50,805 次观看 • 4 个月前

We have HOT3D! I've started using Claude to port more datasets into Rerun and exoego-forge. I'd been meaning to bring in the HOT3D dataset from Meta for a while, but with Claude, it's way easier. My goal is to take any egocentric, exocentric, or both datasets and ingest them into a standardized schema. Getting everything into Rerun means we can easily query and transform data via the in-memory OSS server. This lets us generate SQL-like queries such as: "Find me all frames that only contain left hands in the leftmost camera view." Most people think of Rerun as a viewer, but this is the actual superpower. So far we have: 1. HOT3D 2. Hocap 3. UmeTrack 4. Assembly101 5. EgoDex Planning to add more, and with every addition, it gets easier as we build up agent skills and better code examples. Hoping to make it almost fully automatic for adding new datasets. The next few I'm looking at are Harmony4D and Aria Pilot Gen2 After we have enough samples, I'll work on bringing in all the different algorithms I've worked on to transform the data 🙂

We have HOT3D! I've started using Claude to port more datasets into Rerun and exoego-forge. I'd been meaning to bring in the HOT3D dataset from Meta for a while, but with Claude, it's way easier. My goal is to take any egocentric, exocentric, or both datasets and ingest them into a standardized schema. Getting everything into Rerun means we can easily query and transform data via the in-memory OSS server. This lets us generate SQL-like queries such as: "Find me all frames that only contain left hands in the leftmost camera view." Most people think of Rerun as a viewer, but this is the actual superpower. So far we have: 1. HOT3D 2. Hocap 3. UmeTrack 4. Assembly101 5. EgoDex Planning to add more, and with every addition, it gets easier as we build up agent skills and better code examples. Hoping to make it almost fully automatic for adding new datasets. The next few I'm looking at are Harmony4D and Aria Pilot Gen2 After we have enough samples, I'll work on bringing in all the different algorithms I've worked on to transform the data 🙂

35,662 次观看 • 3 个月前

Colmap 4.0 was very recently released, so it inspired me to do some work to better understand it and its new capabilities with Rerun. I want to really understand how Colmap, and in particular, pycolmap, works outside of just calling it via the CLI. So my goal is to use the low-level pycolmap API to log every part of the pipeline. The explicit goal is to have an alternative to the SQLite database that I can utilize. Instead of SQLite, I want to try logging everything directly to rerun and use RRD. This means I can have deep inspectability and still save the features/matches/2D view geometry, but be able to view it directly in rerun. I think this is one of the superpowers that rerun provides; data and visualizations are deeply integrated. As I'm often working with sequential data (videos), I'm going to specifically focus on four things: 1. Monocular Video Simple: Calls high-level APIs such as pycolmap.extract_features, pycolmap.match_sequential, pycolmap.incremental_mapping. These are basically identical to the CLI options and provide a good baseline. 2. Monocular Video Streamed: Take the above high-level APIs and break them down to their iterator version, logging each component in a streamed manner. This way, I can stream the intermediate features to rerun while the extraction/matching/mapping is happening. 3. Rig with unknown calibration: <- WHAT THE VIDEO SHOWS This is probably the most interesting version and the first one I've been working on. It allows one to set a rig between known sensors, such as in VR/AR devices, leading to much better reconstructions with multiple cameras. This is the case where we don't know the calibration a priori, so we have to run a reconstruction twice: once as a normal Colmap reconstruction with no rig constraints, use this to generate the constraints, and then do it again with the newly found rig. 4. Rig with known calibration: This is the RoboCap example, where we have a pre-calibrated set of sensors, so we don't need to run the two reconstructions and also gain better matching between cameras, both spatially and temporally. Again, this leads to a much better reconstruction! Along with all this, GLOMAP has become a first-class global mapper, making it super easy to use directly within pycolmap! I'm excited to do more with this and compare it to things like pycuvslam, vipe, and other alternatives.

Colmap 4.0 was very recently released, so it inspired me to do some work to better understand it and its new capabilities with Rerun. I want to really understand how Colmap, and in particular, pycolmap, works outside of just calling it via the CLI. So my goal is to use the low-level pycolmap API to log every part of the pipeline. The explicit goal is to have an alternative to the SQLite database that I can utilize. Instead of SQLite, I want to try logging everything directly to rerun and use RRD. This means I can have deep inspectability and still save the features/matches/2D view geometry, but be able to view it directly in rerun. I think this is one of the superpowers that rerun provides; data and visualizations are deeply integrated. As I'm often working with sequential data (videos), I'm going to specifically focus on four things: 1. Monocular Video Simple: Calls high-level APIs such as pycolmap.extract_features, pycolmap.match_sequential, pycolmap.incremental_mapping. These are basically identical to the CLI options and provide a good baseline. 2. Monocular Video Streamed: Take the above high-level APIs and break them down to their iterator version, logging each component in a streamed manner. This way, I can stream the intermediate features to rerun while the extraction/matching/mapping is happening. 3. Rig with unknown calibration: <- WHAT THE VIDEO SHOWS This is probably the most interesting version and the first one I've been working on. It allows one to set a rig between known sensors, such as in VR/AR devices, leading to much better reconstructions with multiple cameras. This is the case where we don't know the calibration a priori, so we have to run a reconstruction twice: once as a normal Colmap reconstruction with no rig constraints, use this to generate the constraints, and then do it again with the newly found rig. 4. Rig with known calibration: This is the RoboCap example, where we have a pre-calibrated set of sensors, so we don't need to run the two reconstructions and also gain better matching between cameras, both spatially and temporally. Again, this leads to a much better reconstruction! Along with all this, GLOMAP has become a first-class global mapper, making it super easy to use directly within pycolmap! I'm excited to do more with this and compare it to things like pycuvslam, vipe, and other alternatives.

30,070 次观看 • 3 个月前

Spent last week in Stockholm meeting the Rerun team in person for the first time! While I was there, I got to work on something I've wanted for a while: face culling for 3D reconstructions. Still in main (not deployed yet), but here's a sneak peek of scanning the library we hung out in. Being able to see inside meshes while reconstructing makes it way easier to verify coverage and quality.

Spent last week in Stockholm meeting the Rerun team in person for the first time! While I was there, I got to work on something I've wanted for a while: face culling for 3D reconstructions. Still in main (not deployed yet), but here's a sneak peek of scanning the library we hung out in. Being able to see inside meshes while reconstructing makes it way easier to verify coverage and quality.

19,890 次观看 • 4 个月前

0.32 has shipped, and it's a massive release from Rerun. There's a ton of cool new features, and I wanted to highlight 2 in particular 1. OSS Server streaming from disk 2. Dataset review I walk you through them in the video, so take a look. I'll have a much longer blog post next week about the entire pipeline. With 0.32, much of the foundation is set for a unified data layer for physical data, and I'll be getting into the details of it with all that I've built over the past year. This will cover 1. Raw Data Collection 2. Data Ingestion 3. Catalog Registration 4. Query and Review 5. Post Process 6. Training so lots to share

0.32 has shipped, and it's a massive release from Rerun. There's a ton of cool new features, and I wanted to highlight 2 in particular 1. OSS Server streaming from disk 2. Dataset review I walk you through them in the video, so take a look. I'll have a much longer blog post next week about the entire pipeline. With 0.32, much of the foundation is set for a unified data layer for physical data, and I'll be getting into the details of it with all that I've built over the past year. This will cover 1. Raw Data Collection 2. Data Ingestion 3. Catalog Registration 4. Query and Review 5. Post Process 6. Training so lots to share

11,264 次观看 • 2 个月前

MVP of Multiview Video → Camera parameters + 3D keypoints. Visualized with Rerun The basic pipeline as of right now looks like this: 1. Capture 🔴 – Using 4 iPhones and an Insta360 Go. iPhone videos are captured via Final Cut Pro Multicam for easy sync and the exocentric view; the Insta360 Go is used for the egocentric view. 2. Sync 🕒 – Custom Gradio app using two Rerun viewers and callbacks for easily aligning frame timestamps so the ego and exo views are aligned. 3. Calibrate 🎯 – Use VGGT from Jianyuan and AI at Meta to get intrinsics/extrinsics for sparse cameras. 4. Estimate 3D 🕺 – Use RTMLib whole‑body keypoint estimator on each frame, then triangulate in 3D. What's missing? 1. No temporal coherence: I’m estimating keypoints one frame at a time and one camera at a time. This leads to a lot of jittering. For now, I plan on adding a One Euro Filter to help with jittering. Long term, I'd want to train a multiview keypoint estimator 2. Kinematic fitting is still missing; this is my next goal. The output will be joint angles, as explored in my previous posts. 3. Missing dense point cloud: VGGT seems to fail for me here. I’m looking to explore using MP‑SFM as a method for generating dense multiview depth maps + normals (plus it has a friendlier license compared to VGGT). 4. Eventually, creation of 4D Gaussian splatting using something akin to DN‑splatter—my long‑term goal is a data engine that provides poses/depths/splats/keypoints/etc.

MVP of Multiview Video → Camera parameters + 3D keypoints. Visualized with Rerun The basic pipeline as of right now looks like this: 1. Capture 🔴 – Using 4 iPhones and an Insta360 Go. iPhone videos are captured via Final Cut Pro Multicam for easy sync and the exocentric view; the Insta360 Go is used for the egocentric view. 2. Sync 🕒 – Custom Gradio app using two Rerun viewers and callbacks for easily aligning frame timestamps so the ego and exo views are aligned. 3. Calibrate 🎯 – Use VGGT from Jianyuan and AI at Meta to get intrinsics/extrinsics for sparse cameras. 4. Estimate 3D 🕺 – Use RTMLib whole‑body keypoint estimator on each frame, then triangulate in 3D. What's missing? 1. No temporal coherence: I’m estimating keypoints one frame at a time and one camera at a time. This leads to a lot of jittering. For now, I plan on adding a One Euro Filter to help with jittering. Long term, I'd want to train a multiview keypoint estimator 2. Kinematic fitting is still missing; this is my next goal. The output will be joint angles, as explored in my previous posts. 3. Missing dense point cloud: VGGT seems to fail for me here. I’m looking to explore using MP‑SFM as a method for generating dense multiview depth maps + normals (plus it has a friendlier license compared to VGGT). 4. Eventually, creation of 4D Gaussian splatting using something akin to DN‑splatter—my long‑term goal is a data engine that provides poses/depths/splats/keypoints/etc.

42,785 次观看 • 1 年前

Happy 4th 🇺🇸!! I have a preview for my next release (assembly101 is massive, so I need to push until next week). But in the meantime, check out the link below to try out the 🚧 work in progress using Rerun and Gradio Links below 👇

Happy 4th 🇺🇸!! I have a preview for my next release (assembly101 is massive, so I need to push until next week). But in the meantime, check out the link below to try out the 🚧 work in progress using Rerun and Gradio Links below 👇

34,186 次观看 • 1 年前

Working on adding a new dataset to the lineup. Ported ego-dex over to Rerun With rerun now stabilizing RRD format between versions (0.23 -> 0.24), this is the perfect time to start encoding all of the datasets I've been using to RRD 1. I'm starting with ego-dex and then adding others, such as HOCAP/Assembly 101 2. Looking to see if it also makes sense to port to webdatasets RRD 3. I've started including visualizing confidence — green (high), yellow (medium), red (low). More info on Friday

Working on adding a new dataset to the lineup. Ported ego-dex over to Rerun With rerun now stabilizing RRD format between versions (0.23 -> 0.24), this is the perfect time to start encoding all of the datasets I've been using to RRD 1. I'm starting with ego-dex and then adding others, such as HOCAP/Assembly 101 2. Looking to see if it also makes sense to port to webdatasets RRD 3. I've started including visualizing confidence — green (high), yellow (medium), red (low). More info on Friday

34,253 次观看 • 1 年前

Here’s a sneak peek using Rerun and Gradio for data annotation. It uses Video Depth Anything and Segment Anything 2 under the hood to generate segmentation masks and depth maps/point clouds. More to share next week.

Here’s a sneak peek using Rerun and Gradio for data annotation. It uses Video Depth Anything and Segment Anything 2 under the hood to generate segmentation masks and depth maps/point clouds. More to share next week.

36,719 次观看 • 1 年前

More progress! I now have two Dockerized Gradio | Rerun apps. The first one takes as input a "raw" rrd file that consists of the synchronized egocentric and exocentric MP4 files. This runs the pipeline and produces an "annotated" rrd file. This has the camera parameters, 3D joints, and projected 2D joints (with 6DOF mano soon). The second app takes this "annotated" rrd file and allows for manual labeling. This is a crucial step in addressing any major failures in the pipeline. Right now, it is only the ego view that can be modified. But I'll eventually extend to all. This results in a final "gt" rrd file. From here, the plan is to improve quality and start building a data loop. Excited to start really scaling this. I'm basically going all in on keeping my data stored as Rerun rrd files. As always, I want to emphasize how crucial it is to LOOK AT YOUR data! The rrd format makes it incredibly easy to do so. Getting the data out to use is a bit of a hassle right now, but for me, it's well worth the tradeoff.

More progress! I now have two Dockerized Gradio | Rerun apps. The first one takes as input a "raw" rrd file that consists of the synchronized egocentric and exocentric MP4 files. This runs the pipeline and produces an "annotated" rrd file. This has the camera parameters, 3D joints, and projected 2D joints (with 6DOF mano soon). The second app takes this "annotated" rrd file and allows for manual labeling. This is a crucial step in addressing any major failures in the pipeline. Right now, it is only the ego view that can be modified. But I'll eventually extend to all. This results in a final "gt" rrd file. From here, the plan is to improve quality and start building a data loop. Excited to start really scaling this. I'm basically going all in on keeping my data stored as Rerun rrd files. As always, I want to emphasize how crucial it is to LOOK AT YOUR data! The rrd format makes it incredibly easy to do so. Getting the data out to use is a bit of a hassle right now, but for me, it's well worth the tradeoff.

19,527 次观看 • 9 个月前

✨ Massive Pipeline Refactor → One Framework for Ego + Exo Datasets, Visualized with Rerun 🚀 After a deep refactoring and cleanup, my entire egocentric/exocentric pipeline is now fully modular. One codebase handles different sensor layouts and generates a unified, multimodal timeseries RRD file that you can open instantly in Rerun. --- The first three datasets that are already supported 1. Assembly101 – 4 ego Quest‑style fisheye cams + 8 exo pinhole cams 2. HO‑Cap – 1 ego HoloLens pinhole cam + 8 exo pinhole cams 3. EgoDex – 1 ego Apple Vision Pro pinhole cam Unified geometry: Each frame now logs _both_ camera intrinsics / extrinsics and COCO Whole-Body 133-kp keypoints in the same stream. Everything is canonicalized at import time, so there’s zero OpenCV vs OpenGL guess-work—Rerun reads it all in the correct coordinate system automatically. --- Why this matters - Consistent schema ✚ live visuals – Rerun’s deep link between data & rendering means every experiment comes with a built‑in viewer. No more ad‑hoc OpenCV/matplotlib hacks just to sanity‑check a dataset. - Multi‑terabyte friendly – The next step is bulk‑ingest these giants into Rerun and wrap them in a Gradio UI for point‑and‑click exploration, as I've already done for EgoDex!

✨ Massive Pipeline Refactor → One Framework for Ego + Exo Datasets, Visualized with Rerun 🚀 After a deep refactoring and cleanup, my entire egocentric/exocentric pipeline is now fully modular. One codebase handles different sensor layouts and generates a unified, multimodal timeseries RRD file that you can open instantly in Rerun. --- The first three datasets that are already supported 1. Assembly101 – 4 ego Quest‑style fisheye cams + 8 exo pinhole cams 2. HO‑Cap – 1 ego HoloLens pinhole cam + 8 exo pinhole cams 3. EgoDex – 1 ego Apple Vision Pro pinhole cam Unified geometry: Each frame now logs _both_ camera intrinsics / extrinsics and COCO Whole-Body 133-kp keypoints in the same stream. Everything is canonicalized at import time, so there’s zero OpenCV vs OpenGL guess-work—Rerun reads it all in the correct coordinate system automatically. --- Why this matters - Consistent schema ✚ live visuals – Rerun’s deep link between data & rendering means every experiment comes with a built‑in viewer. No more ad‑hoc OpenCV/matplotlib hacks just to sanity‑check a dataset. - Multi‑terabyte friendly – The next step is bulk‑ingest these giants into Rerun and wrap them in a Gradio UI for point‑and‑click exploration, as I've already done for EgoDex!

20,836 次观看 • 1 年前

没有更多内容可加载