正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

We have SLAM on the Robocap! 🎉 Visualized with Rerun Using NVIDIA AI Developer cuVSLAM for GPU-accelerated multicamera tracking. I basically wrote zero code myself and fully used Claude Code for this. It worked because I had so many existing examples to point to that it just wrote everything... the way I would have. A few technical wins: 1. Used rattler-build from prefix.dev to package the compiled cuVSLAM CUDA binaries, which made it SUPER easy to use across repos. This also means it works on the DGX Spark (ARM64) out of the box. 2. Zero-setup experience: git clone && pixi run track-robocap auto-downloads a 100MB dataset from HuggingFace and tracks frames. 3. Real-time 3D visualization with trajectories, landmarks, pose graphs, and video playback in Rerun. Still visual-only (not visual-inertial yet), and loop closure needs some debugging. Next steps are getting this into a Gradio interface, then into daggr, and extending it to work with other datasets from exoego-forge. The last piece I'm excited about: Rerun's RRD files now support layers for incremental data. Planning to build pipelines that go from raw sensor data → slam -> human pose → depth estimation → etc. Repo here:show more

Pablo Vela

2,808 subscribers

50,805 次观看 • 4 个月前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

We have HOT3D! I've started using Claude to port more datasets into Rerun and exoego-forge. I'd been meaning to bring in the HOT3D dataset from Meta for a while, but with Claude, it's way easier. My goal is to take any egocentric, exocentric, or both datasets and ingest them into a standardized schema. Getting everything into Rerun means we can easily query and transform data via the in-memory OSS server. This lets us generate SQL-like queries such as: "Find me all frames that only contain left hands in the leftmost camera view." Most people think of Rerun as a viewer, but this is the actual superpower. So far we have: 1. HOT3D 2. Hocap 3. UmeTrack 4. Assembly101 5. EgoDex Planning to add more, and with every addition, it gets easier as we build up agent skills and better code examples. Hoping to make it almost fully automatic for adding new datasets. The next few I'm looking at are Harmony4D and Aria Pilot Gen2 After we have enough samples, I'll work on bringing in all the different algorithms I've worked on to transform the data 🙂

We have HOT3D! I've started using Claude to port more datasets into Rerun and exoego-forge. I'd been meaning to bring in the HOT3D dataset from Meta for a while, but with Claude, it's way easier. My goal is to take any egocentric, exocentric, or both datasets and ingest them into a standardized schema. Getting everything into Rerun means we can easily query and transform data via the in-memory OSS server. This lets us generate SQL-like queries such as: "Find me all frames that only contain left hands in the leftmost camera view." Most people think of Rerun as a viewer, but this is the actual superpower. So far we have: 1. HOT3D 2. Hocap 3. UmeTrack 4. Assembly101 5. EgoDex Planning to add more, and with every addition, it gets easier as we build up agent skills and better code examples. Hoping to make it almost fully automatic for adding new datasets. The next few I'm looking at are Harmony4D and Aria Pilot Gen2 After we have enough samples, I'll work on bringing in all the different algorithms I've worked on to transform the data 🙂

Pablo Vela

35,662 次观看 • 3 个月前

I've migrated the old Mast3r-SLAM example I had made last year to the latest version of Rerun and made a bunch of improvements! I wanted to spend some time with agents to modernize it. Here's an example of me walking around with my iPhone and getting a dense reconstruction at about 10FPS on a 5090. Heres the following improvements I made. Brought it into the monorepo with proper packaging: • Using prefix.dev pixi-build to get rid of all the mast3r/asmk/lietorch vendored code with just a few small patches. This let me remove so 60k lines of code from the repo! • Don't have to build the lietorch code on my machine anymore, which was taking ~10 minutes to compile (and also made it work on blackwell when it previously did not) Rebuilt the Gradio interface: • Fixed incremental updates, .MOV uploads, and stop behavior • Made the CLI + Gradio interface share the same entry point so updates automatically propagate Upgraded the Rerun integration: • Switched to a multiprocessing async logging strategy • Added video/pointmap/confidence logging • Improved blueprint layout and hid noisy entities from 3D view • Biggest perf win was the async background logger - documented about a ~2.5x speedup from decoupling logging from tracking The newest and most interesting part was my attempt to replace the CUDA kernels for Gauss-Newton ray matching with a Modular Mojo backend. As a Python dev, every time I look at CUDA code I basically shy away as it's pretty difficult for me to understand. Mojo let me rewrite the matching logic in a syntax I'm more comfortable with while still getting near-CUDA performance. Mojo is now the default matching backend with CUDA fallback. One major piece that's missing is the custom PyTorch op path, but I'll eventually do that as well. I heavily leaned on Claude Code to do the CUDA → Mojo migration, and I have no doubt it's not the cleanest or most idiomatic, BUT it's way more readable for me and helps me better understand the underlying algorithm. This was a ton of work, and a large part of why I'm doing it is how the monorepo compounds. This becomes an artifact for the next example I want to build with Claude that I can point to, which will make it even faster to implement. The compounding nature of this is really interesting and part of why I'm spending so much time trying to make things nice and readable.

I've migrated the old Mast3r-SLAM example I had made last year to the latest version of Rerun and made a bunch of improvements! I wanted to spend some time with agents to modernize it. Here's an example of me walking around with my iPhone and getting a dense reconstruction at about 10FPS on a 5090. Heres the following improvements I made. Brought it into the monorepo with proper packaging: • Using prefix.dev pixi-build to get rid of all the mast3r/asmk/lietorch vendored code with just a few small patches. This let me remove so 60k lines of code from the repo! • Don't have to build the lietorch code on my machine anymore, which was taking ~10 minutes to compile (and also made it work on blackwell when it previously did not) Rebuilt the Gradio interface: • Fixed incremental updates, .MOV uploads, and stop behavior • Made the CLI + Gradio interface share the same entry point so updates automatically propagate Upgraded the Rerun integration: • Switched to a multiprocessing async logging strategy • Added video/pointmap/confidence logging • Improved blueprint layout and hid noisy entities from 3D view • Biggest perf win was the async background logger - documented about a ~2.5x speedup from decoupling logging from tracking The newest and most interesting part was my attempt to replace the CUDA kernels for Gauss-Newton ray matching with a Modular Mojo backend. As a Python dev, every time I look at CUDA code I basically shy away as it's pretty difficult for me to understand. Mojo let me rewrite the matching logic in a syntax I'm more comfortable with while still getting near-CUDA performance. Mojo is now the default matching backend with CUDA fallback. One major piece that's missing is the custom PyTorch op path, but I'll eventually do that as well. I heavily leaned on Claude Code to do the CUDA → Mojo migration, and I have no doubt it's not the cleanest or most idiomatic, BUT it's way more readable for me and helps me better understand the underlying algorithm. This was a ton of work, and a large part of why I'm doing it is how the monorepo compounds. This becomes an artifact for the next example I want to build with Claude that I can point to, which will make it even faster to implement. The compounding nature of this is really interesting and part of why I'm spending so much time trying to make things nice and readable.

Pablo Vela

42,143 次观看 • 3 个月前

I've been on a SLAM/SFM kick. It's one of the more underexplored and lacking areas when it comes to human teleop/data collections, so I've brought over Deep Patch Visual Odometry/SLAM to Rerun and Gradio. With this example, we now have 1. pycuvslam 2. pycolmap/glomap 3. mast3r-slam 4. dpvo/slam all integrated into rerun. The question becomes, which method should be used in what situations? They all make different trade-offs with different camera requirements and throughput/accuracy. What about when a new method comes out? Now that I have several different methods, I plan to use VSLAM-LAB for evaluation. It uses prefix.dev to isolate all the dependencies of each of these methods and easily compare them against each other. In particular, I'll be converting the data preprocessing, algorithm outputs, and evaluation into rerun recordings (rrd files). This will allow both programmatic querying of anything stored in the files (which method had the highest ATE-to-FPS ratio? Which dataset/sequence caused the most difficulty? etc. etc.), all with easy visual inspection using the rerun server to link them all together. Another really important side effect of this is how it impacts agents. As Karpathy said ``` LLMs are exceptionally good at looping until they meet specific goals, and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria, and watch it go. ``` by having accuracy and throughput metrics deeply tied with human inspectable artifacts. One can really accelerate agentic development with an actual understanding of how the method/data performs. I think this is another killer use case that I'll be really leaning into to make ingestion of new datasets/methods trivial with an agent. I'm making it my mission for folks to understand that rerun as a visualization tool only scratches the surface of what its true benefit is. Deep integration between data and visuals, with powerful query capabilities. I'll be focusing on the SLAM use case first and then bringing this into the full egocentric/exocentric data collection domain!

I've been on a SLAM/SFM kick. It's one of the more underexplored and lacking areas when it comes to human teleop/data collections, so I've brought over Deep Patch Visual Odometry/SLAM to Rerun and Gradio. With this example, we now have 1. pycuvslam 2. pycolmap/glomap 3. mast3r-slam 4. dpvo/slam all integrated into rerun. The question becomes, which method should be used in what situations? They all make different trade-offs with different camera requirements and throughput/accuracy. What about when a new method comes out? Now that I have several different methods, I plan to use VSLAM-LAB for evaluation. It uses prefix.dev to isolate all the dependencies of each of these methods and easily compare them against each other. In particular, I'll be converting the data preprocessing, algorithm outputs, and evaluation into rerun recordings (rrd files). This will allow both programmatic querying of anything stored in the files (which method had the highest ATE-to-FPS ratio? Which dataset/sequence caused the most difficulty? etc. etc.), all with easy visual inspection using the rerun server to link them all together. Another really important side effect of this is how it impacts agents. As Karpathy said ``` LLMs are exceptionally good at looping until they meet specific goals, and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria, and watch it go. ``` by having accuracy and throughput metrics deeply tied with human inspectable artifacts. One can really accelerate agentic development with an actual understanding of how the method/data performs. I think this is another killer use case that I'll be really leaning into to make ingestion of new datasets/methods trivial with an agent. I'm making it my mission for folks to understand that rerun as a visualization tool only scratches the surface of what its true benefit is. Deep integration between data and visuals, with powerful query capabilities. I'll be focusing on the SLAM use case first and then bringing this into the full egocentric/exocentric data collection domain!

Pablo Vela

40,864 次观看 • 3 个月前

More progress! I now have two Dockerized Gradio | Rerun apps. The first one takes as input a "raw" rrd file that consists of the synchronized egocentric and exocentric MP4 files. This runs the pipeline and produces an "annotated" rrd file. This has the camera parameters, 3D joints, and projected 2D joints (with 6DOF mano soon). The second app takes this "annotated" rrd file and allows for manual labeling. This is a crucial step in addressing any major failures in the pipeline. Right now, it is only the ego view that can be modified. But I'll eventually extend to all. This results in a final "gt" rrd file. From here, the plan is to improve quality and start building a data loop. Excited to start really scaling this. I'm basically going all in on keeping my data stored as Rerun rrd files. As always, I want to emphasize how crucial it is to LOOK AT YOUR data! The rrd format makes it incredibly easy to do so. Getting the data out to use is a bit of a hassle right now, but for me, it's well worth the tradeoff.

More progress! I now have two Dockerized Gradio | Rerun apps. The first one takes as input a "raw" rrd file that consists of the synchronized egocentric and exocentric MP4 files. This runs the pipeline and produces an "annotated" rrd file. This has the camera parameters, 3D joints, and projected 2D joints (with 6DOF mano soon). The second app takes this "annotated" rrd file and allows for manual labeling. This is a crucial step in addressing any major failures in the pipeline. Right now, it is only the ego view that can be modified. But I'll eventually extend to all. This results in a final "gt" rrd file. From here, the plan is to improve quality and start building a data loop. Excited to start really scaling this. I'm basically going all in on keeping my data stored as Rerun rrd files. As always, I want to emphasize how crucial it is to LOOK AT YOUR data! The rrd format makes it incredibly easy to do so. Getting the data out to use is a bit of a hassle right now, but for me, it's well worth the tradeoff.

Pablo Vela

19,527 次观看 • 9 个月前

Most people think Rerun is a visualization tool. In reality, it's a database masquerading as a visualizer. I wanted to showcase this functionality by building a full data pipeline consisting of: ingestion → baseline method → eval → finetuning for SLAM on egocentric data. I'll eventually extend this to the rest of my ego/exo datasets, but I wanted to start with a smaller bunch of datasets first. Rerun allows you to expose your saved .rrd files to a catalog where you store datasets. You can query, filter, and join them like any database using DataFusion under the hood. These are the same .rrd files that are automatically generated whenever you visualize anything in Rerun and decide to save it to disk. I brought in 109 VSLAM-LAB sequences across 14 datasets into the Rerun catalog as an example. These include 7Scenes, Euroc, eth3d, and others. Now I can query them with segment_table, filter_segments, and filter_contents instead of parsing CSVs and YAML files. With a strong set of ground-truth datasets for SLAM, baseline additions become nearly automatic with agents like Opus/Codex. This unification of data and visualization is imo the largest missing part for Physical AI. Visualization becomes a natural byproduct of having your data properly structured and queryable. The catalog API is what makes it a database, not just a viewer. I initially focused on VSLAM-LAB data, but I'll migrate all the egoexo data to this format in the coming days to really show just how useful this is.

Most people think Rerun is a visualization tool. In reality, it's a database masquerading as a visualizer. I wanted to showcase this functionality by building a full data pipeline consisting of: ingestion → baseline method → eval → finetuning for SLAM on egocentric data. I'll eventually extend this to the rest of my ego/exo datasets, but I wanted to start with a smaller bunch of datasets first. Rerun allows you to expose your saved .rrd files to a catalog where you store datasets. You can query, filter, and join them like any database using DataFusion under the hood. These are the same .rrd files that are automatically generated whenever you visualize anything in Rerun and decide to save it to disk. I brought in 109 VSLAM-LAB sequences across 14 datasets into the Rerun catalog as an example. These include 7Scenes, Euroc, eth3d, and others. Now I can query them with segment_table, filter_segments, and filter_contents instead of parsing CSVs and YAML files. With a strong set of ground-truth datasets for SLAM, baseline additions become nearly automatic with agents like Opus/Codex. This unification of data and visualization is imo the largest missing part for Physical AI. Visualization becomes a natural byproduct of having your data properly structured and queryable. The catalog API is what makes it a database, not just a viewer. I initially focused on VSLAM-LAB data, but I'll migrate all the egoexo data to this format in the coming days to really show just how useful this is.

Pablo Vela

34,937 次观看 • 2 个月前

🖨️ It took over a year and lots of help from Claude Code but I have now been able to create a real dot matrix printer on the web that prints from Windows 3.11 in your browser: Here's the test page: I couldn't get the LPT1 port to work but COM2 did work (the dial up model already uses COM1), the JS receives the COM2 serial data (just raw print data) and then prints it Next is to make the dot matrix printer better looking (kinda 3d like on pieter dot com) and have the paper also flowing in 3d out of the printer and integrate it on the page My idea was that it might work because we also capture the COM1 data (from the dial up modem) and send it to a Websocket as a dial up connection, so we should be able to capture the COM2 traffic too but I didn't know how, but now I do IT WORKS!! 😊

🖨️ It took over a year and lots of help from Claude Code but I have now been able to create a real dot matrix printer on the web that prints from Windows 3.11 in your browser: Here's the test page: I couldn't get the LPT1 port to work but COM2 did work (the dial up model already uses COM1), the JS receives the COM2 serial data (just raw print data) and then prints it Next is to make the dot matrix printer better looking (kinda 3d like on pieter dot com) and have the paper also flowing in 3d out of the printer and integrate it on the page My idea was that it might work because we also capture the COM1 data (from the dial up modem) and send it to a Websocket as a dial up connection, so we should be able to capture the COM2 traffic too but I didn't know how, but now I do IT WORKS!! 😊

@levelsio

85,937 次观看 • 1 个月前

Today, I'm revealing my Roblox Game FULLY scripted using Claude Opus 4.6 Extended. This project took about 4 days to complete, and not a single line of code was touched by an actual scripter. In the conversation I had with Claude, I guided it to complete the entire coding, this game started with a Frontend and a Backend - It is now a fully complete frontpage Roblox Game. From UI Animations, to Visual effect movements, and UI Icons, it was all Claude. The game releases Tomorrow, and I'm expecting it to reach front page on Roblox within a few weeks. 300 Likes, and I'll share the FULL conversation, which was used to build the game. Claude is incredible, this is genuinely an insane movement for coders enhancing their productivity with AI. This is FRICKEN INSANE.

Today, I'm revealing my Roblox Game FULLY scripted using Claude Opus 4.6 Extended. This project took about 4 days to complete, and not a single line of code was touched by an actual scripter. In the conversation I had with Claude, I guided it to complete the entire coding, this game started with a Frontend and a Backend - It is now a fully complete frontpage Roblox Game. From UI Animations, to Visual effect movements, and UI Icons, it was all Claude. The game releases Tomorrow, and I'm expecting it to reach front page on Roblox within a few weeks. 300 Likes, and I'll share the FULL conversation, which was used to build the game. Claude is incredible, this is genuinely an insane movement for coders enhancing their productivity with AI. This is FRICKEN INSANE.

Henry

680,331 次观看 • 5 个月前

UPDATE: Ended up with this cool little tool to both build and verify the training data quality; it turned out very accurate! So I am at least happy about that (tho it obvisouly was the easy part) The pipeline to produce this kind of training data (from any Warsh recitation) is in the GitHub below. The repo also documents my full exploration of the Tarteel-for-Warsh problem (and quite frankly, of the offline open source tarteel problem in general): what I tried, what's promising, what's not, and what I think the next steps worth taking are: from a non-engineer with little technical expertiste trying to figure it out. I tried to document everything so the repo might be excessively comprehensive at times. It's basically my entire convo with Claude on the subject, condensed in one repo. Please someone take this and run with it! I genuinely believe the right engineers who can make something real out of this are out there First public repo I've ever shared, so if I made dumb mistakes, please tell me! 🔗

UPDATE: Ended up with this cool little tool to both build and verify the training data quality; it turned out very accurate! So I am at least happy about that (tho it obvisouly was the easy part) The pipeline to produce this kind of training data (from any Warsh recitation) is in the GitHub below. The repo also documents my full exploration of the Tarteel-for-Warsh problem (and quite frankly, of the offline open source tarteel problem in general): what I tried, what's promising, what's not, and what I think the next steps worth taking are: from a non-engineer with little technical expertiste trying to figure it out. I tried to document everything so the repo might be excessively comprehensive at times. It's basically my entire convo with Claude on the subject, condensed in one repo. Please someone take this and run with it! I genuinely believe the right engineers who can make something real out of this are out there First public repo I've ever shared, so if I made dumb mistakes, please tell me! 🔗

Yousr

37,936 次观看 • 4 个月前

Colmap 4.0 was very recently released, so it inspired me to do some work to better understand it and its new capabilities with Rerun. I want to really understand how Colmap, and in particular, pycolmap, works outside of just calling it via the CLI. So my goal is to use the low-level pycolmap API to log every part of the pipeline. The explicit goal is to have an alternative to the SQLite database that I can utilize. Instead of SQLite, I want to try logging everything directly to rerun and use RRD. This means I can have deep inspectability and still save the features/matches/2D view geometry, but be able to view it directly in rerun. I think this is one of the superpowers that rerun provides; data and visualizations are deeply integrated. As I'm often working with sequential data (videos), I'm going to specifically focus on four things: 1. Monocular Video Simple: Calls high-level APIs such as pycolmap.extract_features, pycolmap.match_sequential, pycolmap.incremental_mapping. These are basically identical to the CLI options and provide a good baseline. 2. Monocular Video Streamed: Take the above high-level APIs and break them down to their iterator version, logging each component in a streamed manner. This way, I can stream the intermediate features to rerun while the extraction/matching/mapping is happening. 3. Rig with unknown calibration: <- WHAT THE VIDEO SHOWS This is probably the most interesting version and the first one I've been working on. It allows one to set a rig between known sensors, such as in VR/AR devices, leading to much better reconstructions with multiple cameras. This is the case where we don't know the calibration a priori, so we have to run a reconstruction twice: once as a normal Colmap reconstruction with no rig constraints, use this to generate the constraints, and then do it again with the newly found rig. 4. Rig with known calibration: This is the RoboCap example, where we have a pre-calibrated set of sensors, so we don't need to run the two reconstructions and also gain better matching between cameras, both spatially and temporally. Again, this leads to a much better reconstruction! Along with all this, GLOMAP has become a first-class global mapper, making it super easy to use directly within pycolmap! I'm excited to do more with this and compare it to things like pycuvslam, vipe, and other alternatives.

Colmap 4.0 was very recently released, so it inspired me to do some work to better understand it and its new capabilities with Rerun. I want to really understand how Colmap, and in particular, pycolmap, works outside of just calling it via the CLI. So my goal is to use the low-level pycolmap API to log every part of the pipeline. The explicit goal is to have an alternative to the SQLite database that I can utilize. Instead of SQLite, I want to try logging everything directly to rerun and use RRD. This means I can have deep inspectability and still save the features/matches/2D view geometry, but be able to view it directly in rerun. I think this is one of the superpowers that rerun provides; data and visualizations are deeply integrated. As I'm often working with sequential data (videos), I'm going to specifically focus on four things: 1. Monocular Video Simple: Calls high-level APIs such as pycolmap.extract_features, pycolmap.match_sequential, pycolmap.incremental_mapping. These are basically identical to the CLI options and provide a good baseline. 2. Monocular Video Streamed: Take the above high-level APIs and break them down to their iterator version, logging each component in a streamed manner. This way, I can stream the intermediate features to rerun while the extraction/matching/mapping is happening. 3. Rig with unknown calibration: <- WHAT THE VIDEO SHOWS This is probably the most interesting version and the first one I've been working on. It allows one to set a rig between known sensors, such as in VR/AR devices, leading to much better reconstructions with multiple cameras. This is the case where we don't know the calibration a priori, so we have to run a reconstruction twice: once as a normal Colmap reconstruction with no rig constraints, use this to generate the constraints, and then do it again with the newly found rig. 4. Rig with known calibration: This is the RoboCap example, where we have a pre-calibrated set of sensors, so we don't need to run the two reconstructions and also gain better matching between cameras, both spatially and temporally. Again, this leads to a much better reconstruction! Along with all this, GLOMAP has become a first-class global mapper, making it super easy to use directly within pycolmap! I'm excited to do more with this and compare it to things like pycuvslam, vipe, and other alternatives.

Pablo Vela

30,070 次观看 • 3 个月前

People used to pay me on on average to build their apps for $30K USD in less than a month. I was the go-to-guy because I knew how to build exactly what they wanted with little instruction and I always met their insane deadlines. I was known as "the machine", I wasn't the 10x developer I was the 100x developer. But it took its toll on my health and I left that do tech education. Claude Code can now do it. No computer science degree. No dev team. No $30K invoice. But most non-tech people are still stuck on the outside looking in because there is a tech knowledge gap. What if someone showed you all the steps, used plain speak, provided technical strategy, technical pathing, technical guidance, made it practical, and takes you to production. That's what my Claude Code from Zero course would be. But I'll be honest with you, this won't be a Udemy $20 priced course. It will be $300, $500 or $1200 depending on your needs. But do you want me to build it?

People used to pay me on on average to build their apps for $30K USD in less than a month. I was the go-to-guy because I knew how to build exactly what they wanted with little instruction and I always met their insane deadlines. I was known as "the machine", I wasn't the 10x developer I was the 100x developer. But it took its toll on my health and I left that do tech education. Claude Code can now do it. No computer science degree. No dev team. No $30K invoice. But most non-tech people are still stuck on the outside looking in because there is a tech knowledge gap. What if someone showed you all the steps, used plain speak, provided technical strategy, technical pathing, technical guidance, made it practical, and takes you to production. That's what my Claude Code from Zero course would be. But I'll be honest with you, this won't be a Udemy $20 priced course. It will be $300, $500 or $1200 depending on your needs. But do you want me to build it?

Andrew Brown

26,909 次观看 • 4 个月前

This is a pretty wild model! You can use it to turn an image into a 3D object with texture. The quality is out of this world! I'm not even a designer, and I've been using this nonstop for the last 2 hours. The model is Hunyuan 3D 2.1. It's open source. You'll find model weights, training/inference code, data pipelines, and architecture on their repository. You can even fine-tune it if you want! GitHub Repository: By the way, the model runs on consumer-grade GPUs. You don't need a datacenter for this! I've been using the model from the HuggingFace demo page: To use it, go to the link and upload an image. That's it! Check out the video I recorded for a couple of examples.

This is a pretty wild model! You can use it to turn an image into a 3D object with texture. The quality is out of this world! I'm not even a designer, and I've been using this nonstop for the last 2 hours. The model is Hunyuan 3D 2.1. It's open source. You'll find model weights, training/inference code, data pipelines, and architecture on their repository. You can even fine-tune it if you want! GitHub Repository: By the way, the model runs on consumer-grade GPUs. You don't need a datacenter for this! I've been using the model from the HuggingFace demo page: To use it, go to the link and upload an image. That's it! Check out the video I recorded for a couple of examples.

Santiago

44,783 次观看 • 1 年前

n8n vs cursor, building a keyword research workflow Plus I'll give you my code no strings attached... you can use my version, my code, or try to use the logic to build it as a node-based workflow...whatever! what you need: - firecrawl API key - perplexity API key - dataforseo API key goal: - input a URL - use Firecrawl to scrape the site - use Perplexity to generate seed keywords and understand the target market, competitors - use DataforSEO to research the keywords and associated data my process: 1) used claude code to build the n8n workflow to go from 0-1, then debug (still debugging after an hour) - structure was good but too many nodes and some errors. - debugging was a pain. building it manually would have taken a long time. - I much prefer natural language w/ an agent. 2) used claude code to just build a simple app where I input a couple API keys (working in 30 minutes) - a few tweaks during the testing process, mostly just around output formatting. - working from the first go, debugging was way easier. - I can easily expand on the functionality. - I have a simple and clean UX out of the gate, no need to use google docs, airtable, etc etc conclusion: OPTION 2 IS MUCH MORE EFFICIENT AND FUN. I've attached a video demo. I'm also going to share a link for you to test it, and the GitHub repo -- it's all yours!

n8n vs cursor, building a keyword research workflow Plus I'll give you my code no strings attached... you can use my version, my code, or try to use the logic to build it as a node-based workflow...whatever! what you need: - firecrawl API key - perplexity API key - dataforseo API key goal: - input a URL - use Firecrawl to scrape the site - use Perplexity to generate seed keywords and understand the target market, competitors - use DataforSEO to research the keywords and associated data my process: 1) used claude code to build the n8n workflow to go from 0-1, then debug (still debugging after an hour) - structure was good but too many nodes and some errors. - debugging was a pain. building it manually would have taken a long time. - I much prefer natural language w/ an agent. 2) used claude code to just build a simple app where I input a couple API keys (working in 30 minutes) - a few tweaks during the testing process, mostly just around output formatting. - working from the first go, debugging was way easier. - I can easily expand on the functionality. - I have a simple and clean UX out of the gate, no need to use google docs, airtable, etc etc conclusion: OPTION 2 IS MUCH MORE EFFICIENT AND FUN. I've attached a video demo. I'm also going to share a link for you to test it, and the GitHub repo -- it's all yours!

The Boring Marketer

33,377 次观看 • 1 年前

If you could only learn one thing that will be relevant for the next 10-20 years, focus on learning how to deal with data. The future is not about faster hardware, smarter algorithms, or better ideas. The future is about DATA, and those who know how to deal with it will stay relevant much longer than anyone else. I recorded a video to show you how easy it is to get started. In the video, I'm using Kestra. For a long time, I was a fan of AirFlow. Then, I moved to AWS Step Functions. Today, I only use Kestra. Kestra is open-source (repo link below) and kind enough to sponsor my work. The video will show you how easy it is to do the following: 1. Run Kestra locally (literally, one command) 2. Build a simple flow 3. Run Python scripts as part of your flow 4. Connect to HuggingFace models If you have never built a data pipeline, open Kestra's Quick Start Guide and follow their examples. (I think it will take you one weekend to feel comfortable with the application and build the courage you need to get into more serious work.)

If you could only learn one thing that will be relevant for the next 10-20 years, focus on learning how to deal with data. The future is not about faster hardware, smarter algorithms, or better ideas. The future is about DATA, and those who know how to deal with it will stay relevant much longer than anyone else. I recorded a video to show you how easy it is to get started. In the video, I'm using Kestra. For a long time, I was a fan of AirFlow. Then, I moved to AWS Step Functions. Today, I only use Kestra. Kestra is open-source (repo link below) and kind enough to sponsor my work. The video will show you how easy it is to do the following: 1. Run Kestra locally (literally, one command) 2. Build a simple flow 3. Run Python scripts as part of your flow 4. Connect to HuggingFace models If you have never built a data pipeline, open Kestra's Quick Start Guide and follow their examples. (I think it will take you one weekend to feel comfortable with the application and build the courage you need to get into more serious work.)

Santiago

51,012 次观看 • 1 年前

We live in the solo founder era. That's for sure and building with AI has accelerated. I am building software to use for myself and replace the expensive tools I used to use. And partially thanks to AI we can build faster, but not dramatically faster if you want to build a commercial grade product. Before I was using a live chat and customer support tool that costed around 50-100$ a month. This helped me to interact with the customers of my agency or prospects. But I decided a few months ago to build my own and stop paying for these more expensive solutions. I did and turned into a product for everyone to use. The build was not 1-2 hours of work using AI, but it was a several-month process on and off as I needed it. Finally, it has turned into a commercial grade product. I have built several of these and this is when we will see more and more solutions like these. Let's start bootstrapping before showing MRR figures.

We live in the solo founder era. That's for sure and building with AI has accelerated. I am building software to use for myself and replace the expensive tools I used to use. And partially thanks to AI we can build faster, but not dramatically faster if you want to build a commercial grade product. Before I was using a live chat and customer support tool that costed around 50-100$ a month. This helped me to interact with the customers of my agency or prospects. But I decided a few months ago to build my own and stop paying for these more expensive solutions. I did and turned into a product for everyone to use. The build was not 1-2 hours of work using AI, but it was a several-month process on and off as I needed it. Finally, it has turned into a commercial grade product. I have built several of these and this is when we will see more and more solutions like these. Let's start bootstrapping before showing MRR figures.

andrei saioc

14,422 次观看 • 1 年前

Firstly, my work isn’t AI GENERATED PROMPT. The reason why I privated my account was because I was busy yesterday and couldn’t explain things. Fonts that I used: • For the “HAMKU” I used PECKHAM PRESS, I rasterized the letters a bit to achieve a close font from the official account. • For the texts I used CHANTAL medium in lower case. • For my username I used CHALKDUSTER. Brushes that I used: • Studio Pen • HB Pencil (for sketching) • Shale Brush For the colors I won’t list it, kindly check the third photo to see the palettes. I have to admit that I get lazy to use it and even lost in the palettes, so I mainly use the eyedropper tool. Here is the original version, as you can see my original version is so low quality, the lines are too pixelated and not clean—that is why I chose to post the ai enhanced version. Since I used ai to enhanced the lines of my original work, it definitely generated a new version which I still decided to post as the lines are cleaner. I really don’t know the way around on procreate. If you have suggestions on how to avoid the pixelated lines on procreate, kindly please tell me. This is my first time using it for line art cause I mostly use it for painting. I actually don’t know how many layers I exactly have as I already deleted some layers during the process. I didn’t expect I had to provide it. I spent hours on making it from ideas down to the final product. I really wanted it to be similar to the official design. This is my first and last explanation regarding this because if you are still not convinced that I made it, I don’t know what else to say.

Firstly, my work isn’t AI GENERATED PROMPT. The reason why I privated my account was because I was busy yesterday and couldn’t explain things. Fonts that I used: • For the “HAMKU” I used PECKHAM PRESS, I rasterized the letters a bit to achieve a close font from the official account. • For the texts I used CHANTAL medium in lower case. • For my username I used CHALKDUSTER. Brushes that I used: • Studio Pen • HB Pencil (for sketching) • Shale Brush For the colors I won’t list it, kindly check the third photo to see the palettes. I have to admit that I get lazy to use it and even lost in the palettes, so I mainly use the eyedropper tool. Here is the original version, as you can see my original version is so low quality, the lines are too pixelated and not clean—that is why I chose to post the ai enhanced version. Since I used ai to enhanced the lines of my original work, it definitely generated a new version which I still decided to post as the lines are cleaner. I really don’t know the way around on procreate. If you have suggestions on how to avoid the pixelated lines on procreate, kindly please tell me. This is my first time using it for line art cause I mostly use it for painting. I actually don’t know how many layers I exactly have as I already deleted some layers during the process. I didn’t expect I had to provide it. I spent hours on making it from ideas down to the final product. I really wanted it to be similar to the official design. This is my first and last explanation regarding this because if you are still not convinced that I made it, I don’t know what else to say.

Wyn 🐧; EN⁷- (💤/busy = priv)

30,000 次观看 • 2 个月前

I genuinely don't understand why everyone isn't using this yet. Andrej Karpathy, OpenAI co-founder, posted a simple idea that went massively viral: Stop using AI to write code. Use it to build a second brain. You point Claude Code at a folder. Drop in any source: an article, a transcript, a PDF. Claude reads it, links it, files it into a living wiki of everything you know. It compounds like interest. The more you feed it, the smarter it gets. Here's the whole thing: 1) Install Obsidian 2) Create a vault 3) Open it in Claude Code 4) Paste Karpathy's wiki idea and tell Claude to build it 5) Claude makes three folders: - raw (for sources) - wiki (for its pages) - CLAUDE. md (that runs it) 6) Drop any source into raw and say: "ingest this" 7) Ask questions across everything, forever Five minutes to set up and you never start from a blank chat again. Full step by step guide below.

I genuinely don't understand why everyone isn't using this yet. Andrej Karpathy, OpenAI co-founder, posted a simple idea that went massively viral: Stop using AI to write code. Use it to build a second brain. You point Claude Code at a folder. Drop in any source: an article, a transcript, a PDF. Claude reads it, links it, files it into a living wiki of everything you know. It compounds like interest. The more you feed it, the smarter it gets. Here's the whole thing: 1) Install Obsidian 2) Create a vault 3) Open it in Claude Code 4) Paste Karpathy's wiki idea and tell Claude to build it 5) Claude makes three folders: - raw (for sources) - wiki (for its pages) - CLAUDE. md (that runs it) 6) Drop any source into raw and say: "ingest this" 7) Ask questions across everything, forever Five minutes to set up and you never start from a blank chat again. Full step by step guide below.

CyrilXBT

321,125 次观看 • 28 天前

I genuinely don't understand why everyone isn't using this yet. Andrej Karpathy, OpenAI co-founder, posted a simple idea that went massively viral: Stop using AI to write code. Use it to build a second brain. You point Claude Code at a folder. Drop in any source: an article, a transcript, a PDF. Claude reads it, links it, files it into a living wiki of everything you know. It compounds like interest. The more you feed it, the smarter it gets. Here's the whole thing: 1) Install Obsidian 2) Create a vault 3) Open it in Claude Code 4) Paste Karpathy's wiki idea and tell Claude to build it 5) Claude makes three folders: - raw (for sources) - wiki (for its pages) - CLAUDE. md (that runs it) 6) Drop any source into raw and say: "ingest this" 7) Ask questions across everything, forever Five minutes to set up and you never start from a blank chat again. Full step by step guide below.

I genuinely don't understand why everyone isn't using this yet. Andrej Karpathy, OpenAI co-founder, posted a simple idea that went massively viral: Stop using AI to write code. Use it to build a second brain. You point Claude Code at a folder. Drop in any source: an article, a transcript, a PDF. Claude reads it, links it, files it into a living wiki of everything you know. It compounds like interest. The more you feed it, the smarter it gets. Here's the whole thing: 1) Install Obsidian 2) Create a vault 3) Open it in Claude Code 4) Paste Karpathy's wiki idea and tell Claude to build it 5) Claude makes three folders: - raw (for sources) - wiki (for its pages) - CLAUDE. md (that runs it) 6) Drop any source into raw and say: "ingest this" 7) Ask questions across everything, forever Five minutes to set up and you never start from a blank chat again. Full step by step guide below.

CyrilXBT

159,824 次观看 • 12 天前

$I just compared Claude Code vs Codex vs Cursor CLI The task was to build a Next.js app with Tailwind 4 and shadcn components to collect customer feedback and showcase it with a widget. I gave all three the same prompt and let them go for 30 minutes to see what they came up with. Claude Code with Opus 4.1 Even though I told it to set up the app in the existing project folder, it tried to create a directory for it. After I interrupted and told it not to do that, it built a demo form and landing page with no errors. I had to ask it to make the demo interactive so users could submit a testimonial and preview it. The landing page looked like AI and was pretty basic, but it worked and it was done in a fraction of the time of the others. Total tokens used: 33k Codex with GPT-5 At the end of the 30 minutes I just could not get Codex to produce a working app. It got stuck in a loop of not being able to set up Tailwind 4 and despite many, MANY, attempts, I ended up with a "failed to compile" error. Total tokens used: 102k Cursor Agent with GPT-5 This was the slowest agent by far and a couple of times I actually thought it got stuck in a loop and was close to Ctrl+C'ing to cancel it. The TUI is really nice though, especially how it shows diffs and it did eventually build a working app (after one or two slight errors that needed fixing) The demo was interactive and it had a very minimal design that looked bare but also a lot less like an "AI generated" app than the Opus 4.1 design. It also wasn't too chatty and just did what it needed to do! Code quality was on a par with Opus 4.1, but it did use 5.5x as many tokens to get there. Still cheaper than Opus on a direct comparison but not when you factor in a Claude Code Max subscription. Total tokens: 188k I'll be able to do a proper comparison and record some videos when I'm back from holiday but for now, Opus is still the more capable model out of the box and Claude Code is the more complete CLI product. It will be interesting to see how Cursor evolve their CLI though with commands and subagents because I think with GPT-5 they have a real shot at providing competition for Claude Code if they can optimise output to get similar quality with less tokens. Jump to 0:40 in the video to see the two apps. Which do you think is which? ;)$

I just compared Claude Code vs Codex vs Cursor CLI The task was to build a Next.js app with Tailwind 4 and shadcn components to collect customer feedback and showcase it with a widget. I gave all three the same prompt and let them go for 30 minutes to see what they came up with. Claude Code with Opus 4.1 Even though I told it to set up the app in the existing project folder, it tried to create a directory for it. After I interrupted and told it not to do that, it built a demo form and landing page with no errors. I had to ask it to make the demo interactive so users could submit a testimonial and preview it. The landing page looked like AI and was pretty basic, but it worked and it was done in a fraction of the time of the others. Total tokens used: 33k Codex with GPT-5 At the end of the 30 minutes I just could not get Codex to produce a working app. It got stuck in a loop of not being able to set up Tailwind 4 and despite many, MANY, attempts, I ended up with a "failed to compile" error. Total tokens used: 102k Cursor Agent with GPT-5 This was the slowest agent by far and a couple of times I actually thought it got stuck in a loop and was close to Ctrl+C'ing to cancel it. The TUI is really nice though, especially how it shows diffs and it did eventually build a working app (after one or two slight errors that needed fixing) The demo was interactive and it had a very minimal design that looked bare but also a lot less like an "AI generated" app than the Opus 4.1 design. It also wasn't too chatty and just did what it needed to do! Code quality was on a par with Opus 4.1, but it did use 5.5x as many tokens to get there. Still cheaper than Opus on a direct comparison but not when you factor in a Claude Code Max subscription. Total tokens: 188k I'll be able to do a proper comparison and record some videos when I'm back from holiday but for now, Opus is still the more capable model out of the box and Claude Code is the more complete CLI product. It will be interesting to see how Cursor evolve their CLI though with commands and subagents because I think with GPT-5 they have a real shot at providing competition for Claude Code if they can optimise output to get similar quality with less tokens. Jump to 0:40 in the video to see the two apps. Which do you think is which? ;)

Ian Nuttall

194,949 次观看 • 11 个月前

What's in my bag 💼 #whatsinmybag #mynameisnanon 🪐 "(1) The first one is Antihistamines. Cause I have allergies to changing whether and pollens. It happens rather randomly, sometimes it acts up and sometimes not. So I have to take medicines everyday (he means on working days, I guess) to avoid the symptoms. (2) Next, what I also bring with me everyday is a notebook, which I take memo about every work project in it, together with a pen. I'm a gen z kid who doesn't like using tablets or electric devices to take note. 😂 I love the feeling when I write straight from what's in my brain. I'm not familiar with typing on tablet or other electronic devices. The same thing as for how I prefer to read scripts. I can't read from digital screens and need real paper version. (3) Next is mobile phone. I input all work schedule in it after I already wrote in the notebook. This is my everyday phone. (4) Then the last one... I had some free time recently and found this in a box during organizing my home. I'm a short-focused person, so I bring a zippo with me just to play with it like this *demonstrate* when I want to relieve some stress or when I couldn't focus well. I have a set of things used for this purpose. Sometimes it's zippo, sometimes balisong, or rubik and coin that I can play around with. I just change thing I bring from time to time. And that's about it krub. 🙂"

What's in my bag 💼 #whatsinmybag #mynameisnanon 🪐 "(1) The first one is Antihistamines. Cause I have allergies to changing whether and pollens. It happens rather randomly, sometimes it acts up and sometimes not. So I have to take medicines everyday (he means on working days, I guess) to avoid the symptoms. (2) Next, what I also bring with me everyday is a notebook, which I take memo about every work project in it, together with a pen. I'm a gen z kid who doesn't like using tablets or electric devices to take note. 😂 I love the feeling when I write straight from what's in my brain. I'm not familiar with typing on tablet or other electronic devices. The same thing as for how I prefer to read scripts. I can't read from digital screens and need real paper version. (3) Next is mobile phone. I input all work schedule in it after I already wrote in the notebook. This is my everyday phone. (4) Then the last one... I had some free time recently and found this in a box during organizing my home. I'm a short-focused person, so I bring a zippo with me just to play with it like this demonstrate when I want to relieve some stress or when I couldn't focus well. I have a set of things used for this purpose. Sometimes it's zippo, sometimes balisong, or rubik and coin that I can play around with. I just change thing I bring from time to time. And that's about it krub. 🙂"

Demiane 🫧🎼

16,034 次观看 • 1 年前

Ever since I wired Claude Code to WhatsApp 3 weeks ago, I built a stupidly large infra around it. I mean, opus built it. No clue how the code even looks. The entire thing was vibe coded using my phone. I wanted to see how far I could push it without touching the computer. Everything via WhatsApp. Build what I need on the fly. So the resulting infrastructure will already be battle tested for software development. The entire thing was streamlined with nearly no manual interventions, everything was communicated via WhatsApp using a single script establishing this connection. If the script is down, I need to get home to start it again to resume the development. Claude was upgrading it, debugging it, restarting it while maintaining constant uptime so it could keep communicating with me. I stressed Claude about it, telling it that it will be “in the dark” and other words that deliberately sound scary about losing communications if the script dies. I also refused git and refused cloning the code, I wanted to see Claude adapting to work on a *LIVING* system. The way this whole thing works: Claude has its own dedicated phone number that I am paying for. A real WhatsApp account for it is installed on a real iPhone that is sitting on my desk. All is registered under my name, this is legit setup with no hacks and tricks. I’ve set up a WhatsApp “Community” and multiple different groups under it. Both me and Claude are the admins, so Claude could edit it on my behalf. Each group is a project I am working on and has its own isolated context. The Group description is a system prompt that gets auto-appended to the larger system prompt explaining this setup in general. When I send a message it’s an instant interrupt to Claude Code’s process, just like in the terminal. Voice notes are seamlessly transcribed with a local Whisper model. Images are used with multimodal reading in an isolated parallel session. Multiple groups running in parallel so I can work on all projects at the same time. No cross-talking, everything has an isolated context and history. And because it’s local on my own machine: Everything is REAL. The browser is REAL. I am connected as myself on it to all services because I actually use it in real life. Claude has unlimited internet access, just like humans who use actual browsers. It utilizes custom-made browser tools that I made to control any browser session it wants. Depending on the situation, it can either connect to my existing session or create one for its own. (You can tell it ‘look at my browser for a sec’ then talk about the current page you are on and it just works, pretty cool) My custom browser tools are not perfect (not by a long shot) but I managed to make them work well to the point they are somewhat reliable. This gives Claude full access to my real creds and all the services I actually use. I’m productive AS HELL with this. It really feels like a personal assistant. I ask it to read my emails and msgs, check x .com for news, research arxiv papers, write code, run experiments for me, investigate and reverse engineer github repos, even use my credit card and order things. [I try not to do this one a lot lol so far no disasters]. All from my phone. Super convenient. This is not a product or an open source project (maybe soon of it will make sense). This is just an ugly script I hacked the entire thing is ~600 lines. (ok maybe i did look at the code, but i swear i didn’t edit!) You can also vibe code this from scratch pretty fast and it will probably even end up better. This is just a cool thing so I’m sharing. It is a real speed booster for many things I do on daily basis, mostly boring things. Forcing my routine into some new “agent platform” just didn’t feel right for me. WhatsApp is where I already communicate and look for messages, so I decided that my agents will live there too. AGI in my pocket 24/7.

Ever since I wired Claude Code to WhatsApp 3 weeks ago, I built a stupidly large infra around it. I mean, opus built it. No clue how the code even looks. The entire thing was vibe coded using my phone. I wanted to see how far I could push it without touching the computer. Everything via WhatsApp. Build what I need on the fly. So the resulting infrastructure will already be battle tested for software development. The entire thing was streamlined with nearly no manual interventions, everything was communicated via WhatsApp using a single script establishing this connection. If the script is down, I need to get home to start it again to resume the development. Claude was upgrading it, debugging it, restarting it while maintaining constant uptime so it could keep communicating with me. I stressed Claude about it, telling it that it will be “in the dark” and other words that deliberately sound scary about losing communications if the script dies. I also refused git and refused cloning the code, I wanted to see Claude adapting to work on a LIVING system. The way this whole thing works: Claude has its own dedicated phone number that I am paying for. A real WhatsApp account for it is installed on a real iPhone that is sitting on my desk. All is registered under my name, this is legit setup with no hacks and tricks. I’ve set up a WhatsApp “Community” and multiple different groups under it. Both me and Claude are the admins, so Claude could edit it on my behalf. Each group is a project I am working on and has its own isolated context. The Group description is a system prompt that gets auto-appended to the larger system prompt explaining this setup in general. When I send a message it’s an instant interrupt to Claude Code’s process, just like in the terminal. Voice notes are seamlessly transcribed with a local Whisper model. Images are used with multimodal reading in an isolated parallel session. Multiple groups running in parallel so I can work on all projects at the same time. No cross-talking, everything has an isolated context and history. And because it’s local on my own machine: Everything is REAL. The browser is REAL. I am connected as myself on it to all services because I actually use it in real life. Claude has unlimited internet access, just like humans who use actual browsers. It utilizes custom-made browser tools that I made to control any browser session it wants. Depending on the situation, it can either connect to my existing session or create one for its own. (You can tell it ‘look at my browser for a sec’ then talk about the current page you are on and it just works, pretty cool) My custom browser tools are not perfect (not by a long shot) but I managed to make them work well to the point they are somewhat reliable. This gives Claude full access to my real creds and all the services I actually use. I’m productive AS HELL with this. It really feels like a personal assistant. I ask it to read my emails and msgs, check x .com for news, research arxiv papers, write code, run experiments for me, investigate and reverse engineer github repos, even use my credit card and order things. [I try not to do this one a lot lol so far no disasters]. All from my phone. Super convenient. This is not a product or an open source project (maybe soon of it will make sense). This is just an ugly script I hacked the entire thing is ~600 lines. (ok maybe i did look at the code, but i swear i didn’t edit!) You can also vibe code this from scratch pretty fast and it will probably even end up better. This is just a cool thing so I’m sharing. It is a real speed booster for many things I do on daily basis, mostly boring things. Forcing my routine into some new “agent platform” just didn’t feel right for me. WhatsApp is where I already communicate and look for messages, so I decided that my agents will live there too. AGI in my pocket 24/7.

Yam Peleg

419,627 次观看 • 7 个月前