Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

Full-circle Test-driven Firmware Development with OpenClaw Ladyada: "I've only had OpenClaw installed on this Raspberry Pi 5 for a couple of days, but boy, have we burned through a lot of tokens and learned a lot. Including what I think is a really fun improvement in my development process:...

25,105 görüntüleme • 4 ay önce •via X (Twitter)

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

OpenClaw vs. a 475-page datasheet: let the robot do the transcribing 🦞🤖 The u-blox SAM-M8Q has been sitting on my bench for months. This little GPS module has a built-in antenna, coin cell backup, speaks both NMEA and UBX binary protocol over UART or I2C. So why isn't it in the shop already? Well, it's mostly cause of the 475-page interfacing datasheet documenting every command, struct, and config register. Hundreds of message types. I got partway through by hand with some Claude Code Sonnet assistance, but ran out of time - plus it was still tedious when babysitting Sonnet. However, now we're living in an Opus + Codex era! So I pointed my Raspberry Pi OpenClaw at it. Here's the setup: Raspberry Pi 5 running OpenClaw, wired to a QT Py RP2040, which talks to the SAM-M8Q. Opus 4.6 reads the datasheet (converted to markdown first by Sonnet 4.6 with 1M context to minimize re-parsing that PDF every session) and builds the implementation plan. I review the plan to make sure it prioritizes the most common commands and reports, and flagged some unessential sections like automotive-assist or RTK-specific. Then Codex is assigned each message implementation task as a sub-agent and writes the actual C code for the Arduino library. Opus suggested using struct-based parsing rather than digging through each uint8_t array; we just memcpy the checksummed message raw bytes onto the matching struct and extract the typed bit fields. We've got four message types done so far. After each message is implemented, Codex also writes a test sketch that will exercise / pretty-print the results of each message, great for self-testing as well as regression testing later. Tonight I'm telling it to keep going while I sleep: code, parse, test against live satellite data, fix failures, commit and push on success, then move on to the next. To me this is a great usage of "agentic" firmware development: there's no creativity in transcribing 84 different structs from a 475-page datasheet. Once the LLMs are done, I can review the PRs as if it were an everyday contributor and even make revision suggestions.

adafruit industries

60,039 görüntüleme • 3 ay önce

I just compared Claude Code vs Codex vs Cursor CLI The task was to build a Next.js app with Tailwind 4 and shadcn components to collect customer feedback and showcase it with a widget. I gave all three the same prompt and let them go for 30 minutes to see what they came up with. Claude Code with Opus 4.1 Even though I told it to set up the app in the existing project folder, it tried to create a directory for it. After I interrupted and told it not to do that, it built a demo form and landing page with no errors. I had to ask it to make the demo interactive so users could submit a testimonial and preview it. The landing page looked like AI and was pretty basic, but it worked and it was done in a fraction of the time of the others. Total tokens used: 33k Codex with GPT-5 At the end of the 30 minutes I just could not get Codex to produce a working app. It got stuck in a loop of not being able to set up Tailwind 4 and despite many, MANY, attempts, I ended up with a "failed to compile" error. Total tokens used: 102k Cursor Agent with GPT-5 This was the slowest agent by far and a couple of times I actually thought it got stuck in a loop and was close to Ctrl+C'ing to cancel it. The TUI is really nice though, especially how it shows diffs and it did eventually build a working app (after one or two slight errors that needed fixing) The demo was interactive and it had a very minimal design that looked bare but also a lot less like an "AI generated" app than the Opus 4.1 design. It also wasn't too chatty and just did what it needed to do! Code quality was on a par with Opus 4.1, but it did use 5.5x as many tokens to get there. Still cheaper than Opus on a direct comparison but not when you factor in a Claude Code Max subscription. Total tokens: 188k I'll be able to do a proper comparison and record some videos when I'm back from holiday but for now, Opus is still the more capable model out of the box and Claude Code is the more complete CLI product. It will be interesting to see how Cursor evolve their CLI though with commands and subagents because I think with GPT-5 they have a real shot at providing competition for Claude Code if they can optimise output to get similar quality with less tokens. Jump to 0:40 in the video to see the two apps. Which do you think is which? ;)

Ian Nuttall

194,803 görüntüleme • 10 ay önce

Congrats to EloShapes.com on their launch! On a similar note, is out of beta and available to the public. A lot of you have been asking me what my plan is now that EloShapes is also doing 3D scans... the answer is: nothing changed! I think that competition is good. I'm a mouse nerd and a LONG TIME EloShapes user, so the more options I get as a user, the better. I really think that my vision and EloShapes, though, is fundamentally different. I've had the domain for FMM for years, I have a much larger platform in mind compared to what I built so far. The 3D scans are the necessary step for my vision, not the end goal: they never were. I am also very proud of the fact that I've been able to scan at least 5 mice in full color EVERY DAY with my own pipeline, and I think I will easily have almost full coverage of the mice people would want on the website within months. Check the video for the mice I've added in just the last few days, all textured. Apart from some of the features I've mentioned before (like the virtual hand/grip), FMM is meant to be kind of a "MyAnimeList" for mice, so that you can share your profile, like this: and then later on leverage this curated list of mice you build for yourself to be shown even more mice you might like. FMM is also a tool for reviewers and anyone that wants to share their opinions on mice. Links like this: allow you to easily share your specific points about a mouse via the annotations, viewpoint sharing and measurements. And of course, as I've said since the beginning, I will use FMM to centralize all my hard testing about sensors, dpi, and such, so that you'll be able to find EVERYTHING about a mouse, including the raw results of my standardized sensor implementation testing. So, with this said, I will keep scanning as many mice as possible and adding as many features as possible until FMM becomes what I envisioned so long ago. Join my discord if you want to keep up! I hope you'll all give it a try! Ciao!

bardOZ (Giovanni Laerte Frongia)

24,967 görüntüleme • 1 ay önce

This is probably the most complex workflow I’ve ever built, only with open-source tools. It took my 4 days. It takes four inputs: author, title, and style; and generates a full visual animated story in one click in ComfyUI . I worked on it for four days. There are still some bugs, but here’s the first preview. Here’s a quick breakdown: - The four inputs are sent to LLMs with precise instructions to generate: first, prompts for images and image modifications; second, prompts for animations; third, prompts for generating music. - All voices are generated from the text and timed precisely, as they determine the length of each animation segment. - The first image and video are generated to serve as the title, but also as the guide for all other images created for the video. - Titles and subtitles are also added automatically in Comfy. - I also developed a lot of custom nodes for minor frame calculations, mostly to match audio and video. - The full system is a large loop that, for each line of text, generates an image and then a video from that image. The loop was the hardest part to build in this workflow, so it can process either a 20-second video or a 2-minute video with the same input. - There are multiple combinations of LLMs that try to understand the text in the best way to provide the best prompts for images and video. - The final video is assembled entirely within ComfyUI. - The music is generated based on the LLM output and matches the exact timing of the full animation. - Done! For reference, this workflow uses a lot of models and only works on an RTX 6000 Pro with plenty of RAM. My goal is not to replace humans, as I’ll try to explain later, this workflow is highly controlled and can be adapted or reworked at any point by real artists! My aim was to create a tool that can animate text in one go, allowing the AI some freedom while keeping a strict flow. I don’t know yet how I’ll share this workflow with people, I still need to polish it properly, but maybe through Patreon. Anyway, I hope you enjoy my research, and let’s always keep pushing further! :)

Lovis Odin

56,518 görüntüleme • 8 ay önce

Ever since I wired Claude Code to WhatsApp 3 weeks ago, I built a stupidly large infra around it. I mean, opus built it. No clue how the code even looks. The entire thing was vibe coded using my phone. I wanted to see how far I could push it without touching the computer. Everything via WhatsApp. Build what I need on the fly. So the resulting infrastructure will already be battle tested for software development. The entire thing was streamlined with nearly no manual interventions, everything was communicated via WhatsApp using a single script establishing this connection. If the script is down, I need to get home to start it again to resume the development. Claude was upgrading it, debugging it, restarting it while maintaining constant uptime so it could keep communicating with me. I stressed Claude about it, telling it that it will be “in the dark” and other words that deliberately sound scary about losing communications if the script dies. I also refused git and refused cloning the code, I wanted to see Claude adapting to work on a *LIVING* system. The way this whole thing works: Claude has its own dedicated phone number that I am paying for. A real WhatsApp account for it is installed on a real iPhone that is sitting on my desk. All is registered under my name, this is legit setup with no hacks and tricks. I’ve set up a WhatsApp “Community” and multiple different groups under it. Both me and Claude are the admins, so Claude could edit it on my behalf. Each group is a project I am working on and has its own isolated context. The Group description is a system prompt that gets auto-appended to the larger system prompt explaining this setup in general. When I send a message it’s an instant interrupt to Claude Code’s process, just like in the terminal. Voice notes are seamlessly transcribed with a local Whisper model. Images are used with multimodal reading in an isolated parallel session. Multiple groups running in parallel so I can work on all projects at the same time. No cross-talking, everything has an isolated context and history. And because it’s local on my own machine: Everything is REAL. The browser is REAL. I am connected as myself on it to all services because I actually use it in real life. Claude has unlimited internet access, just like humans who use actual browsers. It utilizes custom-made browser tools that I made to control any browser session it wants. Depending on the situation, it can either connect to my existing session or create one for its own. (You can tell it ‘look at my browser for a sec’ then talk about the current page you are on and it just works, pretty cool) My custom browser tools are not perfect (not by a long shot) but I managed to make them work well to the point they are somewhat reliable. This gives Claude full access to my real creds and all the services I actually use. I’m productive AS HELL with this. It really feels like a personal assistant. I ask it to read my emails and msgs, check x .com for news, research arxiv papers, write code, run experiments for me, investigate and reverse engineer github repos, even use my credit card and order things. [I try not to do this one a lot lol so far no disasters]. All from my phone. Super convenient. This is not a product or an open source project (maybe soon of it will make sense). This is just an ugly script I hacked the entire thing is ~600 lines. (ok maybe i did look at the code, but i swear i didn’t edit!) You can also vibe code this from scratch pretty fast and it will probably even end up better. This is just a cool thing so I’m sharing. It is a real speed booster for many things I do on daily basis, mostly boring things. Forcing my routine into some new “agent platform” just didn’t feel right for me. WhatsApp is where I already communicate and look for messages, so I decided that my agents will live there too. AGI in my pocket 24/7.

Yam Peleg

419,379 görüntüleme • 5 ay önce

Pi was built when there were already agent harnesses around. Here’s why Mario Zechner(Mario Zechner), found them suboptimal and built Pi, a minimalist self-modifying agent: #1 - Mario initially was a believer in Claude Code: "I was a believer in Claude code because they were the first that packaged agentic search up in a really compelling package. And at the time that fit my workflow really well. Everything around the LLM was kind of nice and tidy and easy to understand. I was super happy. I was proselytising Claude code." #2 - Reverse engineering Claude Code highlighted the degradation that Mario felt as a user: "I personally like simple tools that are stable and that I can rely on. Even if they have non-deterministic parts, all the deterministic parts should be as stable as possible. That was just not the experience with Claude Code around summer 2025. They would take away your control of the context. They would inject stuff behind your back, which is bad. Then, your workflows stopped working because there's now a system reminder that you don't even see in the UI that would modify the behaviour of the model. They would also do this to the system prompt. I built a little service where I can track the progression or evolution of the system, prompt and tool definitions and, with every release, it was messing with stuff. That just messed with my workflows and I don't appreciate that." #3 - PI was built with an appreciation for simple and reliable tools: "If I commit to a development tool, I want it to be a stable, reliable thing like a hammer. I don't want my hammer to break a different spot every day. That's terrible. We need somebody who goes the full velocity kind of way. But I don't want to work with a tool like that."

The Pragmatic Engineer

62,545 görüntüleme • 1 ay önce

Day 8-9 of building an AMM. (repos in the next post) I added the protocol fee + the swap fees to LPs yesterday. It's the same as Uniswap v2, which is 0.25% (goes to LP) + 0.05% (goes to protocol) per swap. It was actually not that easy, as you want to spend the least amount of gas (CUs) possible, and you have to compound the fees into the pool. But I figured it out and in the end, it hopefully works. I don't transfer the fee out of the pool on every swap for the protocol, fees just go to the liquidity pool's LP token associated token account on deposits/withdraws, which I'll be able to withdraw later. (instruction still needs to be implemented) I still haven't written a test in Mollusk, and instead I started working on the frontend, for reason I won't tell you now... but just let me tell you that there may be good thing coming out of this whole AMM project besides just the AMM. So we'll put the tests on hold, for now. About the frontend... well yeah. This is the part I hate the most. I don't like frontend development. I'm not really good at it, and I can just get by fine, especially with Claude. If vibecoding was invented for something, it was frontend. God bless Claude for being a React beast. All I really did with the frontend today was initilize it from the solana template, and make the basic interface without hooking it up to the program. So everything you see is just mock data, I have attached a small little video on how it looks now. The recorder is laggy for some reason, so bear with me, but the actual site is fine, for the most part. (and also the video quality is really shitty on X, idk why) Just FYI, I'm gonna be taking a break from building this AMM for a couple days, and there will be less updates for a bit - I have exams soon for university and I have to study, and I also have to work on some other stuff. Maybe I'll stick to these updates to once a week? We'll see, I like doing it a lot so I might end up working anyways lmfaooooo. Your patience & attention is always appreciated, and thank you for following along!

8bitpenis.sol

60,037 görüntüleme • 5 ay önce

🔥 Battle for the top reasoning LLM intensifies! The QwQ-32B-Preview is a very good reasoning LLM. Full video of my tests here: Summary of my findings and thoughts: It was able to solve a couple of hard math problems so it looks very promising for maths. It didn’t do so well on my coding task (generating bash script). By the results reported on the LiveCodeBench it has room for improvement. One thing that’s become very clear to me is that the reasoning capabilities of these LLMs are significantly closing the gap between the open and closed-sourced models. The competition is now going to be on a different level and it's going to be focused on which model produces the most efficient, optimized, accurate, and fastest reasoning steps beyond just accurate responses. That's what developers will care about. Traditional benchmarks are not going to be good enough for this. On that note, it's getting harder to assess these models, especially the consistency, efficiency, and quality of reasoning steps. After experimenting with this model, I realized that the reasoning paths are not fully optimized and there is a lot more optimization that needs to happen before these models are used in production settings. There might be a need to build some type of native and efficient self-assessment or self-reflection capability that prevents these reasoning LLMs to go in loops or produce unnecessary lengthy sequences. I also noticed that this model, at least from the HF demo, doesn’t separate the reasoning from the response. I think that actually hurts the performance of the model. On the other hand, o1 and R1 do that really well. In addition to that, I believe the training on reasoning is hurting the performance of the LLM in other areas such as helpfulness (check the code example in the video). Something that’s necessary at the moment is validating or evaluating the quality of the reasoning chains and figuring out a better strategy to optimize them. Current methods are probably not sufficient to solve this problem but that's where innovation will comes next. I recognize that this is a first effort so kudos to the Qwen team on this release. These issues highlight the importance of transparency with reasoning LLMs. We need to know how it was trained and with exact data or optimization strategy. Understanding that will enable researchers and developers to build better intuition and improve the reasoning capabilities and components at a faster rate. There is an opportunity for someone or a company to build a truly open-reasoning LLM. The race is on! I will continue to track the state-of-the-art in reasoning LLMs and report my takes and observations here. Stay tuned for more.

elvis

14,740 görüntüleme • 1 yıl önce

The most interesting part for me is where Andrej Karpathy describes why LLMs aren't able to learn like humans. As you would expect, he comes up with a wonderfully evocative phrase to describe RL: “sucking supervision bits through a straw.” A single end reward gets broadcast across every token in a successful trajectory, upweighting even wrong or irrelevant turns that lead to the right answer. > “Humans don't use reinforcement learning, as I've said before. I think they do something different. Reinforcement learning is a lot worse than the average person thinks. Reinforcement learning is terrible. It just so happens that everything that we had before is much worse.” So what do humans do instead? > “The book I’m reading is a set of prompts for me to do synthetic data generation. It's by manipulating that information that you actually gain that knowledge. We have no equivalent of that with LLMs; they don't really do that.” > “I'd love to see during pretraining some kind of a stage where the model thinks through the material and tries to reconcile it with what it already knows. There's no equivalent of any of this. This is all research.” Why can’t we just add this training to LLMs today? > “There are very subtle, hard to understand reasons why it's not trivial. If I just give synthetic generation of the model thinking about a book, you look at it and you're like, 'This looks great. Why can't I train on it?' You could try, but the model will actually get much worse if you continue trying.” > “Say we have a chapter of a book and I ask an LLM to think about it. It will give you something that looks very reasonable. But if I ask it 10 times, you'll notice that all of them are the same.” > “You're not getting the richness and the diversity and the entropy from these models as you would get from humans. How do you get synthetic data generation to work despite the collapse and while maintaining the entropy? It is a research problem.” How do humans get around model collapse? > “These analogies are surprisingly good. Humans collapse during the course of their lives. Children haven't overfit yet. They will say stuff that will shock you. Because they're not yet collapsed. But we [adults] are collapsed. We end up revisiting the same thoughts, we end up saying more and more of the same stuff, the learning rates go down, the collapse continues to get worse, and then everything deteriorates.” In fact, there’s an interesting paper arguing that dreaming evolved to assist generalization, and resist overfitting to daily learning - look up The Overfitted Brain by Erik Hoel. I asked Karpathy: Isn’t it interesting that humans learn best at a part of their lives (childhood) whose actual details they completely forget, adults still learn really well but have terrible memory about the particulars of the things they read or watch, and LLMs can memorize arbitrary details about text that no human could but are currently pretty bad at generalization? > “[Fallible human memory] is a feature, not a bug, because it forces you to only learn the generalizable components. LLMs are distracted by all the memory that they have of the pre-trained documents. That's why when I talk about the cognitive core, I actually want to remove the memory. I'd love to have them have less memory so that they have to look things up and they only maintain the algorithms for thought, and the idea of an experiment, and all this cognitive glue for acting.”

Dwarkesh Patel

1,049,820 görüntüleme • 7 ay önce

REMINDER: The genome for SARS-CoV-2 is a "consensus sequence." Anybody who says the gene sequence for SARS2 is confirmed is *confused or lying.* "[The reality is that] nobody has the code of the pathogen..." "What they set for the control [for the PCR 'test'] is a consensus sequence, which [means] they took AI, they averaged out a section of the genome that they want as that test, and they set it for that. So it doesn't even exist in nature anywhere." This is a clip from a recent discussion between former medical coder and whistleblower Zowe Smith (Zowe Smith) and retired pharma R&D executive Sasha Latypova (sashalatypova.substack.com "Due Diligence and Art"). Smith and Latypova discuss the many shortcomings of PCR "tests," as well as the fact that no genome for SARS-CoV-2 has ever been characterized—only a "consensus sequence," which Smith notes is developed when AI "average[s] out... a section of the genome that they want as a test." She adds, "it doesn't even exist in nature anywhere." Latypova confirms, "when they're saying, 'Oh, we have the COVID virus, the full genome...it's been sequenced. Look at all these papers.' [The reality is that] nobody has the code of the pathogen... Ralph Baric also wrote about it in his work all the time. So nobody has the pathogenic sequence." The pharma insider adds, "What they upload to GenBank is... averaged... And once it's averaged, it's no longer pathogenic anything. It's just a model. And then for PCR, [it] doesn't test the full genome. They do these, like, snippets, and then whatever snippet you wanna set it to, you will find it, and that's how they find... positive COVID. So all of this is total BS." Interestingly, Smith notes that when she worked with PCR in a lab at the Oregon Health and Science University "[she] realized that everything was controlled through EUA [Emergency Use Authorization] and the CDC, so there was no way to independently verify [the controls that were used]."

Sense Receptor

22,788 görüntüleme • 1 yıl önce

Eric Rohmer on the use of Colour in "La Collectionneuse" (1967) and "Claire's Knee" (1970): "I didn't use color as a dramatic element, as some filmmakers have done. For me it's something inherent in the film as a whole. I think that in 'La Collectionneuse' (1967) color above all heightens the sense of reality and increases the immediacy of the settings. In this film color acts in an indirect way; it's not direct and there aren't any color effects, as there are for example in Bergman's most recent film, his second one in color, where the color is very deliberately worked out and he gets his effects mainly by the way he uses red. I've never tried for dramatic effects of this kind, but. for example, the sense of time-evening, morning, and so on-can be rendered in a much more precise way through color. Color can also give a stronger sense of warmth, of heat, for when the film is in black-and-white you get less of a feeling of the different moments of the day, and there is less of what you might call a tactile impression about it. In 'Claire's Knee' (1970), I think it works in the same way: the presence of the lake and the mountains is stronger in color than in black-and-white. It's a film I couldn't imagine in black-and-white. The color green seems to me essential in that film, I couldn't imagine it without the green in it. And the blue too-the cold color as a whole. This film mould have no value for me in black-and-white. It's a very difficult thing to explain. It's more a feeling I have that can't be reasoned out logically." (Eric Rohmer's interview with Graham Petrie, Film Quarterly, 1971)

DepressedBergman

56,251 görüntüleme • 11 ay önce