Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Introducing /visual-plan - a skill to generate rich, visual plans for Claude Code and Codex. Plan mode in Claude Code is incredible. But I always find my eyes glazing over when it gives me this huge markdown essay in my terminal. I found I can make much better visual...

118,354 Aufrufe • vor 4 Tagen •via X (Twitter)

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Bash is all you need! Which is why I'm introducing my holiday project: just-bash just-bash is a pretty complete implementation of bash in TypeScript designed to be used as a bash tool by AI agents. Because it turns out agents love exploring data via shell scripts, even beyond coding. It comes with grep, sed, awk and the 99th percentile features that an agent like Claude Code or Cursor would use. In fact, Claude Code can use it for secure bash execution. In the package - A bash-tool for AI SDK - A binary for use by yourself or your coding agents - An overlay filesystem to feed files to your agent securely - A Vercel Sandbox compatible API, so you can quickly upgrade to a real VM if you need to run binaries - An example AI agent that explores the just-bash code base using just-bash - I imported the Oils shell bash compatibility suite and just-bash passes a very good chunk What is interesting about this codebase: It was essentially entirely written by Opus 4.5. Coding agents love bash and they are good at reproducing it. They are also great at text-book recursive descent parsers and AST tweet-walk interpreters. That said, it is, like, a lot of code and I didn't read it all 😅. This is very much a hack, but it also seems to be _really_ useful. I haven't really found anything agents want to use that it doesn't support and it's fast and secure (caveats apply). It doesn't have write access to your computer and the filesystem is given a root that the agent cannot escape from. Find it at Related: Our recent blog post how we migrated our data analysis agent to bash tools and achieved incredible quality improvements The video shows the example agent investigating the just-bash code base

Malte Ubl

124,713 Aufrufe • vor 5 Monaten

Ever seen a fresh (20x) Claude Max account's 5-hour usage allowance get drained in ~14 minutes? Feast your eyes on my bizarre life now with this screen recording of a recent live work session, something I've gotten at least 100 requests for over the past month. Maybe you can understand now why I need so many accounts and how I can work on so many different projects. You can also see the truth of what I was saying recently about how, once your plan is done and the beads made and polished, it's mostly just machine-tending the swarm that doesn't require much thought. Lots of just telling it to get the next bead and work on it, to review code, to re-read AGENTS dot md after a compaction, etc. And you can see how I use gemini-cli for code review. I give Google a lot of crap for the harness being broken and the capacity overloads, but when it works, it's actually really good for this code review use case. I don't usually let it write new code, though, because I think Opus and 5.2 do a better job. Also, sorry the recording is a bit blurry; I have a 5K resolution monitor and screen recordings usually are hard to watch from it. And btw, this really wasn't that normal of a session for me, it was more frenetic than usual, because I don't want to dox myself or my clients by accident. Hence all the ceaseless terminal tab swirling. I usually do more planning work while this stuff is going on, but I wanted to minimize the chances of leaking important information. That's also why I didn't refresh the Gemini login in the WezTerm window, which killed me, trust me. It's the reason I hate doing these screen recordings in the first place; it kills my productivity. Anyway, hope you liked it. I will also post to YouTube, see reply for link. Thanks for watching.

Jeffrey Emanuel

86,013 Aufrufe • vor 5 Monaten

New Andrej Karpathy interview Says AI agent failures stem from user skill, not model capability. Poor instructions cause errors. He suggests delegating 20-minute macro actions like coding and research to parallel agents and reviewing their work. --- "I think everything, like so many things, even if they don't work, I think to a large extent you feel like it's a skill issue. It's not that the capability is not there; it's that you just haven't found a way to string together what's available. Like, I didn't give good enough instructions to the agents in the file, or whatever it may be. I don't have a nice enough memory tool that I put in there, or something like that. So, it all kind of feels like a skill issue when it doesn't work to some extent. You want to see how you can parallelize them, and you want to be a 'Pierce tender,' basically. Pierce famously has a funny photo where he's in front of lots of these Codex agents behind the monitor. They all take about 20 minutes if you run them correctly and use high effort. You have multiple—you know, 10 or 20—pull requests checked out. It's just like you can do much larger macro actions. It's not just, 'Here's a line of code, here's a new function.' It's like, 'Here's a new functionality, delegate it to agent one. Here's a new functionality that's not going to interfere with the other one, give it to agent two.' Then, you try to review their work as best as you can, depending on how much you care about that code. You look for these macro actions that you can manipulate your software repository by. Another agent is doing some research, another agent is writing code, another one is coming up with a plan for some new implementation. Everything just happens in these macro actions over your repository. You're just trying to become really good at it and develop a muscle memory for it. It's very rewarding when it actually works, but it's also a new thing to learn. Hence, the psychosis." --- From No Priors YT channel (link in comment)

Rohan Paul

23,090 Aufrufe • vor 3 Monaten

I asked Garry Tan how to use meta prompting to get better at AI: "My partners at YC Jared Friedman and Pete Koomen showed me how to do this. You can take almost anything that you do all the time and just drop it into a context window. And then say, “Here’s a bunch of inputs and outputs." And maybe you also add a bunch of notes. And then you tell it, “Write me a prompt that can act as an agent that takes this input and makes this output over here.” You can do this for almost any type of knowledge work. And you can even introspect. "What are things you notice that I did to convert this from the input to the output?”. And then you can just start using the prompt. Initially, it’s going to suck. Because it’s just not that smart yet. But what’s funny is now, I also use it to Iterate my writing. You can be very direct, "I would never say that", "Don’t say it like this", or "Oh, you used the long word there, use the short word". Just speak to it conversationally. And then when you're happy with the output, you can use that new output to make a new prompt. "Based on this conversation, give me a better initial prompt that incorporates all the things we talked about." And you can do this with literally everything. And in theory, there’s so much it applies to that people do day-to-day. You could use it for tweets. You could use it for editing podcasts. You can use it for pretty much everything. I have a folder of prompts that I use all the time. My YouTube prompt is on v27 or something. I'll go through this process with all the different max models. I'll use GPT 5.2 Pro. I’ll use Grok. I'll use Claude. Then, I’ll take all the outputs from all the models and put them into Claude and say "Here’s my prompt, here’s the output from four LLMs, including yourself. Rate each response and tell me what the pros and cons of each approach are." And I usually say "give it to me in numbered form". And then you can agree with one, disagree with two, tell it three is this or that. And then after that, you say given all of this, synthesize it."

The Peel

51,632 Aufrufe • vor 3 Monaten

If you watch this ~50 minute screen recording closely (yeah, I know, it's long; there are also some times when my computer was very slow and laggy, just skip past that part. And at one point I had to run and get my 9-month-old a new bottle and left it on a boring screen, sorry!), I believe you can see real signs of the kind of runaway, recursive AI self-improvement that people have been warning of for a while (Mr. Kurzweil most notably and prophetically). Why do I say that? What's different now? Well, there's a reason my set of agent coding tooling is called the Flywheel. These tools all mutually self-reinforce each other. And they all flow directly into my ntm tool (short for "named_tmux_manager"), which acts as a sort of integration point and nerve center for the tools (this is becoming more true by the minute as I'm now seriously working on ntm). Now, ntm was something I started making to automate some aspects of my workflow, but it was the kind of thing where, until it was perfect, it sort of just slowed me down. So I didn't actually use it even though I kept working on it and trying to improve it, and suggested to users that they try it in my tutorials. Well anyway, I finally got around to "dogfooding" ntm last night, and now it's going to get very dramatically better at an alarming rate. Some of that is from applying my "idea wizard" prompt to generate more useful features and building that stuff out and addressing obvious pain points I encountered during my newfound usage of the tool. But a lot comes from my realization that, once again, ntm's true utility is not as a tool for ME, but for an agent. That is, ntm lets one instance of Claude Code or Codex act as, well, me, do the things that I had been doing manually. Do I wish I had started using ntm earlier? No, for two big reasons: 1) Doing it manually helped me build up my intuition massively, which directly led me down the path of creating useful prompt strategies and workflows; these often began as ad-hoc prompts that I realized could be generalized and made more versatile/universal. Lesson: don't prematurely automate until you have an intimate, intuitive feel for your "core value-add loop." Otherwise you'll have a fully automated system quickly that efficiently and automatically does a stupid or otherwise sub-optimal thing. 2) My eyes have been opened to the beauty and power of Skills. I'm not talking about your garden-variety skills that are just a simple markdown file. I'm talking about true tour-de-force directories of perfectly structured and organized files that are filled with good information, insights, workflows, etc., but presented in a way that is highly optimized for consumption by AI agents, with extreme attention paid to things like perfect progressive disclosure, token density, agent-ergonomics, agent-intuitiveness, etc. And also Skills that go way beyond markdown files, with full integration into Claude Code where it makes sense via hooks, sub-agents, and even Python scripts. These kinds of skills are a qualitative difference in expressive power and usefulness and a total game changer. They are also effectively composable, creating almost an algebra of skills that let you use them together in powerful ways. I'm working on a subscription service website and CLI tool now to share what I've learned here most effectively, stay tuned for that in the coming days. Anyway, I now know what to make and how to make it. So, getting back to that screen recording, what does it show that makes me claim recursive self-improvement is here? If you keep your eye on the upper left tmux pane, that's the "controller" agent. It is using ntm to control all the other panes which are also running Claude Code (but ntm fully supports other agent types like Codex and Gemini-CLI, and it's trivially easy to mix and match them if you wanted to have, say, 8 CCs and 6 Codexes for writing the code and 3 Gemini-CLIs for reviewing code.) Now, there's nothing that crazy about this much so far. But where it starts to get very cool is that as the session continues and we encounter real-world problems, things like my ridiculously overloaded computer that keeps hanging for long periods, Claude Code instances that crash and get into a frozen, unresponsive state, it can learn from that. And you can see it using my skill writing skill to refine its ntm vibe coding skill in real time. And then take that skill and refine it to be more intuitive for itself. Or use my cass tool skill to search all the session histories to look for problems that came up and strategize how to solve them. The most useful part was when, towards the end of the session, I told it to reflect on all the things we had done and problems we encountered. One way it can usefully leverage those reflections is by improving its ntm vibe coding skill to make it cover more edge cases and exigencies. But the other, more fundamental, way is for it to conceive of and design the optimal new features and functionality for ntm itself so that the tool embodies those lessons in a first-class way. This offloads cognition from its brain onto its tooling, just like how a person can lean on spellcheck or a calculator. It codifies correct, effective reasoning at the tool level, where it's more reliable and robust and repeatable. And btw, did you notice what code base it was working on the whole time? It was none other than ntm itself! So as it worked on its own tool, it had reflections and ideas about how to further improve the tool. Now, it could have just as easily gotten those insights and ideas while using ntm to work on a different project, but the fact that it was working on itself is almost gloriously meta and recursive. So by the end, after learning from tending to a big group of agent workers (btw, I have previously emphasized doing everything in a really distributed/decentralized way, where each fungible agent gets identical marching orders that tell it to use my bv tool to find the optimal bead to work on. This does work very well, but occasionally results in some contention and overlap from thundering herd, or at least wastes time/tokens/communication in avoiding that before the agents waste time duplicating work. But in this new ntm-oriented workflow, I was able to have the controller agent in the upper left use bv itself and then optimally parcel out the instructions to each agent so that we could know for sure that there's no overlap), I ended up with a ton of new beads for new features, which I had it optimize and polish a few times. Now I can swap to a new Claude Max account and have the swarm implement all those new features! It should only take a couple passes like the one shown in the screen recording to get everything implemented. Then we can rinse and repeat, having the agent read through the full session histories of each agent and its experience from its own session in sending ntm commands and seeing how they worked out in practice, to come up with the next batch of changes to both its ntm vibe coding skill AND to the ntm tool itself. Do you see how rapidly this turns into Skynet? My mistake earlier was in focusing on making myself a "faster horse" as Henry Ford used to joke about customers wanting before he showed them what they should really want (a Model T). That is, something that would make my experience nicer while doing this agent swarm based development workflow. But the obvious lesson is that you should make all your tooling agent-first because the agents are just better at this stuff. You can still watch, and of course I did add a ridiculous number of very nice human-centric features to ntm that you'll be seeing in the next day or two, but those are really kind of "for fun" to make us humans feel better about the process. All the real value-add is happening "by agents, for agents." PS: Towards the end, you can see me switch to my Mac and tell Claude to improve the skill that I made earlier today for taking the mkv screen recording files from OBS Studio and muxing them into MP4 files for sharing, while downloading songs from YouTube to serve as the background music. I made it so it can also grab the thumbnails and generate little song credit cards that show up in the lower right corner. This worked perfectly the first time! I'll include some screenshots in a response post showing how that worked, but it was awesome to witness. Skills are POWERFUL. I'll also post a link to this video on YouTube if you prefer to watch it there.

Jeffrey Emanuel

25,483 Aufrufe • vor 5 Monaten

New Tools for a New Era. Coding agents like Claude and Cursor have dramatically reduced the time it takes to go from idea to functional software. But the experience of designing and refining with them sucks. One reason for this is that while the terminal is an incredible tool for communicating direction with language, it is a terrible tool for defining and exploring visual and interactive objects. Here is one idea for how we might fix a small part of that. In the old world, when you wanted to create a transition or animation in your app, you would type some code, refresh your local server, and click to run your animation. It probably wasn't right, because after all no one can know what 'cubic-bezier(0.3, 0.05, 0.45, 1)' really feels like when you read it. You need to see it. Feel it. Interact with it in a real world context. So you'd edit some values, save, refresh, and keep guessing and checking until it felt right. Today, you can write a quick, single-use tool that's a visual studio for designing animations. You can then configure some components and containers common in all apps, and explore different animations in real time, adjusting key properties, and getting it just right. Then, you can copy a highly detailed prompt (or export a skill containing all your animations) that captures your intent and direction with perfect clarity. Paste this into your terminal and your agent instantly implements it everywhere. To me, this is an improvement over the old world, and a better way to work in today's. I'm extremely excited to see the ways in which our ability to rapidly create software will shape how we design software tomorrow. Feedback, ideas, and critiques welcome!

joshpuckett

81,788 Aufrufe • vor 5 Monaten