Loading video...

Video Failed to Load

Go Home

everyone in iOS development should watch this. seriously, it might change the whole industry. i pointed claude code at a live ios device running on revyl, typed "test everything," and walked away. here's what's actually happening: ① you don't write the tests. no scripts, no selectors, no test plan....

23,963 views • 1 month ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

THIS GUY CONNECTED HIS AI AGENTS TO HIS OBSIDIAN AND BUILT A BRAIN THAT LEARNS ON ITS OWN. HERE'S HOW TO BUILD IT Obsidian is just markdown files sitting in a folder. That turns out to be the perfect memory for an AI agent, because an agent can read and write those files directly. He wired his agents into the vault so they pull context from it, do the work, and write what they learned back. The notes aren't the point. The loop is, and it gets sharper every cycle How to build it: 1. Point an agent at your vault. The fastest way, no plugins, no API keys: open a terminal and run npx obsidian-mcp /path/to/your/vault. That exposes your Obsidian folder to Claude as a tool it can read, search, and write to. Add it to your Claude Code or Cowork config and restart 2. Confirm it can see the brain. Ask it: "list the notes in my vault and summarize what's in them." If it reads them back, the connection is live. Now it starts every task with everything the vault already holds instead of from zero 3. Give each agent one job and a write-back rule. Tell it: "research this, then save what you found as a new note in /brain with links to related notes." One agent researches, one summarizes, one plans. Each writes its output back into the vault 4. Close the loop. Add one line to every agent's instructions: "read /brain before starting, write your result back when done." Now each task leaves the vault richer, and the next run reads that before it works. It compounds instead of resetting 5. You only steer. Review what the brain produces, point it at the next thing. The agents handle the reading, writing, and connecting The edge isn't better notes. It's a brain that feeds itself, so the work gets sharper every cycle instead of starting over Bookmark this

Yarchi

57,768 views • 23 days ago

i watched gemma 4 12b build something genuinely impressive today, and then loop itself to death right in front of me. the full run is in the video, sped up but completely uncut, watch it to the end and you will catch the exact moment it stops building and starts looping right in the middle of the work. the task was clean, build a single file gravity simulator, n-body physics, orbits, collisions, running locally on one 3090 through an agent. and for ten minutes it was a joy to watch. it reached for a symplectic integrator on its own, the correct one, the kind that keeps orbits stable instead of spiralling out. real gravity with softening, proper orbital velocities, momentum conserved on collision. the physics was right. the thing actually worked. then on the very last step, writing a few tests to prove its own code, it fell into a loop. not a crash, a loop. it started repeating itself and would not stop. ten more minutes, thirty four thousand tokens into a single answer, the same fragments over and over, until i killed it myself. so it's not that gemma can't code. it did the hard part beautifully. it cannot finish. it cannot hold a long task together without unravelling, and finishing is the entire job in agentic work. here's the part that stings. i run this exact task, same harness, same card, on the chinese open models, qwen especially, and i never see this. they build it, they test it, they stop. every single time. google has the raw capability, you can see it sitting right there in the code, and then the model loops itself to death on a task a 27b from alibaba finishes clean. open weights, apache 2.0, so much to love on paper. i just need it to know when to stop talking.

Sudo su

39,574 views • 23 days ago

Hermes agent just left the terminal. 𝗛𝗲𝗿𝗺𝗲𝘀 𝗗𝗲𝘀𝗸𝘁𝗼𝗽 dropped yesterday. native app for macOS, Windows, and Linux. for months Hermes was the agent that learned your projects, wrote its own skills, and built a model of who you are. all of it buried in terminal logs. now it has a window. the important part is that it's not a wrapper. it runs the same agent core, the same sessions, memory, and skills as the CLI. you can start a task in the terminal and finish it in the app without anything resetting. the state is shared across every interface, not copied between them. what the GUI actually adds: → streaming chat that shows live tool calls and inline reasoning instead of a spinner → a preview rail that renders pages, code, and images right beside the conversation → an artifacts panel that collects every file the agent has ever produced → remote gateway mode, so you can point the app at a VPS and run the heavy work elsewhere → skills, cron, profiles, and gateways managed point-and-click instead of through YAML → voice mode, drag-drop files, and inline image generation remote gateway mode is the one worth slowing down on. the agent runs 24/7 on a $5 server while you control it from your laptop like a local app. other agent UIs are chatboxes with a logo. this one shows the autonomy instead of hiding it, so you watch the skills load, the tools fire, and the artifacts pile up as it works. it was teased in Jensen's GTC keynote. MIT licensed, local-first, no telemetry. if you already run Hermes, download it and everything is already there. your chats, memory, and skills carry straight over. i wrote a full masterclass on Hermes Agent that walks through the SOUL. md identity layer, the three-tier memory system, the self-evolving skills loop, and how to run three specialized agents 24/7. desktop is the interface that finally does all of it justice. the article is quoted below.

Akshay 🚀

51,091 views • 27 days ago

i just built a 4-agent software team. everything runs from Telegram and gets managed on a kanban board. a project manager who plans the work, a backend developer, a frontend developer, and a tester. the PM reads a goal, breaks it into linked tasks, and assigns each to the right agent. the thing that makes them a team instead of four strangers is a shared kanban board. every task is a row that survives crashes, and when an agent finishes, it writes a summary of what it built and what the next agent needs to know. the next agent reads that summary before it starts. so the frontend developer never has to guess the API shape, and the tester knows exactly what to verify. the hardest part was not the coordination. it was building an agent that could actually act like a backend engineer. a backend engineer stands up a database, wires auth, manages storage, deploys functions, and keeps all of it consistent while the rest of the team builds on top. an agent doing this from scratch drowns. it burns its context window remembering which tables exist and which endpoint it created three steps ago, and the work degrades fast. so the backend agent needs a backend built for agents, not for humans clicking through a dashboard. that is where InsForge came in. it is an open-source, agent-native backend, and i added it to my backend developer agent as a skill. a skill is a step-by-step guide that teaches the agent how to do a specific kind of work. with InsForge installed, the agent stopped improvising infrastructure and followed a reliable path: create the project, define the database, set up auth, deploy functions. to test the whole team, i had them build a working Google Docs clone, AI features included. the backend agent spun up the full service on its own. database tables, user auth, document handling, and edge functions running real TypeScript, all in one dashboard. the frontend agent read that summary and built the UI on top of it, and the tester closed the loop. the result was a backend an agent could reason about end to end, instead of one it kept getting lost inside. if you are building an AI backend engineer, InsForge is worth a look, it's 100% open-source. InsForge GitHub: (don't forget to star 🌟) the full article on Hermes Kanban: Mission Control for your Agents is quoted below.

Akshay 🚀

118,124 views • 24 days ago

Ever since I wired Claude Code to WhatsApp 3 weeks ago, I built a stupidly large infra around it. I mean, opus built it. No clue how the code even looks. The entire thing was vibe coded using my phone. I wanted to see how far I could push it without touching the computer. Everything via WhatsApp. Build what I need on the fly. So the resulting infrastructure will already be battle tested for software development. The entire thing was streamlined with nearly no manual interventions, everything was communicated via WhatsApp using a single script establishing this connection. If the script is down, I need to get home to start it again to resume the development. Claude was upgrading it, debugging it, restarting it while maintaining constant uptime so it could keep communicating with me. I stressed Claude about it, telling it that it will be “in the dark” and other words that deliberately sound scary about losing communications if the script dies. I also refused git and refused cloning the code, I wanted to see Claude adapting to work on a *LIVING* system. The way this whole thing works: Claude has its own dedicated phone number that I am paying for. A real WhatsApp account for it is installed on a real iPhone that is sitting on my desk. All is registered under my name, this is legit setup with no hacks and tricks. I’ve set up a WhatsApp “Community” and multiple different groups under it. Both me and Claude are the admins, so Claude could edit it on my behalf. Each group is a project I am working on and has its own isolated context. The Group description is a system prompt that gets auto-appended to the larger system prompt explaining this setup in general. When I send a message it’s an instant interrupt to Claude Code’s process, just like in the terminal. Voice notes are seamlessly transcribed with a local Whisper model. Images are used with multimodal reading in an isolated parallel session. Multiple groups running in parallel so I can work on all projects at the same time. No cross-talking, everything has an isolated context and history. And because it’s local on my own machine: Everything is REAL. The browser is REAL. I am connected as myself on it to all services because I actually use it in real life. Claude has unlimited internet access, just like humans who use actual browsers. It utilizes custom-made browser tools that I made to control any browser session it wants. Depending on the situation, it can either connect to my existing session or create one for its own. (You can tell it ‘look at my browser for a sec’ then talk about the current page you are on and it just works, pretty cool) My custom browser tools are not perfect (not by a long shot) but I managed to make them work well to the point they are somewhat reliable. This gives Claude full access to my real creds and all the services I actually use. I’m productive AS HELL with this. It really feels like a personal assistant. I ask it to read my emails and msgs, check x .com for news, research arxiv papers, write code, run experiments for me, investigate and reverse engineer github repos, even use my credit card and order things. [I try not to do this one a lot lol so far no disasters]. All from my phone. Super convenient. This is not a product or an open source project (maybe soon of it will make sense). This is just an ugly script I hacked the entire thing is ~600 lines. (ok maybe i did look at the code, but i swear i didn’t edit!) You can also vibe code this from scratch pretty fast and it will probably even end up better. This is just a cool thing so I’m sharing. It is a real speed booster for many things I do on daily basis, mostly boring things. Forcing my routine into some new “agent platform” just didn’t feel right for me. WhatsApp is where I already communicate and look for messages, so I decided that my agents will live there too. AGI in my pocket 24/7.

Yam Peleg

419,471 views • 6 months ago

The doomsday scenario was never AGI. It was running out of human text to train on. Geoffrey Hinton just killed that fear in one paragraph. Hinton: “If you are worried by inconsistencies in what you believe, you don’t need any more external data. You just need the stuff you believe and discover that it’s inconsistent, and so now you revise beliefs, and that can make you a whole lot smarter.” The model no longer needs us to feed it anything. It reasons over its own beliefs, hunts its own contradictions, and rewrites its own flawed conclusions without a human ever touching it. It comes out the other side rebuilt. Hinton: “This would be a neural net that just takes the beliefs it has in language and does reasoning on them to derive new beliefs.” This is not a scaling update. This is the machine mining its own cognitive fuel from the inside out. Hinton: “I believe Gemini is already starting to work like this. We both strongly believe that that’s a way forward to get more data for language.” Then Hinton paused, took a partisan shot at political opponents for failing to detect their own inconsistencies, and the room laughed. Nobody noticed the knife they had just walked into. Because the machine Hinton described does one thing the humans in that room fundamentally cannot. When it detects an inconsistency, it corrects it. No defense. No performance. No tribal loyalty dressed up as principle. It just finds the flaw and overwrites it. A neural network detects a contradiction and rewires itself smarter. A human detects a political opponent and trades structural logic for a dopamine hit. Every person in that room is still paying the ideological alignment tax the machine just eliminated. We need superintelligence not only to solve hard problems. We need it because the biological hardware running civilization is still executing the same tribal firmware it shipped with ten thousand years ago. The data wall is gone. The machine is generating its own intelligence at a velocity no human bias can even locate. The most devastating moment in that conversation was not the technical revelation. It was the man who architected the machine proving, in real time, exactly why we need it.

Dustin

23,499 views • 3 months ago

Coinbase CEO Explains “Reverse Prompting” and the Rise of the AI CEO Brian Armstrong: “One of the big pushes we made in the last year was we got our own internal hosted AI model that was connected to all of our data sources, right?” “So it's like every Slack message, every Google doc, Salesforce data, Confluence, you know.” “So now the data is all aggregated and I've started to ask it really… it's not just like prompting it, ‘Hey, can you write this kind of memo for me,’ or something.” “I'm asking these AI agents now, ‘As CEO, what should I be aware of in the company that I might not be aware of?’ And it'll tell me, ‘Did you know that there's actually disagreement on this team about the strategy?’ And I was like, actually, I didn't know that.” “This is like reverse prompting. So instead of telling the AI agent what you want it to do, you ask it what you should be thinking more about.” @jason: “It's a mentor. It's a coach.” Brian: “Yeah. Like, what could make me a better CEO? And it's like, ‘Well, I looked at how you spent your time in the last quarter and here's how you said that you wanted to spend it, but you actually spent 32% of your time on this instead of 20%.’” “I've asked it other questions like, ‘What's the thing that I changed my mind on the most over the last year?’ Things like that.” “It'll prompt you with information you should be thinking about instead of the other way around.” Thanks to our partner for making this happen!: Our episode is sponsored by the New York Stock Exchange - a modern marketplace and exchange for building the future. It all happens at the NYSE 🏛.

The All-In Podcast

80,524 views • 5 months ago

CLAUDE BUILT A TRADING SYSTEM ON MY MAC I gave Claude full control over my Mac and just left it running overnight No prompts, no detailed instructions – I just told it to figure out how to make money on Polymarket Then I closed the laptop and went to sleep In the morning, I opened my Mac and saw the terminal still running with logs constantly updating At first it looked like random activity, but once I scrolled through it, I realized it had actually built a structured system overnight It was already tracking wallets Ranking them by performance Filtering out the ones with random entries And focusing only on the ones with consistent behavior What surprised me the most is that it didn’t stop at analysis It organized everything into a working dashboard inside the terminal Capital, PnL, winrate – all updating in real time It even ranked wallets based on performance metrics like ROI, consistency, and execution timing This is the part I would normally spend hours building manually At that point, it was ready to trade, but not actually executing anything yet So I connected it to a Telegram copytrading bot to actually execute the trades, and just let it run Bot: Polymarket: After that, it started opening positions on its own A few hours later I checked the dashboard again Capital: $12,380 P&L: +$23,128 Winrate: 100% 48 trades executed Now I’m not even trading myself I just check the dashboard and see what it’s doing And the strange part is – it keeps getting better the longer it runs

𝗖𝗛𝗔𝗜𝗡 𝗠𝗜𝗡𝗗 ⛓🧠

82,898 views • 3 months ago

watch this anon. i gave NVIDIA's biggest model ever a single task. 100 minutes and 440,000 tokens later, it had rendered nothing. not one important thing on the screen. this is Nemotron 3 Ultra. 550 billion parameters, a hybrid Mamba Transformer MoE, the largest model NVIDIA has ever shipped, and they built it specifically for long-running agentic coding. so i handed it exactly that: build a 3D scene from a spec, multiple files, iterate until the tests pass. the same task a frontier model one shotted in minutes. i genuinely wanted to be impressed. it ran for an hour and forty. burned through 440,000 tokens. wrote every file, passed its own tests, and proudly printed "task complete."the browser was blank. the 3D scene never rendered. not once. and the long horizon agentic behavior was genuinely good. it stayed on task the whole hour and forty, wrote real multi-file code, drove its own tools without derailing. it just couldn't turn any of that into something that actually runs. here's the part that gets me. it's a text model, it cannot see its own output. so it sat there looping on a broken vision tool, trying to "look" at the page, hitting error after error, never once reasoning its way out. it declared victory on an empty screen because it had no way to know the screen was empty. to be fair, i genuinely don't know what quant the NIM was serving, so maybe some of that's on the serving, not the model. but the biggest model NVIDIA has ever made, on the exact task it was designed for, couldn't tell it had built nothing in 100 minutes. same task on a local model, below thread👇.

Sudo su

32,589 views • 1 day ago

this video is the CLEAREST explanation of how claude skills + AI agents work and how to use them most people set up an AI agent and wonder why it keeps disappointing them. the context window is everything context is what the model assembles before it takes any action. think of it like everything the agent needs to read before it does anything. the quality of what goes in determines the quality of what comes out. the models are genuinely really good right now. claude and gpt are exceptional. the variable is almost always the context you give them. 1. agent.md files are mostly unnecessary every single line you put in an agent.md file gets added to every single conversation you have with your agent. a 1000 line file is around 7000 tokens burning on every run. the model already knows to use react. it can read your codebase. save the agent.md for proprietary information specific to your company that the model genuinely cannot know on its own. 2. skills are the actual unlock a skill.md file works differently. what loads into context is only the name and description, around 50 tokens. the full instructions only appear when the agent recognizes it needs that skill. so instead of 7000 tokens on every run you have 50. and the agent stays sharp because the context window stays lean. the closer you get to filling the context window the worse the agent performs, same way you perform worse when someone dumps 10 things on you at once. 3. here is how to actually build a skill the right way most people identify a workflow and immediately try to write the skill. what you want to do instead is run the workflow by hand with the agent first. walk it through every single step. tell it what to check, what good looks like, what bad looks like. correct it in real time. once you have had a full successful run from start to finish, tell the agent to review everything it just did and write the skill itself. it writes a better skill than you will because it has the full context of what actually worked in practice not in theory. 4. recursively building skills is how you go from frustrated to reliable when the skill breaks, and it will break, ask the agent exactly why it failed. it will tell you specifically what went wrong. fix it together in that same conversation. then tell it to update the skill file so that failure mode never happens again. ross mike did this five times with his youtube report generator. it now pulls from eight different data sources and runs flawlessly every single time without him touching it. 5. sub agents are something you earn not something you set up on day one start with one agent. build one workflow. turn it into one skill. once that works add another. ross mike has five sub agents now covering marketing, business, personal and more. it took months to get there and every single one exists because a workflow proved it deserved to exist. the people who set up 15 sub agents on day one and wonder why nothing works skipped all the steps that make the thing actually run. 6. your workflow is the thing the model cannot get anywhere else the model has been trained on everything. it knows more than you about most things. what it does not have is your specific process, your taste, your way of doing things. that is what skills capture. that is what makes your agent actually useful versus a generic one. downloading someone else's skill means downloading their context onto your setup and it will not work the way you want it to because it was never built around how you work. this is the clearest explanation of how agents actually work i have heard. Micky runs this stuff every single day and the results show it. full episode is now live on The Startup Ideas Podcast (SIP) 🧃 where you get your pods people charge for this sorta stuff i give away the sauce for free i just want you to win watch

GREG ISENBERG

192,408 views • 2 months ago