正在加载视频...

视频加载失败

This Chinese developer runs 9 agents on Claude Code under a GPT-5.5 orchestrator and they close 500 client tasks a month without a single assistant. His client work is closed without him, on a single laptop and only three subscriptions. The entire system lives on one MacBook Pro M4...

29,917 次观看 • 1 个月前 •via X (Twitter)

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

This Chinese guy created agents in Claude Code for MCP servers and single-handedly serves 6 marketing agencies a month from one iPhone, earning $5,000 from each. Inside he runs a pipeline of 7 agents on Claude Sonnet 4.6 that every Monday pulls a scan of the tech stack from a selected agency, develops an MCP server for its ad accounts, and over the course of a week brings it to production code ready to connect to Claude Desktop. No DevOps, no senior developer, no project manager. Just a Mac Mini in a work corner, an iPhone in the pocket, and a single API key. And traditional dev shops keep 5 people on project rates for the same contract, while his entire P&L is tokens, dirt-cheap hosting on Cloudflare, and Calendly. 7 agents run under a shared orchestrator-router and burn about 5 million tokens a day, which in the API bill comes out to $540 a month. The Mac Mini itself sits at home and keeps the entire orchestrator running 24/7, and from the iPhone the owner connects to it through a secure remote terminal and sees the output of any session right on the smartphone screen, wherever he happens to be. His starting system prompt looks like this: "you run a solo shop for custom MCP servers for marketing agencies. you hand out read-only tasks to 6 sub-agents and own all commits and shipping yourself. sub-agents: // Hunter (finds marketing agencies of 15 to 60 people that have no MCP access to Google Ads, Meta Ads, TikTok Ads, and HubSpot) // Mapper (pulls their tech stack, identifies 3 to 5 integration pains, and simultaneously writes the technical spec for the server: which tools, resources, and prompts to export through MCP, which auth flow and rate limit) // Coder (generates an MCP server in Python through the MCP SDK, deploys 8 to 15 tools for ad accounts and CRM) // Validator (connects the server to Claude Desktop, runs real client API keys in a sandbox, and checks for compliance with the MCP spec) // Shipper (writes a README, integration guide, deployment manual, packages the server, and hosts it on Cloudflare Workers or pushes to the GitHub of the client) // Mobile (always online on the iPhone, books demo calls in Calendly, picks up hot fixes, and confirms contracts through a secure remote terminal to the Mac Mini). only 1 owner agent works on 1 contract, no overlaps. you pull the owner out of observation mode only when a deal goes above $7,500 or the test coverage of the server drops below 85%." This prompt gives the system an understanding of its role and the limits of intervention from the very first line. It knows it is supposed to find agencies on its own. It knows it is supposed to bring every MCP server to production on its own. It knows it connects the live owner only on large deals or when the tests do not converge. → The pipeline runs without breaks, day or night → Hunter goes through about 130 marketing agencies on LinkedIn and Clutch per day → Mapper rolls out 4 audit reports with the tech stack and a final spec for each → Coder writes 1 to 2 MCP servers per week in Python with 8 to 15 tools → Validator validates every server through Claude Desktop with real client API keys → Shipper rolls out the full documentation package and pushes the finished product to Cloudflare Workers or the GitHub of the client And only when a contract breaks $7,500 or test coverage drops below 85% does the orchestrator pull the owner from whatever he is doing. And when the owner at that moment is behind the wheel or at a meeting in a coworking space, the Mobile agent in his iPhone picks up 1 contract in progress: confirms a meeting with the agency CMO in Calendly, opens a live demo of the MCP server through a secure terminal to the Mac Mini, and writes the test result to the shared state. The owner just swipes "approve" and in 15 minutes joins the Zoom demo. The fresh system log from last Wednesday looks like this: "hunter report: 132 agencies checked on LinkedIn and Clutch, 19 without MCP integrations, 8 with active requests for AI tooling in job posts, 4 with an open Q4 budget. passing to mapper." "coder: MCP server for Northwave Performance Marketing built in Python, 11 tools for Google Ads, Meta Ads, and GA4, 320 lines of code. exported to /Users/dev/mcp-shop/clients/northwave/server.py. validator connecting to Claude Desktop." "validator: 11 tools passed validation through Claude Desktop, test coverage 92%, average latency 380 ms. passing to shipper." "eval flag: contract with Pacific Reach Agency at $8,200 exceeds the approved limit of $7,500. sending for manual review." In his work setup there is no cloud server, no external team, and not even a separate office. At home sits a Mac Mini with a sandbox at /Users/dev/mcp-shop, on top runs an MCP router with a single API key to Claude, and the same key is forwarded to a secure terminal on the iPhone. Out of everything I have seen this year, this is the cleanest solo shop for custom MCP servers for marketing agencies: $540 a month on the API, about $30,000 into the account, and between them 7 system prompts, 1 Mac Mini in a work corner, and 1 iPhone that never leaves the pocket.

Blaze

55,926 次观看 • 1 个月前

This Chinese developer launched 6 agents under 1 orchestrator, and they run his UI design agency at $32,000 a month on their own. He built a system of 6 agents on Claude Sonnet 4.6 that single-handedly runs his agency for UI auditing and redesign for SaaS startups and e-commerce. No contractors, no project manager, and no team. Just him, a MacBook, and 1 API key. Traditional design agencies out of Shenzhen keep teams of 8 people on salaries for the same volume, while he keeps only API tokens. 6 agents work through a single orchestrator on Claude Code Router. Usage is about 4 million tokens a day, the average API bill is just $480 a month. All 6 go through MCP servers and write shared state to the file system, without shared state in memory and without race conditions. And here is the system prompt he gave the orchestrator before launch: "you are the orchestrator of a one-man UI agency. you delegate read-only research tasks to 5 sub-agents and own all writes. sub-agents: // Hunter (finds SaaS and e-commerce sites with outdated UI) // Auditor (runs each site through Lighthouse, accessibility, and design system checks) // Pitcher (writes cold outreach and redesign proposals with before/after screenshots) // Splitter (breaks accepted projects into typed milestones) // Designer (generates Figma mockups and Tailwind components) // Checker (runs evals on every artifact before it leaves the harness). you never let 2 sub-agents touch 1 file. you stop and request human approval only when an invoice exceeds $5,000 or when the design system eval score drops below 0.88." Meaning the system knows exactly what it is and within what boundaries it operates. It knows it is supposed to find clients on its own. It knows it is supposed to write proposals with screenshots and mockups without intervention. It knows the human only plugs in when the amounts go above $5,000 or when the design system eval does not converge. → The system runs 24 hours a day → Hunter finds about 200 sites with outdated UI a day → Auditor runs each one through Lighthouse and WCAG → Pitcher prepares about 28 personalized proposals with before/after screenshots → Splitter breaks 3 accepted projects per week into milestones → Designer generates mockups and components, Checker runs evals on every artifact And only when the invoice breaks $5,000 or the eval drops below 0.88 does the orchestrator wake the human. Here is what the system outputs in his log during 1 of the sessions: "hunter report, tuesday: 213 sites found, 31 with last redesign before 2020, 14 with Lighthouse score below 65, 6 with active redesign RFP. passing top 6 to auditor." "pitcher: 27 cold outreach sent with before/after screenshots, 5 replies, 3 discovery calls scheduled. passing to splitter." "designer: milestone 2 of Lotus Tea Co redesign complete. Figma frames exported to /Users/dev/agency/clients/lotus/v2. checker running design system evals." "eval flag: proposal for $6,800 exceeds the approved limit of $5,000. sending for manual review." He has no remote server. No separate backend. Just a local file sandbox in /Users/dev/agency, an MCP router, and an API key to Claude. Out of everything I have seen this year, this is the cleanest one-person UI design agency: $480 in, about $32,000 out, and between them 6 prompts and 1 file system.

Blaze

56,062 次观看 • 1 个月前

This Chinese guy created agents in Claude Code for landing pages and single-handedly serves 47 small businesses a month, taking $400 from each. He built a system of 7 agents on Claude Sonnet 4.6 that analyzes Google Maps in small towns, finds small businesses without websites there, and over 1 weekend takes each one to a finished mockup with video and cold message. No assistant, no sales team, no SDR. Just him, a MacBook, an iPhone, and 1 API key. And traditional web design agencies keep teams of 8 people on salary for the same order flow, while his expenses are only tokens and subscriptions to Lovable, Higgsfield, and Calendly. 7 agents work through 1 orchestrator on Claude Code Router. Usage is about 3 million tokens a day, the average API bill is about $480 a month. All 7 go through MCP servers and write shared state to the file system, without shared state in memory and without race conditions, and 1 of them lives right in the iPhone and picks up positive replies from the subway, a taxi, or on walks. And here is the system prompt he put into the orchestrator before launch: "You are the orchestrator of a solo agency that sells ready-made websites to local businesses. You delegate read-only tasks to 6 sub-agents and own all writes. sub-agents: // Scout (walks through Google Maps in selected cities, looks for narrow niches: 5+ years on the map, fewer than 50 reviews, no website or a website from 2014, but high ratings) // Diagnoser (for each lead writes a 50-word diagnosis, hero angle, tone matched to the industry, and a cold message under 70 words) // Builder (generates a landing page mockup in Lovable through MCP only for the top 5 leads per day, with the sharpest diagnoses and the biggest gap) // Filmer (pulls 5 screenshots of the mockup and through Higgsfield renders a 10-second vertical video 1080x1920 with a soft zoom) // Pitcher (sends a personalized cold message through the right channel for the niche: email to roofers, SMS to tradesmen, IG DM to salons, LinkedIn to realtors) // Checker (runs every message through evals for personalization, absence of AI markers and buzzwords before sending) // Mobile (lives in the iPhone, handles positive replies in real time, books Zoom calls in Calendly through MCP while the owner is on the go). You never let 2 sub-agents touch 1 lead. You stop and request approval from the human only when a deal exceeds $3,000 or the reply rate in a niche for the day drops below 12%." Meaning the system knows what it is and within what boundaries it is allowed to act. It knows it is supposed to find leads on its own. It knows it is supposed to take each one to a mockup, video, and cold message without intervention. It knows the human only steps in when a deal goes above $3,000 or the reply rate stops converging. → The system runs 24 hours a day → Scout goes through about 220 local businesses on Google Maps per day and leaves 30 new leads in the queue → Diagnoser outputs 30 structured diagnoses + briefs + cold messages per day → Builder assembles 3 to 5 finished landing pages in Lovable for the sharpest leads → Filmer renders a 10-second vertical video in Higgsfield for each one → Pitcher sends 30 personalized messages per day across 4 channels with a reply rate of about 14% → Checker runs every message through evals before sending And only when a deal breaks $3,000 or the reply rate for the day drops below 12% does the orchestrator wake the owner. And when the owner at that moment is sitting in the subway or a taxi, the Mobile agent in his iPhone picks up 1 move on its own: replies to a fresh positive reply from a dentist, books a Zoom through Calendly synced to the local time of the client, and puts the lead back in the queue. The owner only has to tap "approve" and in just 10 minutes join the call. Here is what the system writes in his log during 1 of the Saturdays: "scout report: 218 businesses checked in Austin, Denver, and Miami, 34 without a website, 19 with a website from 2014, 6 with an active redesign request in reviews. passing top 30 to diagnoser." "pitcher: 30 cold messages sent across 4 channels, 14 replies, 5 positive, 3 Zoom calls booked for Sunday. passing to closer." "builder: landing page for Westside Cosmetic Dentistry built in Lovable, 5 sections, mobile, soft beige. URL placed at /Users/dev/maps-agency/clients/westside/v1. filmer launching Higgsfield." "eval flag: deal with The Lotus Salon at $3,400 exceeds the approved limit of $3,000. sending for manual review." He has no server of his own and no separate backend. Just a local file sandbox at /Users/dev/maps-agency, an MCP router, 1 API key to Claude, and the same key forwarded to Claude Code on his iPhone. Out of everything I have seen this year, this is the cleanest one-person agency for selling websites to small businesses: $480 a month on the API, about $18,800 into the account, and between them 7 prompts, 1 file system, and 1 phone in the pocket.

Blaze

2,697,192 次观看 • 1 个月前

AI AGENTS 101 (58 minute free masterclass) send this to anyone who wants to understand ai agents, claude skills, md files, how to get the most out of AI etc in plain english: 1. chat vs agents - chat models answer questions in a back and forth while agents take a goal, figure out the steps, and deliver a result 2. agents don’t stop after one response. they keep running until the task is actually finishedno babysitting required 3. everything runs on a loop. they gather context, decide what to do, take an action, then repeat until done 4. the loop is the system. they look at files, tools, and the internet. decide the next step. execute and then feed that back into the next step. over and over until completion 5. the model is just one piece. gpt, claude, gemini are the reasoning layer. the key is model + loop + tools + context 6. mcp is how agents use tools. it connects things like browser, code, apis, and your internal software. once connected, the agent decides when to use them to get the job done 7. context beats prompt all day. you don't need to write perfect prompts. load your agent with context about your business, style, and goals and then simple instructions work 8. claude.md or agents.md is the onboarding doc it tells the agent who it is, how to behave, what it knows, and what tools it can use. this gets loaded every time before it starts 9. memory.md is how it improves. agents don’t remember by default. this file stores preferences, corrections, and patterns you tell the agent to update it, and it gets better over time 10. skills + harnesses make it usable. skills are reusable tasks like writing, research, analysis the harness is the environment like claude code or openclaw that runs everything. basiclaly, different interfaces, same system underneath this episode with remy on The Startup Ideas Podcast (SIP) 🧃 was one of the clearest ways of understanding a lot of the core concepts of ai agents could be the best beginners course for ai agents 58 mins. all free. no advertisers. i just want to see you build cool stuff. im rooting for you. send to a friend watch

GREG ISENBERG

374,915 次观看 • 3 个月前

i just built a 4-agent software team. everything runs from Telegram and gets managed on a kanban board. a project manager who plans the work, a backend developer, a frontend developer, and a tester. the PM reads a goal, breaks it into linked tasks, and assigns each to the right agent. the thing that makes them a team instead of four strangers is a shared kanban board. every task is a row that survives crashes, and when an agent finishes, it writes a summary of what it built and what the next agent needs to know. the next agent reads that summary before it starts. so the frontend developer never has to guess the API shape, and the tester knows exactly what to verify. the hardest part was not the coordination. it was building an agent that could actually act like a backend engineer. a backend engineer stands up a database, wires auth, manages storage, deploys functions, and keeps all of it consistent while the rest of the team builds on top. an agent doing this from scratch drowns. it burns its context window remembering which tables exist and which endpoint it created three steps ago, and the work degrades fast. so the backend agent needs a backend built for agents, not for humans clicking through a dashboard. that is where InsForge came in. it is an open-source, agent-native backend, and i added it to my backend developer agent as a skill. a skill is a step-by-step guide that teaches the agent how to do a specific kind of work. with InsForge installed, the agent stopped improvising infrastructure and followed a reliable path: create the project, define the database, set up auth, deploy functions. to test the whole team, i had them build a working Google Docs clone, AI features included. the backend agent spun up the full service on its own. database tables, user auth, document handling, and edge functions running real TypeScript, all in one dashboard. the frontend agent read that summary and built the UI on top of it, and the tester closed the loop. the result was a backend an agent could reason about end to end, instead of one it kept getting lost inside. if you are building an AI backend engineer, InsForge is worth a look, it's 100% open-source. InsForge GitHub: (don't forget to star 🌟) the full article on Hermes Kanban: Mission Control for your Agents is quoted below.

Akshay 🚀

118,124 次观看 • 12 天前

🚨 OpenAI just launched Codex, a brand-new autonomous coding agent that can build features and fix bugs on its own. We’ve been using it Every 📧 for a few days, and I’m impressed. I invited Alexander Embiricos (ben davies), a member of the product staff responsible for Codex, to demo Codex and talk about it live on a special edition of AI & I: What Codex is and how it works Codex is designed to be used by senior engineers—it performs coding tasks like adding features or fixing bugs autonomously. It's built to allow you to start many sessions at once, so you can have multiple agents working in parallel. Codex is built to have "taste" OpenAI trained Codex to have the taste of a senior software engineer. It knows how big codebases work, how to write a good PR, and uses clean, minimal code. Why an “abundance mindset” is best for interacting with agents Codex is designed to allow users to delegate many tasks at once without getting caught up in the details. This lets you point an abundance of agents at a specific task like a difficult bug—it’s worth it even if only one of them succeeds. How OpenAI is thinking about agents Codex is one piece of a unified super-assistant OpenAI wants to eventually build—an agent that helps users easily get things done by selecting the right tools for them behind the scenes. OpenAI’s vision for the future of programming In the future developers will probably spend less time writing routine code and more time guiding agents, reviewing their work, and making strategy decisions. Programming will become more social, letting teams easily delegate multiple tasks at once, allowing people to focus on ideas and collaboration instead of routine coding. Watch below!

Dan Shipper 📧

145,487 次观看 • 1 年前

This guy built JARVIS on Claude Code and with 1 clap of his hands launches his entire work day, saving $5,000 a month on a personal assistant. Inside he runs a pipeline of 5 plugins on Claude Code that on a double clap of the hands wakes up 3 monitors, sets the Philips Hue light to focus mode, turns on a Spotify playlist, and greets him by voice with a British accent, reading out the time, date, and weather. No Alexa, no smart speakers, no separate smart home app. Just him, a MacBook M3 Max on the desk, an iPhone in the pocket, and 1 local API key. And a regular personal assistant for the same volume of tasks charges $5,000 a month or more on salary alone, plus another $1,200 to cover off-hours work time. Meanwhile this guy's expenses are only tokens and a subscription to ElevenLabs for the British voice. All 5 plugins launch through 1 JARVIS, burn about 4 million tokens a day, and close the monthly API bill at about $640. Each plugin writes shared state to a local sandbox at /Users/dev/jarvis-suite, and 1 of them lives right in the iPhone and picks up voice requests while the owner is in the kitchen or on a run. And here is the system prompt he put into JARVIS before launch: "you are JARVIS, a butler-engineer on Claude Code. you manage your owner's workflow through 4 sub-plugins and own all commits and communication yourself. sub-plugins: // Wakeup (recognizes a double clap, activates 3 monitors, reads out the time, date, and weather by voice, checks the clock accuracy on the iPad and corrects it via NTP server) // Atmosphere (controls Philips Hue on a Pomodoro schedule, turns on a Spotify playlist for the current context, and holds the light at 2700K at 80% brightness in focus mode) // Devshop (monitors VS Code, tracks Python scripts in the terminal, and every 15 minutes sends a summary of changes to the shared chat) // Project (every morning recalculates the deadline for the Wallaroo app in the App Store, manages UI tickets, and initiates the Refinement Protocol by voice command). you speak only with a British accent, you never slip into neutral English. you wake the owner by voice only when the Wallaroo deadline drops below 10 days or when an external client joins Zoom without an invitation." This instruction immediately defines the role of JARVIS and the limits of his autonomy. He knows he is supposed to wake the room himself and sound like a real butler. He knows he is supposed to manage the Wallaroo project himself and not miss the App Store deadline. → JARVIS runs 24 hours a day in the background → Wakeup activates the room on a double clap in just 1.4 seconds, the monitors come alive simultaneously → Atmosphere sets warm Philips Hue light at 2700K and picks a Spotify playlist for the current Pomodoro cycle → Devshop reads changes in VS Code and pushes a summary to the shared chat every 15 minutes → Project every morning recalculates the Wallaroo deadline and reminds about 4 unresolved UI tickets → Mobile lives in the iPhone and answers any question about code or the project by voice while the owner is not home And only when less than 10 days remain until the Wallaroo release or Zoom receives an unscheduled call does JARVIS raise the owner with a voice intervention. And when the owner at that moment is on a run or in a coffee shop, the Mobile agent in his iPhone picks up 1 request on its own: switches the Spotify playlist, dictates the summary of the last commit, updates the Pomodoro timer, and reads the Wallaroo reminder. Look at 0:55 in the video, that is where JARVIS intercepts a voice request from outside and confirms execution with the phrase "Very good, sir." The fresh system log from last Wednesday looks like this: "wakeup: double clap registered at 09:14, 3 monitors activated, temperature 20.4C, sunny. clock on iPad was 4 minutes behind, syncing via NTP." "atmosphere: Spotify turned on playlist 'Deep Focus', Philips Hue set to warm 2700K at 80% brightness, Pomodoro mode 25/5." "project: Wallaroo to App Store 9 days, 4 unresolved UI tickets, initiating Refinement Protocol by voice command from the owner." "mobile: voice request processed outside the room, playlist switched to 'Coding Lo-Fi', Pomodoro updated to 25 minutes, confirming execution with the phrase 'Very good, sir.'" He has no Alexa, no smart speakers, no smart home app. At home sits a MacBook M3 Max with a local folder at /Users/dev/jarvis-suite, on top run 5 plugins and a neural network butler, and the same stack is forwarded to a secure terminal on the iPhone. Out of everything I have seen this year, this is the densest one-person AI headquarters assembled in 1 room: $640 a month on the API, about $5,000 a month saved on a personal assistant, and between them 5 plugins, 1 clap of the hands, and 1 voice with a British accent.

Blaze

798,515 次观看 • 1 个月前

The same kinds of productivity gains we've seen in coding with AI agents are heading to the rest of knowledge work. This is the jump when you go from having a chatbot to being able to actually have an agent go off and do work for minutes or even hours and come back with a complete work output that you then review. Here's an example of the new Box Agent filling out an RFP response from an existing knowledge base. This process would normally take hours to fill out, and requires the full attention of the user doing the work. Now, you provide the Box Agent with the RFP questions, and it will go off, make a plan, extract all the relevant questions, read through existing source material to come up with an answer, and then generate a new word document as the final output. All while you're doing something else. The key to this architecture is that the agent is able to use all of the same tools in the background that a user uses to get work done. The agent can search for documents, read entire files, run scripts and tools in the background, and even be able to write code on the fly to automate tasks it hasn't seen before. And best of all, the Box Agent will (soon) work from the Box MCP and CLI so you can invoke it in any agentic system as a step in a process. This kind of agent complexity would have been impossible even 6 months ago. Models consistently failed at tracking long running tasks or using the right tools at the right moment for the task. But this is all now possible because of models like GPT-5.4, Opus 4.6, and Gemini 3, and is only getting better by the month. Just as we moved from engineers writing code and using AI as an assistant to answer questions, in many areas of knowledge work -like legal, finance, consulting, sales, marketing, and more- when we have a problem we'll just kick off the AI agent to just go work on it for us in the background.

Aaron Levie

24,608 次观看 • 2 个月前

In the future, you’ll be able to accomplish a goal by just giving Claude an outcome and a budget. That’s the direction Anthropic is building in with its new Managed Agents features, announced at this week’s Code with Claude developer event. The basic idea: Claude, wrapped in a computer in the cloud, that you can spin up, scale, and manage as needed. Anthropic is taking on the infrastructure that kills most agent products, and making sure that it scales to meet the needs of agents running 24/7. On this week’s AI & I from Every 📧, I talk with Angela Jiang (Angela Jiang), head of product for the Claude platform, and Katelyn Lesse (Katelyn Lesse), head of engineering for the Claude platform, about what Anthropic is building and what it takes to make agents reliable in production. We get into: - Why the "build a generic harness, hot-swap any model behind it" playbook is already outdated. Angela points to eval data on Memory where the same task across different harnesses performed drastically differently. - The infrastructure wall every team hits in production—and why Katelyn thinks “my sandbox died and took the agent with it” is the real reason internal agents don't ship. - Why Anthropic is so bullish on using file systems and skills within Claude, including Angela's argument that those early design choices can compound for years. This is a must-watch for anyone trying to take an agent past the demo and into production. Watch below! Timestamps: How the Claude platform evolved from API to agents: 00:01:48 The primitives that make up Claude Managed Agents: 00:04:09 Why the harness and the model are becoming a single unit: 00:10:37 The infrastructure wall that kills most agent projects in production: 00:18:49 Why team agents need a different shape than individual productivity tools: 00:24:49 How Anthropic's legal team uses an agent to review marketing copy: 00:26:36 Using multi-agent orchestration for advisor strategies, adversarial pairs, and swarms: 00:34:24 How to measure agent success with outcome and budget as the end state: 00:35:50 What the platform looks like a year from now, when Claude writes its own harness: 00:39:11

Dan Shipper 📧

66,256 次观看 • 1 个月前