Loading video...

Video Failed to Load

Go Home

Karpathy's Agentic Engineering finally has proper tooling! (built by Google) Karpathy defined agentic engineering as the discipline that separates production agent work from vibe coding. The core skills he listed were spec design, eval loops, and security oversight. The problem has been that practicing this still requires a different...

242,901 views • 2 days ago •via X (Twitter)

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

OpenAI's AgentKit will be so insane, build every step of agents on one platform. These visual agent builders make the whole process of iterating and launching agents far more efficient. It sits on top of the Responses API and unifies the tools that were previously scattered across SDKs and custom orchestration. It lets developers create agent workflows visually, connect data sources securely, and measure performance automatically without coding every layer by hand. The core of AgentKit is the Agent Builder, a drag-and-drop canvas where each node represents an action, guardrail, or decision branch. Developers can link these nodes into multi-agent workflows, preview results instantly, and version each setup. It supports inline evaluation so that developers can see how changes affect output before deploying. The Connector Registry is a single admin panel that manages how data and tools connect across the OpenAI ecosystem. It centralizes integrations like Google Drive, SharePoint, Dropbox, and Microsoft Teams. Large organizations can govern access and flow of data between agents securely under one global console. ChatKit provides a ready-to-use chat interface for embedding agents inside apps or websites. It manages streaming, message threads, and model reasoning displays automatically. Developers can skin the interface to match their product without writing custom front-end code. Under the hood, all these blocks use the same execution core that runs agent reasoning through OpenAI’s APIs. Workflows in Agent Builder compile down to structured instructions for the Responses API, which handles model calls, tool use, and context passing. Connector Registry handles authentication and routing for external tools, while Evals and RFT provide feedback loops that improve agents over time. This integration means developers no longer need to handle orchestration logic, model evaluation pipelines, or safety layers separately. Everything runs natively within OpenAI’s control plane with managed security, automatic versioning, and built-in testing. In short, AgentKit standardizes the entire life cycle of an AI agent—from visual design to deployment and performance tuning—inside a single unified system.

Rohan Paul

178,460 views • 8 months ago

HTML Artifacts are a big part of how I work with agents now. Artifacts can be more than just static files. When combined with agents, they can take action or help you take action. This unlocks all kinds of interesting ways to work with agents. This is clearly the future. Check out this writing and scheduler artifact I built in a few minutes. It uses a bit of HTML and JS. All the data is in markdown (Obsidian vaults), so the agent can access and modify it at any time. No DB needed. No sophisticated functionalities. The agent decides all that for me based on the skills, context, and memory it has access to. The best part about this simple stack is that all the important information stays with me. This has allowed me to build a recursive self-improving system and automations that can better tap into coding agents like Codex or Claude Code. I could have paid or built an entire app for scheduling posts, and there are so many of them out there. But I don't need to. I've realized a simple artifact does the job. And the simplicity of it is actually an advantage. Very little maintenance for very high returns on personalization, time, and efficiency. The other benefit of this is that I can add features as I please. That level of personalization feels magical, and we should all be pursuing more of it. All of this just keeps compounding. Of course, this example is just about writing. But I have similar artifacts for research, design, experimentation, evaluation, and so much more. And no, I didn't actually publish the post example I shared in the clip. It was just for demonstration purposes. I actually spend more time than this when writing together with agents. Lastly, having built my own agent orchestrator tool has made me realize that simplifying the tool stack is a superpower. If you are curious about how all this works, I will do a live session next week:

elvis

18,374 views • 1 month ago

Claude Code + Google Stitch 2.0 is f*cking cracked 🤯 Google just dropped a free AI design agent that solves Claude Code's biggest weakness: frontend design. One screenshot of a high-converting landing page → a production-ready site for your brand in minutes. All inside Google Stitch + Claude Code. Perfect for DTC brands and agencies who are building advertorial pages and product launch pages for Meta but burning days on designer back-and-forth. If you're running Meta ads and need 5-10 different landing pages testing different hooks, angles, and offers — each one targeting a different audience and pain point — you know the bottleneck isn't the ads. It's the pages. Briefing designers, waiting for revisions, paying $2-5K per page. Stitch eliminates the design bottleneck: → Find a high-converting advertorial that's scaling on Meta → Screenshot it and drop it into Stitch (powered by Gemini 3.1) → Stitch redesigns it with your brand's colors, fonts, and imagery using Nano Banana 2 → Edit sections visually — headlines, CTAs, layouts — without touching code → Export the code and paste it into Claude Code → Claude builds the full production site and deploys to Vercel or Netlify in 60 seconds No designer. No $3K per landing page. No Claude Code frontend that looks like a template from 2019. What you get: → Designer-quality landing pages and advertorials built in minutes, not weeks → Visual editing so you actually see the design before you code it → Nano Banana 2 generating on-brand product imagery and hero shots → A repeatable system — new angle, new page, same pipeline Built 100% with Google Stitch 2.0 + Claude Code. I put together a full playbook showing the exact workflow: how to find winning pages, redesign them in Stitch, and deploy with Claude Code. Want it for free? > Like this post > Comment "STITCH" And I'll send it over (must be following so I can DM)

Mike Futia

125,355 views • 3 months ago

The Visual Studio Code insiders version that just shipped and will ship in the next few days will come with an insane amount of new capabilities. A few highlights: - You can now run sub-agents in parallel. Yes, really. I even attached a video. - Major UX improvements for sub agents, especially visible in the chat window - A new search tool wrapped as a sub-agent that iteratively runs multiple search tools: semantic_search, file_search, grep_search Which connects nicely to the point above: multiple searches running in parallel, efficiently and fast - Anthropic’s Message API is now enabled by default - You can choose the model for the cloud agent (three available, all premium) - Extended thinking support when using the Claude cloud agent This is part of the broader multi-vendor cloud support under AgentsHQ I wrote about a few weeks ago - Tasks sent to the background agent (basically the CLI tool) now always run in isolation, each with its own git worktree - In a multi-repo workspace, assigning a task to a cloud agent prompts you to choose the target repo Same behavior when opening an empty workspace with no repo - Support for building an external index for files not supported by GitHub’s default indexing - UI/UX improvements for starting new sessions and switching between local / background / cloud agents - Skills are now first-class citizens, just like prompt files, with better UX indicating when a skill is loaded - Improved API for dynamic contribution of prompt files New V2 includes skills as part of the model. Curious to see the extensions that will leverage this - Finally, initial support for showing context usage percentage per session - Skills are enabled by default - Resizable chat window and session view. Small thing, but it was driving me crazy 😁 - A new integrated browser meant to replace the old simple browser Maybe the beginning of real browser use? - Better UI/UX for token streaming in chat - Ability to index external files not supported by GitHub There’s a lot more. Some of it hasn’t fully landed yet, but everything that has is already in Insiders. The next stable release should drop in early February. As usual, I’m just shocked by the volume of features this team ships every month. After the holiday slowdown, this one is shaping up to be a wild release.

Oren Melamed

29,555 views • 5 months ago