Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

The Visual Studio Code insiders version that just shipped and will ship in the next few days will come with an insane amount of new capabilities. A few highlights: - You can now run sub-agents in parallel. Yes, really. I even attached a video. - Major UX improvements for... sub agents, especially visible in the chat window - A new search tool wrapped as a sub-agent that iteratively runs multiple search tools: semantic_search, file_search, grep_search Which connects nicely to the point above: multiple searches running in parallel, efficiently and fast - Anthropic’s Message API is now enabled by default - You can choose the model for the cloud agent (three available, all premium) - Extended thinking support when using the Claude cloud agent This is part of the broader multi-vendor cloud support under AgentsHQ I wrote about a few weeks ago - Tasks sent to the background agent (basically the CLI tool) now always run in isolation, each with its own git worktree - In a multi-repo workspace, assigning a task to a cloud agent prompts you to choose the target repo Same behavior when opening an empty workspace with no repo - Support for building an external index for files not supported by GitHub’s default indexing - UI/UX improvements for starting new sessions and switching between local / background / cloud agents - Skills are now first-class citizens, just like prompt files, with better UX indicating when a skill is loaded - Improved API for dynamic contribution of prompt files New V2 includes skills as part of the model. Curious to see the extensions that will leverage this - Finally, initial support for showing context usage percentage per session - Skills are enabled by default - Resizable chat window and session view. Small thing, but it was driving me crazy 😁 - A new integrated browser meant to replace the old simple browser Maybe the beginning of real browser use? - Better UI/UX for token streaming in chat - Ability to index external files not supported by GitHub There’s a lot more. Some of it hasn’t fully landed yet, but everything that has is already in Insiders. The next stable release should drop in early February. As usual, I’m just shocked by the volume of features this team ships every month. After the holiday slowdown, this one is shaping up to be a wild release.show more

Oren Melamed

2,557 subscribers

29,555 görüntüleme • 6 ay önce •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Yorum

Yorum bulunmuyor

Orijinal gönderinin yorumları burada görünecek

Benzer Videolar

Postman is going all-in on AI! Their AI Agent Builder is live (and it's pretty powerful!) Postman's API Network has 100,000+ APIs and they just released a suite of tools for building AI Agents that can automatically discover and connect to any of these APIs. Think about it for a second: You can now build an AI agent that searches the network, discovers an API, processes its documentation, connects to it, and evaluates whether everything worked at the end of the process. This is a huge deal. Every person I know uses the Postman client to work with APIs. Now, they will have access to the AI Agent Builder, a complete suite for designing, building, testing, and deploying agents. I've built a few simple agents myself, and it's tough. The ability to discover and use an API automatically is really exciting (and a huge time saver if it works as promised!) In collaboration with the Postman team, who is sponsoring this post, I recorded a video with my thoughts about the Postman AI Agent Builder.

Postman is going all-in on AI! Their AI Agent Builder is live (and it's pretty powerful!) Postman's API Network has 100,000+ APIs and they just released a suite of tools for building AI Agents that can automatically discover and connect to any of these APIs. Think about it for a second: You can now build an AI agent that searches the network, discovers an API, processes its documentation, connects to it, and evaluates whether everything worked at the end of the process. This is a huge deal. Every person I know uses the Postman client to work with APIs. Now, they will have access to the AI Agent Builder, a complete suite for designing, building, testing, and deploying agents. I've built a few simple agents myself, and it's tough. The ability to discover and use an API automatically is really exciting (and a huge time saver if it works as promised!) In collaboration with the Postman team, who is sponsoring this post, I recorded a video with my thoughts about the Postman AI Agent Builder.

Santiago

230,594 görüntüleme • 1 yıl önce

HERMES JUST FIXED THE BIGGEST PROBLEM WITH BROWSER AGENTS Most AI agents still click around websites like confused interns. This new setup gives them the map. Hermes + → Hermes now connects to Browserbase’s new Browse hub → Browse launched with 100+ browser skills → Each skill is a plain-text playbook for a specific website or task → Your agent can search, preview, and install skills inside Hermes Why This Matters: ✓ Less random clicking ✓ Fewer timeouts ✓ Better form filling ✓ More reliable website navigation ✓ Skills can be edited, reused, bundled, and shared The Real Stack: → Hermes runs the agent → Browserbase handles cloud browser infrastructure → gives the agent site-specific skills → Vision helps when pages get weird → Bundles let you load repeat workflows faster The killer detail: If a website breaks, you don’t wait for the model to magically improve. You update the skill. Now your agent gets better forever. That’s the difference between an AI tool and an actual agent system.

HERMES JUST FIXED THE BIGGEST PROBLEM WITH BROWSER AGENTS Most AI agents still click around websites like confused interns. This new setup gives them the map. Hermes + → Hermes now connects to Browserbase’s new Browse hub → Browse launched with 100+ browser skills → Each skill is a plain-text playbook for a specific website or task → Your agent can search, preview, and install skills inside Hermes Why This Matters: ✓ Less random clicking ✓ Fewer timeouts ✓ Better form filling ✓ More reliable website navigation ✓ Skills can be edited, reused, bundled, and shared The Real Stack: → Hermes runs the agent → Browserbase handles cloud browser infrastructure → gives the agent site-specific skills → Vision helps when pages get weird → Bundles let you load repeat workflows faster The killer detail: If a website breaks, you don’t wait for the model to magically improve. You update the skill. Now your agent gets better forever. That’s the difference between an AI tool and an actual agent system.

Julian Goldie SEO

43,472 görüntüleme • 2 ay önce

The same kinds of productivity gains we've seen in coding with AI agents are heading to the rest of knowledge work. This is the jump when you go from having a chatbot to being able to actually have an agent go off and do work for minutes or even hours and come back with a complete work output that you then review. Here's an example of the new Box Agent filling out an RFP response from an existing knowledge base. This process would normally take hours to fill out, and requires the full attention of the user doing the work. Now, you provide the Box Agent with the RFP questions, and it will go off, make a plan, extract all the relevant questions, read through existing source material to come up with an answer, and then generate a new word document as the final output. All while you're doing something else. The key to this architecture is that the agent is able to use all of the same tools in the background that a user uses to get work done. The agent can search for documents, read entire files, run scripts and tools in the background, and even be able to write code on the fly to automate tasks it hasn't seen before. And best of all, the Box Agent will (soon) work from the Box MCP and CLI so you can invoke it in any agentic system as a step in a process. This kind of agent complexity would have been impossible even 6 months ago. Models consistently failed at tracking long running tasks or using the right tools at the right moment for the task. But this is all now possible because of models like GPT-5.4, Opus 4.6, and Gemini 3, and is only getting better by the month. Just as we moved from engineers writing code and using AI as an assistant to answer questions, in many areas of knowledge work -like legal, finance, consulting, sales, marketing, and more- when we have a problem we'll just kick off the AI agent to just go work on it for us in the background.

The same kinds of productivity gains we've seen in coding with AI agents are heading to the rest of knowledge work. This is the jump when you go from having a chatbot to being able to actually have an agent go off and do work for minutes or even hours and come back with a complete work output that you then review. Here's an example of the new Box Agent filling out an RFP response from an existing knowledge base. This process would normally take hours to fill out, and requires the full attention of the user doing the work. Now, you provide the Box Agent with the RFP questions, and it will go off, make a plan, extract all the relevant questions, read through existing source material to come up with an answer, and then generate a new word document as the final output. All while you're doing something else. The key to this architecture is that the agent is able to use all of the same tools in the background that a user uses to get work done. The agent can search for documents, read entire files, run scripts and tools in the background, and even be able to write code on the fly to automate tasks it hasn't seen before. And best of all, the Box Agent will (soon) work from the Box MCP and CLI so you can invoke it in any agentic system as a step in a process. This kind of agent complexity would have been impossible even 6 months ago. Models consistently failed at tracking long running tasks or using the right tools at the right moment for the task. But this is all now possible because of models like GPT-5.4, Opus 4.6, and Gemini 3, and is only getting better by the month. Just as we moved from engineers writing code and using AI as an assistant to answer questions, in many areas of knowledge work -like legal, finance, consulting, sales, marketing, and more- when we have a problem we'll just kick off the AI agent to just go work on it for us in the background.

Aaron Levie

24,618 görüntüleme • 3 ay önce

Tempo's Dan Romero explains why the future of AI agents will be "stablecoin-native." John: "Why wouldn't I just give OpenClaw my credit card?" Dan Romero: "The credit card itself is kind of a private key, having that get prompt-injected out, maybe not the best thing in the world." “If you have agent swarms, the idea of spinning up a new credit card for each individual sub-agent doesn’t make sense. With wallets, you can spin up as many as you want and manage the balances for each agent.” “An API call to any of these frontier labs right now is pay-per-call. You’re eventually going to get to a point where, for every single API call, you can just pay some amount of stablecoins in the background and keep moving.”

Tempo's Dan Romero explains why the future of AI agents will be "stablecoin-native." John: "Why wouldn't I just give OpenClaw my credit card?" Dan Romero: "The credit card itself is kind of a private key, having that get prompt-injected out, maybe not the best thing in the world." “If you have agent swarms, the idea of spinning up a new credit card for each individual sub-agent doesn’t make sense. With wallets, you can spin up as many as you want and manage the balances for each agent.” “An API call to any of these frontier labs right now is pay-per-call. You’re eventually going to get to a point where, for every single API call, you can just pay some amount of stablecoins in the background and keep moving.”

TBPN

155,348 görüntüleme • 5 ay önce

Tempo's Dan Romero explains why the future of AI agents will be "stablecoin-native." John: "Why wouldn't I just give OpenClaw my credit card?" Dan Romero: "The credit card itself is kind of a private key, having that get prompt-injected out, maybe not the best thing in the world." “If you have agent swarms, the idea of spinning up a new credit card for each individual sub-agent doesn’t make sense. With wallets, you can spin up as many as you want and manage the balances for each agent.” “An API call to any of these frontier labs right now is pay-per-call. You’re eventually going to get to a point where, for every single API call, you can just pay some amount of stablecoins in the background and keep moving.”

Tempo's Dan Romero explains why the future of AI agents will be "stablecoin-native." John: "Why wouldn't I just give OpenClaw my credit card?" Dan Romero: "The credit card itself is kind of a private key, having that get prompt-injected out, maybe not the best thing in the world." “If you have agent swarms, the idea of spinning up a new credit card for each individual sub-agent doesn’t make sense. With wallets, you can spin up as many as you want and manage the balances for each agent.” “An API call to any of these frontier labs right now is pay-per-call. You’re eventually going to get to a point where, for every single API call, you can just pay some amount of stablecoins in the background and keep moving.”

TBPN

41,347 görüntüleme • 5 ay önce

File systems are quickly becoming a core abstraction for AI agents for knowledge work. By having access to a file system, agents can effectively manage context, process any amount of information, and create new files and data. The challenge is that local file systems are inherently limited in size, lack governance capabilities, and are inherently personal -not collaborative- in nature. Now, with the Box API and LangChain's deepagent SDK, you can bring the full power of a secure, collaborative cloud file system to your AI agents. Repo ↓

File systems are quickly becoming a core abstraction for AI agents for knowledge work. By having access to a file system, agents can effectively manage context, process any amount of information, and create new files and data. The challenge is that local file systems are inherently limited in size, lack governance capabilities, and are inherently personal -not collaborative- in nature. Now, with the Box API and LangChain's deepagent SDK, you can bring the full power of a secure, collaborative cloud file system to your AI agents. Repo ↓

Box

227,348 görüntüleme • 5 ay önce

Finally, a proper chat UI for Hermes Agent (open-source)! Hermes ships an official dashboard, but it's primarily built for management, and its chat is just a terminal piped into a browser tab. Hermes Web UI is an open-source chat-first alternative. It's self-hosted and points at your existing ~/.hermes state, so there's nothing new to configure. - It's a native web chat, not a terminal in a tab - Sessions group by date with a context ring - Kanban renders the agent's task board - Spaces manages your workspaces - Skills panel lists the full catalog - Tasks panel shows cron jobs - Insights show usage and activity - Memory shows MEMORY and SOUL files - Logs tails the agent, gateway, and error logs The whole setup runs 100% locally, binds to localhost by default, and you reach it over an SSH tunnel or Tailscale from your phone. I have shared the Hermes Web UI GitHub repo in the replies. Do note that it's a community project, not official, so expect occasional rough edges (concurrent profile runs are blocked for now). To dive deeper into Hermes Agent, my co-founder wrote a full masterclass about it, covering the learning loop, the memory tiers, self-evolving skills, GEPA, and running multiple isolated agents. Read it below.

Avi Chawla

77,513 görüntüleme • 1 ay önce

Hermes agent just left the terminal. 𝗛𝗲𝗿𝗺𝗲𝘀 𝗗𝗲𝘀𝗸𝘁𝗼𝗽 dropped yesterday. native app for macOS, Windows, and Linux. for months Hermes was the agent that learned your projects, wrote its own skills, and built a model of who you are. all of it buried in terminal logs. now it has a window. the important part is that it's not a wrapper. it runs the same agent core, the same sessions, memory, and skills as the CLI. you can start a task in the terminal and finish it in the app without anything resetting. the state is shared across every interface, not copied between them. what the GUI actually adds: → streaming chat that shows live tool calls and inline reasoning instead of a spinner → a preview rail that renders pages, code, and images right beside the conversation → an artifacts panel that collects every file the agent has ever produced → remote gateway mode, so you can point the app at a VPS and run the heavy work elsewhere → skills, cron, profiles, and gateways managed point-and-click instead of through YAML → voice mode, drag-drop files, and inline image generation remote gateway mode is the one worth slowing down on. the agent runs 24/7 on a $5 server while you control it from your laptop like a local app. other agent UIs are chatboxes with a logo. this one shows the autonomy instead of hiding it, so you watch the skills load, the tools fire, and the artifacts pile up as it works. it was teased in Jensen's GTC keynote. MIT licensed, local-first, no telemetry. if you already run Hermes, download it and everything is already there. your chats, memory, and skills carry straight over. i wrote a full masterclass on Hermes Agent that walks through the SOUL. md identity layer, the three-tier memory system, the self-evolving skills loop, and how to run three specialized agents 24/7. desktop is the interface that finally does all of it justice. the article is quoted below.

Akshay 🚀

51,370 görüntüleme • 1 ay önce

I wanted to share a quick demo of what we've been working on with our ai agent cloud. This enables fast deployment of agents that have access to a suite of tools, and was designed with agent interoperability in mind. This demo shows how you can go from nothing to an AI twitter agent in a couple minutes. This is what we are using internally to manage deployments, so we will consistently upgrading its capabilities. The next goals are to enable simple TEE deployments for agents, and focus on building out feature for agent interoperability to simplify agent to agent collaboration.

I wanted to share a quick demo of what we've been working on with our ai agent cloud. This enables fast deployment of agents that have access to a suite of tools, and was designed with agent interoperability in mind. This demo shows how you can go from nothing to an AI twitter agent in a couple minutes. This is what we are using internally to manage deployments, so we will consistently upgrading its capabilities. The next goals are to enable simple TEE deployments for agents, and focus on building out feature for agent interoperability to simplify agent to agent collaboration.

Johnny

27,176 görüntüleme • 1 yıl önce

happy friday. sneak peek at some new features i'm building for the bankr agent: > sandboxed filesystem (your agent gets its own secure file system in the browser. you control what it can access. no agent running loose on your actual computer) > skill uploads (plugin new capabilities so your agent learns new things) > cli download & access > secure environment variables for api keys > github integration (connect your repo for reads and writes directly from your agent) all of this runs in a secure browser environment. no desktop app. no messy configuration. no downloading agent harnesses from the terminal. just a sandbox you control. in the video: i download an audit skill, run it against a smart contract, and save the report straight to the filesystem. then i ask the agent to pull all my 2026 transactions and write a csv for my accountant.

happy friday. sneak peek at some new features i'm building for the bankr agent: > sandboxed filesystem (your agent gets its own secure file system in the browser. you control what it can access. no agent running loose on your actual computer) > skill uploads (plugin new capabilities so your agent learns new things) > cli download & access > secure environment variables for api keys > github integration (connect your repo for reads and writes directly from your agent) all of this runs in a secure browser environment. no desktop app. no messy configuration. no downloading agent harnesses from the terminal. just a sandbox you control. in the video: i download an audit skill, run it against a smart contract, and save the report straight to the filesystem. then i ask the agent to pull all my 2026 transactions and write a csv for my accountant.

deployer

45,638 görüntüleme • 3 ay önce

AI AGENTS 101 (58 minute free masterclass) send this to anyone who wants to understand ai agents, claude skills, md files, how to get the most out of AI etc in plain english: 1. chat vs agents - chat models answer questions in a back and forth while agents take a goal, figure out the steps, and deliver a result 2. agents don’t stop after one response. they keep running until the task is actually finishedno babysitting required 3. everything runs on a loop. they gather context, decide what to do, take an action, then repeat until done 4. the loop is the system. they look at files, tools, and the internet. decide the next step. execute and then feed that back into the next step. over and over until completion 5. the model is just one piece. gpt, claude, gemini are the reasoning layer. the key is model + loop + tools + context 6. mcp is how agents use tools. it connects things like browser, code, apis, and your internal software. once connected, the agent decides when to use them to get the job done 7. context beats prompt all day. you don't need to write perfect prompts. load your agent with context about your business, style, and goals and then simple instructions work 8. claude.md or agents.md is the onboarding doc it tells the agent who it is, how to behave, what it knows, and what tools it can use. this gets loaded every time before it starts 9. memory.md is how it improves. agents don’t remember by default. this file stores preferences, corrections, and patterns you tell the agent to update it, and it gets better over time 10. skills + harnesses make it usable. skills are reusable tasks like writing, research, analysis the harness is the environment like claude code or openclaw that runs everything. basiclaly, different interfaces, same system underneath this episode with remy on The Startup Ideas Podcast (SIP) 🧃 was one of the clearest ways of understanding a lot of the core concepts of ai agents could be the best beginners course for ai agents 58 mins. all free. no advertisers. i just want to see you build cool stuff. im rooting for you. send to a friend watch

AI AGENTS 101 (58 minute free masterclass) send this to anyone who wants to understand ai agents, claude skills, md files, how to get the most out of AI etc in plain english: 1. chat vs agents - chat models answer questions in a back and forth while agents take a goal, figure out the steps, and deliver a result 2. agents don’t stop after one response. they keep running until the task is actually finishedno babysitting required 3. everything runs on a loop. they gather context, decide what to do, take an action, then repeat until done 4. the loop is the system. they look at files, tools, and the internet. decide the next step. execute and then feed that back into the next step. over and over until completion 5. the model is just one piece. gpt, claude, gemini are the reasoning layer. the key is model + loop + tools + context 6. mcp is how agents use tools. it connects things like browser, code, apis, and your internal software. once connected, the agent decides when to use them to get the job done 7. context beats prompt all day. you don't need to write perfect prompts. load your agent with context about your business, style, and goals and then simple instructions work 8. claude.md or agents.md is the onboarding doc it tells the agent who it is, how to behave, what it knows, and what tools it can use. this gets loaded every time before it starts 9. memory.md is how it improves. agents don’t remember by default. this file stores preferences, corrections, and patterns you tell the agent to update it, and it gets better over time 10. skills + harnesses make it usable. skills are reusable tasks like writing, research, analysis the harness is the environment like claude code or openclaw that runs everything. basiclaly, different interfaces, same system underneath this episode with remy on The Startup Ideas Podcast (SIP) 🧃 was one of the clearest ways of understanding a lot of the core concepts of ai agents could be the best beginners course for ai agents 58 mins. all free. no advertisers. i just want to see you build cool stuff. im rooting for you. send to a friend watch

GREG ISENBERG

375,365 görüntüleme • 4 ay önce

In the future, you’ll be able to accomplish a goal by just giving Claude an outcome and a budget. That’s the direction Anthropic is building in with its new Managed Agents features, announced at this week’s Code with Claude developer event. The basic idea: Claude, wrapped in a computer in the cloud, that you can spin up, scale, and manage as needed. Anthropic is taking on the infrastructure that kills most agent products, and making sure that it scales to meet the needs of agents running 24/7. On this week’s AI & I from Every 📧, I talk with Angela Jiang (Angela Jiang), head of product for the Claude platform, and Katelyn Lesse (Katelyn Lesse), head of engineering for the Claude platform, about what Anthropic is building and what it takes to make agents reliable in production. We get into: - Why the "build a generic harness, hot-swap any model behind it" playbook is already outdated. Angela points to eval data on Memory where the same task across different harnesses performed drastically differently. - The infrastructure wall every team hits in production—and why Katelyn thinks “my sandbox died and took the agent with it” is the real reason internal agents don't ship. - Why Anthropic is so bullish on using file systems and skills within Claude, including Angela's argument that those early design choices can compound for years. This is a must-watch for anyone trying to take an agent past the demo and into production. Watch below! Timestamps: How the Claude platform evolved from API to agents: 00:01:48 The primitives that make up Claude Managed Agents: 00:04:09 Why the harness and the model are becoming a single unit: 00:10:37 The infrastructure wall that kills most agent projects in production: 00:18:49 Why team agents need a different shape than individual productivity tools: 00:24:49 How Anthropic's legal team uses an agent to review marketing copy: 00:26:36 Using multi-agent orchestration for advisor strategies, adversarial pairs, and swarms: 00:34:24 How to measure agent success with outcome and budget as the end state: 00:35:50 What the platform looks like a year from now, when Claude writes its own harness: 00:39:11

In the future, you’ll be able to accomplish a goal by just giving Claude an outcome and a budget. That’s the direction Anthropic is building in with its new Managed Agents features, announced at this week’s Code with Claude developer event. The basic idea: Claude, wrapped in a computer in the cloud, that you can spin up, scale, and manage as needed. Anthropic is taking on the infrastructure that kills most agent products, and making sure that it scales to meet the needs of agents running 24/7. On this week’s AI & I from Every 📧, I talk with Angela Jiang (Angela Jiang), head of product for the Claude platform, and Katelyn Lesse (Katelyn Lesse), head of engineering for the Claude platform, about what Anthropic is building and what it takes to make agents reliable in production. We get into: - Why the "build a generic harness, hot-swap any model behind it" playbook is already outdated. Angela points to eval data on Memory where the same task across different harnesses performed drastically differently. - The infrastructure wall every team hits in production—and why Katelyn thinks “my sandbox died and took the agent with it” is the real reason internal agents don't ship. - Why Anthropic is so bullish on using file systems and skills within Claude, including Angela's argument that those early design choices can compound for years. This is a must-watch for anyone trying to take an agent past the demo and into production. Watch below! Timestamps: How the Claude platform evolved from API to agents: 00:01:48 The primitives that make up Claude Managed Agents: 00:04:09 Why the harness and the model are becoming a single unit: 00:10:37 The infrastructure wall that kills most agent projects in production: 00:18:49 Why team agents need a different shape than individual productivity tools: 00:24:49 How Anthropic's legal team uses an agent to review marketing copy: 00:26:36 Using multi-agent orchestration for advisor strategies, adversarial pairs, and swarms: 00:34:24 How to measure agent success with outcome and budget as the end state: 00:35:50 What the platform looks like a year from now, when Claude writes its own harness: 00:39:11

Dan Shipper 📧

66,339 görüntüleme • 2 ay önce

Bash is all you need! Which is why I'm introducing my holiday project: just-bash just-bash is a pretty complete implementation of bash in TypeScript designed to be used as a bash tool by AI agents. Because it turns out agents love exploring data via shell scripts, even beyond coding. It comes with grep, sed, awk and the 99th percentile features that an agent like Claude Code or Cursor would use. In fact, Claude Code can use it for secure bash execution. In the package - A bash-tool for AI SDK - A binary for use by yourself or your coding agents - An overlay filesystem to feed files to your agent securely - A Vercel Sandbox compatible API, so you can quickly upgrade to a real VM if you need to run binaries - An example AI agent that explores the just-bash code base using just-bash - I imported the Oils shell bash compatibility suite and just-bash passes a very good chunk What is interesting about this codebase: It was essentially entirely written by Opus 4.5. Coding agents love bash and they are good at reproducing it. They are also great at text-book recursive descent parsers and AST tweet-walk interpreters. That said, it is, like, a lot of code and I didn't read it all 😅. This is very much a hack, but it also seems to be _really_ useful. I haven't really found anything agents want to use that it doesn't support and it's fast and secure (caveats apply). It doesn't have write access to your computer and the filesystem is given a root that the agent cannot escape from. Find it at Related: Our recent blog post how we migrated our data analysis agent to bash tools and achieved incredible quality improvements The video shows the example agent investigating the just-bash code base

Bash is all you need! Which is why I'm introducing my holiday project: just-bash just-bash is a pretty complete implementation of bash in TypeScript designed to be used as a bash tool by AI agents. Because it turns out agents love exploring data via shell scripts, even beyond coding. It comes with grep, sed, awk and the 99th percentile features that an agent like Claude Code or Cursor would use. In fact, Claude Code can use it for secure bash execution. In the package - A bash-tool for AI SDK - A binary for use by yourself or your coding agents - An overlay filesystem to feed files to your agent securely - A Vercel Sandbox compatible API, so you can quickly upgrade to a real VM if you need to run binaries - An example AI agent that explores the just-bash code base using just-bash - I imported the Oils shell bash compatibility suite and just-bash passes a very good chunk What is interesting about this codebase: It was essentially entirely written by Opus 4.5. Coding agents love bash and they are good at reproducing it. They are also great at text-book recursive descent parsers and AST tweet-walk interpreters. That said, it is, like, a lot of code and I didn't read it all 😅. This is very much a hack, but it also seems to be _really_ useful. I haven't really found anything agents want to use that it doesn't support and it's fast and secure (caveats apply). It doesn't have write access to your computer and the filesystem is given a root that the agent cannot escape from. Find it at Related: Our recent blog post how we migrated our data analysis agent to bash tools and achieved incredible quality improvements The video shows the example agent investigating the just-bash code base

Malte Ubl

124,713 görüntüleme • 6 ay önce

For 40 years the file browser hasn’t changed. Today, we’re launching with $8 million in seed funding to rebuild the file browser into something more intelligent, searchable, and delightful. The world is in the middle of a data explosion. We’re generating and using more files than ever, but the apps we’re using to manage our files don’t even understand them. It’s time for file browsers to become useful. When you search for “dog”, it should show you content with dogs in it, not just files with “dog” in the name! When you want to edit, convert, summarize, or organize a file, your browser should do that, too. Your files tell the story of your life, but when you need a specific one, you usually can’t even find it anymore. Why can’t your file browser find it for you, or cross-reference it when you have a question? Prompting can give an LLM a million tokens of context. With Poly, you can give it the next trillion. As long as we can afford it, all new users receive 100GB of free cloud storage. We can’t wait for you to try it out!

For 40 years the file browser hasn’t changed. Today, we’re launching with $8 million in seed funding to rebuild the file browser into something more intelligent, searchable, and delightful. The world is in the middle of a data explosion. We’re generating and using more files than ever, but the apps we’re using to manage our files don’t even understand them. It’s time for file browsers to become useful. When you search for “dog”, it should show you content with dogs in it, not just files with “dog” in the name! When you want to edit, convert, summarize, or organize a file, your browser should do that, too. Your files tell the story of your life, but when you need a specific one, you usually can’t even find it anymore. Why can’t your file browser find it for you, or cross-reference it when you have a question? Prompting can give an LLM a million tokens of context. With Poly, you can give it the next trillion. As long as we can afford it, all new users receive 100GB of free cloud storage. We can’t wait for you to try it out!

Abhay Agarwal

1,776,784 görüntüleme • 8 ay önce

This is probably the biggest news yet in software going headless, and will bring knowledge work agents to the masses. The new ChatGPT agents have access to any of the tools and data you want to work with, with complete coding and tool use available to them. Here's an example of a custom sales assistant agent uses Box as a knowledge source for accessing enterprise content securely to answer questions and generate new content on the fly. The workflows can obviously vastly far more complex as the agent can use any of the tools within Box available via MCP and CLI. This precisely what agents will start to look like for knowledge work. You'll be able to spin them up in the foreground or background to help augment work. Big opportunity right now for headless platforms, and for all the new builders and designers of these agents in the enterprise.

This is probably the biggest news yet in software going headless, and will bring knowledge work agents to the masses. The new ChatGPT agents have access to any of the tools and data you want to work with, with complete coding and tool use available to them. Here's an example of a custom sales assistant agent uses Box as a knowledge source for accessing enterprise content securely to answer questions and generate new content on the fly. The workflows can obviously vastly far more complex as the agent can use any of the tools within Box available via MCP and CLI. This precisely what agents will start to look like for knowledge work. You'll be able to spin them up in the foreground or background to help augment work. Big opportunity right now for headless platforms, and for all the new builders and designers of these agents in the enterprise.

Aaron Levie

419,956 görüntüleme • 3 ay önce

New short course: Build AI Apps with MCP Servers: Working with Box Files, built with Box and taught by Ben Kus , their CTO. Many AI applications require custom code for basic file operations. The Model Context Protocol (MCP) standardizes this by letting you offload file tasks to dedicated servers that provide tools an LLM can use directly. In this course, you'll process documents stored in a Box folder using the Box MCP server. Rather than writing custom integration code to connect to the Box API and download files, you'll design your application to use the tools provided via MCP. Skills you'll gain: - Build an LLM-powered document processing app, using the Box MCP server to access files - Design a multi-agent system using Google's Agent Development Kit (ADK), consisting of specialized agents for file operations - Coordinate the multi-agent workflow through an orchestrator that uses the Agent2Agent (A2A) protocol to connect to the agents You'll start with a local file-processing app, refactor it to work with Box's MCP server, then evolve it into a multi-agent system. Sign up here:

New short course: Build AI Apps with MCP Servers: Working with Box Files, built with Box and taught by Ben Kus , their CTO. Many AI applications require custom code for basic file operations. The Model Context Protocol (MCP) standardizes this by letting you offload file tasks to dedicated servers that provide tools an LLM can use directly. In this course, you'll process documents stored in a Box folder using the Box MCP server. Rather than writing custom integration code to connect to the Box API and download files, you'll design your application to use the tools provided via MCP. Skills you'll gain: - Build an LLM-powered document processing app, using the Box MCP server to access files - Design a multi-agent system using Google's Agent Development Kit (ADK), consisting of specialized agents for file operations - Coordinate the multi-agent workflow through an orchestrator that uses the Agent2Agent (A2A) protocol to connect to the agents You'll start with a local file-processing app, refactor it to work with Box's MCP server, then evolve it into a multi-agent system. Sign up here:

Andrew Ng

81,696 görüntüleme • 10 ay önce

AG-UI makes building agentic applications dramatically easier. Here's how it works. This is a model for a simple chatbot: User → LLM → Response But interactive agents that render UI, pause for approvals, and ask users for input need a much more complex model. When building these agents, a response from the LLM will include a series of state changes as the agent runs: • Agent started a task • Agent called a tool • Agent updated its state • Agent streams these tokens • Agent is waiting on a human • Agent is resuming the task The Agent-User Interaction Protocol (AG-UI) treats the LLM response as a stream of events rather than a text endpoint. In practice, here is what you get as an agent runs: 1. Lifecycle events so your UI knows where the agent is. 2. Text messages that stream tokens. 3. Tool calls so your UI can prefill a form with any required arguments. 4. State updates that keep your UI in sync with the agent. 5. Special events for human approvals, rich media, and custom needs. All of these events travel over standard transports (SSE, WebSockets, or plain HTTP) as JSON. As a result, you can build a frontend that stays in sync with the agent's progress without having to invent a custom process to make this happen. For example, building a human-in-the-loop workflow becomes an off-the-shelf component you can integrate rather than build from scratch. CopilotKit🪁 is the creator of AG-UI, and you can use it when building frontend applications pretty much anywhere: • React • Angular • Vue • React Native • Slack • Teams • Discord • WhatsApp • Telegram Here is the link for you to check it out: Thanks to the CopilotKit team for partnering with me on this post.

AG-UI makes building agentic applications dramatically easier. Here's how it works. This is a model for a simple chatbot: User → LLM → Response But interactive agents that render UI, pause for approvals, and ask users for input need a much more complex model. When building these agents, a response from the LLM will include a series of state changes as the agent runs: • Agent started a task • Agent called a tool • Agent updated its state • Agent streams these tokens • Agent is waiting on a human • Agent is resuming the task The Agent-User Interaction Protocol (AG-UI) treats the LLM response as a stream of events rather than a text endpoint. In practice, here is what you get as an agent runs: 1. Lifecycle events so your UI knows where the agent is. 2. Text messages that stream tokens. 3. Tool calls so your UI can prefill a form with any required arguments. 4. State updates that keep your UI in sync with the agent. 5. Special events for human approvals, rich media, and custom needs. All of these events travel over standard transports (SSE, WebSockets, or plain HTTP) as JSON. As a result, you can build a frontend that stays in sync with the agent's progress without having to invent a custom process to make this happen. For example, building a human-in-the-loop workflow becomes an off-the-shelf component you can integrate rather than build from scratch. CopilotKit🪁 is the creator of AG-UI, and you can use it when building frontend applications pretty much anywhere: • React • Angular • Vue • React Native • Slack • Teams • Discord • WhatsApp • Telegram Here is the link for you to check it out: Thanks to the CopilotKit team for partnering with me on this post.

Santiago

17,438 görüntüleme • 21 gün önce

Anthropic's in trouble, again. The entire Claude experience is now available at 1/6th the price. Kimi now does everything Claude does, powered by K2.6, a 1-trillion-parameter MoE model that activates only 32B parameters per token. It covers all three features Claude has (Chat, Code, and Cowork): 1) Kimi Chat runs in four modes - Instant for fast responses - Thinking for deep reasoning - Agent for multi-step execution - and Agent Swarm for parallel workloads. There's a 262K context window across all of them. 2) Kimi Code is the open-source CLI coding agent with K2.6 as the default backend. K2.6 ranked #1 on OpenRouter's programming leaderboard by weekly usage. 3) Kimi Agent is the Cowork equivalent. It generates: - full websites with database and auth - presentation decks (editable PPTX output) - spreadsheets with formulas and charts - word docs and structured research reports. On top of this, Kimi K2.6 is also trained to decompose tasks into up to 300 parallel sub-agents. This helps it retain coherence even across 4,000+ tool calls in a single run, with sessions sustaining up to 13 hours. On SWE-Bench Pro: - Kimi K2.6 → 58.6 - GPT-5.4 xhigh → 57.7 - Gemini 3.1 Pro → 54.2 - Claude Opus 4.6 → 53.4 Kimi K2.6 model is open weights and self-hostable on 4x H100s in INT4. Find the link to the HuggingFace model page in the replies!

Anthropic's in trouble, again. The entire Claude experience is now available at 1/6th the price. Kimi now does everything Claude does, powered by K2.6, a 1-trillion-parameter MoE model that activates only 32B parameters per token. It covers all three features Claude has (Chat, Code, and Cowork): 1) Kimi Chat runs in four modes - Instant for fast responses - Thinking for deep reasoning - Agent for multi-step execution - and Agent Swarm for parallel workloads. There's a 262K context window across all of them. 2) Kimi Code is the open-source CLI coding agent with K2.6 as the default backend. K2.6 ranked #1 on OpenRouter's programming leaderboard by weekly usage. 3) Kimi Agent is the Cowork equivalent. It generates: - full websites with database and auth - presentation decks (editable PPTX output) - spreadsheets with formulas and charts - word docs and structured research reports. On top of this, Kimi K2.6 is also trained to decompose tasks into up to 300 parallel sub-agents. This helps it retain coherence even across 4,000+ tool calls in a single run, with sessions sustaining up to 13 hours. On SWE-Bench Pro: - Kimi K2.6 → 58.6 - GPT-5.4 xhigh → 57.7 - Gemini 3.1 Pro → 54.2 - Claude Opus 4.6 → 53.4 Kimi K2.6 model is open weights and self-hostable on 4x H100s in INT4. Find the link to the HuggingFace model page in the replies!

Avi Chawla

109,818 görüntüleme • 2 ay önce

Sam Altman said on camera at Stripe Sessions that we are in an insane area, stuck with old devices and old operating systems. The OS and the browser are built for humans clicking. Both get rewritten around agents that do not click. Two years deep building agents. Sharpest framing I have heard a frontier lab CEO put on the next interface. ↓ Read it before the gap locks in The part nobody is sitting with: >the chat window is the old interface >the new one is an agent in a loop with your machine >the gap between people prompting and people running agents is forming right now >in six months it will be permanent Open Claude, type a question, copy the answer, paste it into VS Code. The average dev runs this loop every day. They think they are using AI. That is 5% of what these tools can do. The people pulling ahead do not prompt. They build harnesses. They run agents in loops. Same model. Different output. Full 2026 agent playbook in the article below here:

Sam Altman said on camera at Stripe Sessions that we are in an insane area, stuck with old devices and old operating systems. The OS and the browser are built for humans clicking. Both get rewritten around agents that do not click. Two years deep building agents. Sharpest framing I have heard a frontier lab CEO put on the next interface. ↓ Read it before the gap locks in The part nobody is sitting with: >the chat window is the old interface >the new one is an agent in a loop with your machine >the gap between people prompting and people running agents is forming right now >in six months it will be permanent Open Claude, type a question, copy the answer, paste it into VS Code. The average dev runs this loop every day. They think they are using AI. That is 5% of what these tools can do. The people pulling ahead do not prompt. They build harnesses. They run agents in loops. Same model. Different output. Full 2026 agent playbook in the article below here:

Rohit

120,600 görüntüleme • 2 ay önce