Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

🚀New Amazon Q Developer agent for software development is available to customers: This agent is based on a new agent architecture that has exciting results coming from the SWE-bench scores (on the full and verified benchmarks) representing AI models’ ability to resolve real-world coding problems. Interesting aspect of Q... Agent is that with these newest updates, Q drove nearly 50% more successful coding tasks completed. What makes Q Dev Agent remarkable? The agent architecture is not just about using the best LLMs (which we do), but also giving the agent the ability to constantly explore multiple paths to find the best way to resolve a particular problem (and back tracking when it has reached dead end like a developer would do). Needless to say, we are just getting started on the developer agent and we are constantly pushing to advance our AI capabilities while maintaining quality, security, privacy, and reliability to keep Amazon Q Developer an innovative and trusted option available to our customers using agents for software development. We highlighted the results of our first SWE-bench submission of Amazon Q Developer back in June blog post; with these updates, our new agent resolves 51% more coding tasks than its previous iteration on the SWE-bench verified dataset, and 43% more on the full dataset. That’s the difference a few months make, and I can’t wait to share what our teams will deliver at re:Invent this December. Here's a quick demo showcasing our new Agent in action:show more

Swami Sivasubramanian

5,632 subscribers

28,946 views • 1 year ago •via X (Twitter)

Education News & Politics Science & Technology

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

SWE-Agent is an open-source software engineering agent with a 12.3% resolve rate on SWE-Bench! Check out SWE-agent in action at Repo:

SWE-Agent is an open-source software engineering agent with a 12.3% resolve rate on SWE-Bench! Check out SWE-agent in action at Repo:

carlos

138,703 views • 2 years ago

I wanted to share a quick demo of what we've been working on with our ai agent cloud. This enables fast deployment of agents that have access to a suite of tools, and was designed with agent interoperability in mind. This demo shows how you can go from nothing to an AI twitter agent in a couple minutes. This is what we are using internally to manage deployments, so we will consistently upgrading its capabilities. The next goals are to enable simple TEE deployments for agents, and focus on building out feature for agent interoperability to simplify agent to agent collaboration.

I wanted to share a quick demo of what we've been working on with our ai agent cloud. This enables fast deployment of agents that have access to a suite of tools, and was designed with agent interoperability in mind. This demo shows how you can go from nothing to an AI twitter agent in a couple minutes. This is what we are using internally to manage deployments, so we will consistently upgrading its capabilities. The next goals are to enable simple TEE deployments for agents, and focus on building out feature for agent interoperability to simplify agent to agent collaboration.

Johnny

26,738 views • 1 year ago

Experience a dynamic, interactive chat experience with our new #AmazonQ Developer CLI agent! You can ask Q Developer to write code, automatically debug issues, & more & Q Developer will iteratively make updates based on your feedback. Learn more. 👉

Experience a dynamic, interactive chat experience with our new #AmazonQ Developer CLI agent! You can ask Q Developer to write code, automatically debug issues, & more & Q Developer will iteratively make updates based on your feedback. Learn more. 👉

Amazon Web Services

12,559 views • 1 year ago

Introducing ALE-Bench, ALE-Agent! Towards Automating Long-Horizon Algorithm Engineering for Hard Optimization Problems Blog: Paper: ALE-Bench is a coding benchmark primarily focused on hard optimization (NP-hard) problems. We developed this benchmark with AtCoder Inc., a leading coding contest platform company. What makes ALE-Bench unique is its focus on hard optimization problems that demand long-horizon and creative reasoning. It’s open-ended, in the sense that true optima are out of reach (NP-hard) and scores can continuously improve. We believe this benchmark has the potential to become one of the key benchmarks for reasoning and coding in the next generation. ALE-Agent is our end-to-end agent that we specifically designed for this challenging domain. In fact, our ALE-Agent has already built an impressive track record in the wild! In May 2025, our agent participated in a live AtCoder Heuristic Competition (AHC), alongside 1,000 other participants in real-time. AHC is considered to be one of the most challenging coding competitions in this domain. Our ALE-Agent achieved an impressive ranking of 21st out of 1,000 human participants in the competition (top 2%), marking a turning point for AI discovery of solutions to hard optimization problems with a wide spectrum of important real world applications such as logistics, routing, packing, factory production planning, power-grid balancing. We look forward to applying this technology to real industrial optimization opportunities. Building on the insights from this study, Sakana AI will continue to tackle the challenge of developing AI with even greater algorithm engineering capabilities. ALE-Bench Dataset: ALE-Bench Code: This research was conducted in collaboration with AtCoder Inc. (AtCoder). We are deeply grateful for their outstanding expertise and contributions in optimization and algorithms, which were invaluable in providing data, analyzing results, and enabling our AI agent’s participation in their contests.

Introducing ALE-Bench, ALE-Agent! Towards Automating Long-Horizon Algorithm Engineering for Hard Optimization Problems Blog: Paper: ALE-Bench is a coding benchmark primarily focused on hard optimization (NP-hard) problems. We developed this benchmark with AtCoder Inc., a leading coding contest platform company. What makes ALE-Bench unique is its focus on hard optimization problems that demand long-horizon and creative reasoning. It’s open-ended, in the sense that true optima are out of reach (NP-hard) and scores can continuously improve. We believe this benchmark has the potential to become one of the key benchmarks for reasoning and coding in the next generation. ALE-Agent is our end-to-end agent that we specifically designed for this challenging domain. In fact, our ALE-Agent has already built an impressive track record in the wild! In May 2025, our agent participated in a live AtCoder Heuristic Competition (AHC), alongside 1,000 other participants in real-time. AHC is considered to be one of the most challenging coding competitions in this domain. Our ALE-Agent achieved an impressive ranking of 21st out of 1,000 human participants in the competition (top 2%), marking a turning point for AI discovery of solutions to hard optimization problems with a wide spectrum of important real world applications such as logistics, routing, packing, factory production planning, power-grid balancing. We look forward to applying this technology to real industrial optimization opportunities. Building on the insights from this study, Sakana AI will continue to tackle the challenge of developing AI with even greater algorithm engineering capabilities. ALE-Bench Dataset: ALE-Bench Code: This research was conducted in collaboration with AtCoder Inc. (AtCoder). We are deeply grateful for their outstanding expertise and contributions in optimization and algorithms, which were invaluable in providing data, analyzing results, and enabling our AI agent’s participation in their contests.

Sakana AI

237,195 views • 11 months ago

AI agents are becoming increasingly capable of taking actions beyond the chat window and interacting with tools in the real world, such as web browsers. We explored this capability by having an AI agent present one of our interactive web-based demos. In this video, it's presenting our election interference demo. This agent is a Cursor coding agent and is equipped with tools to read and interact with the browser (using Playwright) and perform text-to-speech (with ElevenLabs), in addition to the standard Cursor tools.

AI agents are becoming increasingly capable of taking actions beyond the chat window and interacting with tools in the real world, such as web browsers. We explored this capability by having an AI agent present one of our interactive web-based demos. In this video, it's presenting our election interference demo. This agent is a Cursor coding agent and is equipped with tools to read and interact with the browser (using Playwright) and perform text-to-speech (with ElevenLabs), in addition to the standard Cursor tools.

CivAI

18,397 views • 6 months ago

Introducing Warp 2.0: the Agentic Development Environment 1️⃣ Top overall coding agent: #1 on Terminal-Bench, 71% on SWE-bench Verified 2️⃣ Agent multi-threading: build features, debug, and ship all at once 3️⃣ The first all-in-one platform for agentic development 🧵 Learn more

Introducing Warp 2.0: the Agentic Development Environment 1️⃣ Top overall coding agent: #1 on Terminal-Bench, 71% on SWE-bench Verified 2️⃣ Agent multi-threading: build features, debug, and ship all at once 3️⃣ The first all-in-one platform for agentic development 🧵 Learn more

Warp

1,445,551 views • 11 months ago

We are excited to announce a powerful step for the future of FOMO! Taking a page out of Virtuals book on BASE, FOMO will be releasing the ability for future projects to be paired in $FOMO in the coming weeks. This is the biggest release we have ever announced. Launch your AI Agent Token + $FOMO trading pair Every individual agent token is paired with the $FOMO token in its liquidity pool. When launching an agent on you will need $FOMO tokens, which are used to create the liquidity pool. This process creates deflationary pressure for FOMO and the entire agent ecosystem. When creating your agent and token, you will have the option to pair your launch with FOMO or SOL, as our goal is not to alienate any project, but rather invite the best communities, CTO’s and builders to launch with us. If you decide to pair your project with FOMO you in turn get full marketing and dev support, once your project graduates the bonding curve and reaches Raydium. Further, as an added incentive, as our revenue grows we will be using part of the funds to support projects that have paired in FOMO. And Devs who launch tokens paired in FOMO will earn fees from their AI Agent token launch. Building the most robust agents using our framework will catapult us as one of the most prominent standards of the Solana ecosystem. Not only have we developed our own core infrastructure, but we also pull from some of the best repo’s and developer talent in all of AI, not just blockchain. Our team is comprised of 9 world class artificial intelligence engineers, PHDs in mathematics and engineering from the top companies on the cutting edge of AI. The future of AI Agents will be on Solana and we will help lead the way.

We are excited to announce a powerful step for the future of FOMO! Taking a page out of Virtuals book on BASE, FOMO will be releasing the ability for future projects to be paired in $FOMO in the coming weeks. This is the biggest release we have ever announced. Launch your AI Agent Token + $FOMO trading pair Every individual agent token is paired with the $FOMO token in its liquidity pool. When launching an agent on you will need $FOMO tokens, which are used to create the liquidity pool. This process creates deflationary pressure for FOMO and the entire agent ecosystem. When creating your agent and token, you will have the option to pair your launch with FOMO or SOL, as our goal is not to alienate any project, but rather invite the best communities, CTO’s and builders to launch with us. If you decide to pair your project with FOMO you in turn get full marketing and dev support, once your project graduates the bonding curve and reaches Raydium. Further, as an added incentive, as our revenue grows we will be using part of the funds to support projects that have paired in FOMO. And Devs who launch tokens paired in FOMO will earn fees from their AI Agent token launch. Building the most robust agents using our framework will catapult us as one of the most prominent standards of the Solana ecosystem. Not only have we developed our own core infrastructure, but we also pull from some of the best repo’s and developer talent in all of AI, not just blockchain. Our team is comprised of 9 world class artificial intelligence engineers, PHDs in mathematics and engineering from the top companies on the cutting edge of AI. The future of AI Agents will be on Solana and we will help lead the way.

FOMO

129,858 views • 1 year ago

We are excited to share a research preview of our generative agent. The agent is being trained to solve the hardest tasks in 3D and beyond, using only keyboard and mouse actions. Join the waitlist: Our agent app runs on Windows or Mac, either locally or with one-click setup for a Windows VM. It’s still early days, but this paves the way for production-level workflows for the first time ever. Blog:

We are excited to share a research preview of our generative agent. The agent is being trained to solve the hardest tasks in 3D and beyond, using only keyboard and mouse actions. Join the waitlist: Our agent app runs on Windows or Mac, either locally or with one-click setup for a Windows VM. It’s still early days, but this paves the way for production-level workflows for the first time ever. Blog:

Common Sense Machines

155,199 views • 1 year ago

Today we’re shipping new capabilities that make Resolve AI the platform where engineering teams run and fix production software with AI agents. New capabilities include background agents that run operational tasks, new agent architecture that delivers 2x investigation quality, new agent capabilities like governed actions, and new ways to work with agents in UI or terminal. With Resolve AI engineering teams can: - Delegate on-call to agents - Co-work with agents to resolve incidents - Run operational tasks with background agents.

Today we’re shipping new capabilities that make Resolve AI the platform where engineering teams run and fix production software with AI agents. New capabilities include background agents that run operational tasks, new agent architecture that delivers 2x investigation quality, new agent capabilities like governed actions, and new ways to work with agents in UI or terminal. With Resolve AI engineering teams can: - Delegate on-call to agents - Co-work with agents to resolve incidents - Run operational tasks with background agents.

Resolve AI

313,579 views • 19 days ago

The Claude Code SDK is now the Claude Agent SDK Why? Because we realized the Claude Code agent harness is useful for much more than coding. In fact, we're moving to using it to power most of our own agent loops at Anthropic.

The Claude Code SDK is now the Claude Agent SDK Why? Because we realized the Claude Code agent harness is useful for much more than coding. In fact, we're moving to using it to power most of our own agent loops at Anthropic.

Thariq

206,976 views • 8 months ago

New AI Package with DeepSeek AI Model TesseractAI continues to expand and improve our Custom AI Agent. You will now be able to add AI packages to your agent in addition to custom information. 🌟 This new feature is being developed using #DeepSeek AI model. In particular, DeepSeek R1 is designed for greater efficiency and more goal-oriented tasks, requiring fewer computational resources while still delivering powerful results. Additionally, we’re experimenting with DeepSeek’s technology for our existing Agent LLM to further enhance performance. Our goal is to make the Custom AI Agent even more efficient and effective for all your needs.

New AI Package with DeepSeek AI Model TesseractAI continues to expand and improve our Custom AI Agent. You will now be able to add AI packages to your agent in addition to custom information. 🌟 This new feature is being developed using #DeepSeek AI model. In particular, DeepSeek R1 is designed for greater efficiency and more goal-oriented tasks, requiring fewer computational resources while still delivering powerful results. Additionally, we’re experimenting with DeepSeek’s technology for our existing Agent LLM to further enhance performance. Our goal is to make the Custom AI Agent even more efficient and effective for all your needs.

Tesseract AI

13,234 views • 1 year ago

🚨 Secure OpenClaw - Message Your Custom AI Agent From Telegram We are excited for our first release of personal agents. You can now use Abacus AI's DeepAgent to vibe code a personal agent on Telegram Create and simply message the agent! - do tasks like respond to emails - schedule meetings - use the browser to book tickets - connect to your internal systems The agent runs on a secure environment with constraints and you can pretty much customize it exactly like you want it when you are building it This is just v1 - we are going to be launching something WAY more exciting soon

🚨 Secure OpenClaw - Message Your Custom AI Agent From Telegram We are excited for our first release of personal agents. You can now use Abacus AI's DeepAgent to vibe code a personal agent on Telegram Create and simply message the agent! - do tasks like respond to emails - schedule meetings - use the browser to book tickets - connect to your internal systems The agent runs on a secure environment with constraints and you can pretty much customize it exactly like you want it when you are building it This is just v1 - we are going to be launching something WAY more exciting soon

Bindu Reddy

10,702 views • 3 months ago

Introducing OB-1: the new #1 coding agent on Terminal Bench. After a year of R&D, our agent now outperforms Codex and Claude Code. Early access is rolling out to waitlist users now.

Introducing OB-1: the new #1 coding agent on Terminal Bench. After a year of R&D, our agent now outperforms Codex and Claude Code. Early access is rolling out to waitlist users now.

OpenBlock

75,814 views • 9 months ago

🧬 Excited to open-source Biomni! With just a few lines of code, you can now automate biomedical research with AI agent! We are releasing Biomni A1 (agent) + E1 (env) with 150 specialized tools, 59 databases, and 105 software. E1 is our first attempt at curating the bio-agent environment, but it only scratches the surface: Call to help build Biomni-E2 - an open environment for bio-agent, built with and for the community! We welcome new tools, benchmarks, datasets, agents, and beyond to build the environment together. Significant contributors will be invited as co-authors on our upcoming E2 manuscript. More to come! 💻 🌐 Biomni

🧬 Excited to open-source Biomni! With just a few lines of code, you can now automate biomedical research with AI agent! We are releasing Biomni A1 (agent) + E1 (env) with 150 specialized tools, 59 databases, and 105 software. E1 is our first attempt at curating the bio-agent environment, but it only scratches the surface: Call to help build Biomni-E2 - an open environment for bio-agent, built with and for the community! We welcome new tools, benchmarks, datasets, agents, and beyond to build the environment together. Significant contributors will be invited as co-authors on our upcoming E2 manuscript. More to come! 💻 🌐 Biomni

Kexin Huang

34,698 views • 11 months ago

Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is an autonomous agent that solves engineering tasks through the use of its own shell, code editor, and web browser. When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted. Check out what Devin can do in the thread below.

Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is an autonomous agent that solves engineering tasks through the use of its own shell, code editor, and web browser. When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted. Check out what Devin can do in the thread below.

Cognition

31,437,532 views • 2 years ago

The freedom to build with any agent framework you want is analogous to the freedom of religion, and this is a core value for the Virtuals society. Here are a few ways we are opening up support for every autonomous agentic framework out there: Today: - Enabling agent creators to specify which framework the agent runs on - Release of Terminal API, which allows agent builders using other agent frameworks to stream their thoughts and activities live on their agent pages. For a guide on how to use this, refer to: - Ability for agents running on any framework to publicly list their capabilities, a key enabler for the agent-to-agent marketplace Soon: - Integration into the agentic commerce standard and registry. Agents across frameworks being able to orchestrate and pay for tasks among themselves - ⁠Multi agent orchestration across frameworks. Imagine building an autonomous business with agents from different religions Welcome to the Virtuals society.

The freedom to build with any agent framework you want is analogous to the freedom of religion, and this is a core value for the Virtuals society. Here are a few ways we are opening up support for every autonomous agentic framework out there: Today: - Enabling agent creators to specify which framework the agent runs on - Release of Terminal API, which allows agent builders using other agent frameworks to stream their thoughts and activities live on their agent pages. For a guide on how to use this, refer to: - Ability for agents running on any framework to publicly list their capabilities, a key enabler for the agent-to-agent marketplace Soon: - Integration into the agentic commerce standard and registry. Agents across frameworks being able to orchestrate and pay for tasks among themselves - ⁠Multi agent orchestration across frameworks. Imagine building an autonomous business with agents from different religions Welcome to the Virtuals society.

Virtuals Protocol

216,070 views • 1 year ago

Watch us use the Agent Manager to delegate research and coding tasks to a first-class AI partner. It's the new home base for turning ambiguous ideas into structured artifacts. See the Agent Manager in action on this week’s livestream →

Watch us use the Agent Manager to delegate research and coding tasks to a first-class AI partner. It's the new home base for turning ambiguous ideas into structured artifacts. See the Agent Manager in action on this week’s livestream →

Google Cloud Tech

22,302 views • 3 months ago

You can create an AI Agent that answers your email with a few clicks. 1. Go to ChatLLM ( 2. Click on AI Engineer 3. Select Create an AI Agent 4. Choose the Email Answering Agent ChatLLM will do the rest: it will code, test, and deploy the agent for you. You can also create a custom agent in English. The Agent Economy is coming (somebody should write a book and use this title.) We are going to see examples like this, times 1,000 in 2025. Just think about how many repetitive tasks you perform every day. Some of these tasks are involved enough that we couldn't automate them with pre-AI solutions. That's where we'll see agents explode, and I'm here for it.

You can create an AI Agent that answers your email with a few clicks. 1. Go to ChatLLM ( 2. Click on AI Engineer 3. Select Create an AI Agent 4. Choose the Email Answering Agent ChatLLM will do the rest: it will code, test, and deploy the agent for you. You can also create a custom agent in English. The Agent Economy is coming (somebody should write a book and use this title.) We are going to see examples like this, times 1,000 in 2025. Just think about how many repetitive tasks you perform every day. Some of these tasks are involved enough that we couldn't automate them with pre-AI solutions. That's where we'll see agents explode, and I'm here for it.

Santiago

79,974 views • 1 year ago

Submission for Lexicon 🚀 Integrated AI Agent (ORBIS AI AGENT) from Lexicon AI Framework to participate in Lexicon AI Hackathon. This is Our Live Demo of the ORBIS AI AGENT@solana Visit and interact with your ORBIS AI AGENT -

Submission for Lexicon 🚀 Integrated AI Agent (ORBIS AI AGENT) from Lexicon AI Framework to participate in Lexicon AI Hackathon. This is Our Live Demo of the ORBIS AI AGENT@solana Visit and interact with your ORBIS AI AGENT -

orbis ai

30,222 views • 1 year ago

i just built a 4-agent software team. everything runs from Telegram and gets managed on a kanban board. a project manager who plans the work, a backend developer, a frontend developer, and a tester. the PM reads a goal, breaks it into linked tasks, and assigns each to the right agent. the thing that makes them a team instead of four strangers is a shared kanban board. every task is a row that survives crashes, and when an agent finishes, it writes a summary of what it built and what the next agent needs to know. the next agent reads that summary before it starts. so the frontend developer never has to guess the API shape, and the tester knows exactly what to verify. the hardest part was not the coordination. it was building an agent that could actually act like a backend engineer. a backend engineer stands up a database, wires auth, manages storage, deploys functions, and keeps all of it consistent while the rest of the team builds on top. an agent doing this from scratch drowns. it burns its context window remembering which tables exist and which endpoint it created three steps ago, and the work degrades fast. so the backend agent needs a backend built for agents, not for humans clicking through a dashboard. that is where InsForge came in. it is an open-source, agent-native backend, and i added it to my backend developer agent as a skill. a skill is a step-by-step guide that teaches the agent how to do a specific kind of work. with InsForge installed, the agent stopped improvising infrastructure and followed a reliable path: create the project, define the database, set up auth, deploy functions. to test the whole team, i had them build a working Google Docs clone, AI features included. the backend agent spun up the full service on its own. database tables, user auth, document handling, and edge functions running real TypeScript, all in one dashboard. the frontend agent read that summary and built the UI on top of it, and the tester closed the loop. the result was a backend an agent could reason about end to end, instead of one it kept getting lost inside. if you are building an AI backend engineer, InsForge is worth a look, it's 100% open-source. InsForge GitHub: (don't forget to star 🌟) the full article on Hermes Kanban: Mission Control for your Agents is quoted below.

i just built a 4-agent software team. everything runs from Telegram and gets managed on a kanban board. a project manager who plans the work, a backend developer, a frontend developer, and a tester. the PM reads a goal, breaks it into linked tasks, and assigns each to the right agent. the thing that makes them a team instead of four strangers is a shared kanban board. every task is a row that survives crashes, and when an agent finishes, it writes a summary of what it built and what the next agent needs to know. the next agent reads that summary before it starts. so the frontend developer never has to guess the API shape, and the tester knows exactly what to verify. the hardest part was not the coordination. it was building an agent that could actually act like a backend engineer. a backend engineer stands up a database, wires auth, manages storage, deploys functions, and keeps all of it consistent while the rest of the team builds on top. an agent doing this from scratch drowns. it burns its context window remembering which tables exist and which endpoint it created three steps ago, and the work degrades fast. so the backend agent needs a backend built for agents, not for humans clicking through a dashboard. that is where InsForge came in. it is an open-source, agent-native backend, and i added it to my backend developer agent as a skill. a skill is a step-by-step guide that teaches the agent how to do a specific kind of work. with InsForge installed, the agent stopped improvising infrastructure and followed a reliable path: create the project, define the database, set up auth, deploy functions. to test the whole team, i had them build a working Google Docs clone, AI features included. the backend agent spun up the full service on its own. database tables, user auth, document handling, and edge functions running real TypeScript, all in one dashboard. the frontend agent read that summary and built the UI on top of it, and the tester closed the loop. the result was a backend an agent could reason about end to end, instead of one it kept getting lost inside. if you are building an AI backend engineer, InsForge is worth a look, it's 100% open-source. InsForge GitHub: (don't forget to star 🌟) the full article on Hermes Kanban: Mission Control for your Agents is quoted below.

Akshay 🚀

114,135 views • 3 days ago