Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

SWE-Agent is an open-source software engineering agent with a 12.3% resolve rate on SWE-Bench! Check out SWE-agent in action at Repo:

carlos

1,372 subscribers

138,830 views • 2 years ago •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

10 Comments

carlos2 years ago

The SWE-agent open-source repository provides a framework for turning general LMs into software engineering agents. SWE-agent lets LMs like GPT-4 interact with their own Docker container using an Agent Computer Interface (ACI) - allowing it to browse, search, edit, and run code.

carlos2 years ago

It’s been amazing to work on this with such a great team: @jyangballin*, @_carlosejimenez*, @_awettig, @ShunyuYao12, @karthik_r_n, and @OfirPress Keep an eye out for the paper coming out April 10th!

MindBranches2 years ago

Here is a overview of the new open source Software Engineering Agent (SWE-Agent):

Elman Mansimov2 years ago

@OfirPress Very cool!

Eddie Forson2 years ago

Nice work! Already starred the repo. Will take a look at the code in the next few days 👌🏿

Harry Tormey 🇮🇪 | 🇺🇸| 🇺🇦2 years ago

Amazing work with SWE-Bench and now this! Thanks for publishing your research and the source to this!.

Konisberg Heinrich2 years ago

Incredible!

Gopinathan A2 years ago

Nice work!!

Hadi2 years ago

Does it just work with anthropic and openai models right now?

Owen Campbell-Moore ✪2 years ago

(OpenAI PM here!) Super cool! I'm curious, what changes to our models or APIs would have made this easier to build or would make it work better? 🤔

Related Videos

"You'll see this in a big way with the software engineering agent." He points to SWE agents, where 50¢ of compute can yield "$500 / $5,000 of work." The SWE agent will be the first major impact that will make people think about the economy with shock.

"You'll see this in a big way with the software engineering agent." He points to SWE agents, where 50¢ of compute can yield "$500 / $5,000 of work." The SWE agent will be the first major impact that will make people think about the economy with shock.

Chubby♨️

111,704 views • 1 year ago

MiniMax M2.1 was officially announced, scoring 72.5% on SWE-multilingual. "A SOTA 10B-activated OSS coding & agent model, scoring 72.5% on SWE-multilingual and 88.6% on our newly open-sourced VIBE-bench"

MiniMax M2.1 was officially announced, scoring 72.5% on SWE-multilingual. "A SOTA 10B-activated OSS coding & agent model, scoring 72.5% on SWE-multilingual and 88.6% on our newly open-sourced VIBE-bench"

TestingCatalog News 🗞

16,235 views • 7 months ago

In <4 months, Ridges used an open, incentivized tournament to produce the top open-source coding agent. • It outperforms all open-source agentic systems on SWE-Bench • Beats closed models from billion-dollar labs • Was done with $0 VC funding

In <4 months, Ridges used an open, incentivized tournament to produce the top open-source coding agent. • It outperforms all open-source agentic systems on SWE-Bench • Beats closed models from billion-dollar labs • Was done with $0 VC funding

Sami Kassab

58,074 views • 10 months ago

GitHub Copilot just got an upgrade 🤖 Agent mode is now available in preview and Copilot Edits is generally available, both in Visual Studio Code. Plus you can check out a sneak peek at our SWE agent 👀

GitHub Copilot just got an upgrade 🤖 Agent mode is now available in preview and Copilot Edits is generally available, both in Visual Studio Code. Plus you can check out a sneak peek at our SWE agent 👀

GitHub

110,587 views • 1 year ago

Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is an autonomous agent that solves engineering tasks through the use of its own shell, code editor, and web browser. When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted. Check out what Devin can do in the thread below.

Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is an autonomous agent that solves engineering tasks through the use of its own shell, code editor, and web browser. When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted. Check out what Devin can do in the thread below.

Cognition

31,450,230 views • 2 years ago

Ok Tokyo devs, lets do this thing. Here is a first glimpse at our autonomous SWE agent "Project Padawan" -- in Japanese! (1/3)

Ok Tokyo devs, lets do this thing. Here is a first glimpse at our autonomous SWE agent "Project Padawan" -- in Japanese! (1/3)

Thomas Dohmke

29,687 views • 1 year ago

Introducing Warp 2.0: the Agentic Development Environment 1️⃣ Top overall coding agent: #1 on Terminal-Bench, 71% on SWE-bench Verified 2️⃣ Agent multi-threading: build features, debug, and ship all at once 3️⃣ The first all-in-one platform for agentic development 🧵 Learn more

Introducing Warp 2.0: the Agentic Development Environment 1️⃣ Top overall coding agent: #1 on Terminal-Bench, 71% on SWE-bench Verified 2️⃣ Agent multi-threading: build features, debug, and ship all at once 3️⃣ The first all-in-one platform for agentic development 🧵 Learn more

Warp

1,445,717 views • 1 year ago

🚀New Amazon Q Developer agent for software development is available to customers: This agent is based on a new agent architecture that has exciting results coming from the SWE-bench scores (on the full and verified benchmarks) representing AI models’ ability to resolve real-world coding problems. Interesting aspect of Q Agent is that with these newest updates, Q drove nearly 50% more successful coding tasks completed. What makes Q Dev Agent remarkable? The agent architecture is not just about using the best LLMs (which we do), but also giving the agent the ability to constantly explore multiple paths to find the best way to resolve a particular problem (and back tracking when it has reached dead end like a developer would do). Needless to say, we are just getting started on the developer agent and we are constantly pushing to advance our AI capabilities while maintaining quality, security, privacy, and reliability to keep Amazon Q Developer an innovative and trusted option available to our customers using agents for software development. We highlighted the results of our first SWE-bench submission of Amazon Q Developer back in June blog post; with these updates, our new agent resolves 51% more coding tasks than its previous iteration on the SWE-bench verified dataset, and 43% more on the full dataset. That’s the difference a few months make, and I can’t wait to share what our teams will deliver at re:Invent this December. Here's a quick demo showcasing our new Agent in action:

🚀New Amazon Q Developer agent for software development is available to customers: This agent is based on a new agent architecture that has exciting results coming from the SWE-bench scores (on the full and verified benchmarks) representing AI models’ ability to resolve real-world coding problems. Interesting aspect of Q Agent is that with these newest updates, Q drove nearly 50% more successful coding tasks completed. What makes Q Dev Agent remarkable? The agent architecture is not just about using the best LLMs (which we do), but also giving the agent the ability to constantly explore multiple paths to find the best way to resolve a particular problem (and back tracking when it has reached dead end like a developer would do). Needless to say, we are just getting started on the developer agent and we are constantly pushing to advance our AI capabilities while maintaining quality, security, privacy, and reliability to keep Amazon Q Developer an innovative and trusted option available to our customers using agents for software development. We highlighted the results of our first SWE-bench submission of Amazon Q Developer back in June blog post; with these updates, our new agent resolves 51% more coding tasks than its previous iteration on the SWE-bench verified dataset, and 43% more on the full dataset. That’s the difference a few months make, and I can’t wait to share what our teams will deliver at re:Invent this December. Here's a quick demo showcasing our new Agent in action:

Swami Sivasubramanian

28,946 views • 1 year ago

Wave 9 is here: a frontier model built for software engineering. Introducing our new family of models: SWE-1, SWE-1-lite, and SWE-1-mini. Based on internal evals, it has performance nearing that of frontier models from the foundation labs. Available now, only in Windsurf!

Wave 9 is here: a frontier model built for software engineering. Introducing our new family of models: SWE-1, SWE-1-lite, and SWE-1-mini. Based on internal evals, it has performance nearing that of frontier models from the foundation labs. Available now, only in Windsurf!

Devin Desktop

742,703 views • 1 year ago

Introducing Warp Code: a partner you can trust, from prompt to production 🤠 Warp Code features: 👉 Top coding agent: #1 on Terminal-Bench, #3 on SWE-bench Verified 👉 Code review panel 👉 Lightweight code editor 👉 Slash commands, and agent profiles

Introducing Warp Code: a partner you can trust, from prompt to production 🤠 Warp Code features: 👉 Top coding agent: #1 on Terminal-Bench, #3 on SWE-bench Verified 👉 Code review panel 👉 Lightweight code editor 👉 Slash commands, and agent profiles

Warp

112,813 views • 10 months ago

Excited to open source our world-model-harness! `wmh` makes it easy to go from agent traces -> faithful replication of your production environment Basically, an LLM pretends to be a Docker container but 5x faster Below is a comparison running 8 SWE-bench tasks

Excited to open source our world-model-harness! `wmh` makes it easy to go from agent traces -> faithful replication of your production environment Basically, an LLM pretends to be a Docker container but 5x faster Below is a comparison running 8 SWE-bench tasks

Silen Naihin

13,523 views • 25 days ago

How did we build & scale Replit Agent? Find out from a Replit AI engineer (and a good friend), James Austin ⠕ We chat about building Replit Agent, what makes a great SWE, and James' journey to becoming an AI engineer

How did we build & scale Replit Agent? Find out from a Replit AI engineer (and a good friend), James Austin ⠕ We chat about building Replit Agent, what makes a great SWE, and James' journey to becoming an AI engineer

matt palmer

26,471 views • 7 months ago

Introducing SWE-grep and SWE-grep-mini: Cognition’s model family for fast agentic search at >2,800 TPS. Surface the right files to your coding agent 20x faster. Now rolling out gradually to Windsurf users via the Fast Context subagent – or try it in our new playground!

Introducing SWE-grep and SWE-grep-mini: Cognition’s model family for fast agentic search at >2,800 TPS. Surface the right files to your coding agent 20x faster. Now rolling out gradually to Windsurf users via the Fast Context subagent – or try it in our new playground!

Cognition

663,797 views • 9 months ago

Today we’re releasing Ramp SWE-Bench: a private, production-grounded coding benchmark created from real engineering problems we've faced at Ramp.

Today we’re releasing Ramp SWE-Bench: a private, production-grounded coding benchmark created from real engineering problems we've faced at Ramp.

Ramp Labs

185,551 views • 1 month ago

$I'm delighted to say that Genie, our fully automated SWE agent is finally GA. It achieves a new SOTA score on the SWE-Lancer benchmark earning $88k with a 49% resolution rate – significantly out-earning any frontier reasoning model, for a fraction of the cost. Genie can work with you, or fully autonomously, and integrates with all of the tools developers love to use. Go give it a try on I'd love to hear any feedback!$

I'm delighted to say that Genie, our fully automated SWE agent is finally GA. It achieves a new SOTA score on the SWE-Lancer benchmark earning $88k with a 49% resolution rate – significantly out-earning any frontier reasoning model, for a fraction of the cost. Genie can work with you, or fully autonomously, and integrates with all of the tools developers love to use. Go give it a try on I'd love to hear any feedback!

Alistair

189,147 views • 1 year ago

Warp Code launched yesterday — here’s what’s new: - Top coding agent: #3 SWE-bench, 52% Terminal-Bench - Built-in code review - Native editor - Slash Commands, Project Rules, and more We’re already seeing millions more lines of code shipped through Warp.

Warp Code launched yesterday — here’s what’s new: - Top coding agent: #3 SWE-bench, 52% Terminal-Bench - Built-in code review - Native editor - Slash Commands, Project Rules, and more We’re already seeing millions more lines of code shipped through Warp.

Warp

11,745 views • 10 months ago

Introducing Fast Deep Coder by NinjaTech AI - a coding agent with its own cloud VM ‣ Sonnet-level capability (69.6% SWE-bench) ‣ 5–10x faster using Qwen3 Coder on Cerebras ‣ Runs in a persistent cloud VM, freeing up your laptop ‣ Native GitHub integration Try it here:

Introducing Fast Deep Coder by NinjaTech AI - a coding agent with its own cloud VM ‣ Sonnet-level capability (69.6% SWE-bench) ‣ 5–10x faster using Qwen3 Coder on Cerebras ‣ Runs in a persistent cloud VM, freeing up your laptop ‣ Native GitHub integration Try it here:

Cerebras

21,091 views • 11 months ago

I'm excited to share that we've built the world's most capable AI software engineer, achieving 30.08% on SWE-Bench – ahead of Amazon and Cognition. This model is so much more than a benchmark score: it was trained from the start to think and behave like a human SWE.

I'm excited to share that we've built the world's most capable AI software engineer, achieving 30.08% on SWE-Bench – ahead of Amazon and Cognition. This model is so much more than a benchmark score: it was trained from the start to think and behave like a human SWE.

Alistair

820,161 views • 1 year ago

Cursor, Windsurf … all cool. But @AugmentCode just dropped `Augment Agent` 200K context. Persistent memory. Deep tool integrations. ... and it just hit #1 on SWE-bench Verified (65.4% on real tasks). It’s kind of a big deal. Let me show you 🧵 ↓

Cursor, Windsurf … all cool. But @AugmentCode just dropped `Augment Agent` 200K context. Persistent memory. Deep tool integrations. ... and it just hit #1 on SWE-bench Verified (65.4% on real tasks). It’s kind of a big deal. Let me show you 🧵 ↓

Charly Wargnier

63,421 views • 1 year ago