正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

SWE-Agent is an open-source software engineering agent with a 12.3% resolve rate on SWE-Bench! Check out SWE-agent in action at Repo:

carlos

1,348 subscribers

138,703 次观看 • 2 年前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

10 条评论

carlos 的头像

carlos2 年前

The SWE-agent open-source repository provides a framework for turning general LMs into software engineering agents. SWE-agent lets LMs like GPT-4 interact with their own Docker container using an Agent Computer Interface (ACI) - allowing it to browse, search, edit, and run code.

carlos 的头像

carlos2 年前

It’s been amazing to work on this with such a great team: @jyangballin*, @_carlosejimenez*, @_awettig, @ShunyuYao12, @karthik_r_n, and @OfirPress Keep an eye out for the paper coming out April 10th!

MindBranches 的头像

MindBranches2 年前

Here is a overview of the new open source Software Engineering Agent (SWE-Agent):

Elman Mansimov 的头像

Elman Mansimov2 年前

@OfirPress Very cool!

Eddie Forson 的头像

Eddie Forson2 年前

Nice work! Already starred the repo. Will take a look at the code in the next few days 👌🏿

Harry Tormey 🇮🇪 | 🇺🇸| 🇺🇦 的头像

Harry Tormey 🇮🇪 | 🇺🇸| 🇺🇦2 年前

Amazing work with SWE-Bench and now this! Thanks for publishing your research and the source to this!.

Konisberg Heinrich 的头像

Konisberg Heinrich2 年前

Incredible!

Gopinathan A 的头像

Gopinathan A2 年前

Nice work!!

Hadi 的头像

Hadi2 年前

Does it just work with anthropic and openai models right now?

Owen Campbell-Moore ✪ 的头像

Owen Campbell-Moore ✪2 年前

(OpenAI PM here!) Super cool! I'm curious, what changes to our models or APIs would have made this easier to build or would make it work better? 🤔

相关视频

"You'll see this in a big way with the software engineering agent." He points to SWE agents, where 50¢ of compute can yield "$500 / $5,000 of work." The SWE agent will be the first major impact that will make people think about the economy with shock.

"You'll see this in a big way with the software engineering agent." He points to SWE agents, where 50¢ of compute can yield "$500 / $5,000 of work." The SWE agent will be the first major impact that will make people think about the economy with shock.

Chubby♨️

111,704 次观看 • 1 年前

MiniMax M2.1 was officially announced, scoring 72.5% on SWE-multilingual. "A SOTA 10B-activated OSS coding & agent model, scoring 72.5% on SWE-multilingual and 88.6% on our newly open-sourced VIBE-bench"

MiniMax M2.1 was officially announced, scoring 72.5% on SWE-multilingual. "A SOTA 10B-activated OSS coding & agent model, scoring 72.5% on SWE-multilingual and 88.6% on our newly open-sourced VIBE-bench"

TestingCatalog News 🗞

16,235 次观看 • 5 个月前

In <4 months, Ridges used an open, incentivized tournament to produce the top open-source coding agent. • It outperforms all open-source agentic systems on SWE-Bench • Beats closed models from billion-dollar labs • Was done with $0 VC funding

In <4 months, Ridges used an open, incentivized tournament to produce the top open-source coding agent. • It outperforms all open-source agentic systems on SWE-Bench • Beats closed models from billion-dollar labs • Was done with $0 VC funding

Sami Kassab

58,074 次观看 • 8 个月前

GitHub Copilot just got an upgrade 🤖 Agent mode is now available in preview and Copilot Edits is generally available, both in Visual Studio Code. Plus you can check out a sneak peek at our SWE agent 👀

GitHub Copilot just got an upgrade 🤖 Agent mode is now available in preview and Copilot Edits is generally available, both in Visual Studio Code. Plus you can check out a sneak peek at our SWE agent 👀

GitHub

110,497 次观看 • 1 年前

Ok Tokyo devs, lets do this thing. Here is a first glimpse at our autonomous SWE agent "Project Padawan" -- in Japanese! (1/3)

Ok Tokyo devs, lets do this thing. Here is a first glimpse at our autonomous SWE agent "Project Padawan" -- in Japanese! (1/3)

Thomas Dohmke

29,687 次观看 • 1 年前

Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is an autonomous agent that solves engineering tasks through the use of its own shell, code editor, and web browser. When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted. Check out what Devin can do in the thread below.

Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is an autonomous agent that solves engineering tasks through the use of its own shell, code editor, and web browser. When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted. Check out what Devin can do in the thread below.

Cognition

31,438,774 次观看 • 2 年前

Introducing Warp 2.0: the Agentic Development Environment 1️⃣ Top overall coding agent: #1 on Terminal-Bench, 71% on SWE-bench Verified 2️⃣ Agent multi-threading: build features, debug, and ship all at once 3️⃣ The first all-in-one platform for agentic development 🧵 Learn more

Introducing Warp 2.0: the Agentic Development Environment 1️⃣ Top overall coding agent: #1 on Terminal-Bench, 71% on SWE-bench Verified 2️⃣ Agent multi-threading: build features, debug, and ship all at once 3️⃣ The first all-in-one platform for agentic development 🧵 Learn more

Warp

1,445,551 次观看 • 11 个月前

🚀New Amazon Q Developer agent for software development is available to customers: This agent is based on a new agent architecture that has exciting results coming from the SWE-bench scores (on the full and verified benchmarks) representing AI models’ ability to resolve real-world coding problems. Interesting aspect of Q Agent is that with these newest updates, Q drove nearly 50% more successful coding tasks completed. What makes Q Dev Agent remarkable? The agent architecture is not just about using the best LLMs (which we do), but also giving the agent the ability to constantly explore multiple paths to find the best way to resolve a particular problem (and back tracking when it has reached dead end like a developer would do). Needless to say, we are just getting started on the developer agent and we are constantly pushing to advance our AI capabilities while maintaining quality, security, privacy, and reliability to keep Amazon Q Developer an innovative and trusted option available to our customers using agents for software development. We highlighted the results of our first SWE-bench submission of Amazon Q Developer back in June blog post; with these updates, our new agent resolves 51% more coding tasks than its previous iteration on the SWE-bench verified dataset, and 43% more on the full dataset. That’s the difference a few months make, and I can’t wait to share what our teams will deliver at re:Invent this December. Here's a quick demo showcasing our new Agent in action:

🚀New Amazon Q Developer agent for software development is available to customers: This agent is based on a new agent architecture that has exciting results coming from the SWE-bench scores (on the full and verified benchmarks) representing AI models’ ability to resolve real-world coding problems. Interesting aspect of Q Agent is that with these newest updates, Q drove nearly 50% more successful coding tasks completed. What makes Q Dev Agent remarkable? The agent architecture is not just about using the best LLMs (which we do), but also giving the agent the ability to constantly explore multiple paths to find the best way to resolve a particular problem (and back tracking when it has reached dead end like a developer would do). Needless to say, we are just getting started on the developer agent and we are constantly pushing to advance our AI capabilities while maintaining quality, security, privacy, and reliability to keep Amazon Q Developer an innovative and trusted option available to our customers using agents for software development. We highlighted the results of our first SWE-bench submission of Amazon Q Developer back in June blog post; with these updates, our new agent resolves 51% more coding tasks than its previous iteration on the SWE-bench verified dataset, and 43% more on the full dataset. That’s the difference a few months make, and I can’t wait to share what our teams will deliver at re:Invent this December. Here's a quick demo showcasing our new Agent in action:

Swami Sivasubramanian

28,946 次观看 • 1 年前

Wave 9 is here: a frontier model built for software engineering. Introducing our new family of models: SWE-1, SWE-1-lite, and SWE-1-mini. Based on internal evals, it has performance nearing that of frontier models from the foundation labs. Available now, only in Windsurf!

Wave 9 is here: a frontier model built for software engineering. Introducing our new family of models: SWE-1, SWE-1-lite, and SWE-1-mini. Based on internal evals, it has performance nearing that of frontier models from the foundation labs. Available now, only in Windsurf!

Devin Desktop

742,599 次观看 • 1 年前

BREAKING 🚨: Github Copilot got an agent mode on VSCode, and the SWE agent integrated straight into github 👀 There you can assign an issue to Copilot, review PR and profit! 🤯

BREAKING 🚨: Github Copilot got an agent mode on VSCode, and the SWE agent integrated straight into github 👀 There you can assign an issue to Copilot, review PR and profit! 🤯

TestingCatalog News 🗞

40,928 次观看 • 1 年前

Introducing Warp Code: a partner you can trust, from prompt to production 🤠 Warp Code features: 👉 Top coding agent: #1 on Terminal-Bench, #3 on SWE-bench Verified 👉 Code review panel 👉 Lightweight code editor 👉 Slash commands, and agent profiles

Introducing Warp Code: a partner you can trust, from prompt to production 🤠 Warp Code features: 👉 Top coding agent: #1 on Terminal-Bench, #3 on SWE-bench Verified 👉 Code review panel 👉 Lightweight code editor 👉 Slash commands, and agent profiles

Warp

112,732 次观看 • 9 个月前

How did we build & scale Replit Agent? Find out from a Replit AI engineer (and a good friend), James Austin ⠕ We chat about building Replit Agent, what makes a great SWE, and James' journey to becoming an AI engineer

How did we build & scale Replit Agent? Find out from a Replit AI engineer (and a good friend), James Austin ⠕ We chat about building Replit Agent, what makes a great SWE, and James' journey to becoming an AI engineer

matt palmer

26,222 次观看 • 6 个月前

Introducing SWE-grep and SWE-grep-mini: Cognition’s model family for fast agentic search at >2,800 TPS. Surface the right files to your coding agent 20x faster. Now rolling out gradually to Windsurf users via the Fast Context subagent – or try it in our new playground!

Introducing SWE-grep and SWE-grep-mini: Cognition’s model family for fast agentic search at >2,800 TPS. Surface the right files to your coding agent 20x faster. Now rolling out gradually to Windsurf users via the Fast Context subagent – or try it in our new playground!

Cognition

663,379 次观看 • 8 个月前

Today we’re releasing Ramp SWE-Bench: a private, production-grounded coding benchmark created from real engineering problems we've faced at Ramp.

Today we’re releasing Ramp SWE-Bench: a private, production-grounded coding benchmark created from real engineering problems we've faced at Ramp.

Ramp Labs

165,111 次观看 • 3 天前

$I'm delighted to say that Genie, our fully automated SWE agent is finally GA. It achieves a new SOTA score on the SWE-Lancer benchmark earning $88k with a 49% resolution rate – significantly out-earning any frontier reasoning model, for a fraction of the cost. Genie can work with you, or fully autonomously, and integrates with all of the tools developers love to use. Go give it a try on I'd love to hear any feedback!$

I'm delighted to say that Genie, our fully automated SWE agent is finally GA. It achieves a new SOTA score on the SWE-Lancer benchmark earning $88k with a 49% resolution rate – significantly out-earning any frontier reasoning model, for a fraction of the cost. Genie can work with you, or fully autonomously, and integrates with all of the tools developers love to use. Go give it a try on I'd love to hear any feedback!

Alistair

189,147 次观看 • 1 年前

Warp Code launched yesterday — here’s what’s new: - Top coding agent: #3 SWE-bench, 52% Terminal-Bench - Built-in code review - Native editor - Slash Commands, Project Rules, and more We’re already seeing millions more lines of code shipped through Warp.

Warp Code launched yesterday — here’s what’s new: - Top coding agent: #3 SWE-bench, 52% Terminal-Bench - Built-in code review - Native editor - Slash Commands, Project Rules, and more We’re already seeing millions more lines of code shipped through Warp.

Warp

11,745 次观看 • 9 个月前

Introducing Fast Deep Coder by NinjaTech AI - a coding agent with its own cloud VM ‣ Sonnet-level capability (69.6% SWE-bench) ‣ 5–10x faster using Qwen3 Coder on Cerebras ‣ Runs in a persistent cloud VM, freeing up your laptop ‣ Native GitHub integration Try it here:

Introducing Fast Deep Coder by NinjaTech AI - a coding agent with its own cloud VM ‣ Sonnet-level capability (69.6% SWE-bench) ‣ 5–10x faster using Qwen3 Coder on Cerebras ‣ Runs in a persistent cloud VM, freeing up your laptop ‣ Native GitHub integration Try it here:

Cerebras

21,091 次观看 • 9 个月前

I'm excited to share that we've built the world's most capable AI software engineer, achieving 30.08% on SWE-Bench – ahead of Amazon and Cognition. This model is so much more than a benchmark score: it was trained from the start to think and behave like a human SWE.

I'm excited to share that we've built the world's most capable AI software engineer, achieving 30.08% on SWE-Bench – ahead of Amazon and Cognition. This model is so much more than a benchmark score: it was trained from the start to think and behave like a human SWE.

Alistair

819,997 次观看 • 1 年前

Cursor, Windsurf … all cool. But @AugmentCode just dropped `Augment Agent` 200K context. Persistent memory. Deep tool integrations. ... and it just hit #1 on SWE-bench Verified (65.4% on real tasks). It’s kind of a big deal. Let me show you 🧵 ↓

Cursor, Windsurf … all cool. But @AugmentCode just dropped `Augment Agent` 200K context. Persistent memory. Deep tool integrations. ... and it just hit #1 on SWE-bench Verified (65.4% on real tasks). It’s kind of a big deal. Let me show you 🧵 ↓

Charly Wargnier

63,321 次观看 • 1 年前

📣 Introducing SWE-PolyBench: A new open-source multilingual benchmark for evaluating #AI coding agents SWE-PolyBench is the first benchmark to evaluate AI coding agents' ability to understand complex codebases, helping advance AI performance in the real world. Learn more. 👉

📣 Introducing SWE-PolyBench: A new open-source multilingual benchmark for evaluating #AI coding agents SWE-PolyBench is the first benchmark to evaluate AI coding agents' ability to understand complex codebases, helping advance AI performance in the real world. Learn more. 👉

Amazon Web Services

10,866 次观看 • 1 年前