Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

SWE-Agent is an open-source software engineering agent with a 12.3% resolve rate on SWE-Bench! Check out SWE-agent in action at Repo:

carlos

1,348 subscribers

138,703 Aufrufe • vor 2 Jahren •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von carlos

carlosvor 2 Jahren

The SWE-agent open-source repository provides a framework for turning general LMs into software engineering agents. SWE-agent lets LMs like GPT-4 interact with their own Docker container using an Agent Computer Interface (ACI) - allowing it to browse, search, edit, and run code.

Profilbild von carlos

carlosvor 2 Jahren

It’s been amazing to work on this with such a great team: @jyangballin*, @_carlosejimenez*, @_awettig, @ShunyuYao12, @karthik_r_n, and @OfirPress Keep an eye out for the paper coming out April 10th!

Profilbild von MindBranches

MindBranchesvor 2 Jahren

Here is a overview of the new open source Software Engineering Agent (SWE-Agent):

Profilbild von Elman Mansimov

Elman Mansimovvor 2 Jahren

@OfirPress Very cool!

Profilbild von Eddie Forson

Eddie Forsonvor 2 Jahren

Nice work! Already starred the repo. Will take a look at the code in the next few days 👌🏿

Profilbild von Harry Tormey 🇮🇪 | 🇺🇸| 🇺🇦

Harry Tormey 🇮🇪 | 🇺🇸| 🇺🇦vor 2 Jahren

Amazing work with SWE-Bench and now this! Thanks for publishing your research and the source to this!.

Profilbild von Konisberg Heinrich

Konisberg Heinrichvor 2 Jahren

Incredible!

Profilbild von Gopinathan A

Gopinathan Avor 2 Jahren

Nice work!!

Profilbild von Hadi

Hadivor 2 Jahren

Does it just work with anthropic and openai models right now?

Profilbild von Owen Campbell-Moore ✪

Owen Campbell-Moore ✪vor 2 Jahren

(OpenAI PM here!) Super cool! I'm curious, what changes to our models or APIs would have made this easier to build or would make it work better? 🤔

Ähnliche Videos

"You'll see this in a big way with the software engineering agent." He points to SWE agents, where 50¢ of compute can yield "$500 / $5,000 of work." The SWE agent will be the first major impact that will make people think about the economy with shock.

"You'll see this in a big way with the software engineering agent." He points to SWE agents, where 50¢ of compute can yield "$500 / $5,000 of work." The SWE agent will be the first major impact that will make people think about the economy with shock.

Chubby♨️

111,704 Aufrufe • vor 1 Jahr

MiniMax M2.1 was officially announced, scoring 72.5% on SWE-multilingual. "A SOTA 10B-activated OSS coding & agent model, scoring 72.5% on SWE-multilingual and 88.6% on our newly open-sourced VIBE-bench"

MiniMax M2.1 was officially announced, scoring 72.5% on SWE-multilingual. "A SOTA 10B-activated OSS coding & agent model, scoring 72.5% on SWE-multilingual and 88.6% on our newly open-sourced VIBE-bench"

TestingCatalog News 🗞

16,235 Aufrufe • vor 5 Monaten

In <4 months, Ridges used an open, incentivized tournament to produce the top open-source coding agent. • It outperforms all open-source agentic systems on SWE-Bench • Beats closed models from billion-dollar labs • Was done with $0 VC funding

In <4 months, Ridges used an open, incentivized tournament to produce the top open-source coding agent. • It outperforms all open-source agentic systems on SWE-Bench • Beats closed models from billion-dollar labs • Was done with $0 VC funding

Sami Kassab

58,074 Aufrufe • vor 8 Monaten

GitHub Copilot just got an upgrade 🤖 Agent mode is now available in preview and Copilot Edits is generally available, both in Visual Studio Code. Plus you can check out a sneak peek at our SWE agent 👀

GitHub Copilot just got an upgrade 🤖 Agent mode is now available in preview and Copilot Edits is generally available, both in Visual Studio Code. Plus you can check out a sneak peek at our SWE agent 👀

GitHub

110,497 Aufrufe • vor 1 Jahr

Ok Tokyo devs, lets do this thing. Here is a first glimpse at our autonomous SWE agent "Project Padawan" -- in Japanese! (1/3)

Ok Tokyo devs, lets do this thing. Here is a first glimpse at our autonomous SWE agent "Project Padawan" -- in Japanese! (1/3)

Thomas Dohmke

29,687 Aufrufe • vor 1 Jahr

Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is an autonomous agent that solves engineering tasks through the use of its own shell, code editor, and web browser. When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted. Check out what Devin can do in the thread below.

Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is an autonomous agent that solves engineering tasks through the use of its own shell, code editor, and web browser. When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted. Check out what Devin can do in the thread below.

Cognition

31,438,184 Aufrufe • vor 2 Jahren

Introducing Warp 2.0: the Agentic Development Environment 1️⃣ Top overall coding agent: #1 on Terminal-Bench, 71% on SWE-bench Verified 2️⃣ Agent multi-threading: build features, debug, and ship all at once 3️⃣ The first all-in-one platform for agentic development 🧵 Learn more

Introducing Warp 2.0: the Agentic Development Environment 1️⃣ Top overall coding agent: #1 on Terminal-Bench, 71% on SWE-bench Verified 2️⃣ Agent multi-threading: build features, debug, and ship all at once 3️⃣ The first all-in-one platform for agentic development 🧵 Learn more

Warp

1,445,551 Aufrufe • vor 11 Monaten

🚀New Amazon Q Developer agent for software development is available to customers: This agent is based on a new agent architecture that has exciting results coming from the SWE-bench scores (on the full and verified benchmarks) representing AI models’ ability to resolve real-world coding problems. Interesting aspect of Q Agent is that with these newest updates, Q drove nearly 50% more successful coding tasks completed. What makes Q Dev Agent remarkable? The agent architecture is not just about using the best LLMs (which we do), but also giving the agent the ability to constantly explore multiple paths to find the best way to resolve a particular problem (and back tracking when it has reached dead end like a developer would do). Needless to say, we are just getting started on the developer agent and we are constantly pushing to advance our AI capabilities while maintaining quality, security, privacy, and reliability to keep Amazon Q Developer an innovative and trusted option available to our customers using agents for software development. We highlighted the results of our first SWE-bench submission of Amazon Q Developer back in June blog post; with these updates, our new agent resolves 51% more coding tasks than its previous iteration on the SWE-bench verified dataset, and 43% more on the full dataset. That’s the difference a few months make, and I can’t wait to share what our teams will deliver at re:Invent this December. Here's a quick demo showcasing our new Agent in action:

🚀New Amazon Q Developer agent for software development is available to customers: This agent is based on a new agent architecture that has exciting results coming from the SWE-bench scores (on the full and verified benchmarks) representing AI models’ ability to resolve real-world coding problems. Interesting aspect of Q Agent is that with these newest updates, Q drove nearly 50% more successful coding tasks completed. What makes Q Dev Agent remarkable? The agent architecture is not just about using the best LLMs (which we do), but also giving the agent the ability to constantly explore multiple paths to find the best way to resolve a particular problem (and back tracking when it has reached dead end like a developer would do). Needless to say, we are just getting started on the developer agent and we are constantly pushing to advance our AI capabilities while maintaining quality, security, privacy, and reliability to keep Amazon Q Developer an innovative and trusted option available to our customers using agents for software development. We highlighted the results of our first SWE-bench submission of Amazon Q Developer back in June blog post; with these updates, our new agent resolves 51% more coding tasks than its previous iteration on the SWE-bench verified dataset, and 43% more on the full dataset. That’s the difference a few months make, and I can’t wait to share what our teams will deliver at re:Invent this December. Here's a quick demo showcasing our new Agent in action:

Swami Sivasubramanian

28,946 Aufrufe • vor 1 Jahr

BREAKING 🚨: Github Copilot got an agent mode on VSCode, and the SWE agent integrated straight into github 👀 There you can assign an issue to Copilot, review PR and profit! 🤯

BREAKING 🚨: Github Copilot got an agent mode on VSCode, and the SWE agent integrated straight into github 👀 There you can assign an issue to Copilot, review PR and profit! 🤯

TestingCatalog News 🗞

40,928 Aufrufe • vor 1 Jahr

Wave 9 is here: a frontier model built for software engineering. Introducing our new family of models: SWE-1, SWE-1-lite, and SWE-1-mini. Based on internal evals, it has performance nearing that of frontier models from the foundation labs. Available now, only in Windsurf!

Wave 9 is here: a frontier model built for software engineering. Introducing our new family of models: SWE-1, SWE-1-lite, and SWE-1-mini. Based on internal evals, it has performance nearing that of frontier models from the foundation labs. Available now, only in Windsurf!

Devin Desktop

742,599 Aufrufe • vor 1 Jahr

Introducing Warp Code: a partner you can trust, from prompt to production 🤠 Warp Code features: 👉 Top coding agent: #1 on Terminal-Bench, #3 on SWE-bench Verified 👉 Code review panel 👉 Lightweight code editor 👉 Slash commands, and agent profiles

Introducing Warp Code: a partner you can trust, from prompt to production 🤠 Warp Code features: 👉 Top coding agent: #1 on Terminal-Bench, #3 on SWE-bench Verified 👉 Code review panel 👉 Lightweight code editor 👉 Slash commands, and agent profiles

Warp

112,732 Aufrufe • vor 9 Monaten

How did we build & scale Replit Agent? Find out from a Replit AI engineer (and a good friend), James Austin ⠕ We chat about building Replit Agent, what makes a great SWE, and James' journey to becoming an AI engineer

How did we build & scale Replit Agent? Find out from a Replit AI engineer (and a good friend), James Austin ⠕ We chat about building Replit Agent, what makes a great SWE, and James' journey to becoming an AI engineer

matt palmer

26,222 Aufrufe • vor 6 Monaten

Introducing SWE-grep and SWE-grep-mini: Cognition’s model family for fast agentic search at >2,800 TPS. Surface the right files to your coding agent 20x faster. Now rolling out gradually to Windsurf users via the Fast Context subagent – or try it in our new playground!

Introducing SWE-grep and SWE-grep-mini: Cognition’s model family for fast agentic search at >2,800 TPS. Surface the right files to your coding agent 20x faster. Now rolling out gradually to Windsurf users via the Fast Context subagent – or try it in our new playground!

Cognition

663,379 Aufrufe • vor 7 Monaten

$I'm delighted to say that Genie, our fully automated SWE agent is finally GA. It achieves a new SOTA score on the SWE-Lancer benchmark earning $88k with a 49% resolution rate – significantly out-earning any frontier reasoning model, for a fraction of the cost. Genie can work with you, or fully autonomously, and integrates with all of the tools developers love to use. Go give it a try on I'd love to hear any feedback!$

I'm delighted to say that Genie, our fully automated SWE agent is finally GA. It achieves a new SOTA score on the SWE-Lancer benchmark earning $88k with a 49% resolution rate – significantly out-earning any frontier reasoning model, for a fraction of the cost. Genie can work with you, or fully autonomously, and integrates with all of the tools developers love to use. Go give it a try on I'd love to hear any feedback!

Alistair

189,147 Aufrufe • vor 1 Jahr

Warp Code launched yesterday — here’s what’s new: - Top coding agent: #3 SWE-bench, 52% Terminal-Bench - Built-in code review - Native editor - Slash Commands, Project Rules, and more We’re already seeing millions more lines of code shipped through Warp.

Warp Code launched yesterday — here’s what’s new: - Top coding agent: #3 SWE-bench, 52% Terminal-Bench - Built-in code review - Native editor - Slash Commands, Project Rules, and more We’re already seeing millions more lines of code shipped through Warp.

Warp

11,745 Aufrufe • vor 9 Monaten

I'm excited to share that we've built the world's most capable AI software engineer, achieving 30.08% on SWE-Bench – ahead of Amazon and Cognition. This model is so much more than a benchmark score: it was trained from the start to think and behave like a human SWE.

I'm excited to share that we've built the world's most capable AI software engineer, achieving 30.08% on SWE-Bench – ahead of Amazon and Cognition. This model is so much more than a benchmark score: it was trained from the start to think and behave like a human SWE.

Alistair

819,975 Aufrufe • vor 1 Jahr

Introducing Fast Deep Coder by NinjaTech AI - a coding agent with its own cloud VM ‣ Sonnet-level capability (69.6% SWE-bench) ‣ 5–10x faster using Qwen3 Coder on Cerebras ‣ Runs in a persistent cloud VM, freeing up your laptop ‣ Native GitHub integration Try it here:

Introducing Fast Deep Coder by NinjaTech AI - a coding agent with its own cloud VM ‣ Sonnet-level capability (69.6% SWE-bench) ‣ 5–10x faster using Qwen3 Coder on Cerebras ‣ Runs in a persistent cloud VM, freeing up your laptop ‣ Native GitHub integration Try it here:

Cerebras

21,091 Aufrufe • vor 9 Monaten

Cursor, Windsurf … all cool. But @AugmentCode just dropped `Augment Agent` 200K context. Persistent memory. Deep tool integrations. ... and it just hit #1 on SWE-bench Verified (65.4% on real tasks). It’s kind of a big deal. Let me show you 🧵 ↓

Cursor, Windsurf … all cool. But @AugmentCode just dropped `Augment Agent` 200K context. Persistent memory. Deep tool integrations. ... and it just hit #1 on SWE-bench Verified (65.4% on real tasks). It’s kind of a big deal. Let me show you 🧵 ↓

Charly Wargnier

63,321 Aufrufe • vor 1 Jahr

📣 Introducing SWE-PolyBench: A new open-source multilingual benchmark for evaluating #AI coding agents SWE-PolyBench is the first benchmark to evaluate AI coding agents' ability to understand complex codebases, helping advance AI performance in the real world. Learn more. 👉

📣 Introducing SWE-PolyBench: A new open-source multilingual benchmark for evaluating #AI coding agents SWE-PolyBench is the first benchmark to evaluate AI coding agents' ability to understand complex codebases, helping advance AI performance in the real world. Learn more. 👉

Amazon Web Services

10,866 Aufrufe • vor 1 Jahr

CFO Sarah Friar revealed that OpenAI is working on: "Agentic Software Engineer — (A-SWE)" unlike current tools like Copilot, which only boost developers. A-SWE can build apps, handle pull requests, conduct QA, fix bugs, and write documentation.

CFO Sarah Friar revealed that OpenAI is working on: "Agentic Software Engineer — (A-SWE)" unlike current tools like Copilot, which only boost developers. A-SWE can build apps, handle pull requests, conduct QA, fix bugs, and write documentation.

Haider.

803,705 Aufrufe • vor 1 Jahr