正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is an autonomous agent that solves engineering tasks through... show more

Cognition

163,024 subscribers

31,439,969 次观看 • 2 年前 •via X (Twitter)

教育科学技术

Anya Rossi• Live Now

Private livecam show

9 条评论

Cognition 的头像

Cognition2 年前

1/4 Devin can learn how to use unfamiliar technologies.

Cognition 的头像

Cognition2 年前

2/4 Devin can contribute to mature production repositories.

Cognition 的头像

Cognition2 年前

3/4 Devin can train and fine tune its own AI models.

Cognition 的头像

Cognition2 年前

4/4 We even tried giving Devin real jobs on Upwork and it could do those too!

Cognition 的头像

Cognition2 年前

For more details on Devin, check out our blog post here: See Devin in action If you have any project ideas, drop them below and we'll forward them to Devin.

Cognition 的头像

Cognition2 年前

We'd like to thank all our supporters who have helped us get to where we are today, including @patrickc, @collision, @eladgil, @saranormous, Chris Re, @eglyman, @karimatiyeh, @bernhardsson, @t_xu, @FEhrsam, @foundersfund, and many more. If you’re excited to solve some of the world’s biggest problems and build AI that can reason, learn more about our team and apply to join us here.

Jack Forge 的头像

Jack Forge2 年前

Please sirs. My family needs to eat, sirs. Please stop development on this, sirs.

Robert Sterling 的头像

Robert Sterling2 年前

CS majors right now

steve 的头像

steve2 年前

we had a good run bros

相关视频

📣 Introducing SWE-PolyBench: A new open-source multilingual benchmark for evaluating #AI coding agents SWE-PolyBench is the first benchmark to evaluate AI coding agents' ability to understand complex codebases, helping advance AI performance in the real world. Learn more. 👉

📣 Introducing SWE-PolyBench: A new open-source multilingual benchmark for evaluating #AI coding agents SWE-PolyBench is the first benchmark to evaluate AI coding agents' ability to understand complex codebases, helping advance AI performance in the real world. Learn more. 👉

Amazon Web Services

10,866 次观看 • 1 年前

Announcing: The Devin Open Source Initiative! We’re now providing 500 free Devin Agent Compute Units each to a limited number of open-source projects. If you’re a maintainer of an open-source project, reach out at osi@cognition.ai. Below are some OSS contributions Devin has already helped to make:

Announcing: The Devin Open Source Initiative! We’re now providing 500 free Devin Agent Compute Units each to a limited number of open-source projects. If you’re a maintainer of an open-source project, reach out at [email protected]. Below are some OSS contributions Devin has already helped to make:

Cognition

55,479 次观看 • 1 年前

SWE-Agent is an open-source software engineering agent with a 12.3% resolve rate on SWE-Bench! Check out SWE-agent in action at Repo:

SWE-Agent is an open-source software engineering agent with a 12.3% resolve rate on SWE-Bench! Check out SWE-agent in action at Repo:

carlos

138,765 次观看 • 2 年前

Devin, the first AI software engineer, created by Cognition, acts as an "army of junior engineers." Cognition’s president Russell Kaplan discusses the goal to “promote” Devin to senior engineer, which could transform AI innovation.

Devin, the first AI software engineer, created by Cognition, acts as an "army of junior engineers." Cognition’s president Russell Kaplan discusses the goal to “promote” Devin to senior engineer, which could transform AI innovation.

Goldman Sachs

27,213 次观看 • 8 个月前

I'm excited to share that we've built the world's most capable AI software engineer, achieving 30.08% on SWE-Bench – ahead of Amazon and Cognition. This model is so much more than a benchmark score: it was trained from the start to think and behave like a human SWE.

I'm excited to share that we've built the world's most capable AI software engineer, achieving 30.08% on SWE-Bench – ahead of Amazon and Cognition. This model is so much more than a benchmark score: it was trained from the start to think and behave like a human SWE.

Alistair

820,015 次观看 • 1 年前

New episode with Scott Wu (Scott Wu), CEO and co-founder of Cognition, the company behind Devin Unlike other AI coding tools, Devin works like an autonomous engineer that you interact with through Slack, Linear, and GitHub, just like with a remote engineer. Each of Cognition's 15 engineers works with a team of Devins, and these Devins are writing about 25% of Cognition’s code today. By the end of the year, they expect over 50% of their code to be written by Devin. In our conversation, we discuss: 🔸 How Devin has evolved from a high school CS student to a strong junior engineer just in the past year 🔸 Why software engineering will shift from “bricklayers” to “architects” 🔸 Why AI tools will lead to *more* engineering jobs rather than fewer 🔸 The eight pivots Cognition went through before landing on their current approach 🔸 The cultural shifts required to successfully adopt AI engineers like Devin at your company 🔸 Much more Listen now 👇 • YouTube: • Spotify: • Apple: Thank you to our wonderful sponsors for supporting the podcast: 🏆 Enterpret — Transform customer feedback into product growth: 🏆 Paragon — Ship every SaaS integration your customers want: 🏆 Attio — The powerful, flexible CRM for fast-growing startups:

New episode with Scott Wu (Scott Wu), CEO and co-founder of Cognition, the company behind Devin Unlike other AI coding tools, Devin works like an autonomous engineer that you interact with through Slack, Linear, and GitHub, just like with a remote engineer. Each of Cognition's 15 engineers works with a team of Devins, and these Devins are writing about 25% of Cognition’s code today. By the end of the year, they expect over 50% of their code to be written by Devin. In our conversation, we discuss: 🔸 How Devin has evolved from a high school CS student to a strong junior engineer just in the past year 🔸 Why software engineering will shift from “bricklayers” to “architects” 🔸 Why AI tools will lead to more engineering jobs rather than fewer 🔸 The eight pivots Cognition went through before landing on their current approach 🔸 The cultural shifts required to successfully adopt AI engineers like Devin at your company 🔸 Much more Listen now 👇 • YouTube: • Spotify: • Apple: Thank you to our wonderful sponsors for supporting the podcast: 🏆 Enterpret — Transform customer feedback into product growth: 🏆 Paragon — Ship every SaaS integration your customers want: 🏆 Attio — The powerful, flexible CRM for fast-growing startups:

Lenny Rachitsky

249,352 次观看 • 1 年前

Software Engineers are in BIG TROUBLE Cognition introduced Devin, first AI software engineer, and it's insane It literally can: - Code entire projects - Complete Upwork jobs - Fix issues in large repos - Deploy in a sec Here's some wild examples: 1/5 Doing real jobs on Upwork

Software Engineers are in BIG TROUBLE Cognition introduced Devin, first AI software engineer, and it's insane It literally can: - Code entire projects - Complete Upwork jobs - Fix issues in large repos - Deploy in a sec Here's some wild examples: 1/5 Doing real jobs on Upwork

Rahul

225,349 次观看 • 2 年前

Nubank refactors millions of lines of code with Devin, reducing a large scale ETL migration from an estimated 1.5 year project to 2 months. Devin successfully delivered 12x efficiency improvement on engineering time, helping reduce the developer toil for Nubank engineers as they’ve scaled to 110+ million customers. See why Vitor Olivier, Nubank’s CTO, is excited about the future of software engineering:

Nubank refactors millions of lines of code with Devin, reducing a large scale ETL migration from an estimated 1.5 year project to 2 months. Devin successfully delivered 12x efficiency improvement on engineering time, helping reduce the developer toil for Nubank engineers as they’ve scaled to 110+ million customers. See why Vitor Olivier, Nubank’s CTO, is excited about the future of software engineering:

Cognition

47,179 次观看 • 1 年前

🚀New Amazon Q Developer agent for software development is available to customers: This agent is based on a new agent architecture that has exciting results coming from the SWE-bench scores (on the full and verified benchmarks) representing AI models’ ability to resolve real-world coding problems. Interesting aspect of Q Agent is that with these newest updates, Q drove nearly 50% more successful coding tasks completed. What makes Q Dev Agent remarkable? The agent architecture is not just about using the best LLMs (which we do), but also giving the agent the ability to constantly explore multiple paths to find the best way to resolve a particular problem (and back tracking when it has reached dead end like a developer would do). Needless to say, we are just getting started on the developer agent and we are constantly pushing to advance our AI capabilities while maintaining quality, security, privacy, and reliability to keep Amazon Q Developer an innovative and trusted option available to our customers using agents for software development. We highlighted the results of our first SWE-bench submission of Amazon Q Developer back in June blog post; with these updates, our new agent resolves 51% more coding tasks than its previous iteration on the SWE-bench verified dataset, and 43% more on the full dataset. That’s the difference a few months make, and I can’t wait to share what our teams will deliver at re:Invent this December. Here's a quick demo showcasing our new Agent in action:

🚀New Amazon Q Developer agent for software development is available to customers: This agent is based on a new agent architecture that has exciting results coming from the SWE-bench scores (on the full and verified benchmarks) representing AI models’ ability to resolve real-world coding problems. Interesting aspect of Q Agent is that with these newest updates, Q drove nearly 50% more successful coding tasks completed. What makes Q Dev Agent remarkable? The agent architecture is not just about using the best LLMs (which we do), but also giving the agent the ability to constantly explore multiple paths to find the best way to resolve a particular problem (and back tracking when it has reached dead end like a developer would do). Needless to say, we are just getting started on the developer agent and we are constantly pushing to advance our AI capabilities while maintaining quality, security, privacy, and reliability to keep Amazon Q Developer an innovative and trusted option available to our customers using agents for software development. We highlighted the results of our first SWE-bench submission of Amazon Q Developer back in June blog post; with these updates, our new agent resolves 51% more coding tasks than its previous iteration on the SWE-bench verified dataset, and 43% more on the full dataset. That’s the difference a few months make, and I can’t wait to share what our teams will deliver at re:Invent this December. Here's a quick demo showcasing our new Agent in action:

Swami Sivasubramanian

28,946 次观看 • 1 年前

Devin can manage a team of Devins. Managed Devins run in parallel help break down complex tasks. Each managed session is a full Devin, with its own VM, terminal, browser, and testing infrastructure. The main session coordinates, monitors, and compiles results.

Devin can manage a team of Devins. Managed Devins run in parallel help break down complex tasks. Each managed session is a full Devin, with its own VM, terminal, browser, and testing infrastructure. The main session coordinates, monitors, and compiles results.

Cognition

31,502 次观看 • 2 个月前

AI has changed software engineering more in the last 3 years than it has changed in the previous 30. What’s needed is not a debate about whether it’s going away—instead it’s a serious discussion about its future: What are the new primitives, techniques, and best practices for software engineering in the age of AI. That’s why I brought Scott Wu (Scott Wu) on AI & I. He’s the founder of Cognition, the company behind the world’s first autonomous AI coding agent, Devin. Cognition got to $73M ARR in less than 2 years—and they just acquired Windsurf to accelerate their growth. I had Scott on the show to talk about where the programming goes from here. We get into: - What the new tools and workflows are for AI engineers. In the near term, Scott sees software engineering defined by a spectrum of tools. At one end are AI features that speed up coding, like tab complete; at the other are agentic systems, like Devin, that can take on tasks independently. Until engineers can operate entirely at the higher layer of abstraction, he argues, both are essential. - Why Scott thinks AGI is already here. By the benchmarks of a decade ago—passing the Turing test, solving hard math problems, and operating agentically—AGI is already here. The line keeps moving, he argues, because humans constantly redefine work around what machines can’t yet do. - Why developers will turn into product architects. Scott sees the long-term future of software engineering as a steady climb up the ladder of abstraction. Just as programming went from assembly to languages like Python and JavaScript, he thinks the future is humans focusing on the product, while AI agents execute. - How Devin stacks up against Anthropic’s Claude Code. Scott credits Claude Code’s success to great product design and the models becoming capable enough to support autonomous workflows. But according to him, the CLI itself isn’t the breakthrough, it’s how a tool fits into a developer’s workflow. Claude Code’s paradigm is that the AI is you, taking the wheel of your computer, he says, while Devin is like the engineer sitting beside you: it runs in its own cloud environment, manages the repo, and improves over time at testing and refining code. This episode of Every 📧’s AI & I is a must-watch for anyone interested in the brass tacks of how AI changes the future of programming. Watch below! Timestamps: Introduction: 00:02:02 Why Scott thinks AGI is here: 00:02:32 Scott’s personal journey as a founder: 00:09:27 Why the fundamentals of computer science still matter: 00:16:55 How the future of programming will evolve: 00:22:30 A new workflow for the AI-first software engineer: 00:26:50 How Devin stacks up against Claude Code: 00:29:33 Reinforcement learning to build better coding agents: 00:40:05 What excites Scott about AI beyond Cognition: 00:50:05

AI has changed software engineering more in the last 3 years than it has changed in the previous 30. What’s needed is not a debate about whether it’s going away—instead it’s a serious discussion about its future: What are the new primitives, techniques, and best practices for software engineering in the age of AI. That’s why I brought Scott Wu (Scott Wu) on AI & I. He’s the founder of Cognition, the company behind the world’s first autonomous AI coding agent, Devin. Cognition got to $73M ARR in less than 2 years—and they just acquired Windsurf to accelerate their growth. I had Scott on the show to talk about where the programming goes from here. We get into: - What the new tools and workflows are for AI engineers. In the near term, Scott sees software engineering defined by a spectrum of tools. At one end are AI features that speed up coding, like tab complete; at the other are agentic systems, like Devin, that can take on tasks independently. Until engineers can operate entirely at the higher layer of abstraction, he argues, both are essential. - Why Scott thinks AGI is already here. By the benchmarks of a decade ago—passing the Turing test, solving hard math problems, and operating agentically—AGI is already here. The line keeps moving, he argues, because humans constantly redefine work around what machines can’t yet do. - Why developers will turn into product architects. Scott sees the long-term future of software engineering as a steady climb up the ladder of abstraction. Just as programming went from assembly to languages like Python and JavaScript, he thinks the future is humans focusing on the product, while AI agents execute. - How Devin stacks up against Anthropic’s Claude Code. Scott credits Claude Code’s success to great product design and the models becoming capable enough to support autonomous workflows. But according to him, the CLI itself isn’t the breakthrough, it’s how a tool fits into a developer’s workflow. Claude Code’s paradigm is that the AI is you, taking the wheel of your computer, he says, while Devin is like the engineer sitting beside you: it runs in its own cloud environment, manages the repo, and improves over time at testing and refining code. This episode of Every 📧’s AI & I is a must-watch for anyone interested in the brass tacks of how AI changes the future of programming. Watch below! Timestamps: Introduction: 00:02:02 Why Scott thinks AGI is here: 00:02:32 Scott’s personal journey as a founder: 00:09:27 Why the fundamentals of computer science still matter: 00:16:55 How the future of programming will evolve: 00:22:30 A new workflow for the AI-first software engineer: 00:26:50 How Devin stacks up against Claude Code: 00:29:33 Reinforcement learning to build better coding agents: 00:40:05 What excites Scott about AI beyond Cognition: 00:50:05

Dan Shipper 📧

34,753 次观看 • 9 个月前

Introducing ALE-Bench, ALE-Agent! Towards Automating Long-Horizon Algorithm Engineering for Hard Optimization Problems Blog: Paper: ALE-Bench is a coding benchmark primarily focused on hard optimization (NP-hard) problems. We developed this benchmark with AtCoder Inc., a leading coding contest platform company. What makes ALE-Bench unique is its focus on hard optimization problems that demand long-horizon and creative reasoning. It’s open-ended, in the sense that true optima are out of reach (NP-hard) and scores can continuously improve. We believe this benchmark has the potential to become one of the key benchmarks for reasoning and coding in the next generation. ALE-Agent is our end-to-end agent that we specifically designed for this challenging domain. In fact, our ALE-Agent has already built an impressive track record in the wild! In May 2025, our agent participated in a live AtCoder Heuristic Competition (AHC), alongside 1,000 other participants in real-time. AHC is considered to be one of the most challenging coding competitions in this domain. Our ALE-Agent achieved an impressive ranking of 21st out of 1,000 human participants in the competition (top 2%), marking a turning point for AI discovery of solutions to hard optimization problems with a wide spectrum of important real world applications such as logistics, routing, packing, factory production planning, power-grid balancing. We look forward to applying this technology to real industrial optimization opportunities. Building on the insights from this study, Sakana AI will continue to tackle the challenge of developing AI with even greater algorithm engineering capabilities. ALE-Bench Dataset: ALE-Bench Code: This research was conducted in collaboration with AtCoder Inc. (AtCoder). We are deeply grateful for their outstanding expertise and contributions in optimization and algorithms, which were invaluable in providing data, analyzing results, and enabling our AI agent’s participation in their contests.

Introducing ALE-Bench, ALE-Agent! Towards Automating Long-Horizon Algorithm Engineering for Hard Optimization Problems Blog: Paper: ALE-Bench is a coding benchmark primarily focused on hard optimization (NP-hard) problems. We developed this benchmark with AtCoder Inc., a leading coding contest platform company. What makes ALE-Bench unique is its focus on hard optimization problems that demand long-horizon and creative reasoning. It’s open-ended, in the sense that true optima are out of reach (NP-hard) and scores can continuously improve. We believe this benchmark has the potential to become one of the key benchmarks for reasoning and coding in the next generation. ALE-Agent is our end-to-end agent that we specifically designed for this challenging domain. In fact, our ALE-Agent has already built an impressive track record in the wild! In May 2025, our agent participated in a live AtCoder Heuristic Competition (AHC), alongside 1,000 other participants in real-time. AHC is considered to be one of the most challenging coding competitions in this domain. Our ALE-Agent achieved an impressive ranking of 21st out of 1,000 human participants in the competition (top 2%), marking a turning point for AI discovery of solutions to hard optimization problems with a wide spectrum of important real world applications such as logistics, routing, packing, factory production planning, power-grid balancing. We look forward to applying this technology to real industrial optimization opportunities. Building on the insights from this study, Sakana AI will continue to tackle the challenge of developing AI with even greater algorithm engineering capabilities. ALE-Bench Dataset: ALE-Bench Code: This research was conducted in collaboration with AtCoder Inc. (AtCoder). We are deeply grateful for their outstanding expertise and contributions in optimization and algorithms, which were invaluable in providing data, analyzing results, and enabling our AI agent’s participation in their contests.

Sakana AI

237,195 次观看 • 1 年前

Introducing NEO: The first Autonomous Machine Learning Engineer. NEO is a multi-agent system that automates the entire ML workflow, saving engineers thousands of hours of grunt work. Our recent advancements in multi-step reasoning and memory orchestration have enabled NEO to excel at solving complex Machine Learning problems on its own from data engineering to deployments of ML models. Benchmarks: NEO was tested against 50 Kaggle competitions and scored a medal in 26% of them, far superior than the previous state of the art performance of 16.9% by Open AI’s O1 with AIDE scaffolding on the MLE bench With these capabilities, NEO aims to help every ML engineer become superhuman and focus on real innovation instead of the existing grunt work Check out NEO in the thread below:

Introducing NEO: The first Autonomous Machine Learning Engineer. NEO is a multi-agent system that automates the entire ML workflow, saving engineers thousands of hours of grunt work. Our recent advancements in multi-step reasoning and memory orchestration have enabled NEO to excel at solving complex Machine Learning problems on its own from data engineering to deployments of ML models. Benchmarks: NEO was tested against 50 Kaggle competitions and scored a medal in 26% of them, far superior than the previous state of the art performance of 16.9% by Open AI’s O1 with AIDE scaffolding on the MLE bench With these capabilities, NEO aims to help every ML engineer become superhuman and focus on real innovation instead of the existing grunt work Check out NEO in the thread below:

Neo AI

321,099 次观看 • 1 年前

Scott Wu (Scott Wu) runs Cognition, the team behind Devin, an AI software engineer built on Claude. He wants to make building software 10x faster for every engineering team:

Scott Wu (Scott Wu) runs Cognition, the team behind Devin, an AI software engineer built on Claude. He wants to make building software 10x faster for every engineering team:

Claude

565,377 次观看 • 1 个月前

The terminal hasn’t changed much since the 1970s. What you do with it has. Introducing Devin for Terminal: everything we learned building Devin, now as a local agent, available right in your shell. And when your work outgrows your laptop, hand it off to the cloud.

The terminal hasn’t changed much since the 1970s. What you do with it has. Introducing Devin for Terminal: everything we learned building Devin, now as a local agent, available right in your shell. And when your work outgrows your laptop, hand it off to the cloud.

Cognition

389,087 次观看 • 1 个月前

1/5 Devin is built to collaborate with engineering teams and starts at $500/month. Here’s how some of the best teams are using Devin today:

1/5 Devin is built to collaborate with engineering teams and starts at $500/month. Here’s how some of the best teams are using Devin today:

Cognition

245,312 次观看 • 1 年前

.Kura AI (YC S24) just built the new state-of-the-art for Browser Agents. It scored a world-first 87% on the WebVoyager benchmark, beating Claude's Computer Use demo by 31% and the previous SOTA by 14%. Congrats on the launch, Ronit Basu + Darren Hwang!

.Kura AI (YC S24) just built the new state-of-the-art for Browser Agents. It scored a world-first 87% on the WebVoyager benchmark, beating Claude's Computer Use demo by 31% and the previous SOTA by 14%. Congrats on the launch, Ronit Basu + Darren Hwang!

Y Combinator

56,948 次观看 • 1 年前

I paid the $500 for Devin, the mega hyped AI coding agent, so you don't have to here's an in depth review of how it compares to Cursor:

I paid the $500 for Devin, the mega hyped AI coding agent, so you don't have to here's an in depth review of how it compares to Cursor:

Steve (Builder.io)

368,028 次观看 • 1 年前

Agent Trace: Capturing the Context Graph of Code We are delighted to collaborate with Cursor, OpenCode, Vercel, Jules, Amp, Cloudflare, and Sasha Varlamov in an open standard for mapping back code:context. here's how we see the potential of code context graphs and the new era of better tooling and better agents it enables. (yes the following is vibe-videoed with Remotion's Skill and Windsurf is now Devin Desktop, 100% ai edits incl audio)

Agent Trace: Capturing the Context Graph of Code We are delighted to collaborate with Cursor, OpenCode, Vercel, Jules, Amp, Cloudflare, and Sasha Varlamov in an open standard for mapping back code:context. here's how we see the potential of code context graphs and the new era of better tooling and better agents it enables. (yes the following is vibe-videoed with Remotion's Skill and Windsurf is now Devin Desktop, 100% ai edits incl audio)

Cognition

39,670 次观看 • 4 个月前

4/4 We even tried giving Devin real jobs on Upwork and it could do those too!

4/4 We even tried giving Devin real jobs on Upwork and it could do those too!

Cognition

936,169 次观看 • 2 年前