正在加载视频...

视频加载失败

Today we're excited to introduce Devin, the first AI software engineer. Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. Devin is an autonomous agent that solves engineering tasks through...

31,439,794 次观看 • 2 年前 •via X (Twitter)

9 条评论

Cognition 的头像
Cognition2 年前

1/4 Devin can learn how to use unfamiliar technologies.

Cognition 的头像
Cognition2 年前

2/4 Devin can contribute to mature production repositories.

Cognition 的头像
Cognition2 年前

3/4 Devin can train and fine tune its own AI models.

Cognition 的头像
Cognition2 年前

4/4 We even tried giving Devin real jobs on Upwork and it could do those too!

Cognition 的头像
Cognition2 年前

For more details on Devin, check out our blog post here: See Devin in action If you have any project ideas, drop them below and we'll forward them to Devin.

Cognition 的头像
Cognition2 年前

We'd like to thank all our supporters who have helped us get to where we are today, including @patrickc, @collision, @eladgil, @saranormous, Chris Re, @eglyman, @karimatiyeh, @bernhardsson, @t_xu, @FEhrsam, @foundersfund, and many more. If you’re excited to solve some of the world’s biggest problems and build AI that can reason, learn more about our team and apply to join us here.

Jack Forge 的头像
Jack Forge2 年前

Please sirs. My family needs to eat, sirs. Please stop development on this, sirs.

Robert Sterling 的头像
Robert Sterling2 年前

CS majors right now

steve 的头像
steve2 年前

we had a good run bros

相关视频

🚀New Amazon Q Developer agent for software development is available to customers: This agent is based on a new agent architecture that has exciting results coming from the SWE-bench scores (on the full and verified benchmarks) representing AI models’ ability to resolve real-world coding problems. Interesting aspect of Q Agent is that with these newest updates, Q drove nearly 50% more successful coding tasks completed. What makes Q Dev Agent remarkable? The agent architecture is not just about using the best LLMs (which we do), but also giving the agent the ability to constantly explore multiple paths to find the best way to resolve a particular problem (and back tracking when it has reached dead end like a developer would do). Needless to say, we are just getting started on the developer agent and we are constantly pushing to advance our AI capabilities while maintaining quality, security, privacy, and reliability to keep Amazon Q Developer an innovative and trusted option available to our customers using agents for software development. We highlighted the results of our first SWE-bench submission of Amazon Q Developer back in June blog post; with these updates, our new agent resolves 51% more coding tasks than its previous iteration on the SWE-bench verified dataset, and 43% more on the full dataset. That’s the difference a few months make, and I can’t wait to share what our teams will deliver at re:Invent this December. Here's a quick demo showcasing our new Agent in action:

Swami Sivasubramanian

28,946 次观看 • 1 年前

AI has changed software engineering more in the last 3 years than it has changed in the previous 30. What’s needed is not a debate about whether it’s going away—instead it’s a serious discussion about its future: What are the new primitives, techniques, and best practices for software engineering in the age of AI. That’s why I brought Scott Wu (Scott Wu) on AI & I. He’s the founder of Cognition, the company behind the world’s first autonomous AI coding agent, Devin. Cognition got to $73M ARR in less than 2 years—and they just acquired Windsurf to accelerate their growth. I had Scott on the show to talk about where the programming goes from here. We get into: - What the new tools and workflows are for AI engineers. In the near term, Scott sees software engineering defined by a spectrum of tools. At one end are AI features that speed up coding, like tab complete; at the other are agentic systems, like Devin, that can take on tasks independently. Until engineers can operate entirely at the higher layer of abstraction, he argues, both are essential. - Why Scott thinks AGI is already here. By the benchmarks of a decade ago—passing the Turing test, solving hard math problems, and operating agentically—AGI is already here. The line keeps moving, he argues, because humans constantly redefine work around what machines can’t yet do. - Why developers will turn into product architects. Scott sees the long-term future of software engineering as a steady climb up the ladder of abstraction. Just as programming went from assembly to languages like Python and JavaScript, he thinks the future is humans focusing on the product, while AI agents execute. - How Devin stacks up against Anthropic’s Claude Code. Scott credits Claude Code’s success to great product design and the models becoming capable enough to support autonomous workflows. But according to him, the CLI itself isn’t the breakthrough, it’s how a tool fits into a developer’s workflow. Claude Code’s paradigm is that the AI is you, taking the wheel of your computer, he says, while Devin is like the engineer sitting beside you: it runs in its own cloud environment, manages the repo, and improves over time at testing and refining code. This episode of Every 📧’s AI & I is a must-watch for anyone interested in the brass tacks of how AI changes the future of programming. Watch below! Timestamps: Introduction: 00:02:02 Why Scott thinks AGI is here: 00:02:32 Scott’s personal journey as a founder: 00:09:27 Why the fundamentals of computer science still matter: 00:16:55 How the future of programming will evolve: 00:22:30 A new workflow for the AI-first software engineer: 00:26:50 How Devin stacks up against Claude Code: 00:29:33 Reinforcement learning to build better coding agents: 00:40:05 What excites Scott about AI beyond Cognition: 00:50:05

Dan Shipper 📧

34,753 次观看 • 8 个月前

Introducing ALE-Bench, ALE-Agent! Towards Automating Long-Horizon Algorithm Engineering for Hard Optimization Problems Blog: Paper: ALE-Bench is a coding benchmark primarily focused on hard optimization (NP-hard) problems. We developed this benchmark with AtCoder Inc., a leading coding contest platform company. What makes ALE-Bench unique is its focus on hard optimization problems that demand long-horizon and creative reasoning. It’s open-ended, in the sense that true optima are out of reach (NP-hard) and scores can continuously improve. We believe this benchmark has the potential to become one of the key benchmarks for reasoning and coding in the next generation. ALE-Agent is our end-to-end agent that we specifically designed for this challenging domain. In fact, our ALE-Agent has already built an impressive track record in the wild! In May 2025, our agent participated in a live AtCoder Heuristic Competition (AHC), alongside 1,000 other participants in real-time. AHC is considered to be one of the most challenging coding competitions in this domain. Our ALE-Agent achieved an impressive ranking of 21st out of 1,000 human participants in the competition (top 2%), marking a turning point for AI discovery of solutions to hard optimization problems with a wide spectrum of important real world applications such as logistics, routing, packing, factory production planning, power-grid balancing. We look forward to applying this technology to real industrial optimization opportunities. Building on the insights from this study, Sakana AI will continue to tackle the challenge of developing AI with even greater algorithm engineering capabilities. ALE-Bench Dataset: ALE-Bench Code: This research was conducted in collaboration with AtCoder Inc. (AtCoder). We are deeply grateful for their outstanding expertise and contributions in optimization and algorithms, which were invaluable in providing data, analyzing results, and enabling our AI agent’s participation in their contests.

Sakana AI

237,195 次观看 • 1 年前