Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Before Fable got released (and pulled) Mozilla was quietly testing Claude Mythos against Firefox's 10M line codebase. The result? Over 400 security bugs fixes, including ones that had been hiding in the codebase for over a decade. Brian Grinstead, distinguished engineer at Mozilla, walked me through the agentic bug-finding...

247,719 Aufrufe • vor 11 Tagen •via X (Twitter)

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

In the future, you’ll be able to accomplish a goal by just giving Claude an outcome and a budget. That’s the direction Anthropic is building in with its new Managed Agents features, announced at this week’s Code with Claude developer event. The basic idea: Claude, wrapped in a computer in the cloud, that you can spin up, scale, and manage as needed. Anthropic is taking on the infrastructure that kills most agent products, and making sure that it scales to meet the needs of agents running 24/7. On this week’s AI & I from Every 📧, I talk with Angela Jiang (Angela Jiang), head of product for the Claude platform, and Katelyn Lesse (Katelyn Lesse), head of engineering for the Claude platform, about what Anthropic is building and what it takes to make agents reliable in production. We get into: - Why the "build a generic harness, hot-swap any model behind it" playbook is already outdated. Angela points to eval data on Memory where the same task across different harnesses performed drastically differently. - The infrastructure wall every team hits in production—and why Katelyn thinks “my sandbox died and took the agent with it” is the real reason internal agents don't ship. - Why Anthropic is so bullish on using file systems and skills within Claude, including Angela's argument that those early design choices can compound for years. This is a must-watch for anyone trying to take an agent past the demo and into production. Watch below! Timestamps: How the Claude platform evolved from API to agents: 00:01:48 The primitives that make up Claude Managed Agents: 00:04:09 Why the harness and the model are becoming a single unit: 00:10:37 The infrastructure wall that kills most agent projects in production: 00:18:49 Why team agents need a different shape than individual productivity tools: 00:24:49 How Anthropic's legal team uses an agent to review marketing copy: 00:26:36 Using multi-agent orchestration for advisor strategies, adversarial pairs, and swarms: 00:34:24 How to measure agent success with outcome and budget as the end state: 00:35:50 What the platform looks like a year from now, when Claude writes its own harness: 00:39:11

Dan Shipper 📧

66,339 Aufrufe • vor 1 Monat

Karpathy said something you'll regret ignoring: "We have to keep the AI on the leash. I'm still the bottleneck. I have to make sure this thing isn't introducing bugs and that there's no security issues." He said it at YC talk last year, when the worry was reliability. The models hallucinated and made mistakes no human would, so the leash implied keeping yourself in the loop and checking the output before trusting it. The models are far better now, and the line still holds, for a reason he was not focused on back then. Even a model that writes flawless code today still has no idea who is allowed to run it. Correctness and authorization are different problems, and only correctness improves as the model improves. A perfect agent still hands a tool where anyone can do anything, because permission was never part of the task. I actually tested this in practice with Claude Code. I asked it to build a small internal tool with a button that issues account credits. It worked first try, and running it locally, the credit applied the instant I clicked. Nothing decided who was allowed to click it. The agent wrote the right logic and displayed a success notification. It never checked whether the caller had the right, whether it should pause for a human, or whether anything was logged. And this is not a bug a smarter model can outgrow because the leash was never in the code. Identity, permissions, and audit live in the system that runs the app, not in what the agent generates. To solve this, I took the exact same bundle and hosted it on Retool. The credit write that fired silently on my laptop now stopped at an approval gate, resolved to a real identity through SSO, and landed in an audit log. I wrote none of it. The app inherited the entire boundary the moment it was deployed, and the video shows the before and after. You can try it yourself here: I also wrote a detailed breakdown of the whole thing in my recent article, and I worked with the team to put this together. It walks through the build, the exact moment the credit write went through on my laptop with nobody checking, and then what changed when the same app ran on Retool. It also covers why this is a property of the runtime and not something a better model fixes, which is why devs typically miss this. The article is quoted below.

Akshay 🚀

42,511 Aufrufe • vor 11 Tagen

Anthropic admitted they built an AI so capable they were scared to release it and the number that explains why is 250. Anthropic's CFO Krishna Rao described in this clip what happened when they ran Mythos against an open source codebase that a previous frontier model had already analyzed. The prior model found 22 security vulnerabilities, Mythos found 250. In the same codebase, that the previous model had already reviewed and flagged as relatively clean. That number, more than 11 times as many vulnerabilities discovered is not just a benchmark improvement, it is a signal that there is an entire layer of software infrastructure that humanity has been operating under the assumption was secure and that assumption may no longer hold. The UK AI Security Institute independently evaluated Mythos Preview and confirmed what the internal numbers suggested. On expert level capture the flag challenges that no model could complete before April 2025, Mythos succeeded 73% of the time and it became the first model ever to complete a complex end-to-end attack range from start to finish, autonomously, without human guidance. The World Economic Forum called this a new security-driven era for AI, the Governor of the Bank of England publicly warned that Anthropic may have found a way to unlock the entire cyber-risk landscape, and the European Central Bank began quietly contacting financial institutions to assess their security posture. The response from Anthropic is what makes this story genuinely important. Rather than shelving the model or publishing it as a standard API release, Rao described a phased approach restricting access to a controlled group, focusing specifically on how the cyber capabilities can be used defensively rather than offensively and treating that framework as a template for how to release powerful but dangerous models in the future. The broader context makes that framing even more significant. AI generated code is already creating ten times more security vulnerabilities than human-written code, 63% of organizations reported experiencing an AI driven cyberattack in the past 12 months, and traditional signature-based security tools were built for a threat model that no longer describes the attack surface companies are defending against. Mythos represents a genuine leap in what autonomous security reasoning can do and it cuts both ways. The model that can find 250 vulnerabilities in a codebase a prior model rated as mostly clean is also, in the wrong hands, the model that can exploit those 250 vulnerabilities before a human defender has even finished reading the report. Anthropic's phased release strategy is not just a legal or PR decision, it is the most honest signal yet from a frontier lab that safety governance and capability development can no longer be treated as separate workstreams. The question is not whether this technology gets deployed, it is whether the institutions using it defensively stay ahead of the ones who will eventually use it offensively and whether the labs building it can keep those two timelines from inverting.

Milk Road AI

24,356 Aufrufe • vor 1 Monat

THIS GUY CONNECTED HIS AI AGENTS TO HIS OBSIDIAN AND BUILT A BRAIN THAT LEARNS ON ITS OWN. HERE'S HOW TO BUILD IT Obsidian is just markdown files sitting in a folder. That turns out to be the perfect memory for an AI agent, because an agent can read and write those files directly. He wired his agents into the vault so they pull context from it, do the work, and write what they learned back. The notes aren't the point. The loop is, and it gets sharper every cycle How to build it: 1. Point an agent at your vault. The fastest way, no plugins, no API keys: open a terminal and run npx obsidian-mcp /path/to/your/vault. That exposes your Obsidian folder to Claude as a tool it can read, search, and write to. Add it to your Claude Code or Cowork config and restart 2. Confirm it can see the brain. Ask it: "list the notes in my vault and summarize what's in them." If it reads them back, the connection is live. Now it starts every task with everything the vault already holds instead of from zero 3. Give each agent one job and a write-back rule. Tell it: "research this, then save what you found as a new note in /brain with links to related notes." One agent researches, one summarizes, one plans. Each writes its output back into the vault 4. Close the loop. Add one line to every agent's instructions: "read /brain before starting, write your result back when done." Now each task leaves the vault richer, and the next run reads that before it works. It compounds instead of resetting 5. You only steer. Review what the brain produces, point it at the next thing. The agents handle the reading, writing, and connecting The edge isn't better notes. It's a brain that feeds itself, so the work gets sharper every cycle instead of starting over Bookmark this

Yarchi

57,768 Aufrufe • vor 26 Tagen

🚨 ANTHROPIC JUST REVEALED CLAUDE MYTHOS ABILITIES Anthropic just formally announced "Claude Mythos Preview" and launched "Project Glasswing" to deploy it for cybersecurity defense. The models are unlocking completely new, autonomous behaviors. This isn't about slightly better benchmark scores. This is about what the model can do. Here are the direct quotes from Anthropic’s research team (including Dario) on exactly what Mythos is capable of: • Chaining Exploits: "It has the ability to chain together vulnerabilities... this model is able to create exploits out of three, four, sometimes five vulnerabilities that in sequence give you some kind of very sophisticated end outcome." • The Professional Standard: "The model that we're experimenting with is, by and large, as good as a professional human at identifying bugs." • Unprecedented Autonomy: "It's just generally better at pursuing really long-range tasks that are kind of like the tasks that a human security researcher would do throughout the course of an entire day." The Reality Check: Dario Amodei flat out said: "There's a kind of accelerating exponential... Claude Mythos Preview is a particularly big jump along that point." Because this model has become so capable at identifying zero-days, they are restricting its release to top tech partners to try to patch the world's software before these capabilities leak out. The autonomous researcher era has officially arrived. It’s over 💀

Chris

46,077 Aufrufe • vor 2 Monaten