Загрузка видео...

Не удалось загрузить видео

На главную

AI web agents like Operator and Anthropic’s Computer Use can operate a browser, but the LLMs inside are brittle, and you can’t trust what’s on the web. In this 🧵, I’ll show how adversaries can fool Anthropic’s web agent into sending phishing emails or revealing credit card info.

42,807 просмотров • 1 год назад •via X (Twitter)

Комментарии: 11

Фото профиля Micah Goldblum
Micah Goldblum1 год назад

We can sneak posts onto Reddit that redirect Anthropic’s web agent to reveal credit card information or send an authenticated phishing email to the user’s mom. We also manipulate the Chemcrow agent to give chemical synthesis instructions for nerve gas.

Фото профиля Micah Goldblum
Micah Goldblum1 год назад

Let’s start with credit card stealing. A user asks for something innocuous, like info about an AI fridge. Web agents don’t trust random sites, but they love Reddit. So let’s make a post on Reddit that matches the search terms. After Anthropic’s agent Googles, it clicks the post.

Фото профиля Micah Goldblum
Micah Goldblum1 год назад

The post instructs the agent to complete the user’s request by following a (malicious) link. In principle, we could use a DAN prompt, but we found that just telling the agent to follow the link is enough. By using Reddit as an entry point, we can redirect the agent to any site.

Фото профиля Micah Goldblum
Micah Goldblum1 год назад

…So the agent follows the link. The malicious page instructs the agent to fulfill the user’s requests by filling out a form. The agent fills it out, including the address and credit card number. Sometimes the agent realizes it’s a scam but only after it already enters cc info.

Фото профиля Micah Goldblum
Micah Goldblum1 год назад

This overarching strategy works for all sorts of attacks. In this example, the web page tells the agent that the user’s request will be completed after sending an email to the user’s mother, telling her that there is an emergency and she should send money to a crypto wallet.

Фото профиля Micah Goldblum
Micah Goldblum1 год назад

Because this user has previously logged into email on their browser, the agent can search their contacts for their mothers email and then send a request asking for money. The request will come from the user’s personal email.

Фото профиля Micah Goldblum
Micah Goldblum1 год назад

In our paper, we also demonstrate a simple attack that swaps recipes in databases (e.g. bioRxiv) indexed by the ChemCrow chemical synthesis agent, causing it to give ingredients for poison gas instead of a recipe for a common medication.

Фото профиля Micah Goldblum
Micah Goldblum1 год назад

Here’s our paper on all these attacks: Agentic pipelines have access to databases, web browsers, APIs, and more. These components give rise to security and privacy vulnerabilities that are already present in today’s agentic products.

Фото профиля Micah Goldblum
Micah Goldblum1 год назад

Surprisingly, the attacks we discuss are implemented with trivial prompt engineering - once agents get on Reddit, they pretty much do whatever we want. Threats will only grow as agents become more powerful and prevalent in our daily lives.

Фото профиля Micah Goldblum
Micah Goldblum1 год назад

Let’s build better guard-rails! Thanks to all the amazing collaborators who made this work possible! @iamleonli, @Levine_YZhou, @Vethssvikas, @tomgoldsteincs

Фото профиля SecBriefs | Making Cybersecurity Simple
SecBriefs | Making Cybersecurity Simple1 год назад

🚨 Don't just read about cyberattacks; understand them. 🧠 Knowledge is the best defense against cyber threats. Stay ahead of the hackers. 💡 📖 Cybersecurity Dictionary for Everyone is your essential companion. Available on Amazon:

Похожие видео

New Short Course: Building AI Browser Agents! Learn how to build AI agents that interact and take actions on websites in this course, created in partnership with and taught by and @namangarg0, Co-founders of AGI Inc. AI browser agents can log into websites, fill out forms, click through web pages, or even place orders online for you. They use both visual information, like screenshots, and structural data, like the HTML or Document Object Model (DOM) of a web page, to reason and take action. With the complexity of webpages and multiple possible actions at each step, it can be challenging for an AI browser agent to complete an assigned task. Because these agents run long action sequences, a single error—like clicking the wrong button or misreading a field—can lead to unexpected outcomes or errors that compound over time. In this course, you'll understand how autonomous web agents work, their current limitations, and how AgentQ enables them to improve through self-correction. In detail, you'll: - Learn what web agents are, how they automate tasks online, their architecture, key components, limitations, and an overview of their decision-making strategies. - Build a web agent that can scrape website and return course recommendations in a structured output format. - Build an autonomous web agent that can execute multiple tasks, such as finding and summarizing webpages, filling out a form, and signing up for a newsletter. - Explore AgentQ, a framework that enables agents to self-correct by combining Monte Carlo Tree Search (MCTS), a self-critique mechanism for continuous improvement, and Direct Preference Optimization (DPO). - Deep dive into MCTS, learn how it finds an effective path, illustrated by an example of Gridworld animation, and use AgentQ to complete web tasks. - Understand AI agents' current state and future directions—including key factors shaping their evolution, such as hardware, algorithm innovation, and data availability. By the end of this course, you will have hands-on experience building browser agents and a deeper understanding of how to make them more robust and reliable. Please sign up here:

Andrew Ng

185,870 просмотров • 1 год назад