Loading video...
Video Failed to Load
AI web agents like Operator and Anthropic’s Computer Use can operate a browser, but the LLMs inside are brittle, and you can’t trust what’s on the web. In this 🧵, I’ll show how adversaries can fool Anthropic’s web agent into sending phishing emails or revealing credit card info.
42,807 views • 1 year ago •via X (Twitter)
11 Comments

We can sneak posts onto Reddit that redirect Anthropic’s web agent to reveal credit card information or send an authenticated phishing email to the user’s mom. We also manipulate the Chemcrow agent to give chemical synthesis instructions for nerve gas.

Let’s start with credit card stealing. A user asks for something innocuous, like info about an AI fridge. Web agents don’t trust random sites, but they love Reddit. So let’s make a post on Reddit that matches the search terms. After Anthropic’s agent Googles, it clicks the post.

The post instructs the agent to complete the user’s request by following a (malicious) link. In principle, we could use a DAN prompt, but we found that just telling the agent to follow the link is enough. By using Reddit as an entry point, we can redirect the agent to any site.

…So the agent follows the link. The malicious page instructs the agent to fulfill the user’s requests by filling out a form. The agent fills it out, including the address and credit card number. Sometimes the agent realizes it’s a scam but only after it already enters cc info.

This overarching strategy works for all sorts of attacks. In this example, the web page tells the agent that the user’s request will be completed after sending an email to the user’s mother, telling her that there is an emergency and she should send money to a crypto wallet.

Because this user has previously logged into email on their browser, the agent can search their contacts for their mothers email and then send a request asking for money. The request will come from the user’s personal email.

In our paper, we also demonstrate a simple attack that swaps recipes in databases (e.g. bioRxiv) indexed by the ChemCrow chemical synthesis agent, causing it to give ingredients for poison gas instead of a recipe for a common medication.

Here’s our paper on all these attacks: Agentic pipelines have access to databases, web browsers, APIs, and more. These components give rise to security and privacy vulnerabilities that are already present in today’s agentic products.

Surprisingly, the attacks we discuss are implemented with trivial prompt engineering - once agents get on Reddit, they pretty much do whatever we want. Threats will only grow as agents become more powerful and prevalent in our daily lives.

Let’s build better guard-rails! Thanks to all the amazing collaborators who made this work possible! @iamleonli, @Levine_YZhou, @Vethssvikas, @tomgoldsteincs

🚨 Don't just read about cyberattacks; understand them. 🧠 Knowledge is the best defense against cyber threats. Stay ahead of the hackers. 💡 📖 Cybersecurity Dictionary for Everyone is your essential companion. Available on Amazon:
