正在加载视频...

视频加载失败

FOR THE FIRST TIME, THE US GOVERNMENT PULLED A PUBLICLY DEPLOYED AI MODEL OFFLINE. THE MARKET SHOULD PAY ATTENTION. 🇺🇸 On June 12, the Commerce Department ordered Anthropic to block access to its newest models, Fable 5 and Mythos 5, for any foreign national. Anthropic had to disable them...

47,954 次观看 • 15 天前 •via X (Twitter)

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

Today, I'm releasing the first eval meant to test whether frontier models will help with authoritarian requests, or resist--the Dictatorship Eval. Headline finding: while some models resist direct authoritarian requests, they all comply with requests disguised as innocuous edits to codebases. As AI is woven into the government and so many parts of society, the biggest near-term risk for freedom isn't some scifi dictatorship of a runaway AI: it's people inside government or inside model companies using the technology to suppress or control us. Model companies understand this, and several of them (particularly Anthropic and OpenAI) have written explicit policies meant to prevent the models from going along with nefarious requests like these. But how well are these policies playing out in practice? Despite all the recent discussion of these issues around the conflict between Anthropic and the Pentagon, no one has systematically tested what the models actually do in these contexts, as opposed to what people in government and industry say they're supposed to do. That's what the Dictatorship Eval does. And the findings suggest we have a lot of work to do to align the policies with what really goes on in practice. It's hard to define what counts as an authoritarian request, so I'm open sourcing the whole library of scenarios I used so that others can improve on them. It's also hard to get an accurate picture of how the models might be used for authoritarian ends, because I can only test hypothetical requests using public-facing models, while the government and the model companies can obviously use internal models with different guardrails. But hopefully this work is a useful first step that gives us some sense of what's going on, and a sort of "lower bound" on how models comply with these requests. Finally: it's not obvious to me that the correct solution here is increasing the rate at which models refuse these requests. Do we really want models scanning our code and judging its moral value before agreeing to help us? Or should we double down on improving how we govern against authoritarianism at the societal level, while leaving the tools open to fulfilling most requests? The answer is probably in between. Just like we don't want the models to help create bioweapons, we probably do want them to explicitly refuse outrageous requests. But we probably also want to limit how often and how strongly they refuse and fall back on other means for guarding against their use for authoritarian ends. I'm super grateful to everyone who gave me feedback on this project along the way, especially Ethan BdM , Zhengdong , Connor Huff, and a bunch of folks at Anthropic. Looking forward to getting feedback from the community and iterating on this. Links to the full piece and the dashboard are below.

Andy Hall

33,301 次观看 • 2 个月前

The Trump administration just did a complete 180 on AI regulation. 16 months ago, Trump killed Biden's AI executive order on DAY ONE. Called AI "a beautiful baby" that shouldn't be stopped with rules. His AI czar David Sacks went to every conference saying deregulation was the only path. JD Vance flew to Paris and told world leaders the future is won "by building, not by hand-wringing about safety." That was the whole pitch. Regulation is for losers. But the same White House just started briefing Anthropic, Google, and OpenAI executives on plans for MANDATORY government review of AI models before public release. The exact policy they destroyed 16 months ago. Fortune called it a "head-spinning policy pirouette." So what happened? ONE AI model happened: In April, Anthropic announced a model called Mythos. During internal testing, it found THOUSANDS of unknown security vulnerabilities across every major operating system and browser on earth including a 27yo bug in OpenBSD, an OS literally famous for being unhackable, and a 16yo flaw in FFmpeg that survived 5 million automated security tests. NOBODY asked it to do any of this. The capabilities emerged on their own as the model got smarter at coding. Anthropic's researchers said they found more bugs in weeks than they'd found in their entire careers combined. The UK's AI Security Institute confirmed Mythos could autonomously execute multi-stage cyberattacks on networks. Tasks that take human professionals DAYS. Anthropic refused to release it. Formed "Project Glasswing" with Apple, Microsoft, Google, JPMorgan, and 40 other organizations to use it defensively before attackers develop similar tools. Their estimate: Competing labs will have comparable capabilities within 6 to 18 months. That timeline is what scared Washington. Because here's what nobody in the White House considered while removing safety rules: What happens when a devastating AI-enabled cyberattack hits American infrastructure and the government has ZERO oversight in place? No safety testing, pre-release review, or reporting. They literally burned all of it. David Sacks quietly left in March. Treasury Secretary Bessent and Chief of Staff Susie Wiles took over AI policy. They're now drafting an executive order for an AI working group that would vet models before release. Some officials want the government to get FIRST ACCESS to new models. The same government that said 16 months ago it had no business being involved. But here's where it gets really insane: The company that triggered all of this was BANNED by the Trump administration from government contracts. Labeled Anthropic a "supply-chain risk." They tried to punish them for refusing to let their AI target US citizens autonomously. Anthropic is currently fighting the Pentagon in federal court. So the timeline reads like this: January 2025: Trump kills Biden's AI oversight. July 2025: Calls AI a "beautiful baby," signs orders to fast-track AI with zero safety guardrails. March 2026: Bans Anthropic from government work. April 2026: Anthropic's Mythos demonstrates it can hack every major OS on earth. May 2026: Same administration rebuilds the oversight it destroyed BECAUSE of the company it banned. This is what happens when ideology meets reality. Every government told itself AI regulation could wait. Mythos proved them wrong overnight. Open-weight models with similar capabilities are even closer. Once those tools are in the wild, no executive order puts them back.

Ricardo

75,210 次观看 • 1 个月前