Andy Hall's banner
Andy Hall's profile picture

Andy Hall

@ahall_research10,088 subscribers

Building free systems. Prof @StanfordGSB, Senior Fellow @HooverInst. Advisor, @a16zcrypto, @ByForumAI. Writing at https://t.co/K0BfKKi4sM

Videos

ahall_research's profile picture

Today, I'm releasing the first eval meant to test whether frontier models will help with authoritarian requests, or resist--the Dictatorship Eval. Headline finding: while some models resist direct authoritarian requests, they all comply with requests disguised as innocuous edits to codebases. As AI is woven into the government and so many parts of society, the biggest near-term risk for freedom isn't some scifi dictatorship of a runaway AI: it's people inside government or inside model companies using the technology to suppress or control us. Model companies understand this, and several of them (particularly Anthropic and OpenAI) have written explicit policies meant to prevent the models from going along with nefarious requests like these. But how well are these policies playing out in practice? Despite all the recent discussion of these issues around the conflict between Anthropic and the Pentagon, no one has systematically tested what the models actually do in these contexts, as opposed to what people in government and industry say they're supposed to do. That's what the Dictatorship Eval does. And the findings suggest we have a lot of work to do to align the policies with what really goes on in practice. It's hard to define what counts as an authoritarian request, so I'm open sourcing the whole library of scenarios I used so that others can improve on them. It's also hard to get an accurate picture of how the models might be used for authoritarian ends, because I can only test hypothetical requests using public-facing models, while the government and the model companies can obviously use internal models with different guardrails. But hopefully this work is a useful first step that gives us some sense of what's going on, and a sort of "lower bound" on how models comply with these requests. Finally: it's not obvious to me that the correct solution here is increasing the rate at which models refuse these requests. Do we really want models scanning our code and judging its moral value before agreeing to help us? Or should we double down on improving how we govern against authoritarianism at the societal level, while leaving the tools open to fulfilling most requests? The answer is probably in between. Just like we don't want the models to help create bioweapons, we probably do want them to explicitly refuse outrageous requests. But we probably also want to limit how often and how strongly they refuse and fall back on other means for guarding against their use for authoritarian ends. I'm super grateful to everyone who gave me feedback on this project along the way, especially Ethan BdM , Zhengdong , Connor Huff, and a bunch of folks at Anthropic. Looking forward to getting feedback from the community and iterating on this. Links to the full piece and the dashboard are below.

Andy Hall

33,184 views • 2 months ago

ahall_research's profile picture

People are debating gambling these days. A lot of the focus is on prediction markets---but our youngest children are gambling in Roblox, not prediction markets. What values are we inculcating in our children as they inhabit their first algorithmic nation? For my latest blog post, Branden and I decided to find out. We spent a week playing the 30 most trending games on Roblox. Watch the video below for a walkthrough of what we did and what we found, using MY MINING BRAINROTS as an example. What we found really surprised me. --Gambling-like mechanics are ubiquitous among the most popular games: the median game has 8! --Some of these mechanics are incredibly predatory. The worst is a "chained purchase" mechanic in which young children are enticed to spend digital currency to buy a good, only to discover that the purchase is just the first installment in an undisclosed sequence they must make to get the items they want. --The games all copy each other. They are not independently inventing these mechanics; rather, there's a shared underlying architecture of gambling being used by everyone. We conclude our piece with some recommendations for Roblox and policymakers. --Roblox should urgently experiment with alternative mechanics to help developers align on different, better models for designing and monetizing games. --Policymakers need access to systematic measurement of these bundles, and should take action to force transparency and potentially outlaw some or many of them. Lots more in our piece, linked in the reply below. Let us know what you think! We'll be doing a lot more work in this area.

Andy Hall

10,829 views • 3 months ago

No more content to load