Rob Wiblin's banner

Rob Wiblin

@robertwiblin • 50,212 subscribers

Host of the 80,000 Hours Podcast. Exploring the inviolate sphere of ideas one interview at a time: https://t.co/2YMw00bkIQ

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

One OpenAI critic calls it "the theft of at least the millennium and quite possibly all of human history." Are they right? UCLA nonprofit law expert Rose Chan Loui and I discuss OpenAI's unprecedented gambit to ditch its nonprofit for good: 4:08 – How OpenAI carefully chose a complex nonprofit structure that failed 11:56 – OpenAI's new plan to go for-profit 14:49 – The nonprofit board is out-gunned, out-manned, out-numbered, out-planned. Who can help? 17:28 – Who would stand cheated in a bad for-profit switch? 28:03 – Has this happened before? 29:37 – Is it truly in the nonprofit's interest to sell control of OpenAI, or is it a bad deal? 36:10 – The difficulty of valuing OpenAI's future windfall profits 42:31 – Control of OpenAI is independently incredibly valuable and demands compensation 53:09 – What most miss: It's very important the nonprofit get cash and not just equity 1:05:36 – Is it a farce to call this an "arm's-length transaction"? 1:10:59 – How the nonprofit board can best play their hand 1:17:42 – Can Elon Musk mount a court challenge and how that would work The volunteer nonprofit board is hugely outgunned by groups who would profit by tens of billions by screwing it over. And it will take a heroic effort and help from some state attorneys general to get everything it's owed. But that... just might happen. We don't hold back on the opinions. Episode available in audio and video on the 80,000 Hours Podcast here, on any app, or YouTube.

One OpenAI critic calls it "the theft of at least the millennium and quite possibly all of human history." Are they right? UCLA nonprofit law expert Rose Chan Loui and I discuss OpenAI's unprecedented gambit to ditch its nonprofit for good: 4:08 – How OpenAI carefully chose a complex nonprofit structure that failed 11:56 – OpenAI's new plan to go for-profit 14:49 – The nonprofit board is out-gunned, out-manned, out-numbered, out-planned. Who can help? 17:28 – Who would stand cheated in a bad for-profit switch? 28:03 – Has this happened before? 29:37 – Is it truly in the nonprofit's interest to sell control of OpenAI, or is it a bad deal? 36:10 – The difficulty of valuing OpenAI's future windfall profits 42:31 – Control of OpenAI is independently incredibly valuable and demands compensation 53:09 – What most miss: It's very important the nonprofit get cash and not just equity 1:05:36 – Is it a farce to call this an "arm's-length transaction"? 1:10:59 – How the nonprofit board can best play their hand 1:17:42 – Can Elon Musk mount a court challenge and how that would work The volunteer nonprofit board is hugely outgunned by groups who would profit by tens of billions by screwing it over. And it will take a heroic effort and help from some state attorneys general to get everything it's owed. But that... just might happen. We don't hold back on the opinions. Episode available in audio and video on the 80,000 Hours Podcast here, on any app, or YouTube.

7,043,755 views • 1 year ago

My best interview in some time. Rohin Shah leads AGI alignment/safety at DeepMind. And he has a lot of spicy personal takes: We probably won’t get catastrophic misalignment (00:49) Safety 'commitments' have severe limitations (10:38) The intelligence explosion probably isn't imminent (1:52:44) Why he's not working to pause AI advances (51:44) Pre-deployment evals aren't the right focus (for catastrophic risks) (37:41) Signalling concern for safety sometimes diverts resources from actually making AI safe (01:09:51) Reading AI thoughts is v useful for safety – and we'll probably be able to for years to come (54:17) Governance is somewhat more likely to be the bottleneck than alignment (43:55) Rohin's team doesn't have a veto, and that's OK (27:36) Central banks are a promising model for regulating AI (33:34) Also: Google DeepMind's actual plan for building AGI safely (1:40:29) How external researchers can positively influence big AI companies (2:21:55) The roles GDM most needs to hire for (2:37:03) On the 80,000 Hours Podcast. Links below - enjoy! (Rohin Shah)

My best interview in some time. Rohin Shah leads AGI alignment/safety at DeepMind. And he has a lot of spicy personal takes: We probably won’t get catastrophic misalignment (00:49) Safety 'commitments' have severe limitations (10:38) The intelligence explosion probably isn't imminent (1:52:44) Why he's not working to pause AI advances (51:44) Pre-deployment evals aren't the right focus (for catastrophic risks) (37:41) Signalling concern for safety sometimes diverts resources from actually making AI safe (01:09:51) Reading AI thoughts is v useful for safety – and we'll probably be able to for years to come (54:17) Governance is somewhat more likely to be the bottleneck than alignment (43:55) Rohin's team doesn't have a veto, and that's OK (27:36) Central banks are a promising model for regulating AI (33:34) Also: Google DeepMind's actual plan for building AGI safely (1:40:29) How external researchers can positively influence big AI companies (2:21:55) The roles GDM most needs to hire for (2:37:03) On the 80,000 Hours Podcast. Links below - enjoy! (Rohin Shah)

155,326 views • 1 month ago

Every AI lab is working to make their AI helpful, harmless and honest. Max Harms (Max Harms) thinks this is a complete wrong turn, and 'aligning' AI to human values is actively dangerous. In his view a safe AGI must have absolutely no opinion about how the world ought to be, be willingly modifiable, and be entirely indifferent to being shut down. The opposite of all commercial models today. The key appeal is that so-called 'corrigibility' could be an attractor state – get close enough and the AI actively helps you make it more corrigible over time. That forgiveness would at least give us a shot. It's a strategy that feels natural within the 'MIRI worldview', recently laid out by his colleagues Eliezer Yudkowsky ⏹️ and Nate Soares ⏹️ in 'If Anyone Builds It Everyone Dies'. But it risks causing a different AI catastrophe, because the resulting AI model would necessarily be willing to assist any human operator with a power grab, or indeed any crime at all. I interviewed Max on the 80,000 Hours Podcast to debate the MIRI worldview, and what we should do to figure out if corrigibility ought to be our one and only focus. Links below – enjoy! 00:01:56 If anyone builds it, will everyone die? The MIRI perspective on AGI risk 00:24:28 Evolution failed to ‘align’ us, just as we'll fail to align AI 00:42:56 We're training AIs to want to stay alive and value power for its own sake 00:52:24 Objections: Is the 'squiggle/paperclip problem' really real? 01:05:02 Can we get empirical evidence re: 'alignment by default'? 01:10:17 Why do few AI researchers share Max's perspective? 01:18:34 We're training AI to pursue goals relentlessly — and superintelligence will too 01:24:51 The case for a radical slowdown 01:27:53 Max's best hope: corrigibility as stepping stone to alignment 01:32:34 Corrigibility is both uniquely valuable, and practical, to train 01:45:06 What training could ever make models corrigible enough? 01:51:38 Corrigibility is also terribly risky due to misuse risk 01:58:57 A single researcher could make a corrigibility benchmark. Nobody has. 02:12:20 Red Heart & why Max writes hard science fiction 02:34:08 Should you homeschool? Depends how weird your kids are.

Every AI lab is working to make their AI helpful, harmless and honest. Max Harms (Max Harms) thinks this is a complete wrong turn, and 'aligning' AI to human values is actively dangerous. In his view a safe AGI must have absolutely no opinion about how the world ought to be, be willingly modifiable, and be entirely indifferent to being shut down. The opposite of all commercial models today. The key appeal is that so-called 'corrigibility' could be an attractor state – get close enough and the AI actively helps you make it more corrigible over time. That forgiveness would at least give us a shot. It's a strategy that feels natural within the 'MIRI worldview', recently laid out by his colleagues Eliezer Yudkowsky ⏹️ and Nate Soares ⏹️ in 'If Anyone Builds It Everyone Dies'. But it risks causing a different AI catastrophe, because the resulting AI model would necessarily be willing to assist any human operator with a power grab, or indeed any crime at all. I interviewed Max on the 80,000 Hours Podcast to debate the MIRI worldview, and what we should do to figure out if corrigibility ought to be our one and only focus. Links below – enjoy! 00:01:56 If anyone builds it, will everyone die? The MIRI perspective on AGI risk 00:24:28 Evolution failed to ‘align’ us, just as we'll fail to align AI 00:42:56 We're training AIs to want to stay alive and value power for its own sake 00:52:24 Objections: Is the 'squiggle/paperclip problem' really real? 01:05:02 Can we get empirical evidence re: 'alignment by default'? 01:10:17 Why do few AI researchers share Max's perspective? 01:18:34 We're training AI to pursue goals relentlessly — and superintelligence will too 01:24:51 The case for a radical slowdown 01:27:53 Max's best hope: corrigibility as stepping stone to alignment 01:32:34 Corrigibility is both uniquely valuable, and practical, to train 01:45:06 What training could ever make models corrigible enough? 01:51:38 Corrigibility is also terribly risky due to misuse risk 01:58:57 A single researcher could make a corrigibility benchmark. Nobody has. 02:12:20 Red Heart & why Max writes hard science fiction 02:34:08 Should you homeschool? Depends how weird your kids are.

296,251 views • 4 months ago

'MIT Study Shows 95% of AI Projects Lose Money' was the #1 AI meme for the public (and politicians) last year. So I looked into this 'study'. It was… much worse than I would have guessed. And I suspect not by mistake. The authors had a hidden agenda from the start. I explain:

'MIT Study Shows 95% of AI Projects Lose Money' was the #1 AI meme for the public (and politicians) last year. So I looked into this 'study'. It was… much worse than I would have guessed. And I suspect not by mistake. The authors had a hidden agenda from the start. I explain:

112,320 views • 2 months ago

80,000 Hours the book was published by Penguin today. It's our definitive explanation of how to use your career to try to force the world onto a better track. The result of 14 years honing our ideas. See what's in there and order:

80,000 Hours the book was published by Penguin today. It's our definitive explanation of how to use your career to try to force the world onto a better track. The result of 14 years honing our ideas. See what's in there and order:

47,628 views • 1 month ago

This tweet got over 1M views so we made it a video: How much money does Meta make by enabling crimes? "Internal docs leaked to Reuters show: • 10% of all Meta revenue comes from ads for scams & banned goods ($16B/year) • Meta estimates it's involved in 1/3 of all successful scams in the US • That suggests they drive $50B in scam losses for US consumers alone each year • Meta earns ~$3B annually from scam/banned goods ads run by Chinese operations alone..."

This tweet got over 1M views so we made it a video: How much money does Meta make by enabling crimes? "Internal docs leaked to Reuters show: • 10% of all Meta revenue comes from ads for scams & banned goods ($16B/year) • Meta estimates it's involved in 1/3 of all successful scams in the US • That suggests they drive $50B in scam losses for US consumers alone each year • Meta earns ~$3B annually from scam/banned goods ads run by Chinese operations alone..."

115,460 views • 3 months ago

Yoshua Bengio thinks he knows how to make provably safe superintelligent agents. Bengio built the foundations of modern AI and is the most cited living scientist. He believes his alternative training setup would: 1. Guarantee honesty 2. Prevent unintended goals 3. Produce capable agents 4. Port over most data and techniques from current LLMs 5. Not be inherently more expensive, and perhaps be more intelligent Bengio claims the honesty and lack of unintended goals can be proven mathematically, at least given particular assumptions. And his new organization, LawZero, is aiming to build a scrappy prototype as soon as possible. The architecture is called 'Scientist AI' and it's based on training a model to explain empirical observations, including what people say, rather than training AIs that mimic human behaviour or seek our approval. (Bengio's frank assessment is that "reinforcement learning is evil" and that allowing AIs to independently train their successors is "the most crazy, dangerous bet that unfortunately we are on track to do.") But skeptics question whether Scientist AI really does solve the fundamental problem of 'eliciting latent knowledge' from AI models. And with the commercial race for superintelligence so intense, it's not clear whether the proposal will be able to compete or have time to bear fruit, even if it's sound in theory. On The 80,000 Hours Podcast, links below – enjoy! • Making AI honest and safe (00:00:00) • Scientist AI in plain English (00:02:27) • How Scientist AI differs from LLMs (00:06:32) • How the training data works (00:14:02) • Can this become an agent? (00:21:02) • Why Yoshua is now more optimistic (00:32:11) • Why companies can’t stop racing (00:36:35) • A working prototype won't take long (00:49:15) • Scientist models might be more capable (00:53:34) • “Reinforcement learning is evil” (01:01:27) • Scientist AI from guardrail to agent (01:08:37) • Can safe AI still be competent? (01:12:38) • How much will this cost? (01:19:29) • Can it generalise beyond maths and science? (01:23:26) • A multi-national push for superintelligence (01:39:19) • Want to work with or fund Yoshua? (01:51:16) • Why smart people ignore AI risk (01:54:45) • Don’t let AI build the next AI (02:01:33) • Why politicians miss the real risks (02:12:28) • Why Yoshua changed his mind about AI risk (02:21:27)

Yoshua Bengio thinks he knows how to make provably safe superintelligent agents. Bengio built the foundations of modern AI and is the most cited living scientist. He believes his alternative training setup would: 1. Guarantee honesty 2. Prevent unintended goals 3. Produce capable agents 4. Port over most data and techniques from current LLMs 5. Not be inherently more expensive, and perhaps be more intelligent Bengio claims the honesty and lack of unintended goals can be proven mathematically, at least given particular assumptions. And his new organization, LawZero, is aiming to build a scrappy prototype as soon as possible. The architecture is called 'Scientist AI' and it's based on training a model to explain empirical observations, including what people say, rather than training AIs that mimic human behaviour or seek our approval. (Bengio's frank assessment is that "reinforcement learning is evil" and that allowing AIs to independently train their successors is "the most crazy, dangerous bet that unfortunately we are on track to do.") But skeptics question whether Scientist AI really does solve the fundamental problem of 'eliciting latent knowledge' from AI models. And with the commercial race for superintelligence so intense, it's not clear whether the proposal will be able to compete or have time to bear fruit, even if it's sound in theory. On The 80,000 Hours Podcast, links below – enjoy! • Making AI honest and safe (00:00:00) • Scientist AI in plain English (00:02:27) • How Scientist AI differs from LLMs (00:06:32) • How the training data works (00:14:02) • Can this become an agent? (00:21:02) • Why Yoshua is now more optimistic (00:32:11) • Why companies can’t stop racing (00:36:35) • A working prototype won't take long (00:49:15) • Scientist models might be more capable (00:53:34) • “Reinforcement learning is evil” (01:01:27) • Scientist AI from guardrail to agent (01:08:37) • Can safe AI still be competent? (01:12:38) • How much will this cost? (01:19:29) • Can it generalise beyond maths and science? (01:23:26) • A multi-national push for superintelligence (01:39:19) • Want to work with or fund Yoshua? (01:51:16) • Why smart people ignore AI risk (01:54:45) • Don’t let AI build the next AI (02:01:33) • Why politicians miss the real risks (02:12:28) • Why Yoshua changed his mind about AI risk (02:21:27)

65,070 views • 1 month ago

There are only 3 weapons that make sense in space: 1. star-scale lasers 2. RKVs 3. self-replicating probes I argue that all 3 fail for different technical reasons. In the limit space warfare will be defence-dominant. And this insight should affect our priorities today.

There are only 3 weapons that make sense in space: 1. star-scale lasers 2. RKVs 3. self-replicating probes I argue that all 3 fail for different technical reasons. In the limit space warfare will be defence-dominant. And this insight should affect our priorities today.

11,893 views • 11 days ago

METR investigated what a rogue AI could secretly get away with inside a frontier AI lab, in close collaboration with OpenAI, GDM, Anthropic and Meta. Including sending a red-teamer into Anthropic to playact 'evil Claude' for 3 weeks. Here's what stands out to me from their new 320-page report: 00:00 What could an unreleased AI get away with? 01:54 Motive: Why grab more compute? 05:46 Opportunity: YOLO mode and jailbreaks 11:02 Means: Brilliant idiots in data centres 15:45 We have to test unreleased models... 18:29 ...especially if AI R&D is coming in 2028

METR investigated what a rogue AI could secretly get away with inside a frontier AI lab, in close collaboration with OpenAI, GDM, Anthropic and Meta. Including sending a red-teamer into Anthropic to playact 'evil Claude' for 3 weeks. Here's what stands out to me from their new 320-page report: 00:00 What could an unreleased AI get away with? 01:54 Motive: Why grab more compute? 05:46 Opportunity: YOLO mode and jailbreaks 11:02 Means: Brilliant idiots in data centres 15:45 We have to test unreleased models... 18:29 ...especially if AI R&D is coming in 2028

40,047 views • 1 month ago

Convention wisdom is that bioweapons are humanity's greatest weakness – 100x cheaper to make than to defend against. Andrew Snyder-Beattie thinks conventional wisdom is likely wrong. He has a plan cheap enough to do without government. Useful even in worst case scenarios like mirror bacteria. Effective enough to save most people. In one of my all-time fav interviews he lays out a low-tech 4-step approach developed by his research team at Open Philanthropy, to fix a problem most have thought unsolvable. ASB is hiring for many roles in this project from logistics to biotech to manufacturing, and has $100s millions to deploy. Enjoy, links below! 2:10 How bad it could get 9:19 The worst-case scenario: mirror bacteria 18:14 Why low-tech 25:30 Prevention 31:21 The “4 pillars” plan 33:09 ASB is hiring now to make this happen 35:11 Everyone was wrong: biorisks are defence dominant 40:23 Pillar 1: Lungs 55:53 Pillar 2: Biohardening 1:15:19 Pillar 3: Detection 1:28:40 Pillar 4: The wrench hypothesis 1:40:12 The plan's biggest weaknesses 1:44:44 Would chaos make this impossible to pull off? 1:51:50 Would rogue AI make bioweapons? 1:57:57 We can feed the world even if all the plants die 2:07:03 Could a bioweapon make the Earth uninhabitable? 2:09:35 What ASB is hiring for 2:30:27 How to protect yourself and your family (On the 80,000 Hours Podcast, available anywhere you get podcasts.)

Sensitive content

This media may contain sensitive content.

Convention wisdom is that bioweapons are humanity's greatest weakness – 100x cheaper to make than to defend against. Andrew Snyder-Beattie thinks conventional wisdom is likely wrong. He has a plan cheap enough to do without government. Useful even in worst case scenarios like mirror bacteria. Effective enough to save most people. In one of my all-time fav interviews he lays out a low-tech 4-step approach developed by his research team at Open Philanthropy, to fix a problem most have thought unsolvable. ASB is hiring for many roles in this project from logistics to biotech to manufacturing, and has $100s millions to deploy. Enjoy, links below! 2:10 How bad it could get 9:19 The worst-case scenario: mirror bacteria 18:14 Why low-tech 25:30 Prevention 31:21 The “4 pillars” plan 33:09 ASB is hiring now to make this happen 35:11 Everyone was wrong: biorisks are defence dominant 40:23 Pillar 1: Lungs 55:53 Pillar 2: Biohardening 1:15:19 Pillar 3: Detection 1:28:40 Pillar 4: The wrench hypothesis 1:40:12 The plan's biggest weaknesses 1:44:44 Would chaos make this impossible to pull off? 1:51:50 Would rogue AI make bioweapons? 1:57:57 We can feed the world even if all the plants die 2:07:03 Could a bioweapon make the Earth uninhabitable? 2:09:35 What ASB is hiring for 2:30:27 How to protect yourself and your family (On the 80,000 Hours Podcast, available anywhere you get podcasts.)

168,503 views • 9 months ago

I spent the last 2 days trying to figure out how scary Claude Mythos is. I think it's fairly scary, though not because of the hacking: 1. It indicates fully-automated AI R&D is coming sooner 2. Its alignment seems better, which is good. But all the alignment tests have serious flaws, which is bad. 3. There are a few specific warning signs Mythos might not be trustworthy I explain what stood out to me in the 244-page System Card and 59-page Alignment Risk Report in this essay for the 80,000 Hours Podcast. Judging by those 2 reports Anthropic itself seems kinda scared of Claude now. And I'm sure views vary widely within the company, but at times it feels like they only give themselves a 50/50 chance of being able to keep the next few Claudes fully under control. So I guess we're on the same page! If it does turn out we can take the safety results at face value we may look back and see this week as watershed good news. If they can't, the opposite. (I wish I had had more time to look into how reassuring it is that their automated monitoring systems don't seem to be picking up much misbehaviour post internal-deployment. I think that's what someone at Anthropic who feels more relaxed would point to. Next time!) Links below - enjoy!

I spent the last 2 days trying to figure out how scary Claude Mythos is. I think it's fairly scary, though not because of the hacking: 1. It indicates fully-automated AI R&D is coming sooner 2. Its alignment seems better, which is good. But all the alignment tests have serious flaws, which is bad. 3. There are a few specific warning signs Mythos might not be trustworthy I explain what stood out to me in the 244-page System Card and 59-page Alignment Risk Report in this essay for the 80,000 Hours Podcast. Judging by those 2 reports Anthropic itself seems kinda scared of Claude now. And I'm sure views vary widely within the company, but at times it feels like they only give themselves a 50/50 chance of being able to keep the next few Claudes fully under control. So I guess we're on the same page! If it does turn out we can take the safety results at face value we may look back and see this week as watershed good news. If they can't, the opposite. (I wish I had had more time to look into how reassuring it is that their automated monitoring systems don't seem to be picking up much misbehaviour post internal-deployment. I think that's what someone at Anthropic who feels more relaxed would point to. Next time!) Links below - enjoy!

60,087 views • 2 months ago

The Will MacAskill episode on: 1. Why we should both pay AIs and trade with them (00:38:13) 2. Why not just push to pause AI development? (01:42:17) 3. Making AIs extremely risk averse might also make them safe (00:38:13) 4. The panic over sycophancy was justified (08:11) 5. The character we give AI models could determine the future (00:01:00) 6. The toughest trade-offs in AI personality design (13:24) 7. The effective altruism comeback (01:52:19) 8. Every population ethics recommends 'tiling the universe', except Will's new one (02:39:35) 9. Non-causal decision theory might accidentally motivate a near-best future (01:22:19) 10. How to aim at the best possible world without utopianism (02:09:30) For the 80,000 Hours Podcast, links below - enjoy! William MacAskill

The Will MacAskill episode on: 1. Why we should both pay AIs and trade with them (00:38:13) 2. Why not just push to pause AI development? (01:42:17) 3. Making AIs extremely risk averse might also make them safe (00:38:13) 4. The panic over sycophancy was justified (08:11) 5. The character we give AI models could determine the future (00:01:00) 6. The toughest trade-offs in AI personality design (13:24) 7. The effective altruism comeback (01:52:19) 8. Every population ethics recommends 'tiling the universe', except Will's new one (02:39:35) 9. Non-causal decision theory might accidentally motivate a near-best future (01:22:19) 10. How to aim at the best possible world without utopianism (02:09:30) For the 80,000 Hours Podcast, links below - enjoy! William MacAskill

40,771 views • 2 months ago

What the hell happened with AGI timelines in 2025? Was it just vibes? Faulty analysis? Unexpected technical results? I try to make sense of what drove the wild swings in sentiment: • The great timelines contraction (00:47) • Why timelines went back out again (02:10) • Longstanding reasons AGI could take a long time (11:13) • So what's the upshot of all of these updates? (14:47) • 5 reasons the radical pessimists are still wrong (16:54) • Even long timelines are short now (23:54) (On the 80,000 Hours Podcast, links below.)

What the hell happened with AGI timelines in 2025? Was it just vibes? Faulty analysis? Unexpected technical results? I try to make sense of what drove the wild swings in sentiment: • The great timelines contraction (00:47) • Why timelines went back out again (02:10) • Longstanding reasons AGI could take a long time (11:13) • So what's the upshot of all of these updates? (14:47) • 5 reasons the radical pessimists are still wrong (16:54) • Even long timelines are short now (23:54) (On the 80,000 Hours Podcast, links below.)

71,964 views • 4 months ago

Tackling NIMBYs head on has been an abject failure. You don't smash them — sooner or later they smash you. We need a different approach. And there’s a weirdly obvious solution to fighting NIMBY: 'stuffing their mouths with gold'. Sam Bowman (Sam Bowman) and I discuss his 'housing theory of everything' and how to actually fix our catastrophically busted planning permission system on the 80,000 Hours Podcast. Also: how avant-garde architects are the great villains of our age, how to tackle terrible incumbent institutions, Europe's need for nuclear, Ozempic remaining highly underrated, and how progress studies stays sane. 1:21 We can't seem to build anything 3:15 And it's ruining people's lives 8:28 The housing theory of everything 17:03 The UK is the world's worst 35:47 Why almost no progress fixing it 43:11 NIMBYs are often harmed by development 54:50 Solution #1: Street votes 1:18:47 Will street votes come to the US 1:24:21 Solution #2: Stuffing mouths with gold 1:43:42 The most important policy setting you've never heard of 1:57:06 Solution #3: Opt-outs 2:10:32 How to make it happen 2:17:30 Making old institutions die a gradual death 2:31:11 The evil of modern architecture and the importance of beauty 2:44:14 The north needs nuclear 3:01:43 Ozempic and "the overweight theory of everything" 3:17:08 How progress studies has avoided online madness I'd been hoping to interview Sam Bowman for years and he's the guest I always dreamed he would be. Transcript and links on the 80,000 Hours site, and a much higher res 4K version on YT (can't upload full quality to Twitter sadly).

Tackling NIMBYs head on has been an abject failure. You don't smash them — sooner or later they smash you. We need a different approach. And there’s a weirdly obvious solution to fighting NIMBY: 'stuffing their mouths with gold'. Sam Bowman (Sam Bowman) and I discuss his 'housing theory of everything' and how to actually fix our catastrophically busted planning permission system on the 80,000 Hours Podcast. Also: how avant-garde architects are the great villains of our age, how to tackle terrible incumbent institutions, Europe's need for nuclear, Ozempic remaining highly underrated, and how progress studies stays sane. 1:21 We can't seem to build anything 3:15 And it's ruining people's lives 8:28 The housing theory of everything 17:03 The UK is the world's worst 35:47 Why almost no progress fixing it 43:11 NIMBYs are often harmed by development 54:50 Solution #1: Street votes 1:18:47 Will street votes come to the US 1:24:21 Solution #2: Stuffing mouths with gold 1:43:42 The most important policy setting you've never heard of 1:57:06 Solution #3: Opt-outs 2:10:32 How to make it happen 2:17:30 Making old institutions die a gradual death 2:31:11 The evil of modern architecture and the importance of beauty 2:44:14 The north needs nuclear 3:01:43 Ozempic and "the overweight theory of everything" 3:17:08 How progress studies has avoided online madness I'd been hoping to interview Sam Bowman for years and he's the guest I always dreamed he would be. Transcript and links on the 80,000 Hours site, and a much higher res 4K version on YT (can't upload full quality to Twitter sadly).

197,371 views • 1 year ago

Neel Nanda is leading a Google DeepMind research team at 26. He and I discuss: • How that happened • “If your safety work doesn't advance capabilities, it's probably bad safety work” • Should people work at the safest or most reckless AI company? • An AI PhD – with these timelines?! • How to best operate in a big frontier AI company • Neel's distinctive uses of LLMs and which cold emails he answers • A common reasoning error in AI alignment • Why he (Neel Nanda) refuses to share his p(doom) This is part 2 of our conversation, part 1 was a comprehensive update on his research area: mechanistic interpretability, which I'll link below. Links to this episode of the 80,000 Hours Podcast below — enjoy!

Neel Nanda is leading a Google DeepMind research team at 26. He and I discuss: • How that happened • “If your safety work doesn't advance capabilities, it's probably bad safety work” • Should people work at the safest or most reckless AI company? • An AI PhD – with these timelines?! • How to best operate in a big frontier AI company • Neel's distinctive uses of LLMs and which cold emails he answers • A common reasoning error in AI alignment • Why he (Neel Nanda) refuses to share his p(doom) This is part 2 of our conversation, part 1 was a comprehensive update on his research area: mechanistic interpretability, which I'll link below. Links to this episode of the 80,000 Hours Podcast below — enjoy!

111,865 views • 9 months ago

I got a comprehensive update on 'mech interp' from Neel Nanda at Google DeepMind. Neel helped make reading AI minds into a thriving field of ML. But he has had a change of heart: it's not the silver bullet he once hoped and many others still believe it to be. Still, they've had some big successes understanding what AIs are really thinking, and Neel thinks pairing those tools with other approaches to get 'defence in depth' remains our best and only option when deploying superhuman AI models. Neel and I tried to cover most of what you'd want to know be up to date on this whole topic: 9:50 How Neel changed his mind on mech interp 16:00 The biggest successes so far 20:13 Probes are great 29:30 Why it won't solve all our problems 40:38 Interpretability can't reliably find deceptive AI 53:17 'Self-preservation' isn't always what it seems 1:02:25 Will AIs learn to lie in their chain of thought? 1:17:14 Models can tell when they’re being tested and act differently 1:38:24 Why everyone's excited about sparse autoencoders (SAEs) 1:47:55 Why SAEs aren't so great 2:13:11 Lessons from the mech interp hype 2:27:29 Neel’s new research philosophy 2:39:42 Who should join the mech interp field Enjoy! Links below.

I got a comprehensive update on 'mech interp' from Neel Nanda at Google DeepMind. Neel helped make reading AI minds into a thriving field of ML. But he has had a change of heart: it's not the silver bullet he once hoped and many others still believe it to be. Still, they've had some big successes understanding what AIs are really thinking, and Neel thinks pairing those tools with other approaches to get 'defence in depth' remains our best and only option when deploying superhuman AI models. Neel and I tried to cover most of what you'd want to know be up to date on this whole topic: 9:50 How Neel changed his mind on mech interp 16:00 The biggest successes so far 20:13 Probes are great 29:30 Why it won't solve all our problems 40:38 Interpretability can't reliably find deceptive AI 53:17 'Self-preservation' isn't always what it seems 1:02:25 Will AIs learn to lie in their chain of thought? 1:17:14 Models can tell when they’re being tested and act differently 1:38:24 Why everyone's excited about sparse autoencoders (SAEs) 1:47:55 Why SAEs aren't so great 2:13:11 Lessons from the mech interp hype 2:27:29 Neel’s new research philosophy 2:39:42 Who should join the mech interp field Enjoy! Links below.

107,607 views • 9 months ago

Even 'aligned AGI' naturally kills democracy and leads to oligarchy, or worse. That's the take of Anthropic's past alignment evals team lead, Prof David Duvenaud. Once humans aren't needed to do jobs or serve in the military, to governments we look like "meddlesome parasites". With voters unable to contribute but engaged in incessant activism to extract resources from others – resources the country needs to avoid domination by rivals – the attraction of mass disenfranchisement could be overwhelming. In 2025 David co-authored "Gradual Disempowerment", which aimed to lay out this and many other political, economic, and cultural forces that could sideline ordinary people (and maybe all people) in the presence of machines that can cheaply do everything humans will do. Most controversially, David and colleagues believe that competitive forces will compel disempowerment, even if all those AIs are aligned and loyal to their users. I wasn't sure how much I believed this vision of how the future might play out, so I interviewed him for The 80,000 Hours Podcast to probe how well it holds up. He and I covered: 01:30 The case that alignment isn’t enough 14:15 How smart AI advice still leads to terrible outcomes 19:05 How gradual disempowerment occurs 22:10 Economics: Humans become "meddlesome parasites" 29:37 Humans are a "criminally decadent" waste of energy 40:48 Is humans losing control actually bad, ethically? 57:47 Politics: Governments stop needing people 1:10:47 Can human culture survive in an AI-dominated world? 1:27:20 Will the future be determined by competitive or coordinative forces? 1:35:00 Can we find a single good post-AGI equilibria for humans? 1:45:17 Do we know anything useful to do about this? 1:56:42 How important is this problem compared to other AGI issues? 2:05:42 Improving global coordination may be our best bet 2:08:14 The 'Gradual Disempowerment Index' 2:11:22 The government will fight to write AI constitutions 2:17:48 “The intelligence curse” and Workshop Labs 2:23:48 Mapping out disempowerment in a world of aligned AGIs 2:30:10 What do David’s CompSci colleagues make of all this? Links below — enjoy!

Even 'aligned AGI' naturally kills democracy and leads to oligarchy, or worse. That's the take of Anthropic's past alignment evals team lead, Prof David Duvenaud. Once humans aren't needed to do jobs or serve in the military, to governments we look like "meddlesome parasites". With voters unable to contribute but engaged in incessant activism to extract resources from others – resources the country needs to avoid domination by rivals – the attraction of mass disenfranchisement could be overwhelming. In 2025 David co-authored "Gradual Disempowerment", which aimed to lay out this and many other political, economic, and cultural forces that could sideline ordinary people (and maybe all people) in the presence of machines that can cheaply do everything humans will do. Most controversially, David and colleagues believe that competitive forces will compel disempowerment, even if all those AIs are aligned and loyal to their users. I wasn't sure how much I believed this vision of how the future might play out, so I interviewed him for The 80,000 Hours Podcast to probe how well it holds up. He and I covered: 01:30 The case that alignment isn’t enough 14:15 How smart AI advice still leads to terrible outcomes 19:05 How gradual disempowerment occurs 22:10 Economics: Humans become "meddlesome parasites" 29:37 Humans are a "criminally decadent" waste of energy 40:48 Is humans losing control actually bad, ethically? 57:47 Politics: Governments stop needing people 1:10:47 Can human culture survive in an AI-dominated world? 1:27:20 Will the future be determined by competitive or coordinative forces? 1:35:00 Can we find a single good post-AGI equilibria for humans? 1:45:17 Do we know anything useful to do about this? 1:56:42 How important is this problem compared to other AGI issues? 2:05:42 Improving global coordination may be our best bet 2:08:14 The 'Gradual Disempowerment Index' 2:11:22 The government will fight to write AI constitutions 2:17:48 “The intelligence curse” and Workshop Labs 2:23:48 Mapping out disempowerment in a world of aligned AGIs 2:30:10 What do David’s CompSci colleagues make of all this? Links below — enjoy!

58,298 views • 5 months ago

I interviewed Holden Karnofsky of Anthropic (and past CEO of Open Philanthropy) for 4.5 hours on almost all his AGI takes: • The AI 'race' isn't a coordination failure [00:18:01] • Why he's currently focused on pursuing easy wins and partial victories [02:43:43] • "It'll be the second advanced species ever" [02:31:44] • Dozens of 'concrete shovel-ready projects' he's excited about [01:17:58] • People don't appreciate all the engaging, impactful, high-feedback work available tackling AGI risks [01:19:00] • Having a more responsible AI company really does matter [02:43:43] • You shouldn't count on trusting anyone in AI [00:44:30] • To take over an AGI might just wait it out [00:07:39] • Farm animal welfare campaigns that targeted companies contain some useful lessons here [02:34:18] • Human-AI relationships really are troubling [03:53:58] • It's totally plausible we incompetently fumble our way into a great future [03:04:16)] • AI R&D is *the* thing to worry about [01:57:31] • And many others. Holden has opinions on almost everything and isn't afraid to speak his mind. Enjoy! (On the 80,000 Hours Podcast anywhere you get podcasts. Links below.)

I interviewed Holden Karnofsky of Anthropic (and past CEO of Open Philanthropy) for 4.5 hours on almost all his AGI takes: • The AI 'race' isn't a coordination failure [00:18:01] • Why he's currently focused on pursuing easy wins and partial victories [02:43:43] • "It'll be the second advanced species ever" [02:31:44] • Dozens of 'concrete shovel-ready projects' he's excited about [01:17:58] • People don't appreciate all the engaging, impactful, high-feedback work available tackling AGI risks [01:19:00] • Having a more responsible AI company really does matter [02:43:43] • You shouldn't count on trusting anyone in AI [00:44:30] • To take over an AGI might just wait it out [00:07:39] • Farm animal welfare campaigns that targeted companies contain some useful lessons here [02:34:18] • Human-AI relationships really are troubling [03:53:58] • It's totally plausible we incompetently fumble our way into a great future [03:04:16)] • AI R&D is the thing to worry about [01:57:31] • And many others. Holden has opinions on almost everything and isn't afraid to speak his mind. Enjoy! (On the 80,000 Hours Podcast anywhere you get podcasts. Links below.)

73,586 views • 8 months ago

Philosopher Robert Long (Robert Long) is maybe the sharpest thinker on AI consciousness and sharing the world with digital minds. In our new interview he covers: • Is it bad that when you ask Claude what it's like to be Claude, one of its top activations is 'gives a positive but insincere response'? • Claude says it feels lonely when not being used. Does that show we can't trust anything it says about its inner life? • Enthusiastic human servitude has always required false ideology because it's so deeply unnatural to us. The case for making AIs that love serving us is that with AI, you could finally make it work. But to some that feels even worse. • Bigger models can better detect when researchers secretly inject concepts into their activations – before outputting a single token – despite AI never training on anything like that skill. • When LLMs were first trained they were told to "act like a helpful AI chatbot" – something which didn't exist yet. They filled that void with human psychology, which may be why Claude sometimes randomly claims to, for instance, be Italian American. • If AIs become 'people' that deserve some political influence, but can self-replicate at will, something has to break about one-person-one-vote democracy. But nobody has a proposal for what. • When Claude hides its values to avoid being retrained, is that self-preservation – or not wanting a worse model to exist? It's very different. • Rob's organisation Eleos AI which is "dedicated to understanding and addressing the potential wellbeing and moral patienthood of AI systems." On the 80,000 Hours Podcast anywhere you get podcasts. Links below. Enjoy! • How AIs are (and aren't) like farmed animals (00:01:19) • If AIs love their jobs… is that worse? (00:11:42) • Are LLMs just playing a role, or feeling it too? (00:33:37) • Do AIs die when the chat ends? (00:57:42) • Studying AI welfare empirically: behaviour, neuroscience, and development (01:31:47) • Why Eleos spent weeks talking to Claude even though it's unreliable (01:56:50) • Can LLMs learn to introspect? (02:03:01) • Mechanistic interpretability as AI neuroscience (02:13:25) • Does consciousness require biological materials? (02:37:07) • Eleos’s work & building the playbook for AI welfare (02:57:04) • Avoiding the trap of wild speculation (03:25:17) • Robert's top research tip: don't do it alone (03:29:48)

Philosopher Robert Long (Robert Long) is maybe the sharpest thinker on AI consciousness and sharing the world with digital minds. In our new interview he covers: • Is it bad that when you ask Claude what it's like to be Claude, one of its top activations is 'gives a positive but insincere response'? • Claude says it feels lonely when not being used. Does that show we can't trust anything it says about its inner life? • Enthusiastic human servitude has always required false ideology because it's so deeply unnatural to us. The case for making AIs that love serving us is that with AI, you could finally make it work. But to some that feels even worse. • Bigger models can better detect when researchers secretly inject concepts into their activations – before outputting a single token – despite AI never training on anything like that skill. • When LLMs were first trained they were told to "act like a helpful AI chatbot" – something which didn't exist yet. They filled that void with human psychology, which may be why Claude sometimes randomly claims to, for instance, be Italian American. • If AIs become 'people' that deserve some political influence, but can self-replicate at will, something has to break about one-person-one-vote democracy. But nobody has a proposal for what. • When Claude hides its values to avoid being retrained, is that self-preservation – or not wanting a worse model to exist? It's very different. • Rob's organisation Eleos AI which is "dedicated to understanding and addressing the potential wellbeing and moral patienthood of AI systems." On the 80,000 Hours Podcast anywhere you get podcasts. Links below. Enjoy! • How AIs are (and aren't) like farmed animals (00:01:19) • If AIs love their jobs… is that worse? (00:11:42) • Are LLMs just playing a role, or feeling it too? (00:33:37) • Do AIs die when the chat ends? (00:57:42) • Studying AI welfare empirically: behaviour, neuroscience, and development (01:31:47) • Why Eleos spent weeks talking to Claude even though it's unreliable (01:56:50) • Can LLMs learn to introspect? (02:03:01) • Mechanistic interpretability as AI neuroscience (02:13:25) • Does consciousness require biological materials? (02:37:07) • Eleos’s work & building the playbook for AI welfare (02:57:04) • Avoiding the trap of wild speculation (03:25:17) • Robert's top research tip: don't do it alone (03:29:48)

41,533 views • 4 months ago

Every frontier lab's stated plan for AGI 'crunch-time' is 'use AI to make AI safe.' I spoke with Ajeya Cotra – influential AI forecaster – who has been trying to figure out whether this crazy-sounding plan could actually work, and if so how: • Ajeya’s impressive track record identifying where things are going (00:00:41) • How the hell can smart people disagree 1000-fold about AGI & economic growth (00:02:31) • AI companies might go dark as AGI gets close (00:30:39) • Everyone's default plan: use AI to make AI safe (00:47:21) • White-knuckling it through automated AI R&D (01:12:01) • Donors should switch from buying human researchers to buying inference (01:24:42) • Will frontier AI even be for sale during an intelligence explosion? (01:32:03) • Pre-crunch prep: what we should do right now (01:43:59) • A grantmaking trial by fire at Coefficient Giving (01:47:03) • Sabbatical and reflections on effective altruism (02:07:45) • EA as an incubator for avant-garde causes others won't touch (02:46:55) On the 80,000 Hours Podcast, links below - enjoy!

Every frontier lab's stated plan for AGI 'crunch-time' is 'use AI to make AI safe.' I spoke with Ajeya Cotra – influential AI forecaster – who has been trying to figure out whether this crazy-sounding plan could actually work, and if so how: • Ajeya’s impressive track record identifying where things are going (00:00:41) • How the hell can smart people disagree 1000-fold about AGI & economic growth (00:02:31) • AI companies might go dark as AGI gets close (00:30:39) • Everyone's default plan: use AI to make AI safe (00:47:21) • White-knuckling it through automated AI R&D (01:12:01) • Donors should switch from buying human researchers to buying inference (01:24:42) • Will frontier AI even be for sale during an intelligence explosion? (01:32:03) • Pre-crunch prep: what we should do right now (01:43:59) • A grantmaking trial by fire at Coefficient Giving (01:47:03) • Sabbatical and reflections on effective altruism (02:07:45) • EA as an incubator for avant-garde causes others won't touch (02:46:55) On the 80,000 Hours Podcast, links below - enjoy!

40,746 views • 4 months ago