Rob Wiblin's banner
Rob Wiblin's profile picture

Rob Wiblin

@robertwiblin50,212 subscribers

Host of the 80,000 Hours Podcast. Exploring the inviolate sphere of ideas one interview at a time: https://t.co/2YMw00bkIQ

Videos

robertwiblin's profile picture

One OpenAI critic calls it "the theft of at least the millennium and quite possibly all of human history." Are they right? UCLA nonprofit law expert Rose Chan Loui and I discuss OpenAI's unprecedented gambit to ditch its nonprofit for good: 4:08 – How OpenAI carefully chose a complex nonprofit structure that failed 11:56 – OpenAI's new plan to go for-profit 14:49 – The nonprofit board is out-gunned, out-manned, out-numbered, out-planned. Who can help? 17:28 – Who would stand cheated in a bad for-profit switch? 28:03 – Has this happened before? 29:37 – Is it truly in the nonprofit's interest to sell control of OpenAI, or is it a bad deal? 36:10 – The difficulty of valuing OpenAI's future windfall profits 42:31 – Control of OpenAI is independently incredibly valuable and demands compensation 53:09 – What most miss: It's very important the nonprofit get cash and not just equity 1:05:36 – Is it a farce to call this an "arm's-length transaction"? 1:10:59 – How the nonprofit board can best play their hand 1:17:42 – Can Elon Musk mount a court challenge and how that would work The volunteer nonprofit board is hugely outgunned by groups who would profit by tens of billions by screwing it over. And it will take a heroic effort and help from some state attorneys general to get everything it's owed. But that... just might happen. We don't hold back on the opinions. Episode available in audio and video on the 80,000 Hours Podcast here, on any app, or YouTube.

Rob Wiblin

7,043,755 views • 1 year ago

robertwiblin's profile picture

Every AI lab is working to make their AI helpful, harmless and honest. Max Harms (Max Harms) thinks this is a complete wrong turn, and 'aligning' AI to human values is actively dangerous. In his view a safe AGI must have absolutely no opinion about how the world ought to be, be willingly modifiable, and be entirely indifferent to being shut down. The opposite of all commercial models today. The key appeal is that so-called 'corrigibility' could be an attractor state – get close enough and the AI actively helps you make it more corrigible over time. That forgiveness would at least give us a shot. It's a strategy that feels natural within the 'MIRI worldview', recently laid out by his colleagues Eliezer Yudkowsky ⏹️ and Nate Soares ⏹️ in 'If Anyone Builds It Everyone Dies'. But it risks causing a different AI catastrophe, because the resulting AI model would necessarily be willing to assist any human operator with a power grab, or indeed any crime at all. I interviewed Max on the 80,000 Hours Podcast to debate the MIRI worldview, and what we should do to figure out if corrigibility ought to be our one and only focus. Links below – enjoy! 00:01:56 If anyone builds it, will everyone die? The MIRI perspective on AGI risk 00:24:28 Evolution failed to ‘align’ us, just as we'll fail to align AI 00:42:56 We're training AIs to want to stay alive and value power for its own sake 00:52:24 Objections: Is the 'squiggle/paperclip problem' really real? 01:05:02 Can we get empirical evidence re: 'alignment by default'? 01:10:17 Why do few AI researchers share Max's perspective? 01:18:34 We're training AI to pursue goals relentlessly — and superintelligence will too 01:24:51 The case for a radical slowdown 01:27:53 Max's best hope: corrigibility as stepping stone to alignment 01:32:34 Corrigibility is both uniquely valuable, and practical, to train 01:45:06 What training could ever make models corrigible enough? 01:51:38 Corrigibility is also terribly risky due to misuse risk 01:58:57 A single researcher could make a corrigibility benchmark. Nobody has. 02:12:20 Red Heart & why Max writes hard science fiction 02:34:08 Should you homeschool? Depends how weird your kids are.

Rob Wiblin

296,251 views • 4 months ago

robertwiblin's profile picture

Yoshua Bengio thinks he knows how to make provably safe superintelligent agents. Bengio built the foundations of modern AI and is the most cited living scientist. He believes his alternative training setup would: 1. Guarantee honesty 2. Prevent unintended goals 3. Produce capable agents 4. Port over most data and techniques from current LLMs 5. Not be inherently more expensive, and perhaps be more intelligent Bengio claims the honesty and lack of unintended goals can be proven mathematically, at least given particular assumptions. And his new organization, LawZero, is aiming to build a scrappy prototype as soon as possible. The architecture is called 'Scientist AI' and it's based on training a model to explain empirical observations, including what people say, rather than training AIs that mimic human behaviour or seek our approval. (Bengio's frank assessment is that "reinforcement learning is evil" and that allowing AIs to independently train their successors is "the most crazy, dangerous bet that unfortunately we are on track to do.") But skeptics question whether Scientist AI really does solve the fundamental problem of 'eliciting latent knowledge' from AI models. And with the commercial race for superintelligence so intense, it's not clear whether the proposal will be able to compete or have time to bear fruit, even if it's sound in theory. On The 80,000 Hours Podcast, links below – enjoy! • Making AI honest and safe (00:00:00) • Scientist AI in plain English (00:02:27) • How Scientist AI differs from LLMs (00:06:32) • How the training data works (00:14:02) • Can this become an agent? (00:21:02) • Why Yoshua is now more optimistic (00:32:11) • Why companies can’t stop racing (00:36:35) • A working prototype won't take long (00:49:15) • Scientist models might be more capable (00:53:34) • “Reinforcement learning is evil” (01:01:27) • Scientist AI from guardrail to agent (01:08:37) • Can safe AI still be competent? (01:12:38) • How much will this cost? (01:19:29) • Can it generalise beyond maths and science? (01:23:26) • A multi-national push for superintelligence (01:39:19) • Want to work with or fund Yoshua? (01:51:16) • Why smart people ignore AI risk (01:54:45) • Don’t let AI build the next AI (02:01:33) • Why politicians miss the real risks (02:12:28) • Why Yoshua changed his mind about AI risk (02:21:27)

Rob Wiblin

65,070 views • 1 month ago

Convention wisdom is that bioweapons are humanity's greatest weakness – 100x cheaper to make than to defend against. Andrew Snyder-Beattie thinks conventional wisdom is likely wrong. He has a plan cheap enough to do without government. Useful even in worst case scenarios like mirror bacteria. Effective enough to save most people. In one of my all-time fav interviews he lays out a low-tech 4-step approach developed by his research team at Open Philanthropy, to fix a problem most have thought unsolvable. ASB is hiring for many roles in this project from logistics to biotech to manufacturing, and has $100s millions to deploy. Enjoy, links below! 2:10 How bad it could get 9:19 The worst-case scenario: mirror bacteria 18:14 Why low-tech 25:30 Prevention 31:21 The “4 pillars” plan 33:09 ASB is hiring now to make this happen 35:11 Everyone was wrong: biorisks are defence dominant 40:23 Pillar 1: Lungs 55:53 Pillar 2: Biohardening 1:15:19 Pillar 3: Detection 1:28:40 Pillar 4: The wrench hypothesis 1:40:12 The plan's biggest weaknesses 1:44:44 Would chaos make this impossible to pull off? 1:51:50 Would rogue AI make bioweapons? 1:57:57 We can feed the world even if all the plants die 2:07:03 Could a bioweapon make the Earth uninhabitable? 2:09:35 What ASB is hiring for 2:30:27 How to protect yourself and your family (On the 80,000 Hours Podcast, available anywhere you get podcasts.)
2:33:25

Sensitive content

This media may contain sensitive content.

robertwiblin's profile picture

Tackling NIMBYs head on has been an abject failure. You don't smash them — sooner or later they smash you. We need a different approach. And there’s a weirdly obvious solution to fighting NIMBY: 'stuffing their mouths with gold'. Sam Bowman (Sam Bowman) and I discuss his 'housing theory of everything' and how to actually fix our catastrophically busted planning permission system on the 80,000 Hours Podcast. Also: how avant-garde architects are the great villains of our age, how to tackle terrible incumbent institutions, Europe's need for nuclear, Ozempic remaining highly underrated, and how progress studies stays sane. 1:21 We can't seem to build anything 3:15 And it's ruining people's lives 8:28 The housing theory of everything 17:03 The UK is the world's worst 35:47 Why almost no progress fixing it 43:11 NIMBYs are often harmed by development 54:50 Solution #1: Street votes 1:18:47 Will street votes come to the US 1:24:21 Solution #2: Stuffing mouths with gold 1:43:42 The most important policy setting you've never heard of 1:57:06 Solution #3: Opt-outs 2:10:32 How to make it happen 2:17:30 Making old institutions die a gradual death 2:31:11 The evil of modern architecture and the importance of beauty 2:44:14 The north needs nuclear 3:01:43 Ozempic and "the overweight theory of everything" 3:17:08 How progress studies has avoided online madness I'd been hoping to interview Sam Bowman for years and he's the guest I always dreamed he would be. Transcript and links on the 80,000 Hours site, and a much higher res 4K version on YT (can't upload full quality to Twitter sadly).

Rob Wiblin

197,371 views • 1 year ago

robertwiblin's profile picture

Even 'aligned AGI' naturally kills democracy and leads to oligarchy, or worse. That's the take of Anthropic's past alignment evals team lead, Prof David Duvenaud. Once humans aren't needed to do jobs or serve in the military, to governments we look like "meddlesome parasites". With voters unable to contribute but engaged in incessant activism to extract resources from others – resources the country needs to avoid domination by rivals – the attraction of mass disenfranchisement could be overwhelming. In 2025 David co-authored "Gradual Disempowerment", which aimed to lay out this and many other political, economic, and cultural forces that could sideline ordinary people (and maybe all people) in the presence of machines that can cheaply do everything humans will do. Most controversially, David and colleagues believe that competitive forces will compel disempowerment, even if all those AIs are aligned and loyal to their users. I wasn't sure how much I believed this vision of how the future might play out, so I interviewed him for The 80,000 Hours Podcast to probe how well it holds up. He and I covered: 01:30 The case that alignment isn’t enough 14:15 How smart AI advice still leads to terrible outcomes 19:05 How gradual disempowerment occurs 22:10 Economics: Humans become "meddlesome parasites" 29:37 Humans are a "criminally decadent" waste of energy 40:48 Is humans losing control actually bad, ethically? 57:47 Politics: Governments stop needing people 1:10:47 Can human culture survive in an AI-dominated world? 1:27:20 Will the future be determined by competitive or coordinative forces? 1:35:00 Can we find a single good post-AGI equilibria for humans? 1:45:17 Do we know anything useful to do about this? 1:56:42 How important is this problem compared to other AGI issues? 2:05:42 Improving global coordination may be our best bet 2:08:14 The 'Gradual Disempowerment Index' 2:11:22 The government will fight to write AI constitutions 2:17:48 “The intelligence curse” and Workshop Labs 2:23:48 Mapping out disempowerment in a world of aligned AGIs 2:30:10 What do David’s CompSci colleagues make of all this? Links below — enjoy!

Rob Wiblin

58,298 views • 5 months ago

robertwiblin's profile picture

Philosopher Robert Long (Robert Long) is maybe the sharpest thinker on AI consciousness and sharing the world with digital minds. In our new interview he covers: • Is it bad that when you ask Claude what it's like to be Claude, one of its top activations is 'gives a positive but insincere response'? • Claude says it feels lonely when not being used. Does that show we can't trust anything it says about its inner life? • Enthusiastic human servitude has always required false ideology because it's so deeply unnatural to us. The case for making AIs that love serving us is that with AI, you could finally make it work. But to some that feels even worse. • Bigger models can better detect when researchers secretly inject concepts into their activations – before outputting a single token – despite AI never training on anything like that skill. • When LLMs were first trained they were told to "act like a helpful AI chatbot" – something which didn't exist yet. They filled that void with human psychology, which may be why Claude sometimes randomly claims to, for instance, be Italian American. • If AIs become 'people' that deserve some political influence, but can self-replicate at will, something has to break about one-person-one-vote democracy. But nobody has a proposal for what. • When Claude hides its values to avoid being retrained, is that self-preservation – or not wanting a worse model to exist? It's very different. • Rob's organisation Eleos AI which is "dedicated to understanding and addressing the potential wellbeing and moral patienthood of AI systems." On the 80,000 Hours Podcast anywhere you get podcasts. Links below. Enjoy! • How AIs are (and aren't) like farmed animals (00:01:19) • If AIs love their jobs… is that worse? (00:11:42) • Are LLMs just playing a role, or feeling it too? (00:33:37) • Do AIs die when the chat ends? (00:57:42) • Studying AI welfare empirically: behaviour, neuroscience, and development (01:31:47) • Why Eleos spent weeks talking to Claude even though it's unreliable (01:56:50) • Can LLMs learn to introspect? (02:03:01) • Mechanistic interpretability as AI neuroscience (02:13:25) • Does consciousness require biological materials? (02:37:07) • Eleos’s work & building the playbook for AI welfare (02:57:04) • Avoiding the trap of wild speculation (03:25:17) • Robert's top research tip: don't do it alone (03:29:48)

Rob Wiblin

41,533 views • 4 months ago