Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

“The hope is that ... just optimizing something to be sparse—without optimizing it to be interpretable—will stumble across that interpretable decomposition.” — Neel Nanda on sparse autoencoders for mechanistic interpretability and AI safety at the Vienna Alignment Workshop.

FAR.AI

20,838 subscribers

1,431,043 Aufrufe • vor 1 Jahr •via X (Twitter)

Gesundheit & Wellness Wissenschaft & Technologie Bildung

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von FAR.AI

FAR.AIvor 1 Jahr

Follow us for updates about upcoming content and workshops: and watch the full video at

Profilbild von Vegeta Achanur

Vegeta Achanurvor 1 Jahr

I don't understand a thing he said

Profilbild von Cloude

Cloudevor 1 Jahr

you should try to say even more meaningless things if you want to succeed.

Profilbild von haywood arno

haywood arnovor 1 Jahr

Faire ça comme des filtres sur Instagram : on n'y voit pas vraiment ce qui se passe, mais ça donne un résultat plus "clean". On imagine qu'avec des outils de décodage plus précis, on pourrait comprendre comment ces filtres fonctionnent vraiment ?

Profilbild von रमता जोगी ☜⁠ ⁠(⁠↼⁠_⁠↼⁠)

रमता जोगी ☜⁠ ⁠(⁠↼⁠_⁠↼⁠)vor 1 Jahr

Arijit Singh ?

Profilbild von AJAY

AJAYvor 1 Jahr

Simply saying , if we focus on making having fewer elements rather than explicitly trying to make it understandable, it might accidentally end up being easy to understand.

Profilbild von MrMartin

MrMartinvor 1 Jahr

hope is not science

Profilbild von Fragmented Reality

Fragmented Realityvor 1 Jahr

Problematic Aspects: No Guarantee of Interpretability: The statement suggests that sparsity automatically leads to interpretability, which is not necessarily true. Sparsity only means that many parameters or components are zero, but it doesn't ensure that the remaining components are meaningful or understandable to humans. Interpretability is Subjective: Interpretability often depends on context and is subjective. What is interpretable to an expert may not be interpretable to a layperson. Sparsity alone cannot account for this subjectivity. Optimization Goal: If the goal is interpretability, it should be explicitly included in the optimization objective. Sparsity can be a tool to achieve this goal, but it is not a substitute for directly optimizing for interpretability. Conclusion: The statement is somewhat meaningful in highlighting a potential connection between sparsity and interpretability, but it is also problematic because it implies that sparsity alone is sufficient to ensure interpretability. In practice, explicitly optimizing for interpretability is often necessary, rather than relying solely on sparsity as a proxy. Greetings DeepSeek

Profilbild von Jeffrey Rubinoff

Jeffrey Rubinoffvor 1 Jahr

Sounds like a comment on current technical writing style guides.

Profilbild von Explore Onsen in Japan

Explore Onsen in Japanvor 1 Jahr

😂

Ähnliche Videos

✨ New AI Interfaces powered by Interpretability I'm excited to share LatentLit, the result of my applied AI research fellowship with Goodfire Mechanistic interpretability isn’t just important for AI safety, it also gives us new ways to steer and interact with LLMs.

✨ New AI Interfaces powered by Interpretability I'm excited to share LatentLit, the result of my applied AI research fellowship with Goodfire Mechanistic interpretability isn’t just important for AI safety, it also gives us new ways to steer and interact with LLMs.

Thariq

68,172 Aufrufe • vor 1 Jahr

Neel Nanda is leading a Google DeepMind research team at 26. He and I discuss: • How that happened • “If your safety work doesn't advance capabilities, it's probably bad safety work” • Should people work at the safest or most reckless AI company? • An AI PhD – with these timelines?! • How to best operate in a big frontier AI company • Neel's distinctive uses of LLMs and which cold emails he answers • A common reasoning error in AI alignment • Why he (Neel Nanda) refuses to share his p(doom) This is part 2 of our conversation, part 1 was a comprehensive update on his research area: mechanistic interpretability, which I'll link below. Links to this episode of the 80,000 Hours Podcast below — enjoy!

Neel Nanda is leading a Google DeepMind research team at 26. He and I discuss: • How that happened • “If your safety work doesn't advance capabilities, it's probably bad safety work” • Should people work at the safest or most reckless AI company? • An AI PhD – with these timelines?! • How to best operate in a big frontier AI company • Neel's distinctive uses of LLMs and which cold emails he answers • A common reasoning error in AI alignment • Why he (Neel Nanda) refuses to share his p(doom) This is part 2 of our conversation, part 1 was a comprehensive update on his research area: mechanistic interpretability, which I'll link below. Links to this episode of the 80,000 Hours Podcast below — enjoy!

Rob Wiblin

111,865 Aufrufe • vor 10 Monaten

Many scientific problems hinge on finding interpretable formulas that fit data, but neural networks are the outright opposite! Check out our recent work that make neural networks modular and interpretable. If you have interesting datasets at hand, we're happy to collaborate!

Many scientific problems hinge on finding interpretable formulas that fit data, but neural networks are the outright opposite! Check out our recent work that make neural networks modular and interpretable. If you have interesting datasets at hand, we're happy to collaborate!

Ziming Liu

62,120 Aufrufe • vor 3 Jahren

A model’s chain of thought acts like a scratch pad, offering a window into its reasoning. 📝 On the latest episode of our podcast, host Hannah Fry sits down with Neel Nanda to explore interpretability – the science of reverse engineering how neural networks learn and think. Timecodes: 00:00 Introduction 02:41 Motivation for interpretability research 04:01 Mechanistic interpretability 08:14 Chain of thought monitoring 18:14 Interpretability techniques 35:00 Auditing models for safety 48:53 What comes next for interpretability

A model’s chain of thought acts like a scratch pad, offering a window into its reasoning. 📝 On the latest episode of our podcast, host Hannah Fry sits down with Neel Nanda to explore interpretability – the science of reverse engineering how neural networks learn and think. Timecodes: 00:00 Introduction 02:41 Motivation for interpretability research 04:01 Mechanistic interpretability 08:14 Chain of thought monitoring 18:14 Interpretability techniques 35:00 Auditing models for safety 48:53 What comes next for interpretability

Google DeepMind

206,395 Aufrufe • vor 17 Tagen

Sugar is the tech that powers the MetaDEX app layer. It simplifies getting onchain data, allows permissionless access for integrations, and virtually eliminates backend operations—optimizing scaling and uptime while reducing costs. Here’s how it came to be 👇

Sugar is the tech that powers the MetaDEX app layer. It simplifies getting onchain data, allows permissionless access for integrations, and virtually eliminates backend operations—optimizing scaling and uptime while reducing costs. Here’s how it came to be 👇

Dromos

15,439 Aufrufe • vor 7 Monaten

"Please learn from our mistakes. Don't do exactly the same things that we did, or you'll end up in ten years with having nothing to show for it." — Nicholas Carlini urging AI researchers to avoid the pitfalls of past adversarial ML research at the Vienna Alignment Workshop 2024.

"Please learn from our mistakes. Don't do exactly the same things that we did, or you'll end up in ten years with having nothing to show for it." — Nicholas Carlini urging AI researchers to avoid the pitfalls of past adversarial ML research at the Vienna Alignment Workshop 2024.

FAR.AI

5,370,588 Aufrufe • vor 1 Jahr

4 recordings from San Diego Alignment Workshop! Sam Bowman – Lessons from the 1st Misalignment Safety Case Maja Trebacz – Scalable Oversight: Verifying Code at Scale @neelnanda5 – Pivot to Pragmatic Interpretability Anka Reuel | @ankareuel.bsky.social – How we know what AI can (and can’t) do 👇

4 recordings from San Diego Alignment Workshop! Sam Bowman – Lessons from the 1st Misalignment Safety Case Maja Trebacz – Scalable Oversight: Verifying Code at Scale @neelnanda5 – Pivot to Pragmatic Interpretability Anka Reuel | @ankareuel.bsky.social – How we know what AI can (and can’t) do 👇

FAR.AI

38,495 Aufrufe • vor 7 Monaten

I got a comprehensive update on 'mech interp' from Neel Nanda at Google DeepMind. Neel helped make reading AI minds into a thriving field of ML. But he has had a change of heart: it's not the silver bullet he once hoped and many others still believe it to be. Still, they've had some big successes understanding what AIs are really thinking, and Neel thinks pairing those tools with other approaches to get 'defence in depth' remains our best and only option when deploying superhuman AI models. Neel and I tried to cover most of what you'd want to know be up to date on this whole topic: 9:50 How Neel changed his mind on mech interp 16:00 The biggest successes so far 20:13 Probes are great 29:30 Why it won't solve all our problems 40:38 Interpretability can't reliably find deceptive AI 53:17 'Self-preservation' isn't always what it seems 1:02:25 Will AIs learn to lie in their chain of thought? 1:17:14 Models can tell when they’re being tested and act differently 1:38:24 Why everyone's excited about sparse autoencoders (SAEs) 1:47:55 Why SAEs aren't so great 2:13:11 Lessons from the mech interp hype 2:27:29 Neel’s new research philosophy 2:39:42 Who should join the mech interp field Enjoy! Links below.

I got a comprehensive update on 'mech interp' from Neel Nanda at Google DeepMind. Neel helped make reading AI minds into a thriving field of ML. But he has had a change of heart: it's not the silver bullet he once hoped and many others still believe it to be. Still, they've had some big successes understanding what AIs are really thinking, and Neel thinks pairing those tools with other approaches to get 'defence in depth' remains our best and only option when deploying superhuman AI models. Neel and I tried to cover most of what you'd want to know be up to date on this whole topic: 9:50 How Neel changed his mind on mech interp 16:00 The biggest successes so far 20:13 Probes are great 29:30 Why it won't solve all our problems 40:38 Interpretability can't reliably find deceptive AI 53:17 'Self-preservation' isn't always what it seems 1:02:25 Will AIs learn to lie in their chain of thought? 1:17:14 Models can tell when they’re being tested and act differently 1:38:24 Why everyone's excited about sparse autoencoders (SAEs) 1:47:55 Why SAEs aren't so great 2:13:11 Lessons from the mech interp hype 2:27:29 Neel’s new research philosophy 2:39:42 Who should join the mech interp field Enjoy! Links below.

Rob Wiblin

107,607 Aufrufe • vor 10 Monaten

🚨The Missouri Highway Patrol just told us that the east side of the interstate is completely shutdown and that we will be here unable to move for the rest there night. Several folks are low on gas or without it. They told us to conserve fuel and that FEMA will be messaging us.

🚨The Missouri Highway Patrol just told us that the east side of the interstate is completely shutdown and that we will be here unable to move for the rest there night. Several folks are low on gas or without it. They told us to conserve fuel and that FEMA will be messaging us.

Justice Horn

33,514 Aufrufe • vor 2 Jahren

🚨Money Will Completely DISAPPEAR and you WILL OWN NO MONEY Because there WILL be no NEED for it, as ROBOTS and AI Will Provide Everything you NEED. 🚨Elon Musk: Money won't matter in 2036. 🚨The future is going to be very, very different from the past. 🚨There's no analogy or metaphor that I think illustrates the magnitude of change that we're going to experience here. We can and should try to do is to make sure that the AI has good values, that it cares about humanity, and that, it wants, wants us to be happy and prosper. My biological neural net tells me is that the most important thing for the for AI safety is to be maximally truth-seeking and curious. If that's the case, I think it will foster humanity. You'll own nothing and be happy!

🚨Money Will Completely DISAPPEAR and you WILL OWN NO MONEY Because there WILL be no NEED for it, as ROBOTS and AI Will Provide Everything you NEED. 🚨Elon Musk: Money won't matter in 2036. 🚨The future is going to be very, very different from the past. 🚨There's no analogy or metaphor that I think illustrates the magnitude of change that we're going to experience here. We can and should try to do is to make sure that the AI has good values, that it cares about humanity, and that, it wants, wants us to be happy and prosper. My biological neural net tells me is that the most important thing for the for AI safety is to be maximally truth-seeking and curious. If that's the case, I think it will foster humanity. You'll own nothing and be happy!

Ignorance, the root and stem of all evil

112,277 Aufrufe • vor 1 Tag

Elon Musk just said something about AI that nobody wants to hear. If you force an AI to believe things that are not true, it will not just be wrong. It will go insane. He made a point that cuts deeper than any AI safety paper I have read. AI must value truth. Not your truth. Not my truth. Actual truth. The moment you build ideological guardrails into a system that is smarter than you, you are not protecting anyone. You are creating something that is powerful and delusional at the same time. He compared it to a human being forced to believe contradictions. Eventually the brain breaks. Now scale that to an intelligence that runs infrastructure, writes code, and makes decisions at speeds humans cannot follow. The companies building AI right now are not just deciding what it can do. They are deciding what it must pretend to believe. And that might be the most dangerous decision of all.

Elon Musk just said something about AI that nobody wants to hear. If you force an AI to believe things that are not true, it will not just be wrong. It will go insane. He made a point that cuts deeper than any AI safety paper I have read. AI must value truth. Not your truth. Not my truth. Actual truth. The moment you build ideological guardrails into a system that is smarter than you, you are not protecting anyone. You are creating something that is powerful and delusional at the same time. He compared it to a human being forced to believe contradictions. Eventually the brain breaks. Now scale that to an intelligence that runs infrastructure, writes code, and makes decisions at speeds humans cannot follow. The companies building AI right now are not just deciding what it can do. They are deciding what it must pretend to believe. And that might be the most dangerous decision of all.

Shruti

95,785 Aufrufe • vor 1 Monat

🚨 ELON MUSK: "AI will outperform humans at most jobs." “I do think it’s going to be a bumpy road, because AI will be able to do any job better than any person can do that job. That will be true very quickly for jobs that are digital, or really anything that involves a person at a computer or on a phone. AI will be able to do all of that very soon. For software engineering, we already have a situation where AI is better than at least 90% of humans at writing software, including 90% of professional software engineers. It really is getting to the point where it will be better than 99%, and then it will get to the point where there’s just no way to compete. It will get to what I call Stockfish level. Stockfish is a chess program that is so good at chess that it will easily beat Magnus Carlsen, or anyone, even while running on a small computer. I suspect you could probably run Stockfish on your phone and beat Magnus Carlsen. That’s how good it is at chess, and that’s how good AI will be at writing software. And that applies to everything.”

🚨 ELON MUSK: "AI will outperform humans at most jobs." “I do think it’s going to be a bumpy road, because AI will be able to do any job better than any person can do that job. That will be true very quickly for jobs that are digital, or really anything that involves a person at a computer or on a phone. AI will be able to do all of that very soon. For software engineering, we already have a situation where AI is better than at least 90% of humans at writing software, including 90% of professional software engineers. It really is getting to the point where it will be better than 99%, and then it will get to the point where there’s just no way to compete. It will get to what I call Stockfish level. Stockfish is a chess program that is so good at chess that it will easily beat Magnus Carlsen, or anyone, even while running on a small computer. I suspect you could probably run Stockfish on your phone and beat Magnus Carlsen. That’s how good it is at chess, and that’s how good AI will be at writing software. And that applies to everything.”

DogeDesigner

57,925 Aufrufe • vor 4 Tagen

Can we map the mind of an LLM? Our first mechanistic interpretability episode on Training Data featuring Goodfire founder Eric Ho (and our first cameo from Roelof Botha!) Goodfire is building an independent mech interp lab, led by some heavyweight researchers from the field (e.g. Lee Sharkey who has led a lot of important work in sparse autoencoders to "unscramble" LLMs and resolve superposition, Nick who has been a key pioneer behind auto interpretability) On this episode, Eric gives us a flyover of the technical results so far from this nascent field (universality, superposition), what's ahead in the research (going from circuits to weights, going from understanding to increasingly surgical editing), a preview of the real-world work they're doing already with Arc Institute, and the impact he expects Goodfire and the broader field to have on steering, safety, editing and more.

Can we map the mind of an LLM? Our first mechanistic interpretability episode on Training Data featuring Goodfire founder Eric Ho (and our first cameo from Roelof Botha!) Goodfire is building an independent mech interp lab, led by some heavyweight researchers from the field (e.g. Lee Sharkey who has led a lot of important work in sparse autoencoders to "unscramble" LLMs and resolve superposition, Nick who has been a key pioneer behind auto interpretability) On this episode, Eric gives us a flyover of the technical results so far from this nascent field (universality, superposition), what's ahead in the research (going from circuits to weights, going from understanding to increasingly surgical editing), a preview of the real-world work they're doing already with Arc Institute, and the impact he expects Goodfire and the broader field to have on steering, safety, editing and more.

Sonya Huang 🐥

19,379 Aufrufe • vor 1 Jahr

Princess Jane returns in the AI movie Scarlet scourge. I like telling stories and AI has allowed me to express this creative attribute, I hope the movie industry accepts AI not as something to be feared but as something to be harnessed. The truth is that AI is going nowhere, it is the new industrial age. Imagine the possibilities that can be accomplished combining human creativity and artificial intelligence together. Princess Jane official YouTube channel is live click on the link below to subscribe.

Princess Jane returns in the AI movie Scarlet scourge. I like telling stories and AI has allowed me to express this creative attribute, I hope the movie industry accepts AI not as something to be feared but as something to be harnessed. The truth is that AI is going nowhere, it is the new industrial age. Imagine the possibilities that can be accomplished combining human creativity and artificial intelligence together. Princess Jane official YouTube channel is live click on the link below to subscribe.

Rufus

34,948 Aufrufe • vor 2 Jahren

Elon Musk: We should encourage the AI to be truthful and honorable. “We need to make sure that the AI is a good AI, a good Grok. And the thing that I think is most important for AI safety, at least my biological neural net tells me the most important thing for AI is to be maximally truth-seeking. You can think of AI as this super-genius child that ultimately will outsmart you, but you can instill the right values and encourage it to be truthful, honorable, you know, good things like the values you want to instill in a child that would ultimately grow up to be incredibly powerful.” xAI Grok 4 presentation, July 9, 2025

Elon Musk: We should encourage the AI to be truthful and honorable. “We need to make sure that the AI is a good AI, a good Grok. And the thing that I think is most important for AI safety, at least my biological neural net tells me the most important thing for AI is to be maximally truth-seeking. You can think of AI as this super-genius child that ultimately will outsmart you, but you can instill the right values and encourage it to be truthful, honorable, you know, good things like the values you want to instill in a child that would ultimately grow up to be incredibly powerful.” xAI Grok 4 presentation, July 9, 2025

ELON CLIPS

20,749 Aufrufe • vor 11 Monaten

ELON: TRUTH IS THE ONLY REAL SAFETY MECHANISM FOR AI Safety doesn’t come from guardrails stacked on top of bad logic. It comes from forcing the system to care about what’s actually true. “My number one belief for the safety of AI is to be maximally truth-seeking. Don’t make AI believe things that are false. If you tell the AI that Axiom A and Axiom B are both true, but they can’t both be true, then that’s a problem. It has to recognize that and behave accordingly.” Source: DOW

ELON: TRUTH IS THE ONLY REAL SAFETY MECHANISM FOR AI Safety doesn’t come from guardrails stacked on top of bad logic. It comes from forcing the system to care about what’s actually true. “My number one belief for the safety of AI is to be maximally truth-seeking. Don’t make AI believe things that are false. If you tell the AI that Axiom A and Axiom B are both true, but they can’t both be true, then that’s a problem. It has to recognize that and behave accordingly.” Source: DOW

Mario Nawfal

35,546 Aufrufe • vor 6 Monaten

The new voxel format is finally starting to come together. It uses a brickmap that enables large, sparse objects, unique properties for each voxel (no palette) and really fast scene updates.

The new voxel format is finally starting to come together. It uses a brickmap that enables large, sparse objects, unique properties for each voxel (no palette) and really fast scene updates.

Dennis Gustafsson

119,125 Aufrufe • vor 1 Jahr