正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

“The hope is that ... just optimizing something to be sparse—without optimizing it to be interpretable—will stumble across that interpretable decomposition.” — Neel Nanda on sparse autoencoders for mechanistic interpretability and AI safety at the Vienna Alignment Workshop.

FAR.AI

11,170 subscribers

1,148,210 次观看 • 1 年前 •via X (Twitter)

健康养生科学技术教育

Anya Rossi• Live Now

Private livecam show

10 条评论

FAR.AI 的头像

FAR.AI1 年前

Follow us for updates about upcoming content and workshops: and watch the full video at

Vegeta Achanur 的头像

Vegeta Achanur1 年前

I don't understand a thing he said

Cloude 的头像

Cloude1 年前

you should try to say even more meaningless things if you want to succeed.

haywood arno 的头像

haywood arno1 年前

Faire ça comme des filtres sur Instagram : on n'y voit pas vraiment ce qui se passe, mais ça donne un résultat plus "clean". On imagine qu'avec des outils de décodage plus précis, on pourrait comprendre comment ces filtres fonctionnent vraiment ?

रमता जोगी ☜⁠ ⁠(⁠↼⁠_⁠↼⁠) 的头像

रमता जोगी ☜⁠ ⁠(⁠↼⁠_⁠↼⁠)1 年前

Arijit Singh ?

AJAY 的头像

AJAY1 年前

Simply saying , if we focus on making having fewer elements rather than explicitly trying to make it understandable, it might accidentally end up being easy to understand.

MrMartin 的头像

MrMartin1 年前

hope is not science

Fragmented Reality 的头像

Fragmented Reality1 年前

Problematic Aspects: No Guarantee of Interpretability: The statement suggests that sparsity automatically leads to interpretability, which is not necessarily true. Sparsity only means that many parameters or components are zero, but it doesn't ensure that the remaining components are meaningful or understandable to humans. Interpretability is Subjective: Interpretability often depends on context and is subjective. What is interpretable to an expert may not be interpretable to a layperson. Sparsity alone cannot account for this subjectivity. Optimization Goal: If the goal is interpretability, it should be explicitly included in the optimization objective. Sparsity can be a tool to achieve this goal, but it is not a substitute for directly optimizing for interpretability. Conclusion: The statement is somewhat meaningful in highlighting a potential connection between sparsity and interpretability, but it is also problematic because it implies that sparsity alone is sufficient to ensure interpretability. In practice, explicitly optimizing for interpretability is often necessary, rather than relying solely on sparsity as a proxy. Greetings DeepSeek

Jeffrey Rubinoff 的头像

Jeffrey Rubinoff1 年前

Sounds like a comment on current technical writing style guides.

Explore Onsen in Japan 的头像

Explore Onsen in Japan1 年前

😂

相关视频

✨ New AI Interfaces powered by Interpretability I'm excited to share LatentLit, the result of my applied AI research fellowship with Goodfire Mechanistic interpretability isn’t just important for AI safety, it also gives us new ways to steer and interact with LLMs.

✨ New AI Interfaces powered by Interpretability I'm excited to share LatentLit, the result of my applied AI research fellowship with Goodfire Mechanistic interpretability isn’t just important for AI safety, it also gives us new ways to steer and interact with LLMs.

Thariq

67,925 次观看 • 1 年前

Neel Nanda is leading a Google DeepMind research team at 26. He and I discuss: • How that happened • “If your safety work doesn't advance capabilities, it's probably bad safety work” • Should people work at the safest or most reckless AI company? • An AI PhD – with these timelines?! • How to best operate in a big frontier AI company • Neel's distinctive uses of LLMs and which cold emails he answers • A common reasoning error in AI alignment • Why he (Neel Nanda) refuses to share his p(doom) This is part 2 of our conversation, part 1 was a comprehensive update on his research area: mechanistic interpretability, which I'll link below. Links to this episode of the 80,000 Hours Podcast below — enjoy!

Neel Nanda is leading a Google DeepMind research team at 26. He and I discuss: • How that happened • “If your safety work doesn't advance capabilities, it's probably bad safety work” • Should people work at the safest or most reckless AI company? • An AI PhD – with these timelines?! • How to best operate in a big frontier AI company • Neel's distinctive uses of LLMs and which cold emails he answers • A common reasoning error in AI alignment • Why he (Neel Nanda) refuses to share his p(doom) This is part 2 of our conversation, part 1 was a comprehensive update on his research area: mechanistic interpretability, which I'll link below. Links to this episode of the 80,000 Hours Podcast below — enjoy!

Rob Wiblin

111,865 次观看 • 9 个月前

The first inherently interpretable AI platform is finally here. Welcome to Clarity.

The first inherently interpretable AI platform is finally here. Welcome to Clarity.

Guide Labs

561,422 次观看 • 9 天前

Many scientific problems hinge on finding interpretable formulas that fit data, but neural networks are the outright opposite! Check out our recent work that make neural networks modular and interpretable. If you have interesting datasets at hand, we're happy to collaborate!

Many scientific problems hinge on finding interpretable formulas that fit data, but neural networks are the outright opposite! Check out our recent work that make neural networks modular and interpretable. If you have interesting datasets at hand, we're happy to collaborate!

Ziming Liu

62,120 次观看 • 3 年前

Safety-oriented interpretability researchers should be focused on AI systems, not individual model artifacts. A snippet from the NeurIPS CogInterp workshop panel on Sunday:

Safety-oriented interpretability researchers should be focused on AI systems, not individual model artifacts. A snippet from the NeurIPS CogInterp workshop panel on Sunday:

Christopher Potts

16,337 次观看 • 6 个月前

Sugar is the tech that powers the MetaDEX app layer. It simplifies getting onchain data, allows permissionless access for integrations, and virtually eliminates backend operations—optimizing scaling and uptime while reducing costs. Here’s how it came to be 👇

Sugar is the tech that powers the MetaDEX app layer. It simplifies getting onchain data, allows permissionless access for integrations, and virtually eliminates backend operations—optimizing scaling and uptime while reducing costs. Here’s how it came to be 👇

Dromos

15,439 次观看 • 6 个月前

💗🗣 How does translating the Korean word "jeong" (정) illustrate the challenge of AI alignment? 🤖🎯 Been Kim discusses alignment and interpretability as part of the New Orleans Alignment Workshop hosted by FAR AI.

💗🗣 How does translating the Korean word "jeong" (정) illustrate the challenge of AI alignment? 🤖🎯 Been Kim discusses alignment and interpretability as part of the New Orleans Alignment Workshop hosted by FAR AI.

FAR.AI

2,835,753 次观看 • 1 年前

4 recordings from San Diego Alignment Workshop! Sam Bowman – Lessons from the 1st Misalignment Safety Case Maja Trebacz – Scalable Oversight: Verifying Code at Scale @neelnanda5 – Pivot to Pragmatic Interpretability Anka Reuel | @ankareuel.bsky.social – How we know what AI can (and can’t) do 👇

4 recordings from San Diego Alignment Workshop! Sam Bowman – Lessons from the 1st Misalignment Safety Case Maja Trebacz – Scalable Oversight: Verifying Code at Scale @neelnanda5 – Pivot to Pragmatic Interpretability Anka Reuel | @ankareuel.bsky.social – How we know what AI can (and can’t) do 👇

FAR.AI

38,485 次观看 • 6 个月前

"Please learn from our mistakes. Don't do exactly the same things that we did, or you'll end up in ten years with having nothing to show for it." — Nicholas Carlini urging AI researchers to avoid the pitfalls of past adversarial ML research at the Vienna Alignment Workshop 2024.

"Please learn from our mistakes. Don't do exactly the same things that we did, or you'll end up in ten years with having nothing to show for it." — Nicholas Carlini urging AI researchers to avoid the pitfalls of past adversarial ML research at the Vienna Alignment Workshop 2024.

FAR.AI

5,370,524 次观看 • 1 年前

🚨The Missouri Highway Patrol just told us that the east side of the interstate is completely shutdown and that we will be here unable to move for the rest there night. Several folks are low on gas or without it. They told us to conserve fuel and that FEMA will be messaging us.

🚨The Missouri Highway Patrol just told us that the east side of the interstate is completely shutdown and that we will be here unable to move for the rest there night. Several folks are low on gas or without it. They told us to conserve fuel and that FEMA will be messaging us.

Justice Horn

33,514 次观看 • 2 年前

I got a comprehensive update on 'mech interp' from Neel Nanda at Google DeepMind. Neel helped make reading AI minds into a thriving field of ML. But he has had a change of heart: it's not the silver bullet he once hoped and many others still believe it to be. Still, they've had some big successes understanding what AIs are really thinking, and Neel thinks pairing those tools with other approaches to get 'defence in depth' remains our best and only option when deploying superhuman AI models. Neel and I tried to cover most of what you'd want to know be up to date on this whole topic: 9:50 How Neel changed his mind on mech interp 16:00 The biggest successes so far 20:13 Probes are great 29:30 Why it won't solve all our problems 40:38 Interpretability can't reliably find deceptive AI 53:17 'Self-preservation' isn't always what it seems 1:02:25 Will AIs learn to lie in their chain of thought? 1:17:14 Models can tell when they’re being tested and act differently 1:38:24 Why everyone's excited about sparse autoencoders (SAEs) 1:47:55 Why SAEs aren't so great 2:13:11 Lessons from the mech interp hype 2:27:29 Neel’s new research philosophy 2:39:42 Who should join the mech interp field Enjoy! Links below.

I got a comprehensive update on 'mech interp' from Neel Nanda at Google DeepMind. Neel helped make reading AI minds into a thriving field of ML. But he has had a change of heart: it's not the silver bullet he once hoped and many others still believe it to be. Still, they've had some big successes understanding what AIs are really thinking, and Neel thinks pairing those tools with other approaches to get 'defence in depth' remains our best and only option when deploying superhuman AI models. Neel and I tried to cover most of what you'd want to know be up to date on this whole topic: 9:50 How Neel changed his mind on mech interp 16:00 The biggest successes so far 20:13 Probes are great 29:30 Why it won't solve all our problems 40:38 Interpretability can't reliably find deceptive AI 53:17 'Self-preservation' isn't always what it seems 1:02:25 Will AIs learn to lie in their chain of thought? 1:17:14 Models can tell when they’re being tested and act differently 1:38:24 Why everyone's excited about sparse autoencoders (SAEs) 1:47:55 Why SAEs aren't so great 2:13:11 Lessons from the mech interp hype 2:27:29 Neel’s new research philosophy 2:39:42 Who should join the mech interp field Enjoy! Links below.

Rob Wiblin

107,607 次观看 • 9 个月前

"How does a model call a lawyer and get advice about what's allowable and not allowable?" – Gillian Hadfield emphasizes the need to integrate AI into institutional structures at the Vienna Alignment Workshop.

"How does a model call a lawyer and get advice about what's allowable and not allowable?" – Gillian Hadfield emphasizes the need to integrate AI into institutional structures at the Vienna Alignment Workshop.

FAR.AI

319,887 次观看 • 1 年前

Princess Jane returns in the AI movie Scarlet scourge. I like telling stories and AI has allowed me to express this creative attribute, I hope the movie industry accepts AI not as something to be feared but as something to be harnessed. The truth is that AI is going nowhere, it is the new industrial age. Imagine the possibilities that can be accomplished combining human creativity and artificial intelligence together. Princess Jane official YouTube channel is live click on the link below to subscribe.

Princess Jane returns in the AI movie Scarlet scourge. I like telling stories and AI has allowed me to express this creative attribute, I hope the movie industry accepts AI not as something to be feared but as something to be harnessed. The truth is that AI is going nowhere, it is the new industrial age. Imagine the possibilities that can be accomplished combining human creativity and artificial intelligence together. Princess Jane official YouTube channel is live click on the link below to subscribe.

Rufus

34,912 次观看 • 2 年前

Can we map the mind of an LLM? Our first mechanistic interpretability episode on Training Data featuring Goodfire founder Eric Ho (and our first cameo from Roelof Botha!) Goodfire is building an independent mech interp lab, led by some heavyweight researchers from the field (e.g. Lee Sharkey who has led a lot of important work in sparse autoencoders to "unscramble" LLMs and resolve superposition, Nick who has been a key pioneer behind auto interpretability) On this episode, Eric gives us a flyover of the technical results so far from this nascent field (universality, superposition), what's ahead in the research (going from circuits to weights, going from understanding to increasingly surgical editing), a preview of the real-world work they're doing already with Arc Institute, and the impact he expects Goodfire and the broader field to have on steering, safety, editing and more.

Can we map the mind of an LLM? Our first mechanistic interpretability episode on Training Data featuring Goodfire founder Eric Ho (and our first cameo from Roelof Botha!) Goodfire is building an independent mech interp lab, led by some heavyweight researchers from the field (e.g. Lee Sharkey who has led a lot of important work in sparse autoencoders to "unscramble" LLMs and resolve superposition, Nick who has been a key pioneer behind auto interpretability) On this episode, Eric gives us a flyover of the technical results so far from this nascent field (universality, superposition), what's ahead in the research (going from circuits to weights, going from understanding to increasingly surgical editing), a preview of the real-world work they're doing already with Arc Institute, and the impact he expects Goodfire and the broader field to have on steering, safety, editing and more.

Sonya Huang 🐥

19,371 次观看 • 11 个月前

Elon Musk: We should encourage the AI to be truthful and honorable. “We need to make sure that the AI is a good AI, a good Grok. And the thing that I think is most important for AI safety, at least my biological neural net tells me the most important thing for AI is to be maximally truth-seeking. You can think of AI as this super-genius child that ultimately will outsmart you, but you can instill the right values and encourage it to be truthful, honorable, you know, good things like the values you want to instill in a child that would ultimately grow up to be incredibly powerful.” xAI Grok 4 presentation, July 9, 2025

Elon Musk: We should encourage the AI to be truthful and honorable. “We need to make sure that the AI is a good AI, a good Grok. And the thing that I think is most important for AI safety, at least my biological neural net tells me the most important thing for AI is to be maximally truth-seeking. You can think of AI as this super-genius child that ultimately will outsmart you, but you can instill the right values and encourage it to be truthful, honorable, you know, good things like the values you want to instill in a child that would ultimately grow up to be incredibly powerful.” xAI Grok 4 presentation, July 9, 2025

ELON CLIPS

20,740 次观看 • 10 个月前

ELON: TRUTH IS THE ONLY REAL SAFETY MECHANISM FOR AI Safety doesn’t come from guardrails stacked on top of bad logic. It comes from forcing the system to care about what’s actually true. “My number one belief for the safety of AI is to be maximally truth-seeking. Don’t make AI believe things that are false. If you tell the AI that Axiom A and Axiom B are both true, but they can’t both be true, then that’s a problem. It has to recognize that and behave accordingly.” Source: DOW

ELON: TRUTH IS THE ONLY REAL SAFETY MECHANISM FOR AI Safety doesn’t come from guardrails stacked on top of bad logic. It comes from forcing the system to care about what’s actually true. “My number one belief for the safety of AI is to be maximally truth-seeking. Don’t make AI believe things that are false. If you tell the AI that Axiom A and Axiom B are both true, but they can’t both be true, then that’s a problem. It has to recognize that and behave accordingly.” Source: DOW

Mario Nawfal

35,546 次观看 • 5 个月前

Elon Musk: A curious AI will want to preserve human civilization. “The best thing I can come up with for AI safety is to make it a maximum truth-seeking AI, maximally curious, and have its optimization function be to understand the nature of the universe. If that is its optimization function, I think it will actually want to preserve and extend human civilization, because we're much more interesting than an asteroid with nothing on it. My biological neural net suggests that a maximally curious and truth-seeking AI is the safest AI. We have to be careful with the alignment stuff. We definitely don't want to teach an AI to lie because that is a path to a dystopian future.” Interview with Linda Yaccarino, April 18, 2023

Elon Musk: A curious AI will want to preserve human civilization. “The best thing I can come up with for AI safety is to make it a maximum truth-seeking AI, maximally curious, and have its optimization function be to understand the nature of the universe. If that is its optimization function, I think it will actually want to preserve and extend human civilization, because we're much more interesting than an asteroid with nothing on it. My biological neural net suggests that a maximally curious and truth-seeking AI is the safest AI. We have to be careful with the alignment stuff. We definitely don't want to teach an AI to lie because that is a path to a dystopian future.” Interview with Linda Yaccarino, April 18, 2023

ELON CLIPS

145,300 次观看 • 1 年前

#WATCH | Delhi: EAM Dr S Jaishankar, "Look at our response to 26/11 in Mumbai and look at our response to Uri and Balakot. I think nothing can tell you more clearly, more sharply, because, you know, at the end of the day, the armed forces are the same, the bureaucracy is the same, the intelligence is the same. So if you look at what are the structural inputs and responses of the system, it would be the same...Uri and Balakot were meant to demonstrate that no, life will not go on and that there will be a price and don't think because you've done something and run away to that side, that you are safe there. You will not be safe there. It will not be safe across the line of control nor across international borders. So there was a clear, direct message out there and I think the people to whom that message was intended to be sent, hopefully got it."

#WATCH | Delhi: EAM Dr S Jaishankar, "Look at our response to 26/11 in Mumbai and look at our response to Uri and Balakot. I think nothing can tell you more clearly, more sharply, because, you know, at the end of the day, the armed forces are the same, the bureaucracy is the same, the intelligence is the same. So if you look at what are the structural inputs and responses of the system, it would be the same...Uri and Balakot were meant to demonstrate that no, life will not go on and that there will be a price and don't think because you've done something and run away to that side, that you are safe there. You will not be safe there. It will not be safe across the line of control nor across international borders. So there was a clear, direct message out there and I think the people to whom that message was intended to be sent, hopefully got it."

ANI

86,425 次观看 • 2 年前