正在加载视频...

视频加载失败

“The hope is that ... just optimizing something to be sparse—without optimizing it to be interpretable—will stumble across that interpretable decomposition.” — Neel Nanda on sparse autoencoders for mechanistic interpretability and AI safety at the Vienna Alignment Workshop.

1,148,210 次观看 • 1 年前 •via X (Twitter)

10 条评论

FAR.AI 的头像
FAR.AI1 年前

Follow us for updates about upcoming content and workshops: and watch the full video at

Vegeta Achanur 的头像
Vegeta Achanur1 年前

I don't understand a thing he said

Cloude 的头像
Cloude1 年前

you should try to say even more meaningless things if you want to succeed.

haywood arno 的头像
haywood arno1 年前

Faire ça comme des filtres sur Instagram : on n'y voit pas vraiment ce qui se passe, mais ça donne un résultat plus "clean". On imagine qu'avec des outils de décodage plus précis, on pourrait comprendre comment ces filtres fonctionnent vraiment ?

रमता जोगी ☜⁠ ⁠(⁠↼⁠_⁠↼⁠) 的头像
रमता जोगी ☜⁠ ⁠(⁠↼⁠_⁠↼⁠)1 年前

Arijit Singh ?

AJAY 的头像
AJAY1 年前

Simply saying , if we focus on making having fewer elements rather than explicitly trying to make it understandable, it might accidentally end up being easy to understand.

MrMartin 的头像
MrMartin1 年前

hope is not science

Fragmented Reality 的头像
Fragmented Reality1 年前

Problematic Aspects: No Guarantee of Interpretability: The statement suggests that sparsity automatically leads to interpretability, which is not necessarily true. Sparsity only means that many parameters or components are zero, but it doesn't ensure that the remaining components are meaningful or understandable to humans. Interpretability is Subjective: Interpretability often depends on context and is subjective. What is interpretable to an expert may not be interpretable to a layperson. Sparsity alone cannot account for this subjectivity. Optimization Goal: If the goal is interpretability, it should be explicitly included in the optimization objective. Sparsity can be a tool to achieve this goal, but it is not a substitute for directly optimizing for interpretability. Conclusion: The statement is somewhat meaningful in highlighting a potential connection between sparsity and interpretability, but it is also problematic because it implies that sparsity alone is sufficient to ensure interpretability. In practice, explicitly optimizing for interpretability is often necessary, rather than relying solely on sparsity as a proxy. Greetings DeepSeek

Jeffrey Rubinoff 的头像
Jeffrey Rubinoff1 年前

Sounds like a comment on current technical writing style guides.

Explore Onsen in Japan 的头像
Explore Onsen in Japan1 年前

😂

相关视频