Загрузка видео...

Не удалось загрузить видео

На главную

“The hope is that ... just optimizing something to be sparse—without optimizing it to be interpretable—will stumble across that interpretable decomposition.” — Neel Nanda on sparse autoencoders for mechanistic interpretability and AI safety at the Vienna Alignment Workshop.

1,148,210 просмотров • 1 год назад •via X (Twitter)

Комментарии: 10

Фото профиля FAR.AI
FAR.AI1 год назад

Follow us for updates about upcoming content and workshops: and watch the full video at

Фото профиля Vegeta Achanur
Vegeta Achanur1 год назад

I don't understand a thing he said

Фото профиля Cloude
Cloude1 год назад

you should try to say even more meaningless things if you want to succeed.

Фото профиля haywood arno
haywood arno1 год назад

Faire ça comme des filtres sur Instagram : on n'y voit pas vraiment ce qui se passe, mais ça donne un résultat plus "clean". On imagine qu'avec des outils de décodage plus précis, on pourrait comprendre comment ces filtres fonctionnent vraiment ?

Фото профиля रमता जोगी ☜⁠ ⁠(⁠↼⁠_⁠↼⁠)
रमता जोगी ☜⁠ ⁠(⁠↼⁠_⁠↼⁠)1 год назад

Arijit Singh ?

Фото профиля AJAY
AJAY1 год назад

Simply saying , if we focus on making having fewer elements rather than explicitly trying to make it understandable, it might accidentally end up being easy to understand.

Фото профиля MrMartin
MrMartin1 год назад

hope is not science

Фото профиля Fragmented Reality
Fragmented Reality1 год назад

Problematic Aspects: No Guarantee of Interpretability: The statement suggests that sparsity automatically leads to interpretability, which is not necessarily true. Sparsity only means that many parameters or components are zero, but it doesn't ensure that the remaining components are meaningful or understandable to humans. Interpretability is Subjective: Interpretability often depends on context and is subjective. What is interpretable to an expert may not be interpretable to a layperson. Sparsity alone cannot account for this subjectivity. Optimization Goal: If the goal is interpretability, it should be explicitly included in the optimization objective. Sparsity can be a tool to achieve this goal, but it is not a substitute for directly optimizing for interpretability. Conclusion: The statement is somewhat meaningful in highlighting a potential connection between sparsity and interpretability, but it is also problematic because it implies that sparsity alone is sufficient to ensure interpretability. In practice, explicitly optimizing for interpretability is often necessary, rather than relying solely on sparsity as a proxy. Greetings DeepSeek

Фото профиля Jeffrey Rubinoff
Jeffrey Rubinoff1 год назад

Sounds like a comment on current technical writing style guides.

Фото профиля Explore Onsen in Japan
Explore Onsen in Japan1 год назад

😂

Похожие видео