Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

“The hope is that ... just optimizing something to be sparse—without optimizing it to be interpretable—will stumble across that interpretable decomposition.” — Neel Nanda on sparse autoencoders for mechanistic interpretability and AI safety at the Vienna Alignment Workshop.

1,148,210 görüntüleme • 1 yıl önce •via X (Twitter)

10 Yorum

FAR.AI profil fotoğrafı
FAR.AI1 yıl önce

Follow us for updates about upcoming content and workshops: and watch the full video at

Vegeta Achanur profil fotoğrafı
Vegeta Achanur1 yıl önce

I don't understand a thing he said

Cloude profil fotoğrafı
Cloude1 yıl önce

you should try to say even more meaningless things if you want to succeed.

haywood arno profil fotoğrafı
haywood arno1 yıl önce

Faire ça comme des filtres sur Instagram : on n'y voit pas vraiment ce qui se passe, mais ça donne un résultat plus "clean". On imagine qu'avec des outils de décodage plus précis, on pourrait comprendre comment ces filtres fonctionnent vraiment ?

रमता जोगी ☜⁠ ⁠(⁠↼⁠_⁠↼⁠) profil fotoğrafı
रमता जोगी ☜⁠ ⁠(⁠↼⁠_⁠↼⁠)1 yıl önce

Arijit Singh ?

AJAY profil fotoğrafı
AJAY1 yıl önce

Simply saying , if we focus on making having fewer elements rather than explicitly trying to make it understandable, it might accidentally end up being easy to understand.

MrMartin profil fotoğrafı
MrMartin1 yıl önce

hope is not science

Fragmented Reality profil fotoğrafı
Fragmented Reality1 yıl önce

Problematic Aspects: No Guarantee of Interpretability: The statement suggests that sparsity automatically leads to interpretability, which is not necessarily true. Sparsity only means that many parameters or components are zero, but it doesn't ensure that the remaining components are meaningful or understandable to humans. Interpretability is Subjective: Interpretability often depends on context and is subjective. What is interpretable to an expert may not be interpretable to a layperson. Sparsity alone cannot account for this subjectivity. Optimization Goal: If the goal is interpretability, it should be explicitly included in the optimization objective. Sparsity can be a tool to achieve this goal, but it is not a substitute for directly optimizing for interpretability. Conclusion: The statement is somewhat meaningful in highlighting a potential connection between sparsity and interpretability, but it is also problematic because it implies that sparsity alone is sufficient to ensure interpretability. In practice, explicitly optimizing for interpretability is often necessary, rather than relying solely on sparsity as a proxy. Greetings DeepSeek

Jeffrey Rubinoff profil fotoğrafı
Jeffrey Rubinoff1 yıl önce

Sounds like a comment on current technical writing style guides.

Explore Onsen in Japan profil fotoğrafı
Explore Onsen in Japan1 yıl önce

😂

Benzer Videolar