Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

“The hope is that ... just optimizing something to be sparse—without optimizing it to be interpretable—will stumble across that interpretable decomposition.” — Neel Nanda on sparse autoencoders for mechanistic interpretability and AI safety at the Vienna Alignment Workshop.

1,148,210 Aufrufe • vor 1 Jahr •via X (Twitter)

10 Kommentare

Profilbild von FAR.AI
FAR.AIvor 1 Jahr

Follow us for updates about upcoming content and workshops: and watch the full video at

Profilbild von Vegeta Achanur
Vegeta Achanurvor 1 Jahr

I don't understand a thing he said

Profilbild von Cloude
Cloudevor 1 Jahr

you should try to say even more meaningless things if you want to succeed.

Profilbild von haywood arno
haywood arnovor 1 Jahr

Faire ça comme des filtres sur Instagram : on n'y voit pas vraiment ce qui se passe, mais ça donne un résultat plus "clean". On imagine qu'avec des outils de décodage plus précis, on pourrait comprendre comment ces filtres fonctionnent vraiment ?

Profilbild von रमता जोगी ☜⁠ ⁠(⁠↼⁠_⁠↼⁠)
रमता जोगी ☜⁠ ⁠(⁠↼⁠_⁠↼⁠)vor 1 Jahr

Arijit Singh ?

Profilbild von AJAY
AJAYvor 1 Jahr

Simply saying , if we focus on making having fewer elements rather than explicitly trying to make it understandable, it might accidentally end up being easy to understand.

Profilbild von MrMartin
MrMartinvor 1 Jahr

hope is not science

Profilbild von Fragmented Reality
Fragmented Realityvor 1 Jahr

Problematic Aspects: No Guarantee of Interpretability: The statement suggests that sparsity automatically leads to interpretability, which is not necessarily true. Sparsity only means that many parameters or components are zero, but it doesn't ensure that the remaining components are meaningful or understandable to humans. Interpretability is Subjective: Interpretability often depends on context and is subjective. What is interpretable to an expert may not be interpretable to a layperson. Sparsity alone cannot account for this subjectivity. Optimization Goal: If the goal is interpretability, it should be explicitly included in the optimization objective. Sparsity can be a tool to achieve this goal, but it is not a substitute for directly optimizing for interpretability. Conclusion: The statement is somewhat meaningful in highlighting a potential connection between sparsity and interpretability, but it is also problematic because it implies that sparsity alone is sufficient to ensure interpretability. In practice, explicitly optimizing for interpretability is often necessary, rather than relying solely on sparsity as a proxy. Greetings DeepSeek

Profilbild von Jeffrey Rubinoff
Jeffrey Rubinoffvor 1 Jahr

Sounds like a comment on current technical writing style guides.

Profilbild von Explore Onsen in Japan
Explore Onsen in Japanvor 1 Jahr

😂

Ähnliche Videos