正在加载视频...

视频加载失败

Embedding features learned with sparse autoencoders can make semantic edits to text ✨ (+ a reading/highlighting demo) I've built an interface to explore and visualize GPT-4 labelled features learned from a text embedding model's latent space. Here's a little video, more in 👇

51,614 次观看 • 2 年前 •via X (Twitter)

11 条评论

Linus 的头像
Linus2 年前

Over Fri-Sat, I made this UI that lets me explore features learned by sparse autoencoders trained on my embedding models. I can - Search thru features + read GPT4 autointerp output - See significant features for a custom input text - Turn features on/off and see generated output

Linus 的头像
Linus2 年前

It's v interesting to see what kinds of features the autoencoder found! i.e. how the embedding model may be representing salient semantics about an input. Common themes: - Topics (sports, food, education, family) - Syntax (code, lists, LaTeX) - Grammar (first-person, quotations)

Linus 的头像
Linus2 年前

I can also type in some custom text and see which features are most activated for that input (feature activation normalized to the max value for each feature seen in training corpus), which is an interesting way of seeing what specific attributes of a text the embedding may "use"

Linus 的头像
Linus2 年前

I can also turn specific features "off" (manually set to 0) or "on" (set to max or 2x max value found in my dataset) and see how that influences the embedding by using Contra/vec2text to decode the embedding back into text. This is very fun to play with. Look at the outputs! (Turning off features doesn't work as well currently, and I hypothesize this is because my sparse autoencoders are quite undertrained, as I realized yesterday. I'm fixing this soon. Screenshots are effortfully cherry-picked examples.)

Linus 的头像
Linus2 年前

Lastly, I made a new interface today to let you highlight every sentence in a passage by how strongly common features activate for each sentence.

Linus 的头像
Linus2 年前

There are a bunch of imperfections with the current state of these demos. 1. Most importantly, the autoencoders I'm using were the first ones that "worked" for me so they're quite undertrained, which messes with cleanly turning features on/off. I'm currently training better sparse AEs. 2. Often, features will appear to be about one thing when you look at max activating examples, but be about something else when we see examples lower in its activation range. ("interpretability illusion.") I want to visualize and run LLM auto-interpretation on the full range of activating samples, as prior work in this space have done. 3. I've gotten much better at training decent sparse autoencoders stably by picking good hyperparameters, but I need to empirically see my intuitions generalize. It's still kind of an art.

Linus 的头像
Linus2 年前

Nonetheless, there are two (to me) very exciting results here! 1. A way to edit vectors in latent space to make *precise* semantic edits to text, which wasn't possible before, and 2. A fully unsupervised, scalable way to discover useful features in an embedding space. more soon 🪄

Linus 的头像
Linus2 年前

For posterity: here was my previous pinned tweet — the invertible text embedding models I'm using for these demos. These are denoising text autoencoders based on T5 that allow semantic steering in latent space.

deter3 的头像
deter32 年前

i believe embedding research is the key to unlock lots of potential of applications .

Nicholas Macias 的头像
Nicholas Macias2 年前

semantic edits to texts are cool but the multimodal* opportunities might be incredible - expert vision (customize to observe or infer things otherwise missed) - GAN-like control of text to image generation for noodling with orientations, composition, and more *applicable?

Linus 的头像
Linus2 年前

Very true! Multimodal (specifically doing this on CLIP embeddings) is one of my mid-term goals with this. Mostly time and compute constrained at the moment.

相关视频