Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Embedding features learned with sparse autoencoders can make semantic edits to text ✨ (+ a reading/highlighting demo) I've built an interface to explore and visualize GPT-4 labelled features learned from a text embedding model's latent space. Here's a little video, more in 👇

Linus

36,678 subscribers

51,614 views • 2 years ago •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

11 Comments

Linus2 years ago

Over Fri-Sat, I made this UI that lets me explore features learned by sparse autoencoders trained on my embedding models. I can - Search thru features + read GPT4 autointerp output - See significant features for a custom input text - Turn features on/off and see generated output

Linus2 years ago

It's v interesting to see what kinds of features the autoencoder found! i.e. how the embedding model may be representing salient semantics about an input. Common themes: - Topics (sports, food, education, family) - Syntax (code, lists, LaTeX) - Grammar (first-person, quotations)

Linus2 years ago

I can also type in some custom text and see which features are most activated for that input (feature activation normalized to the max value for each feature seen in training corpus), which is an interesting way of seeing what specific attributes of a text the embedding may "use"

Linus2 years ago

I can also turn specific features "off" (manually set to 0) or "on" (set to max or 2x max value found in my dataset) and see how that influences the embedding by using Contra/vec2text to decode the embedding back into text. This is very fun to play with. Look at the outputs! (Turning off features doesn't work as well currently, and I hypothesize this is because my sparse autoencoders are quite undertrained, as I realized yesterday. I'm fixing this soon. Screenshots are effortfully cherry-picked examples.)

Linus2 years ago

Lastly, I made a new interface today to let you highlight every sentence in a passage by how strongly common features activate for each sentence.

Linus2 years ago

There are a bunch of imperfections with the current state of these demos. 1. Most importantly, the autoencoders I'm using were the first ones that "worked" for me so they're quite undertrained, which messes with cleanly turning features on/off. I'm currently training better sparse AEs. 2. Often, features will appear to be about one thing when you look at max activating examples, but be about something else when we see examples lower in its activation range. ("interpretability illusion.") I want to visualize and run LLM auto-interpretation on the full range of activating samples, as prior work in this space have done. 3. I've gotten much better at training decent sparse autoencoders stably by picking good hyperparameters, but I need to empirically see my intuitions generalize. It's still kind of an art.

Linus2 years ago

Nonetheless, there are two (to me) very exciting results here! 1. A way to edit vectors in latent space to make *precise* semantic edits to text, which wasn't possible before, and 2. A fully unsupervised, scalable way to discover useful features in an embedding space. more soon 🪄

Linus2 years ago

For posterity: here was my previous pinned tweet — the invertible text embedding models I'm using for these demos. These are denoising text autoencoders based on T5 that allow semantic steering in latent space.

deter32 years ago

i believe embedding research is the key to unlock lots of potential of applications .

Nicholas Macias2 years ago

semantic edits to texts are cool but the multimodal* opportunities might be incredible - expert vision (customize to observe or infer things otherwise missed) - GAN-like control of text to image generation for noodling with orientations, composition, and more *applicable?

Linus2 years ago

Very true! Multimodal (specifically doing this on CLIP embeddings) is one of my mid-term goals with this. Mostly time and compute constrained at the moment.

Related Videos

Today, every Nomic-Embed-Text embedding becomes multimodal. Introducing Nomic-Embed-Vision: - a high quality, unified embedding space for image, text, and multimodal tasks - outperforms both OpenAI CLIP and text-embedding-3-small - open weights and code to enable indie hacking, research, and experimentation - released in collaboration with MongoDB, LlamaIndex 🦙, , Hugging Face, Amazon Web Services, DigitalOcean, Lambda

Today, every Nomic-Embed-Text embedding becomes multimodal. Introducing Nomic-Embed-Vision: - a high quality, unified embedding space for image, text, and multimodal tasks - outperforms both OpenAI CLIP and text-embedding-3-small - open weights and code to enable indie hacking, research, and experimentation - released in collaboration with MongoDB, LlamaIndex 🦙, , Hugging Face, Amazon Web Services, DigitalOcean, Lambda

CalCo

103,205 views • 2 years ago

Most don't know (1) how easy it is to invert embedding vectors back into sentences, (2) this is a perfect task text diffusion models. Here's a 78M parameter model and live demo that recovers 80% of tokens from Qwen3-Embedding and EmbeddingGemma vectors. Works even on multilingual input.

Most don't know (1) how easy it is to invert embedding vectors back into sentences, (2) this is a perfect task text diffusion models. Here's a 78M parameter model and live demo that recovers 80% of tokens from Qwen3-Embedding and EmbeddingGemma vectors. Works even on multilingual input.

Jina AI

12,977 views • 5 months ago

What if a foundation model could align histology, spatial biology & clinical data to reveal latent biomedical insights? 🚀 Introducing Haiku — a tri-modal foundation model trained on 26.7M+ spatial proteomics patches with matched H&E and clinical text, aligned in one shared embedding space. 📄 🧵👇

What if a foundation model could align histology, spatial biology & clinical data to reveal latent biomedical insights? 🚀 Introducing Haiku — a tri-modal foundation model trained on 26.7M+ spatial proteomics patches with matched H&E and clinical text, aligned in one shared embedding space. 📄 🧵👇

Zhi Huang

13,236 views • 2 months ago

Your app talks back. Lovable now supports text-to-speech and-speech-to-text, so you can seamlessly add AI voice features to your projects. Make a voice assistant, a hands-free tool, or an app you can literally speak to. Sound on:

Your app talks back. Lovable now supports text-to-speech and-speech-to-text, so you can seamlessly add AI voice features to your projects. Make a voice assistant, a hands-free tool, or an app you can literally speak to. Sound on:

Lovable

21,494 views • 1 month ago

I got early access to Project Genie from Google DeepMind ✨ It's unlike any realtime world model I've tried - you generate a scene from text or a photo, and then design the character who gets to explore it. I tested dozens of prompts. Here are the standout features 👇

I got early access to Project Genie from Google DeepMind ✨ It's unlike any realtime world model I've tried - you generate a scene from text or a photo, and then design the character who gets to explore it. I tested dozens of prompts. Here are the standout features 👇

Justine Moore

229,613 views • 5 months ago

Merlin is the first text editor that can actually edit your text 🧙‍♂️ Think Grammarly, but supercharged with LLMs 🔋 It uses GPT-4 to: - understand your writing - ask clarifying questions - propose improvements - use those improvements to magically fix your essay Here's a demo:

Merlin is the first text editor that can actually edit your text 🧙‍♂️ Think Grammarly, but supercharged with LLMs 🔋 It uses GPT-4 to: - understand your writing - ask clarifying questions - propose improvements - use those improvements to magically fix your essay Here's a demo:

Varun Shenoy

412,323 views • 3 years ago

The June Pixel Drop includes new features and updates that make your devices more personalized and helpful. You’ll now be able to: ✨ React in real-time with Screen reactions (no green screen required) ✨ Blend any combination of text, images and video to create your own high-quality video with Gemini Omni ✨ Create your own custom soundtrack in Gemini from an idea or photo ✨ Turn any app into a compact, floating window to make multitasking smoother ✨ Use Voice Translate on more devices to translate phone calls in real time ✨ Chat smarter with Magic Cue in more apps ✨ Use call features like Take a Message in more regions ✨ Automatically notify your loved ones in an emergency with features like Car Crash Detection, Fall Detection and Loss of Pulse Detection ✨ Edit your photos by simply asking in more regions

The June Pixel Drop includes new features and updates that make your devices more personalized and helpful. You’ll now be able to: ✨ React in real-time with Screen reactions (no green screen required) ✨ Blend any combination of text, images and video to create your own high-quality video with Gemini Omni ✨ Create your own custom soundtrack in Gemini from an idea or photo ✨ Turn any app into a compact, floating window to make multitasking smoother ✨ Use Voice Translate on more devices to translate phone calls in real time ✨ Chat smarter with Magic Cue in more apps ✨ Use call features like Take a Message in more regions ✨ Automatically notify your loved ones in an emergency with features like Car Crash Detection, Fall Detection and Loss of Pulse Detection ✨ Edit your photos by simply asking in more regions

Google

95,818 views • 1 month ago

"This is how GPT-4 sees and hears itself" I used GPT-4 to describe itself. Then I used its description to generate an image, a video based on this image and a soundtrack. Tools I used: GPT-4, Midjourney, Kainber AI, Mubert, RunwayML This is the description I used that GPT-4 had of itself as a prompt to text-to-image, image-to-video, and text-to-music. I put the video and sound together in RunwayML.

"This is how GPT-4 sees and hears itself" I used GPT-4 to describe itself. Then I used its description to generate an image, a video based on this image and a soundtrack. Tools I used: GPT-4, Midjourney, Kainber AI, Mubert, RunwayML This is the description I used that GPT-4 had of itself as a prompt to text-to-image, image-to-video, and text-to-music. I put the video and sound together in RunwayML.

Kris Kashtanova

1,233,461 views • 3 years ago

With new generally available features in GitHub Copilot CLI, you can now ➡️ Speak to Copilot using on-device speech-to-text models ➡️ Use built-in Rubber Duck agent to help find blind spots with a second model

With new generally available features in GitHub Copilot CLI, you can now ➡️ Speak to Copilot using on-device speech-to-text models ➡️ Use built-in Rubber Duck agent to help find blind spots with a second model

GitHub

70,879 views • 1 month ago

Releasing a few quality of life Figma Draw updates today → New brush & texture updates (gradients, eyedropper, and noise controls) → Dedicated text on a path tool (or drag on an empty canvas to create text on a circle) → Separate text and vector into independent layers Plus, we pulled a few features over from Design to make it easier to access → New layers panel icons → Auto layout directly in Draw

Releasing a few quality of life Figma Draw updates today → New brush & texture updates (gradients, eyedropper, and noise controls) → Dedicated text on a path tool (or drag on an empty canvas to create text on a circle) → Separate text and vector into independent layers Plus, we pulled a few features over from Design to make it easier to access → New layers panel icons → Auto layout directly in Draw

Figma

62,844 views • 2 months ago

How to Analyze Tables In Large Financial Reports Using GPT-4 (w/Jerry Liu) Most corporate docs contain a mix of text and tables. But if you use simple RAG split and chunk methods, the AI model will likely hallucinate due to embedding split tables. To solve this, Jerry Liu demonstrates how to use LlamaIndex 🦙,OpenAI's new GPT-1106 model, and a novel retrieval strategy to analyse financial statement tables in Tesla's 10-k. Full tutorial video:

How to Analyze Tables In Large Financial Reports Using GPT-4 (w/Jerry Liu) Most corporate docs contain a mix of text and tables. But if you use simple RAG split and chunk methods, the AI model will likely hallucinate due to embedding split tables. To solve this, Jerry Liu demonstrates how to use LlamaIndex 🦙,OpenAI's new GPT-1106 model, and a novel retrieval strategy to analyse financial statement tables in Tesla's 10-k. Full tutorial video:

Mayo Oshin

185,199 views • 2 years ago

🚀 introducing youtube transcripts machine (ytm)! 📝 extract timestamps and transcripts from any youtube video instantly - with ai doing the heavy lifting. ✨ features: > accurate timestamps > fast processing than a human > works with any public youtube video > simple, clean interface > it is open source! link to demo site and github repo in replies 🔗 powered by Stagehand 🤘 and demo 👇

🚀 introducing youtube transcripts machine (ytm)! 📝 extract timestamps and transcripts from any youtube video instantly - with ai doing the heavy lifting. ✨ features: > accurate timestamps > fast processing than a human > works with any public youtube video > simple, clean interface > it is open source! link to demo site and github repo in replies 🔗 powered by Stagehand 🤘 and demo 👇

Zaid

97,620 views • 1 year ago

The winner of Lovable's weekend competition: Kolbo ai - A powerful tool to help make all sorts of social media content with AI Features of the winning app: - Supabase for backend - Project-based organization system - OpenAI for text & image generation - Anthropic for text generation - Google Gemini for text generation - Midjourney for image generation - for image generation - Text-to-speech - Speech-to-text - Stripe for payments - mu for music generation Built by Zohar Vanunu 👇

The winner of Lovable's weekend competition: Kolbo ai - A powerful tool to help make all sorts of social media content with AI Features of the winning app: - Supabase for backend - Project-based organization system - OpenAI for text & image generation - Anthropic for text generation - Google Gemini for text generation - Midjourney for image generation - for image generation - Text-to-speech - Speech-to-text - Stripe for payments - mu for music generation Built by Zohar Vanunu 👇

Lovable

35,841 views • 1 year ago

Meet SAM 3, a unified model that enables detection, segmentation, and tracking of objects across images and videos. SAM 3 introduces some of our most highly requested features like text and exemplar prompts to segment all objects of a target category. Learnings from SAM 3 will help power new features in Instagram Edits and Vibes, bringing advanced segmentation capabilities directly to creators. 🔗 Learn more:

Meet SAM 3, a unified model that enables detection, segmentation, and tracking of objects across images and videos. SAM 3 introduces some of our most highly requested features like text and exemplar prompts to segment all objects of a target category. Learnings from SAM 3 will help power new features in Instagram Edits and Vibes, bringing advanced segmentation capabilities directly to creators. 🔗 Learn more:

AI at Meta

190,934 views • 8 months ago

I just tried Sarvam, and it’s truly impressive. You can convert any text into audio in Indian languages and save it as an MP3 on your system. You can also convert audio into text and save it as a TXT file or even a video file. It offers many more features that I’m excited to explore. Congratulations to Team Sarvam for building India’s first truly Make in India AI platform. A proud moment for India 🇮🇳 Special thanks to our respected PM Narendra Modi ji and IT Minister Ashwini Vaishnaw ji for their leadership in driving India’s AI revolution.

I just tried Sarvam, and it’s truly impressive. You can convert any text into audio in Indian languages and save it as an MP3 on your system. You can also convert audio into text and save it as a TXT file or even a video file. It offers many more features that I’m excited to explore. Congratulations to Team Sarvam for building India’s first truly Make in India AI platform. A proud moment for India 🇮🇳 Special thanks to our respected PM Narendra Modi ji and IT Minister Ashwini Vaishnaw ji for their leadership in driving India’s AI revolution.

STAR Boy TARUN

70,548 views • 5 months ago

Here's a quick video to show a bit more of what's been created. I think this only covers 25%-30% of the features I've implemented. Everything you can see is clickable and working.

Here's a quick video to show a bit more of what's been created. I think this only covers 25%-30% of the features I've implemented. Everything you can see is clickable and working.

Gilgamesh

40,629 views • 3 months ago

Here's a demo of the gpt-4-vision API that I built in Bubble in 30 min. It takes a URL, converts it to an image, and sends it through the Vision API to respond with custom landing page optimization suggestions.

Here's a demo of the gpt-4-vision API that I built in Bubble in 30 min. It takes a URL, converts it to an image, and sends it through the Vision API to respond with custom landing page optimization suggestions.

Seth Kramer

953,992 views • 2 years ago

5/ Create: This one is fun. Turn a PowerPoint into an explainer video, or generate an image from a prompt in Copilot with just a few clicks. We’ve also added new features to make Copilot even more personalized to you, plus a redesigned app built for human-agent collaboration.

5/ Create: This one is fun. Turn a PowerPoint into an explainer video, or generate an image from a prompt in Copilot with just a few clicks. We’ve also added new features to make Copilot even more personalized to you, plus a redesigned app built for human-agent collaboration.

Satya Nadella

241,272 views • 1 year ago

I've built a bunch of tools this summer to move my whole workflow to using things that are "pure text" so I can use AI on top. Here's the first: a modern paper reader. PDFs suck. Ugly, terribly interface, big files, fixed. I want papers that are adaptive, interactive, pretty. 1/4

I've built a bunch of tools this summer to move my whole workflow to using things that are "pure text" so I can use AI on top. Here's the first: a modern paper reader. PDFs suck. Ugly, terribly interface, big files, fixed. I want papers that are adaptive, interactive, pretty. 1/4

Kevin A. Bryan

23,310 views • 11 months ago

So, you think GPT-4 can't make a complex game... think again! Here's how I used GPT-4, Replit ⠕, MidJourney, and Claude to assemble an AI team and create a 3D space runner from scratch with ZERO knowledge of Javascript or game programming. Follow along for a saga! 🧵

So, you think GPT-4 can't make a complex game... think again! Here's how I used GPT-4, Replit ⠕, MidJourney, and Claude to assemble an AI team and create a 3D space runner from scratch with ZERO knowledge of Javascript or game programming. Follow along for a saga! 🧵

Ammaar Reshi

1,431,546 views • 3 years ago