Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Google just proved that bigger isn't always better. Their 308M parameter model is outperforming models 2x its size. Google just released 𝗘𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝗚𝗲𝗺𝗺𝗮, and it's proving that lightweight embedding models can punch way above their weight class. At just 308M parameters (578MB), it's the new state-of-the-art for models under 500M... parameters across MTEB multilingual, English, and code benchmarks. But the really impressive part is that it ranks 8th overall on MTEB(Multilingual, v2) - that's 𝟭𝟳 𝗽𝗹𝗮𝗰𝗲𝘀 above the second-best sub-500M model, and it's delivering performance 𝗰𝗼𝗺𝗽𝗮𝗿𝗮𝗯𝗹𝗲 𝘁𝗼 𝗺𝗼𝗱𝗲𝗹𝘀 𝗻𝗲𝗮𝗿𝗹𝘆 𝗱𝗼𝘂𝗯𝗹𝗲 𝗶𝘁𝘀 𝘀𝗶𝘇𝗲. There are three key parts of their training recipe that sets it apart: 𝟭. 𝗘𝗻𝗰𝗼𝗱𝗲𝗿-𝗗𝗲𝗰𝗼𝗱𝗲𝗿 𝗜𝗻𝗶𝘁𝗶𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 Instead of starting from a decoder-only Gemma 3 model, they first adapted it to encoder-decoder, then used just the encoder. By basing EmbeddingGemma off an LLM that already has world and language understanding, it gives it a stronger starting point. 𝟮. 𝗧𝗵𝗿𝗲𝗲-𝗟𝗼𝘀𝘀 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 They combine three different loss functions, instead of just having one: • Contrastive loss (NCE) with in-batch negatives and hardness weighting • Spread-out regularization to ensure embeddings utilize the full space (for quantization and ANN retrieval) • Embedding matching distillation from Gemini Embedding - not just learning from relevance scores, but directly aligning the embedding space with the teacher model 𝟯. 𝗠𝗼𝗱𝗲𝗹 𝗦𝗼𝘂𝗽𝗶𝗻𝗴 Rather than just averaging checkpoints from the same training run, they use optimization techniques to find multiple specialized training mixtures. Each mixture creates an "expert" model in different domains, and averaging all their parameters creates a final model that's actually better than individual models. Extras: • Matryoshka embeddings supporting 768, 512, 256, and 128 dimensions • Quantization-aware training - maintains quality even at int4 precision • 100+ languages from Gemma 3 pretraining • Exceptional performance on low-resource languages (check their XTREME-UP results) Is it the absolute best embedding model? No - Gemini Embedding still leads overall. But that's not really the point. EmbeddingGemma proves you can achieve state-of-the-art performance in a small package that's actually deployable on-device, in low-latency applications, and in resource-constrained environments. This makes good embeddings accessible for use cases that I'm seeing more and more: offline applications, privacy-sensitive deployments, and high-throughput scenarios where inference cost actually matters. Full paper: Shoutout to the EmbeddingGemma team at Google DeepMind for this awesome open source work 💙 and to Daniel Williams for helping me with this video! 🫶show more

Victoria Slocum

8,534 subscribers

21,211 Aufrufe • vor 7 Monaten •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

EmbeddingGemma is our new best-in-class open embedding model designed for on-device AI. 📱 At just 308M parameters, it delivers state-of-the-art performance while being small and efficient enough to run anywhere - even without an internet connection.

EmbeddingGemma is our new best-in-class open embedding model designed for on-device AI. 📱 At just 308M parameters, it delivers state-of-the-art performance while being small and efficient enough to run anywhere - even without an internet connection.

Google DeepMind

584,545 Aufrufe • vor 9 Monaten

NEW: Google releases EmbeddingGemma, a state-of-the-art multilingual embedding model perfect for on-device use cases! At only 308M params, the model can run 100% locally in your browser! 🤯 Explore your documents in an interactive 3D universe with our demo: "The Semantic Galaxy"

NEW: Google releases EmbeddingGemma, a state-of-the-art multilingual embedding model perfect for on-device use cases! At only 308M params, the model can run 100% locally in your browser! 🤯 Explore your documents in an interactive 3D universe with our demo: "The Semantic Galaxy"

Xenova

23,338 Aufrufe • vor 9 Monaten

Most don't know (1) how easy it is to invert embedding vectors back into sentences, (2) this is a perfect task text diffusion models. Here's a 78M parameter model and live demo that recovers 80% of tokens from Qwen3-Embedding and EmbeddingGemma vectors. Works even on multilingual input.

Most don't know (1) how easy it is to invert embedding vectors back into sentences, (2) this is a perfect task text diffusion models. Here's a 78M parameter model and live demo that recovers 80% of tokens from Qwen3-Embedding and EmbeddingGemma vectors. Works even on multilingual input.

Jina AI

12,813 Aufrufe • vor 4 Monaten

Small Language Models (SML) are the future of AI. "Small" (SML) instead of "Large" (LLM). These small models are highly specialized models with superhuman abilities on specific tasks. Here are two techniques to build these models: • Spectrum • Model Merging I give you a short introduction in the attached video, but here is a quick summary: Spectrum helps us identify the most relevant layers to solve one specific task. We can ignore everything else and focus on fine-tuning these layers. Using Spectrum, we can fine-tune models in a heartbeat. Model Merging combines multiple models into a unique, much better model than any of the individual input models. You can also combine models specialized in different tasks and get a model with multiple abilities. This is the state of the art of productizing models. It's what Arcee.ai's platform does behind the scenes. Arcee collaborated with me on this post and is sponsoring it. There are three main steps to produce a model for your particular use case: 1. You create a dataset by uploading your data. 2. You train a model. At this step, Arcee uses Spectrum and Model Merging to produce a highly specialized model for your task. 3. You can deploy that model to any environment you want. Three important notes: • Training process is 2x faster and 2x cheaper than regular fine-tuning. • Resultant models are smaller and have higher accuracy. • They create these specialized models from open-source models. Check this site so you can fully appreciate how this works: If you want to fine-tune an open-source model, consider Arcee's platform. This is the state of the art.

Small Language Models (SML) are the future of AI. "Small" (SML) instead of "Large" (LLM). These small models are highly specialized models with superhuman abilities on specific tasks. Here are two techniques to build these models: • Spectrum • Model Merging I give you a short introduction in the attached video, but here is a quick summary: Spectrum helps us identify the most relevant layers to solve one specific task. We can ignore everything else and focus on fine-tuning these layers. Using Spectrum, we can fine-tune models in a heartbeat. Model Merging combines multiple models into a unique, much better model than any of the individual input models. You can also combine models specialized in different tasks and get a model with multiple abilities. This is the state of the art of productizing models. It's what Arcee.ai's platform does behind the scenes. Arcee collaborated with me on this post and is sponsoring it. There are three main steps to produce a model for your particular use case: 1. You create a dataset by uploading your data. 2. You train a model. At this step, Arcee uses Spectrum and Model Merging to produce a highly specialized model for your task. 3. You can deploy that model to any environment you want. Three important notes: • Training process is 2x faster and 2x cheaper than regular fine-tuning. • Resultant models are smaller and have higher accuracy. • They create these specialized models from open-source models. Check this site so you can fully appreciate how this works: If you want to fine-tune an open-source model, consider Arcee's platform. This is the state of the art.

Santiago

164,162 Aufrufe • vor 1 Jahr

Let's go! 😍 Qwen just released Qwen3-Embedding, a new series of embedding models: 🏆 SOTA performance on MMTEB, MTEB, and MTEB-Code 📏 Three different sizes (0.6B / 4B / 8B) 🌍 Multilingual (119 languages) 💻 Can run in-browser w/ Transformers.js (+ WebGPU acceleration)

Let's go! 😍 Qwen just released Qwen3-Embedding, a new series of embedding models: 🏆 SOTA performance on MMTEB, MTEB, and MTEB-Code 📏 Three different sizes (0.6B / 4B / 8B) 🌍 Multilingual (119 languages) 💻 Can run in-browser w/ Transformers.js (+ WebGPU acceleration)

Xenova

47,143 Aufrufe • vor 1 Jahr

Introducing EmbeddingGemma: our new open, state-of-the-art embedding model designed for on-device AI 📱

Introducing EmbeddingGemma: our new open, state-of-the-art embedding model designed for on-device AI 📱

Google AI Developers

153,811 Aufrufe • vor 9 Monaten

WHY IS NO ONE TALKING ABOUT THIS?? Gemma 3n model was one of the best surprises for me. The fact that you can run it on edge devices even with just 2GB of RAM is impressive. A few weeks back, I was on holiday and used the Gemini Live feature a lot. But I kept running into issues whenever I was in a place where the network wasn’t reliable. Gemma 3n : > Multimodal: Supports text and image inputs; video/audio in development. > Context Window: Up to 128K tokens (32K for 1B model). > Multilingual: Trained on 140+ languages. > Privacy: Offline, on-device processing for data security. > Model Sizes: E2B (5B parameters, 2GB RAM) and E4B (10–12B parameters, 3GB RAM). Video credit: Google YT

WHY IS NO ONE TALKING ABOUT THIS?? Gemma 3n model was one of the best surprises for me. The fact that you can run it on edge devices even with just 2GB of RAM is impressive. A few weeks back, I was on holiday and used the Gemini Live feature a lot. But I kept running into issues whenever I was in a place where the network wasn’t reliable. Gemma 3n : > Multimodal: Supports text and image inputs; video/audio in development. > Context Window: Up to 128K tokens (32K for 1B model). > Multilingual: Trained on 140+ languages. > Privacy: Offline, on-device processing for data security. > Model Sizes: E2B (5B parameters, 2GB RAM) and E4B (10–12B parameters, 3GB RAM). Video credit: Google YT

AshutoshShrivastava

210,490 Aufrufe • vor 1 Jahr

This is a pretty wild model! You can use it to turn an image into a 3D object with texture. The quality is out of this world! I'm not even a designer, and I've been using this nonstop for the last 2 hours. The model is Hunyuan 3D 2.1. It's open source. You'll find model weights, training/inference code, data pipelines, and architecture on their repository. You can even fine-tune it if you want! GitHub Repository: By the way, the model runs on consumer-grade GPUs. You don't need a datacenter for this! I've been using the model from the HuggingFace demo page: To use it, go to the link and upload an image. That's it! Check out the video I recorded for a couple of examples.

This is a pretty wild model! You can use it to turn an image into a 3D object with texture. The quality is out of this world! I'm not even a designer, and I've been using this nonstop for the last 2 hours. The model is Hunyuan 3D 2.1. It's open source. You'll find model weights, training/inference code, data pipelines, and architecture on their repository. You can even fine-tune it if you want! GitHub Repository: By the way, the model runs on consumer-grade GPUs. You don't need a datacenter for this! I've been using the model from the HuggingFace demo page: To use it, go to the link and upload an image. That's it! Check out the video I recorded for a couple of examples.

Santiago

44,783 Aufrufe • vor 1 Jahr

🚀 The team at Google DeepMind just released Gemini Embedding 2, a frontier embeddings model with 3072 dimensions and state-of-the-art semantic quality. 👩‍💻 We built a demo showing how to integrate it across the LlamaIndex ecosystem, from LlamaParse to LlamaAgents: 𝗮𝘂𝗱𝗶𝗼-𝗸𝗯, a knowledge base for your audio notes. With audio-kb, you can: 🔹 Upload an MP3 or record directly from your terminal 🔹 LlamaParse extracts the transcript from the audio 🔹 Gemini Embedding 2 generates embeddings 🔹 Metadata + vectors are stored in SurrealDB and indexed with HNSW 🔍 Once ingested, you can search all your audio notes directly from the terminal. 🎙️ Perfect for turning voice memos, meetings, or lectures into a searchable knowledge base. 📖 Full blog: 💻 GitHub: ⚡ Try LlamaParse:

🚀 The team at Google DeepMind just released Gemini Embedding 2, a frontier embeddings model with 3072 dimensions and state-of-the-art semantic quality. 👩‍💻 We built a demo showing how to integrate it across the LlamaIndex ecosystem, from LlamaParse to LlamaAgents: 𝗮𝘂𝗱𝗶𝗼-𝗸𝗯, a knowledge base for your audio notes. With audio-kb, you can: 🔹 Upload an MP3 or record directly from your terminal 🔹 LlamaParse extracts the transcript from the audio 🔹 Gemini Embedding 2 generates embeddings 🔹 Metadata + vectors are stored in SurrealDB and indexed with HNSW 🔍 Once ingested, you can search all your audio notes directly from the terminal. 🎙️ Perfect for turning voice memos, meetings, or lectures into a searchable knowledge base. 📖 Full blog: 💻 GitHub: ⚡ Try LlamaParse:

LlamaIndex 🦙

34,765 Aufrufe • vor 3 Monaten

The future of AI is open-source. And ollama is the easiest way to build AI applications with open-source LLMs. Here's how to build a free, private RAG app using open-source tools. We'll use: - Ollama for LLMs and embedding models - PostgreSQL for data storage and retrieval - pgai Vectorizer for embedding creation and sync (I use Nomic for embeddings and tinnyllama as my LLM but you can substitute them for any models on Ollama)

The future of AI is open-source. And ollama is the easiest way to build AI applications with open-source LLMs. Here's how to build a free, private RAG app using open-source tools. We'll use: - Ollama for LLMs and embedding models - PostgreSQL for data storage and retrieval - pgai Vectorizer for embedding creation and sync (I use Nomic for embeddings and tinnyllama as my LLM but you can substitute them for any models on Ollama)

Avthar

34,261 Aufrufe • vor 1 Jahr

NVIDIA just released a new open source transcription model, Nemotron Speech ASR, designed from the ground up for low-latency use cases like voice agents. Here's a voice agent built with this new model. 24ms transcription finalization and total voice-to-voice inference time under 500ms. This agent actually uses *three* NVIDIA open source models: - Nemotron Speech ASR - Nemotron 3 Nano 30GB in a 4-bit quant (released in December) - A preview checkpoint of the upcoming Magpie text-to-speech model These models are all truly open source: weights, training data, training code, and inference code. This is a big deal! Jensen said in the CES keynote yesterday that he expects open source models to catch up to proprietary models this year in a number of categories. NVIDIA is putting their weight behind making this happen. (As Alan Kay said, the best way to predict the future is to invent it.) The code for this agent is open source too, of course. You can deploy it to production with Modal and Pipecat AI cloud, or run locally on an NVIDIA DGX Spark or RTX 5090.

NVIDIA just released a new open source transcription model, Nemotron Speech ASR, designed from the ground up for low-latency use cases like voice agents. Here's a voice agent built with this new model. 24ms transcription finalization and total voice-to-voice inference time under 500ms. This agent actually uses three NVIDIA open source models: - Nemotron Speech ASR - Nemotron 3 Nano 30GB in a 4-bit quant (released in December) - A preview checkpoint of the upcoming Magpie text-to-speech model These models are all truly open source: weights, training data, training code, and inference code. This is a big deal! Jensen said in the CES keynote yesterday that he expects open source models to catch up to proprietary models this year in a number of categories. NVIDIA is putting their weight behind making this happen. (As Alan Kay said, the best way to predict the future is to invent it.) The code for this agent is open source too, of course. You can deploy it to production with Modal and Pipecat AI cloud, or run locally on an NVIDIA DGX Spark or RTX 5090.

kwindla

274,254 Aufrufe • vor 5 Monaten

Prime Intellect's will brown says continual learning could be solved in the first half of 2026: "Continual learning is going to fall pretty quickly, I think. It's more of an engineering problem. No one's actually trying." "OpenAI and Anthropic don't want to continuously train their models for each user. It's expensive and annoying and hard to serve at scale. But from a research perspective, we do continue learning where they just keep training the model more and it knows more stuff because they put more of the Internet in it." "I think there's a lot of experimentation around exactly the recipe that's going to be the most reliable. But we kind of have a grab bag of six or seven tricks that kind of work, or they work in different ways, and you can mix and match them. And it's just going to be like whatever's the best combination of these tricks." "People are going to experiment with it and find the versions that work the best. And there doesn't seem to be any big wall inside that prevents that from being practical."

Prime Intellect's will brown says continual learning could be solved in the first half of 2026: "Continual learning is going to fall pretty quickly, I think. It's more of an engineering problem. No one's actually trying." "OpenAI and Anthropic don't want to continuously train their models for each user. It's expensive and annoying and hard to serve at scale. But from a research perspective, we do continue learning where they just keep training the model more and it knows more stuff because they put more of the Internet in it." "I think there's a lot of experimentation around exactly the recipe that's going to be the most reliable. But we kind of have a grab bag of six or seven tricks that kind of work, or they work in different ways, and you can mix and match them. And it's just going to be like whatever's the best combination of these tricks." "People are going to experiment with it and find the versions that work the best. And there doesn't seem to be any big wall inside that prevents that from being practical."

TBPN

72,077 Aufrufe • vor 3 Monaten

Gemini 3 Pro is the best model in the world for multimodal understanding. One of its most exciting capabilities is document understanding and reasoning. This means you can convert information in any format and into the medium that works best for you. Gemini 3 also has leading multilingual capabilities, enabling it to process, reason and even capture cultural relevance across a variety of languages. For example, here Gemini 3 is translating handwritten recipes in Korean and English to build a digital family cookbook in different languages.

Gemini 3 Pro is the best model in the world for multimodal understanding. One of its most exciting capabilities is document understanding and reasoning. This means you can convert information in any format and into the medium that works best for you. Gemini 3 also has leading multilingual capabilities, enabling it to process, reason and even capture cultural relevance across a variety of languages. For example, here Gemini 3 is translating handwritten recipes in Korean and English to build a digital family cookbook in different languages.

Google AI

36,696 Aufrufe • vor 7 Monaten

There is a beautiful story that just happened in AI so let me share it for a lighter tone weekend post among all the doom stories in our AI field this week. It’s a story of people on three continents building and sharing in the open a new small efficient and state-of-the-art AI model. It started a couple of months ago when a new team in the AI scene released their first model from their headquarters in Paris (France): Mistral 7B. Impressive model, small and very strong performances in the benchmarks, better than all previous models of this size. And open source! So you could build on top of it. Lewis in Bern (Switzerland) and Ed (in Lyon, in the South of France) both from the H4 team, a team of researchers in model fine-tuning and alignment were talking about it over a coffee, in one of these gatherings that often happen at Hugging Face to break the distance between people (literal distance as HF is a remote company). What about fine-tuning it using this new DPO method that a research team from Stanford in California just posted on Arxiv, says one? Hey, that’s a great idea, replies the other. We've just build a great code base (with Nathan, Nazneen, Costa, Younes and all the H4 team and TRL community) let's use it! The next day they start diving in the datasets openly shared on the HF hub and stumble upon two interesting large and good quality fine-tuning datasets recently open-sourced by OpenBMB, a Chinese team from Tsinghua: UltraFeedback and UltraChat. A few rounds of training experiments confirm the intuition, the resulting model is super strong, by far the strongest they have ever seen in their benchmarks from Berkeley and Stanford (LMSYS and Alpaca). Join Clementine, the big boss of the open evaluation leaderboard. Her deep dive into the model capabilities confirms the results: impressive performance. But the H4 team also hosts a famous faculty member, Pr. Sasha Rush, Associate Professor at Cornell University in his daytime, hacker at HF in his nighttime. Joining the conversation, he proposes to quickly draft a research paper to organize and share all the details with the community. A few days later, the model, called Zephyr (a wind like Mistral), paper, and all details are shared with the world. Quickly other companies, everywhere in the world starts to use it. LlamaIndex, a famous data framework and community, shares how the model blew their expectations on real-life use-case benchmarks, while researchers and practitioners discuss the paper and work on the Hugging Face hub. All this happened in just a few weeks catalyzed by open access to knowledge, models, research, and datasets released all over the world (Europe, California, China) and by the idea that people can build upon one another work in AI to bring real-world value with efficient and open models. Stories like this are numerous everywhere around us and make me really proud of the AI community and see how we can build amazingly useful things together. [the video is just me reading this Friday post hahah]

There is a beautiful story that just happened in AI so let me share it for a lighter tone weekend post among all the doom stories in our AI field this week. It’s a story of people on three continents building and sharing in the open a new small efficient and state-of-the-art AI model. It started a couple of months ago when a new team in the AI scene released their first model from their headquarters in Paris (France): Mistral 7B. Impressive model, small and very strong performances in the benchmarks, better than all previous models of this size. And open source! So you could build on top of it. Lewis in Bern (Switzerland) and Ed (in Lyon, in the South of France) both from the H4 team, a team of researchers in model fine-tuning and alignment were talking about it over a coffee, in one of these gatherings that often happen at Hugging Face to break the distance between people (literal distance as HF is a remote company). What about fine-tuning it using this new DPO method that a research team from Stanford in California just posted on Arxiv, says one? Hey, that’s a great idea, replies the other. We've just build a great code base (with Nathan, Nazneen, Costa, Younes and all the H4 team and TRL community) let's use it! The next day they start diving in the datasets openly shared on the HF hub and stumble upon two interesting large and good quality fine-tuning datasets recently open-sourced by OpenBMB, a Chinese team from Tsinghua: UltraFeedback and UltraChat. A few rounds of training experiments confirm the intuition, the resulting model is super strong, by far the strongest they have ever seen in their benchmarks from Berkeley and Stanford (LMSYS and Alpaca). Join Clementine, the big boss of the open evaluation leaderboard. Her deep dive into the model capabilities confirms the results: impressive performance. But the H4 team also hosts a famous faculty member, Pr. Sasha Rush, Associate Professor at Cornell University in his daytime, hacker at HF in his nighttime. Joining the conversation, he proposes to quickly draft a research paper to organize and share all the details with the community. A few days later, the model, called Zephyr (a wind like Mistral), paper, and all details are shared with the world. Quickly other companies, everywhere in the world starts to use it. LlamaIndex, a famous data framework and community, shares how the model blew their expectations on real-life use-case benchmarks, while researchers and practitioners discuss the paper and work on the Hugging Face hub. All this happened in just a few weeks catalyzed by open access to knowledge, models, research, and datasets released all over the world (Europe, California, China) and by the idea that people can build upon one another work in AI to bring real-world value with efficient and open models. Stories like this are numerous everywhere around us and make me really proud of the AI community and see how we can build amazingly useful things together. [the video is just me reading this Friday post hahah]

Thomas Wolf

169,127 Aufrufe • vor 2 Jahren

Build powerful, offline AI features with EmbeddingGemma. Our new 308M parameter text embedding model enables on-device semantic search, RAG, and more. Learn how to get started ↓

Build powerful, offline AI features with EmbeddingGemma. Our new 308M parameter text embedding model enables on-device semantic search, RAG, and more. Learn how to get started ↓

Google AI Developers

36,977 Aufrufe • vor 8 Monaten

Jason Calacanis: Anthropic, OpenAI and others are trying to kill OpenClaw Why? Because an open source agent platform is an existential threat to frontier model companies. @jason: “ People are going to say I'm a conspiracy theorist, but the number one goal, I believe, in the large language model, frontier model space, is to kill (OpenClaw). This is a giant movement to stop it, because this is the equivalent of having an open source Android-like player in the market, and that could be incredibly disruptive. Because, I believe, open source is going to win the day on the large language models and take 90% of the token usage, and I think the entire frontier model space could be undercut by open source. And I think they realize that SLMs, the smaller language models that are verticalized now, that will run on desktops and laptops and are even starting to run on the top ones, that is their biggest competitive threat, and I hope it happens.”

Jason Calacanis: Anthropic, OpenAI and others are trying to kill OpenClaw Why? Because an open source agent platform is an existential threat to frontier model companies. @jason: “ People are going to say I'm a conspiracy theorist, but the number one goal, I believe, in the large language model, frontier model space, is to kill (OpenClaw). This is a giant movement to stop it, because this is the equivalent of having an open source Android-like player in the market, and that could be incredibly disruptive. Because, I believe, open source is going to win the day on the large language models and take 90% of the token usage, and I think the entire frontier model space could be undercut by open source. And I think they realize that SLMs, the smaller language models that are verticalized now, that will run on desktops and laptops and are even starting to run on the top ones, that is their biggest competitive threat, and I hope it happens.”

The All-In Podcast

64,267 Aufrufe • vor 2 Monaten

Matryoshka Representation Learning (MRL) is a super exciting approach to improving the quality and efficiency of embedding models and strategies ✨ MRL allows models to store more information in the earlier dimensions of their data vectors. This method not only boosts performance in tasks like classification and retrieval, but is also a super cool compression technique! Paper: For compression: It’s been so much fun learning and making this video with @DanielW966, thanks for all your help!

Matryoshka Representation Learning (MRL) is a super exciting approach to improving the quality and efficiency of embedding models and strategies ✨ MRL allows models to store more information in the earlier dimensions of their data vectors. This method not only boosts performance in tasks like classification and retrieval, but is also a super cool compression technique! Paper: For compression: It’s been so much fun learning and making this video with @DanielW966, thanks for all your help!

Victoria Slocum

120,436 Aufrufe • vor 1 Jahr

.John Coogan says the recent reporting on Meta's 'tokenmaxxing' is less of a sign of bad incentives at the company, and more of a tell about its potential strategy for more vertical integration: "I think it makes clearer the strategy with MSL. Because it's clear that they're spending hundreds of millions of dollars on this, just for internal code-gen tooling — like, running their business." "They are going to spend an inordinate amount of money on frontier inference, and so [by] training a model there, they will be able to amortize the training cost of the next model they build not just over, 'Can they get a product out that goes viral and becomes its own standalone chat app that people pay for.'" "Just on the internal usage, they could be running a multi-billion-dollar token bill that they would have to pay another lab. And so if they develop that internally, it's pure vertical integration." "And then you also have everything that's happening on the actual ad targeting and content delivery side..." "The big question has been like, 'Is Meta going to be able to launch an entirely new AI product like Vibes or something like that?'" "And this is a data point that, to me, says they don't need to. Because just from a pure vertical integration [perspective], the investment in MSL can pencil out."

.John Coogan says the recent reporting on Meta's 'tokenmaxxing' is less of a sign of bad incentives at the company, and more of a tell about its potential strategy for more vertical integration: "I think it makes clearer the strategy with MSL. Because it's clear that they're spending hundreds of millions of dollars on this, just for internal code-gen tooling — like, running their business." "They are going to spend an inordinate amount of money on frontier inference, and so [by] training a model there, they will be able to amortize the training cost of the next model they build not just over, 'Can they get a product out that goes viral and becomes its own standalone chat app that people pay for.'" "Just on the internal usage, they could be running a multi-billion-dollar token bill that they would have to pay another lab. And so if they develop that internally, it's pure vertical integration." "And then you also have everything that's happening on the actual ad targeting and content delivery side..." "The big question has been like, 'Is Meta going to be able to launch an entirely new AI product like Vibes or something like that?'" "And this is a data point that, to me, says they don't need to. Because just from a pure vertical integration [perspective], the investment in MSL can pencil out."

TBPN

32,654 Aufrufe • vor 2 Monaten

Anthropic CEO Dario Amodei on Open-Source AI Models. "I don't think open source works the same way in AI that it has worked in other areas. Primarily because with open source you can see the source code of the model. Here we can't see inside the model, it's often called open weights instead of open source to kind of distinguish that. But a lot of the benefits, which is that many people can work on it and that it's kind of additive, don't quite work in the same way. So I've actually always seen it as a red herring. When I see a new model come out I don't care whether it's open source or not. If we talk about Deep Seek I don't think it mattered that Deep Seek is open source. I think I ask, is it a good model? Is it better than us at the things that matter? That's the only thing that I care about. It actually doesn't matter either way. Because ultimately you have to host it on the cloud. The people who host it on the cloud do inference. These are big models, they're hard to do inference on. When I think about competition I think about which models are good at the tasks that we do. I think open source is actually a red herring. It's not free. You have to run it on inference and someone has to make it fast on inference." --- From 'Alex Kantrowitz' YT channel

Anthropic CEO Dario Amodei on Open-Source AI Models. "I don't think open source works the same way in AI that it has worked in other areas. Primarily because with open source you can see the source code of the model. Here we can't see inside the model, it's often called open weights instead of open source to kind of distinguish that. But a lot of the benefits, which is that many people can work on it and that it's kind of additive, don't quite work in the same way. So I've actually always seen it as a red herring. When I see a new model come out I don't care whether it's open source or not. If we talk about Deep Seek I don't think it mattered that Deep Seek is open source. I think I ask, is it a good model? Is it better than us at the things that matter? That's the only thing that I care about. It actually doesn't matter either way. Because ultimately you have to host it on the cloud. The people who host it on the cloud do inference. These are big models, they're hard to do inference on. When I think about competition I think about which models are good at the tasks that we do. I think open source is actually a red herring. It's not free. You have to run it on inference and someone has to make it fast on inference." --- From 'Alex Kantrowitz' YT channel

Rohan Paul

944,205 Aufrufe • vor 7 Monaten

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

How can you solve complex tasks using a Large Language Model? Here is a 2-minute introduction to everything you need to know to 10x the quality of your results. Let's talk about three techniques, in order of complexity, starting with the easiest one: • In-Context Learning • Indexing + In-Context Learning • Fine-tuning In-Context Learning The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave. I included an example prompt in the attached video. You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples. You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video. Indexing + In-Context Learning Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size." One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words. Although this sounds like a lot, many applications need more than that. Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context? That's where Indexing comes in. Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors. You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation. Fine-tuning Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list. There are different approaches to fine-tuning a model with your data. A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier. Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model. Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches. I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me Santiago so you don't miss what comes next.

Santiago

384,482 Aufrufe • vor 3 Jahren