正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

EmbeddingGemma is our new best-in-class open embedding model designed for on-device AI. 📱 At just 308M parameters, it delivers state-of-the-art performance while being small and efficient enough to run anywhere - even without an internet connection.

Google DeepMind

1,462,279 subscribers

584,565 次观看 • 10 个月前 •via X (Twitter)

健康养生科学技术教育

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

NEW: Google releases EmbeddingGemma, a state-of-the-art multilingual embedding model perfect for on-device use cases! At only 308M params, the model can run 100% locally in your browser! 🤯 Explore your documents in an interactive 3D universe with our demo: "The Semantic Galaxy"

NEW: Google releases EmbeddingGemma, a state-of-the-art multilingual embedding model perfect for on-device use cases! At only 308M params, the model can run 100% locally in your browser! 🤯 Explore your documents in an interactive 3D universe with our demo: "The Semantic Galaxy"

Xenova

23,338 次观看 • 10 个月前

Google just proved that bigger isn't always better. Their 308M parameter model is outperforming models 2x its size. Google just released 𝗘𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝗚𝗲𝗺𝗺𝗮, and it's proving that lightweight embedding models can punch way above their weight class. At just 308M parameters (578MB), it's the new state-of-the-art for models under 500M parameters across MTEB multilingual, English, and code benchmarks. But the really impressive part is that it ranks 8th overall on MTEB(Multilingual, v2) - that's 𝟭𝟳 𝗽𝗹𝗮𝗰𝗲𝘀 above the second-best sub-500M model, and it's delivering performance 𝗰𝗼𝗺𝗽𝗮𝗿𝗮𝗯𝗹𝗲 𝘁𝗼 𝗺𝗼𝗱𝗲𝗹𝘀 𝗻𝗲𝗮𝗿𝗹𝘆 𝗱𝗼𝘂𝗯𝗹𝗲 𝗶𝘁𝘀 𝘀𝗶𝘇𝗲. There are three key parts of their training recipe that sets it apart: 𝟭. 𝗘𝗻𝗰𝗼𝗱𝗲𝗿-𝗗𝗲𝗰𝗼𝗱𝗲𝗿 𝗜𝗻𝗶𝘁𝗶𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 Instead of starting from a decoder-only Gemma 3 model, they first adapted it to encoder-decoder, then used just the encoder. By basing EmbeddingGemma off an LLM that already has world and language understanding, it gives it a stronger starting point. 𝟮. 𝗧𝗵𝗿𝗲𝗲-𝗟𝗼𝘀𝘀 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 They combine three different loss functions, instead of just having one: • Contrastive loss (NCE) with in-batch negatives and hardness weighting • Spread-out regularization to ensure embeddings utilize the full space (for quantization and ANN retrieval) • Embedding matching distillation from Gemini Embedding - not just learning from relevance scores, but directly aligning the embedding space with the teacher model 𝟯. 𝗠𝗼𝗱𝗲𝗹 𝗦𝗼𝘂𝗽𝗶𝗻𝗴 Rather than just averaging checkpoints from the same training run, they use optimization techniques to find multiple specialized training mixtures. Each mixture creates an "expert" model in different domains, and averaging all their parameters creates a final model that's actually better than individual models. Extras: • Matryoshka embeddings supporting 768, 512, 256, and 128 dimensions • Quantization-aware training - maintains quality even at int4 precision • 100+ languages from Gemma 3 pretraining • Exceptional performance on low-resource languages (check their XTREME-UP results) Is it the absolute best embedding model? No - Gemini Embedding still leads overall. But that's not really the point. EmbeddingGemma proves you can achieve state-of-the-art performance in a small package that's actually deployable on-device, in low-latency applications, and in resource-constrained environments. This makes good embeddings accessible for use cases that I'm seeing more and more: offline applications, privacy-sensitive deployments, and high-throughput scenarios where inference cost actually matters. Full paper: Shoutout to the EmbeddingGemma team at Google DeepMind for this awesome open source work 💙 and to Daniel Williams for helping me with this video! 🫶

Google just proved that bigger isn't always better. Their 308M parameter model is outperforming models 2x its size. Google just released 𝗘𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝗚𝗲𝗺𝗺𝗮, and it's proving that lightweight embedding models can punch way above their weight class. At just 308M parameters (578MB), it's the new state-of-the-art for models under 500M parameters across MTEB multilingual, English, and code benchmarks. But the really impressive part is that it ranks 8th overall on MTEB(Multilingual, v2) - that's 𝟭𝟳 𝗽𝗹𝗮𝗰𝗲𝘀 above the second-best sub-500M model, and it's delivering performance 𝗰𝗼𝗺𝗽𝗮𝗿𝗮𝗯𝗹𝗲 𝘁𝗼 𝗺𝗼𝗱𝗲𝗹𝘀 𝗻𝗲𝗮𝗿𝗹𝘆 𝗱𝗼𝘂𝗯𝗹𝗲 𝗶𝘁𝘀 𝘀𝗶𝘇𝗲. There are three key parts of their training recipe that sets it apart: 𝟭. 𝗘𝗻𝗰𝗼𝗱𝗲𝗿-𝗗𝗲𝗰𝗼𝗱𝗲𝗿 𝗜𝗻𝗶𝘁𝗶𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 Instead of starting from a decoder-only Gemma 3 model, they first adapted it to encoder-decoder, then used just the encoder. By basing EmbeddingGemma off an LLM that already has world and language understanding, it gives it a stronger starting point. 𝟮. 𝗧𝗵𝗿𝗲𝗲-𝗟𝗼𝘀𝘀 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 They combine three different loss functions, instead of just having one: • Contrastive loss (NCE) with in-batch negatives and hardness weighting • Spread-out regularization to ensure embeddings utilize the full space (for quantization and ANN retrieval) • Embedding matching distillation from Gemini Embedding - not just learning from relevance scores, but directly aligning the embedding space with the teacher model 𝟯. 𝗠𝗼𝗱𝗲𝗹 𝗦𝗼𝘂𝗽𝗶𝗻𝗴 Rather than just averaging checkpoints from the same training run, they use optimization techniques to find multiple specialized training mixtures. Each mixture creates an "expert" model in different domains, and averaging all their parameters creates a final model that's actually better than individual models. Extras: • Matryoshka embeddings supporting 768, 512, 256, and 128 dimensions • Quantization-aware training - maintains quality even at int4 precision • 100+ languages from Gemma 3 pretraining • Exceptional performance on low-resource languages (check their XTREME-UP results) Is it the absolute best embedding model? No - Gemini Embedding still leads overall. But that's not really the point. EmbeddingGemma proves you can achieve state-of-the-art performance in a small package that's actually deployable on-device, in low-latency applications, and in resource-constrained environments. This makes good embeddings accessible for use cases that I'm seeing more and more: offline applications, privacy-sensitive deployments, and high-throughput scenarios where inference cost actually matters. Full paper: Shoutout to the EmbeddingGemma team at Google DeepMind for this awesome open source work 💙 and to Daniel Williams for helping me with this video! 🫶

Victoria Slocum

21,592 次观看 • 8 个月前

At the ElevenLabs Summit in Warsaw, we previewed on-device Text to Speech - a new model architecture that delivers human-level quality on limited hardware without an internet connection.

At the ElevenLabs Summit in Warsaw, we previewed on-device Text to Speech - a new model architecture that delivers human-level quality on limited hardware without an internet connection.

ElevenLabs

35,091 次观看 • 1 个月前

.SnowflakeDB is thrilled to announce #SnowflakeArctic: A state-of-the-art large language model uniquely designed to be the most open, enterprise-grade LLM on the market. This is a big step forward for open source LLMs. And it’s a big moment for Snowflake in our #AI journey as we continue to build best-in-class enterprise-grade products for our customers. The era of enterprise AI is here. 🚀

.SnowflakeDB is thrilled to announce #SnowflakeArctic: A state-of-the-art large language model uniquely designed to be the most open, enterprise-grade LLM on the market. This is a big step forward for open source LLMs. And it’s a big moment for Snowflake in our #AI journey as we continue to build best-in-class enterprise-grade products for our customers. The era of enterprise AI is here. 🚀

sridhar

243,719 次观看 • 2 年前

We’re bringing powerful AI directly onto robots with Gemini Robotics On-Device. 🤖 It’s our first vision-language-action model to help make robots faster, highly efficient, and adaptable to new tasks and environments - without needing a constant internet connection. 🧵

We’re bringing powerful AI directly onto robots with Gemini Robotics On-Device. 🤖 It’s our first vision-language-action model to help make robots faster, highly efficient, and adaptable to new tasks and environments - without needing a constant internet connection. 🧵

Google DeepMind

819,465 次观看 • 1 年前

Introducing ✨Tiny Aya✨, a family of massively multilingual small language models built to run where people actually are. Tiny Aya delivers strong multilingual performance in 70+ global languages in a 3.35B parameter model, efficient enough to run locally, even on a phone.

Introducing ✨Tiny Aya✨, a family of massively multilingual small language models built to run where people actually are. Tiny Aya delivers strong multilingual performance in 70+ global languages in a 3.35B parameter model, efficient enough to run locally, even on a phone.

Cohere Labs

193,163 次观看 • 5 个月前

Google just released Muse: a text to image model achieving state of the art performance. It is significantly more efficient than diffusion models. Muse also allows for inpainting, outpainting, and mask-free editing without finetuning or inverting the model.

Google just released Muse: a text to image model achieving state of the art performance. It is significantly more efficient than diffusion models. Muse also allows for inpainting, outpainting, and mask-free editing without finetuning or inverting the model.

bleedingedge.ai

183,750 次观看 • 3 年前

New AI Package with DeepSeek AI Model TesseractAI continues to expand and improve our Custom AI Agent. You will now be able to add AI packages to your agent in addition to custom information. 🌟 This new feature is being developed using #DeepSeek AI model. In particular, DeepSeek R1 is designed for greater efficiency and more goal-oriented tasks, requiring fewer computational resources while still delivering powerful results. Additionally, we’re experimenting with DeepSeek’s technology for our existing Agent LLM to further enhance performance. Our goal is to make the Custom AI Agent even more efficient and effective for all your needs.

New AI Package with DeepSeek AI Model TesseractAI continues to expand and improve our Custom AI Agent. You will now be able to add AI packages to your agent in addition to custom information. 🌟 This new feature is being developed using #DeepSeek AI model. In particular, DeepSeek R1 is designed for greater efficiency and more goal-oriented tasks, requiring fewer computational resources while still delivering powerful results. Additionally, we’re experimenting with DeepSeek’s technology for our existing Agent LLM to further enhance performance. Our goal is to make the Custom AI Agent even more efficient and effective for all your needs.

Tesseract AI

13,236 次观看 • 1 年前

Introducing NexaSDK for Android (Beta) — run the latest AI models locally, 9× more energy-efficient and 2× faster, on Android devices, powered by the Qualcomm Hexagon NPU. This is the first SDK to support NPU, GPU and CPU, unlocking the full power of every Android device — for example, LFM2-1.2B achieves 85 t/s on NPU vs 37 t/s on CPU. With just 3 lines of code, you can run the latest state-of-the-art models across every task, for example: Multimodal (Vision, Audio, Text): OmniNeural-4B Embedding: EmbeddingGemma from Google ASR: Parakeet-v3 from NVIDIA OCR: PaddleOCR from Baidu Inc. Rerank: Jina-reranker from Jina AI LLM: LFM2-1.2B from And we continue to deliver Day-0 model support across our framework. Build on-device AI into your Android app today and enjoy no cloud API cost, full privacy, and offline availability. Check out our Quickstart guide and example app below to get started in minutes.

Introducing NexaSDK for Android (Beta) — run the latest AI models locally, 9× more energy-efficient and 2× faster, on Android devices, powered by the Qualcomm Hexagon NPU. This is the first SDK to support NPU, GPU and CPU, unlocking the full power of every Android device — for example, LFM2-1.2B achieves 85 t/s on NPU vs 37 t/s on CPU. With just 3 lines of code, you can run the latest state-of-the-art models across every task, for example: Multimodal (Vision, Audio, Text): OmniNeural-4B Embedding: EmbeddingGemma from Google ASR: Parakeet-v3 from NVIDIA OCR: PaddleOCR from Baidu Inc. Rerank: Jina-reranker from Jina AI LLM: LFM2-1.2B from And we continue to deliver Day-0 model support across our framework. Build on-device AI into your Android app today and enjoy no cloud API cost, full privacy, and offline availability. Check out our Quickstart guide and example app below to get started in minutes.

NEXA AI

26,091 次观看 • 8 个月前

If your AI needs a wire to work, it isn't yours. QVAC is the local-first engine designed to run anywhere, even where the internet can't reach. Fully autonomous. Fully open source. Fully sovereign. If you can dream it, you can build it. Even in another galaxy. Start building the future of edge AI:

If your AI needs a wire to work, it isn't yours. QVAC is the local-first engine designed to run anywhere, even where the internet can't reach. Fully autonomous. Fully open source. Fully sovereign. If you can dream it, you can build it. Even in another galaxy. Start building the future of edge AI:

QVAC

7,044,078 次观看 • 2 个月前

Alibaba just changed the AI game! Their QwQ-32B model delivers premium performance but it's completely free and open source. • Outperforms on key benchmarks • 32B parameters (vs 671B for DeepSeek) • Run locally or use web interface • Thinking mode for reasoning • Web search integration Save this video if you need powerful AI tools without the expensive subscriptions 💡 Want the SOP? DM me.

Alibaba just changed the AI game! Their QwQ-32B model delivers premium performance but it's completely free and open source. • Outperforms on key benchmarks • 32B parameters (vs 671B for DeepSeek) • Run locally or use web interface • Thinking mode for reasoning • Web search integration Save this video if you need powerful AI tools without the expensive subscriptions 💡 Want the SOP? DM me.

Julian Goldie SEO

16,923 次观看 • 1 年前

Not enough people are talking about NVIDIA's new Nemotron-3-Nano (4B) model! 🤯 Hybrid Mamba + Attention architecture, designed as a unified model for reasoning and non-reasoning tasks. So small and efficient, it can run 100% locally in your web browser at 75 tokens per second.

Not enough people are talking about NVIDIA's new Nemotron-3-Nano (4B) model! 🤯 Hybrid Mamba + Attention architecture, designed as a unified model for reasoning and non-reasoning tasks. So small and efficient, it can run 100% locally in your web browser at 75 tokens per second.

Xenova

50,063 次观看 • 4 个月前

Meet Molmo: a family of open, state-of-the-art multimodal AI models. Our best model outperforms proprietary systems, using 1000x less data. Molmo doesn't just understand multimodal data—it acts on it, enabling rich interactions in both the physical and virtual worlds. Try it for yourself:

Meet Molmo: a family of open, state-of-the-art multimodal AI models. Our best model outperforms proprietary systems, using 1000x less data. Molmo doesn't just understand multimodal data—it acts on it, enabling rich interactions in both the physical and virtual worlds. Try it for yourself:

Ai2

515,522 次观看 • 1 年前

wow.. this new AI is shaking Hollywood LTX Studio just dropped a 13b parameters open sourced video model and.. it's now 30x faster, run on normal laptop for free check out this film trailer I created in an hour and.. new features we never see before:

wow.. this new AI is shaking Hollywood LTX Studio just dropped a 13b parameters open sourced video model and.. it's now 30x faster, run on normal laptop for free check out this film trailer I created in an hour and.. new features we never see before:

el.cine

246,707 次观看 • 1 年前

NEW: Alibaba just released Qwen 3.5 Small — a family of powerful multimodal models available in a range of sizes (0.8B, 2B, 4B, and 9B parameters). Perfect for on-device applications! They can even run 100% locally in your browser on WebGPU, powered by Transformers.js! 🤯

NEW: Alibaba just released Qwen 3.5 Small — a family of powerful multimodal models available in a range of sizes (0.8B, 2B, 4B, and 9B parameters). Perfect for on-device applications! They can even run 100% locally in your browser on WebGPU, powered by Transformers.js! 🤯

Xenova

102,592 次观看 • 4 个月前

Introducing SmolVLM 256M (& 500M): The world’s smallest multimodal model. Designed for efficiency and perfect for on-device applications! 🔥 It’s so small, it can even run 100% locally in your browser on WebGPU! 🤏 Powered by Transformers.js! ⚡️ Try it out yourself! 👇

Introducing SmolVLM 256M (& 500M): The world’s smallest multimodal model. Designed for efficiency and perfect for on-device applications! 🔥 It’s so small, it can even run 100% locally in your browser on WebGPU! 🤏 Powered by Transformers.js! ⚡️ Try it out yourself! 👇

Xenova

25,302 次观看 • 1 年前

Create your own local and free AI agent using an open source model You can combine: - IBM's Granite 3.3 8B AI model - LM Studio to run it on a laptop - Smolagents to build your agent Small AI models are now powerful enough to run an autonomous agent. Thanks to IBM France for sponsoring this content!

Create your own local and free AI agent using an open source model You can combine: - IBM's Granite 3.3 8B AI model - LM Studio to run it on a laptop - Smolagents to build your agent Small AI models are now powerful enough to run an autonomous agent. Thanks to IBM France for sponsoring this content!

Paul Couvert

18,890 次观看 • 1 年前

made an interactive video model trained on tap-conditioned data.. tap the screen and it generates the next frames in realtime (small enough to run in the browser with webgpu, even on iphone)

made an interactive video model trained on tap-conditioned data.. tap the screen and it generates the next frames in realtime (small enough to run in the browser with webgpu, even on iphone)

sahir

23,151 次观看 • 4 个月前

Everyone wrote Apple off as the AI loser, but one hardware spec might flip that story upside down (Save this). @jason called Apple a screaming buy on the back of a single chip detail. The rumored M7 Ultra, expected around 2028, is designed to support up to 1.5TB of unified memory, enough to run frontier class trillion parameter AI models locally, with no cloud required. The Street's bear case on Apple is straightforward. Apple has no frontier model of its own, Siri has stumbled for years and the company effectively rents OpenAI's models for its hardest queries. That narrative treats Apple as the one Magnificent Seven name that missed the AI wave entirely but the bull case flips that framing on its head. If frontier AI models keep shrinking and getting cheaper to run, Apple doesn't need the smartest model in the world, it just needs to own the device that model runs on. And unified memory is the mechanism that makes this possible. Unlike traditional systems where the CPU and GPU each need separate memory, Apple's architecture lets the CPU, GPU and Neural Engine draw from one shared pool. A fully specced M7 Ultra could theoretically run something on the scale of a 1.2 trillion parameter model locally and that capability plugs directly into the one advantage Apple has spent over a decade building: privacy. Apple has already shipped Private Cloud Compute, a system designed so even Apple can't access user data processed off device. Apple doubled down on this at WWDC 2026, framing on device privacy as non-negotiable while rivals default to the cloud. If the best AI models get small enough to run on Apple silicon, the moat stops being the model and becomes the hardware it has to sit on. Milk Road Pro remains bullish on Apple and it remains as one of our core positions, if you want the full thesis + our full AI trades, come join us using the link below for just a $1.

Everyone wrote Apple off as the AI loser, but one hardware spec might flip that story upside down (Save this). @jason called Apple a screaming buy on the back of a single chip detail. The rumored M7 Ultra, expected around 2028, is designed to support up to 1.5TB of unified memory, enough to run frontier class trillion parameter AI models locally, with no cloud required. The Street's bear case on Apple is straightforward. Apple has no frontier model of its own, Siri has stumbled for years and the company effectively rents OpenAI's models for its hardest queries. That narrative treats Apple as the one Magnificent Seven name that missed the AI wave entirely but the bull case flips that framing on its head. If frontier AI models keep shrinking and getting cheaper to run, Apple doesn't need the smartest model in the world, it just needs to own the device that model runs on. And unified memory is the mechanism that makes this possible. Unlike traditional systems where the CPU and GPU each need separate memory, Apple's architecture lets the CPU, GPU and Neural Engine draw from one shared pool. A fully specced M7 Ultra could theoretically run something on the scale of a 1.2 trillion parameter model locally and that capability plugs directly into the one advantage Apple has spent over a decade building: privacy. Apple has already shipped Private Cloud Compute, a system designed so even Apple can't access user data processed off device. Apple doubled down on this at WWDC 2026, framing on device privacy as non-negotiable while rivals default to the cloud. If the best AI models get small enough to run on Apple silicon, the moat stops being the model and becomes the hardware it has to sit on. Milk Road Pro remains bullish on Apple and it remains as one of our core positions, if you want the full thesis + our full AI trades, come join us using the link below for just a $1.

Milk Road AI

36,711 次观看 • 7 天前