正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

How does Exa serve billion-scale vector search? We combine binary quantization, Matryoshka embeddings, SIMD, and IVF into a novel system that can beat alternatives like HNSW. Shreyas gave a talk today at the AI Engineer World's Fair explaining our approach! ⬇️

Exa

51,636 subscribers

85,482 次观看 • 1 年前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

10 条评论

Jeffrey Wang 的头像

Jeffrey Wang1 年前

@shreyas4_ @aiDotEngineer I wanna be nearest neighbors w/ @shreyas4_

Tigran III 的头像

Tigran III1 年前

@shreyas4_ @aiDotEngineer i am still struggling to believe how much cracked engineering talent is coming from that one university. @shreyas4_ what's the secret sauce?

Martyn Strydom 🤸 的头像

Martyn Strydom 🤸1 年前

@shreyas4_ @aiDotEngineer Unreal @shreyas4_

Karan☕ 的头像

Karan☕1 年前

@shreyas4_ @aiDotEngineer great talk learned a lot of new things, had this question: I think if you use binary quantization, for smaller embeddings you will get poorer results because of lossy compression(already dimension reduction is done and then BQ)

Prashant Dixit 的头像

Prashant Dixit1 年前

@shreyas4_ @aiDotEngineer Anyone wants to just give a quick try and Build Matryoshka Embedding based RAG in a min, Give it a try 🙂

sophia 的头像

sophia1 年前

@shreyas4_ @aiDotEngineer I'm confused why you said 8TB of memory to hold everything in RAM is too expensive. Back of the envelope Hetzner has 24 core/192GB systems for $366/mo. 8TB would be ~$200k/y or ~18k queries/$ @ 100 QPS

Hamish Ogilvy 的头像

Hamish Ogilvy1 年前

@shreyas4_ @aiDotEngineer Nice work. So funny how obsessed people were with HNSW…

omkaar 的头像

omkaar1 年前

@shreyas4_ @aiDotEngineer awesome great job guys

Aarush Sah 的头像

Aarush Sah1 年前

@shreyas4_ @aiDotEngineer i love shreyas shreyas is so cool

agi 的头像

agi1 年前

@shreyas4_ @aiDotEngineer love this - great insight for my product

相关视频

Gave a talk on how build AI apps in a weekend that go viral and scale to millions last year. AMA! Thanks for having me AI Engineer!

Gave a talk on how build AI apps in a weekend that go viral and scale to millions last year. AMA! Thanks for having me AI Engineer!

Hassan

83,260 次观看 • 1 年前

Our approach to AI infra is simple: build the most fungible and flexible fleet to meet the real world's needs across inference and training as Scott Guthrie shared with Alex Kantrowitz. And we are already doing it at scale today, as we power the biggest AI workloads like Copilot and ChatGPT, APIs that power 3P products & enterprise workloads and high scale training.

Our approach to AI infra is simple: build the most fungible and flexible fleet to meet the real world's needs across inference and training as Scott Guthrie shared with Alex Kantrowitz. And we are already doing it at scale today, as we power the biggest AI workloads like Copilot and ChatGPT, APIs that power 3P products & enterprise workloads and high scale training.

Satya Nadella

111,599 次观看 • 8 个月前

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Andrew Ng

146,200 次观看 • 1 年前

How do computers understand data? With semantic search! Instead of just matching keywords, it understands context using vector embeddings. Here’s how: 1) Convert data (text, images, etc.) into vectors (embeddings) 2) Store these vectors in a vector database 3) Search by meaning, not just the keywords Semantic search makes finding data across formats easier. Learn more in this blog post by Leonie, my all-time favorite:

How do computers understand data? With semantic search! Instead of just matching keywords, it understands context using vector embeddings. Here’s how: 1) Convert data (text, images, etc.) into vectors (embeddings) 2) Store these vectors in a vector database 3) Search by meaning, not just the keywords Semantic search makes finding data across formats easier. Learn more in this blog post by Leonie, my all-time favorite:

Femke Plantinga

23,911 次观看 • 1 年前

We (@RohitDilip8 Alex Beatson Ishaan Javali + I) won 2nd place & the Chroma prize at the Scale AI GenAI hackathon w/ Protex: the protein therapeutics universe is immense, we focus the search space w/ OpenAI & Meta ESM embeddings of InterPro, Chroma vector db, InfoNCE & Vercel

We (@RohitDilip8 Alex Beatson Ishaan Javali + I) won 2nd place & the Chroma prize at the Scale AI GenAI hackathon w/ Protex: the protein therapeutics universe is immense, we focus the search space w/ OpenAI & Meta ESM embeddings of InterPro, Chroma vector db, InfoNCE & Vercel

Shawn

10,926 次观看 • 2 年前

My pitch to Sam Altman was simple - "We want to serve a billion people" For that scale, we rebuilt the insurance infrastructure with AI and Bitcoin. Today, meanwhile | Bitcoin Life Insurance can close quarterly books in 2.5 hours. And customers can get a fully underwritten policy in a few days. w/ @TBPN

My pitch to Sam Altman was simple - "We want to serve a billion people" For that scale, we rebuilt the insurance infrastructure with AI and Bitcoin. Today, meanwhile | Bitcoin Life Insurance can close quarterly books in 2.5 hours. And customers can get a fully underwritten policy in a few days. w/ @TBPN

Zac Townsend

10,738 次观看 • 2 个月前

How does Aaron Rodgers want to see the Steelers learn from last week’s loss and respond? “I don’t like getting too binary, but winning. That’s a good response, but we can’t get attached to the binary system that our league is judged on necessarily because it’s a 17-game season.”

How does Aaron Rodgers want to see the Steelers learn from last week’s loss and respond? “I don’t like getting too binary, but winning. That’s a good response, but we can’t get attached to the binary system that our league is judged on necessarily because it’s a 17-game season.”

Brooke Pryor

423,820 次观看 • 8 个月前

Supabase can be used as a vector database! This means that you can perform a semantic search against Supabase! This allows you to create RAG apps or content recommendation engines on top of Supabase! Learn what embeddings are, and how you can use them 👇

Supabase can be used as a vector database! This means that you can perform a semantic search against Supabase! This allows you to create RAG apps or content recommendation engines on top of Supabase! Learn what embeddings are, and how you can use them 👇

Tyler Shukert

39,359 次观看 • 1 年前

INTELLIGENT TASKS ARE A STEPPING STONE TO AGI Today, we are launching ChatLLM Tasks. We have hooked up tools like web search, email, and web scrappers to mini-agents that can be triggered on a schedule. These tasks combine intelligent tool use with crons! We think this is the first step towards AGI. The next step is to connect our AI engineer to create more complex tasks. AGI STEP ONE - DONE!

INTELLIGENT TASKS ARE A STEPPING STONE TO AGI Today, we are launching ChatLLM Tasks. We have hooked up tools like web search, email, and web scrappers to mini-agents that can be triggered on a schedule. These tasks combine intelligent tool use with crons! We think this is the first step towards AGI. The next step is to connect our AI engineer to create more complex tasks. AGI STEP ONE - DONE!

Bindu Reddy

18,486 次观看 • 1 年前

Today, Scale AI powers the world's leading AI systems by providing essential data labeling. His company works with OpenAI, Google, Meta, and the Department of Defense. In 2024, Scale AI reached a $13.8 billion valuation. But he didn't just build a company.

Today, Scale AI powers the world's leading AI systems by providing essential data labeling. His company works with OpenAI, Google, Meta, and the Department of Defense. In 2024, Scale AI reached a $13.8 billion valuation. But he didn't just build a company.

Jens Honack

50,336 次观看 • 1 年前

We raised $85M in Series B funding at a $700M valuation, led by Benchmark. Exa is a research lab building the search engine for AI.

We raised $85M in Series B funding at a $700M valuation, led by Benchmark. Exa is a research lab building the search engine for AI.

Exa

1,807,827 次观看 • 9 个月前

🚨 Treasury Secretary Scott Bessent gave a FASCINATING insight into Trump's tariff strategy with regard to China today. BESSENT: "At the end of the day... we can probably reach a deal with our allies... and then we can approach China as a group." Helps to explain the 90-day pause with other nations & lowered reciprocal tariffs but the increased tariffs on China. ⬇️

🚨 Treasury Secretary Scott Bessent gave a FASCINATING insight into Trump's tariff strategy with regard to China today. BESSENT: "At the end of the day... we can probably reach a deal with our allies... and then we can approach China as a group." Helps to explain the 90-day pause with other nations & lowered reciprocal tariffs but the increased tariffs on China. ⬇️

Kayleigh McEnany

150,051 次观看 • 1 年前

Introducing Exa Monitors - your agent’s radar for the web Exa is a search engine built from scratch, and today we're exposing our "update" layer. Simply define what to find and how often - Monitors will return any new information, via webhook.

Introducing Exa Monitors - your agent’s radar for the web Exa is a search engine built from scratch, and today we're exposing our "update" layer. Simply define what to find and how often - Monitors will return any new information, via webhook.

Exa

121,344 次观看 • 2 个月前

How did we build & scale Replit Agent? Find out from a Replit AI engineer (and a good friend), James Austin ⠕ We chat about building Replit Agent, what makes a great SWE, and James' journey to becoming an AI engineer

How did we build & scale Replit Agent? Find out from a Replit AI engineer (and a good friend), James Austin ⠕ We chat about building Replit Agent, what makes a great SWE, and James' journey to becoming an AI engineer

matt palmer

26,222 次观看 • 6 个月前

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action.

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action.

Thinking Machines

7,716,045 次观看 • 26 天前

Get Your Very Own AI Engineer! We present your very OWN AI Engineer as part of ChatLLM! Watch the engineer create a custom bot based on your documents. It will create a vector store, configure your custom bot, and deploy it in minutes. An end-to-end custom RAG system becomes instantly available! This is just the beginning! We hope to upskill the Engineer rapidly.

Get Your Very Own AI Engineer! We present your very OWN AI Engineer as part of ChatLLM! Watch the engineer create a custom bot based on your documents. It will create a vector store, configure your custom bot, and deploy it in minutes. An end-to-end custom RAG system becomes instantly available! This is just the beginning! We hope to upskill the Engineer rapidly.

Bindu Reddy

77,693 次观看 • 1 年前

We believe AI can be a dedicated research partner to help discover the next breakthrough. Enter Co-Scientist: our latest Gemini-based multi-agent system that can generate, debate and evolve novel hypotheses for complex scientific problems 🧵

We believe AI can be a dedicated research partner to help discover the next breakthrough. Enter Co-Scientist: our latest Gemini-based multi-agent system that can generate, debate and evolve novel hypotheses for complex scientific problems 🧵

Google DeepMind

177,233 次观看 • 4 天前

I built a Pinterest clone that uses AI to find similar images I crawled tumblr and collected lots of images, then used a model to get vector embeddings. When you click an image it finds the most similar embeddings and returns the images

I built a Pinterest clone that uses AI to find similar images I crawled tumblr and collected lots of images, then used a model to get vector embeddings. When you click an image it finds the most similar embeddings and returns the images

ab

15,330 次观看 • 2 年前

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built HydraDB for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built HydraDB for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️

Nishkarsh

3,854,883 次观看 • 2 个月前

What does the future of TV really look like? “A TV is no longer just a screen — it’s a system.” Hear how industry experts explain the role of SQD-Mini LED and AI at TCL’s CES 2026 Tech Talk.

What does the future of TV really look like? “A TV is no longer just a screen — it’s a system.” Hear how industry experts explain the role of SQD-Mini LED and AI at TCL’s CES 2026 Tech Talk.

TCL

10,030,680 次观看 • 4 个月前