Video yükleniyor...

Video Yüklenemedi

Ana Sayfaya Dön

How does Exa serve billion-scale vector search? We combine binary quantization, Matryoshka embeddings, SIMD, and IVF into a novel system that can beat alternatives like HNSW. Shreyas gave a talk today at the AI Engineer World's Fair explaining our approach! ⬇️

85,482 görüntüleme • 1 yıl önce •via X (Twitter)

10 Yorum

Jeffrey Wang profil fotoğrafı
Jeffrey Wang1 yıl önce

@shreyas4_ @aiDotEngineer I wanna be nearest neighbors w/ @shreyas4_

Tigran III profil fotoğrafı
Tigran III1 yıl önce

@shreyas4_ @aiDotEngineer i am still struggling to believe how much cracked engineering talent is coming from that one university. @shreyas4_ what's the secret sauce?

Martyn Strydom 🤸 profil fotoğrafı
Martyn Strydom 🤸1 yıl önce

@shreyas4_ @aiDotEngineer Unreal @shreyas4_

Karan☕ profil fotoğrafı
Karan☕1 yıl önce

@shreyas4_ @aiDotEngineer great talk learned a lot of new things, had this question: I think if you use binary quantization, for smaller embeddings you will get poorer results because of lossy compression(already dimension reduction is done and then BQ)

Prashant Dixit profil fotoğrafı
Prashant Dixit1 yıl önce

@shreyas4_ @aiDotEngineer Anyone wants to just give a quick try and Build Matryoshka Embedding based RAG in a min, Give it a try 🙂

sophia profil fotoğrafı
sophia1 yıl önce

@shreyas4_ @aiDotEngineer I'm confused why you said 8TB of memory to hold everything in RAM is too expensive. Back of the envelope Hetzner has 24 core/192GB systems for $366/mo. 8TB would be ~$200k/y or ~18k queries/$ @ 100 QPS

Hamish Ogilvy profil fotoğrafı
Hamish Ogilvy1 yıl önce

@shreyas4_ @aiDotEngineer Nice work. So funny how obsessed people were with HNSW…

omkaar profil fotoğrafı
omkaar1 yıl önce

@shreyas4_ @aiDotEngineer awesome great job guys

Aarush Sah profil fotoğrafı
Aarush Sah1 yıl önce

@shreyas4_ @aiDotEngineer i love shreyas shreyas is so cool

agi profil fotoğrafı
agi1 yıl önce

@shreyas4_ @aiDotEngineer love this - great insight for my product

Benzer Videolar

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Andrew Ng

146,200 görüntüleme • 1 yıl önce