Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

How does Exa serve billion-scale vector search? We combine binary quantization, Matryoshka embeddings, SIMD, and IVF into a novel system that can beat alternatives like HNSW. Shreyas gave a talk today at the AI Engineer World's Fair explaining our approach! ⬇️

Exa

51,636 subscribers

85,482 Aufrufe • vor 1 Jahr •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

10 Kommentare

Profilbild von Jeffrey Wang

Jeffrey Wangvor 1 Jahr

@shreyas4_ @aiDotEngineer I wanna be nearest neighbors w/ @shreyas4_

Profilbild von Tigran III

Tigran IIIvor 1 Jahr

@shreyas4_ @aiDotEngineer i am still struggling to believe how much cracked engineering talent is coming from that one university. @shreyas4_ what's the secret sauce?

Profilbild von Martyn Strydom 🤸

Martyn Strydom 🤸vor 1 Jahr

@shreyas4_ @aiDotEngineer Unreal @shreyas4_

Profilbild von Karan☕

Karan☕vor 1 Jahr

@shreyas4_ @aiDotEngineer great talk learned a lot of new things, had this question: I think if you use binary quantization, for smaller embeddings you will get poorer results because of lossy compression(already dimension reduction is done and then BQ)

Profilbild von Prashant Dixit

Prashant Dixitvor 1 Jahr

@shreyas4_ @aiDotEngineer Anyone wants to just give a quick try and Build Matryoshka Embedding based RAG in a min, Give it a try 🙂

Profilbild von sophia

sophiavor 1 Jahr

@shreyas4_ @aiDotEngineer I'm confused why you said 8TB of memory to hold everything in RAM is too expensive. Back of the envelope Hetzner has 24 core/192GB systems for $366/mo. 8TB would be ~$200k/y or ~18k queries/$ @ 100 QPS

Profilbild von Hamish Ogilvy

Hamish Ogilvyvor 1 Jahr

@shreyas4_ @aiDotEngineer Nice work. So funny how obsessed people were with HNSW…

Profilbild von omkaar

omkaarvor 1 Jahr

@shreyas4_ @aiDotEngineer awesome great job guys

Profilbild von Aarush Sah

Aarush Sahvor 1 Jahr

@shreyas4_ @aiDotEngineer i love shreyas shreyas is so cool

Profilbild von agi

agivor 1 Jahr

@shreyas4_ @aiDotEngineer love this - great insight for my product

Ähnliche Videos

Gave a talk on how build AI apps in a weekend that go viral and scale to millions last year. AMA! Thanks for having me AI Engineer!

Gave a talk on how build AI apps in a weekend that go viral and scale to millions last year. AMA! Thanks for having me AI Engineer!

Hassan

83,260 Aufrufe • vor 1 Jahr

Our approach to AI infra is simple: build the most fungible and flexible fleet to meet the real world's needs across inference and training as Scott Guthrie shared with Alex Kantrowitz. And we are already doing it at scale today, as we power the biggest AI workloads like Copilot and ChatGPT, APIs that power 3P products & enterprise workloads and high scale training.

Our approach to AI infra is simple: build the most fungible and flexible fleet to meet the real world's needs across inference and training as Scott Guthrie shared with Alex Kantrowitz. And we are already doing it at scale today, as we power the biggest AI workloads like Copilot and ChatGPT, APIs that power 3P products & enterprise workloads and high scale training.

Satya Nadella

111,599 Aufrufe • vor 8 Monaten

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Andrew Ng

146,200 Aufrufe • vor 1 Jahr

How do computers understand data? With semantic search! Instead of just matching keywords, it understands context using vector embeddings. Here’s how: 1) Convert data (text, images, etc.) into vectors (embeddings) 2) Store these vectors in a vector database 3) Search by meaning, not just the keywords Semantic search makes finding data across formats easier. Learn more in this blog post by Leonie, my all-time favorite:

How do computers understand data? With semantic search! Instead of just matching keywords, it understands context using vector embeddings. Here’s how: 1) Convert data (text, images, etc.) into vectors (embeddings) 2) Store these vectors in a vector database 3) Search by meaning, not just the keywords Semantic search makes finding data across formats easier. Learn more in this blog post by Leonie, my all-time favorite:

Femke Plantinga

23,911 Aufrufe • vor 1 Jahr

My pitch to Sam Altman was simple - "We want to serve a billion people" For that scale, we rebuilt the insurance infrastructure with AI and Bitcoin. Today, meanwhile | Bitcoin Life Insurance can close quarterly books in 2.5 hours. And customers can get a fully underwritten policy in a few days. w/ @TBPN

My pitch to Sam Altman was simple - "We want to serve a billion people" For that scale, we rebuilt the insurance infrastructure with AI and Bitcoin. Today, meanwhile | Bitcoin Life Insurance can close quarterly books in 2.5 hours. And customers can get a fully underwritten policy in a few days. w/ @TBPN

Zac Townsend

10,738 Aufrufe • vor 2 Monaten

We (@RohitDilip8 Alex Beatson Ishaan Javali + I) won 2nd place & the Chroma prize at the Scale AI GenAI hackathon w/ Protex: the protein therapeutics universe is immense, we focus the search space w/ OpenAI & Meta ESM embeddings of InterPro, Chroma vector db, InfoNCE & Vercel

We (@RohitDilip8 Alex Beatson Ishaan Javali + I) won 2nd place & the Chroma prize at the Scale AI GenAI hackathon w/ Protex: the protein therapeutics universe is immense, we focus the search space w/ OpenAI & Meta ESM embeddings of InterPro, Chroma vector db, InfoNCE & Vercel

Shawn

10,926 Aufrufe • vor 2 Jahren

How does Aaron Rodgers want to see the Steelers learn from last week’s loss and respond? “I don’t like getting too binary, but winning. That’s a good response, but we can’t get attached to the binary system that our league is judged on necessarily because it’s a 17-game season.”

How does Aaron Rodgers want to see the Steelers learn from last week’s loss and respond? “I don’t like getting too binary, but winning. That’s a good response, but we can’t get attached to the binary system that our league is judged on necessarily because it’s a 17-game season.”

Brooke Pryor

423,820 Aufrufe • vor 8 Monaten

INTELLIGENT TASKS ARE A STEPPING STONE TO AGI Today, we are launching ChatLLM Tasks. We have hooked up tools like web search, email, and web scrappers to mini-agents that can be triggered on a schedule. These tasks combine intelligent tool use with crons! We think this is the first step towards AGI. The next step is to connect our AI engineer to create more complex tasks. AGI STEP ONE - DONE!

INTELLIGENT TASKS ARE A STEPPING STONE TO AGI Today, we are launching ChatLLM Tasks. We have hooked up tools like web search, email, and web scrappers to mini-agents that can be triggered on a schedule. These tasks combine intelligent tool use with crons! We think this is the first step towards AGI. The next step is to connect our AI engineer to create more complex tasks. AGI STEP ONE - DONE!

Bindu Reddy

18,486 Aufrufe • vor 1 Jahr

Supabase can be used as a vector database! This means that you can perform a semantic search against Supabase! This allows you to create RAG apps or content recommendation engines on top of Supabase! Learn what embeddings are, and how you can use them 👇

Supabase can be used as a vector database! This means that you can perform a semantic search against Supabase! This allows you to create RAG apps or content recommendation engines on top of Supabase! Learn what embeddings are, and how you can use them 👇

Tyler Shukert

39,359 Aufrufe • vor 1 Jahr

We raised $85M in Series B funding at a $700M valuation, led by Benchmark. Exa is a research lab building the search engine for AI.

We raised $85M in Series B funding at a $700M valuation, led by Benchmark. Exa is a research lab building the search engine for AI.

Exa

1,807,827 Aufrufe • vor 9 Monaten

🚨 Treasury Secretary Scott Bessent gave a FASCINATING insight into Trump's tariff strategy with regard to China today. BESSENT: "At the end of the day... we can probably reach a deal with our allies... and then we can approach China as a group." Helps to explain the 90-day pause with other nations & lowered reciprocal tariffs but the increased tariffs on China. ⬇️

🚨 Treasury Secretary Scott Bessent gave a FASCINATING insight into Trump's tariff strategy with regard to China today. BESSENT: "At the end of the day... we can probably reach a deal with our allies... and then we can approach China as a group." Helps to explain the 90-day pause with other nations & lowered reciprocal tariffs but the increased tariffs on China. ⬇️

Kayleigh McEnany

150,051 Aufrufe • vor 1 Jahr

How did we build & scale Replit Agent? Find out from a Replit AI engineer (and a good friend), James Austin ⠕ We chat about building Replit Agent, what makes a great SWE, and James' journey to becoming an AI engineer

How did we build & scale Replit Agent? Find out from a Replit AI engineer (and a good friend), James Austin ⠕ We chat about building Replit Agent, what makes a great SWE, and James' journey to becoming an AI engineer

matt palmer

26,222 Aufrufe • vor 6 Monaten

Introducing Exa Monitors - your agent’s radar for the web Exa is a search engine built from scratch, and today we're exposing our "update" layer. Simply define what to find and how often - Monitors will return any new information, via webhook.

Introducing Exa Monitors - your agent’s radar for the web Exa is a search engine built from scratch, and today we're exposing our "update" layer. Simply define what to find and how often - Monitors will return any new information, via webhook.

Exa

121,180 Aufrufe • vor 2 Monaten

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action.

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action.

Thinking Machines

7,708,033 Aufrufe • vor 25 Tagen

Get Your Very Own AI Engineer! We present your very OWN AI Engineer as part of ChatLLM! Watch the engineer create a custom bot based on your documents. It will create a vector store, configure your custom bot, and deploy it in minutes. An end-to-end custom RAG system becomes instantly available! This is just the beginning! We hope to upskill the Engineer rapidly.

Get Your Very Own AI Engineer! We present your very OWN AI Engineer as part of ChatLLM! Watch the engineer create a custom bot based on your documents. It will create a vector store, configure your custom bot, and deploy it in minutes. An end-to-end custom RAG system becomes instantly available! This is just the beginning! We hope to upskill the Engineer rapidly.

Bindu Reddy

77,693 Aufrufe • vor 1 Jahr

We believe AI can be a dedicated research partner to help discover the next breakthrough. Enter Co-Scientist: our latest Gemini-based multi-agent system that can generate, debate and evolve novel hypotheses for complex scientific problems 🧵

We believe AI can be a dedicated research partner to help discover the next breakthrough. Enter Co-Scientist: our latest Gemini-based multi-agent system that can generate, debate and evolve novel hypotheses for complex scientific problems 🧵

Google DeepMind

172,609 Aufrufe • vor 3 Tagen

I built a Pinterest clone that uses AI to find similar images I crawled tumblr and collected lots of images, then used a model to get vector embeddings. When you click an image it finds the most similar embeddings and returns the images

I built a Pinterest clone that uses AI to find similar images I crawled tumblr and collected lots of images, then used a model to get vector embeddings. When you click an image it finds the most similar embeddings and returns the images

ab

15,330 Aufrufe • vor 2 Jahren

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built HydraDB for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built HydraDB for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️

Nishkarsh

3,854,883 Aufrufe • vor 2 Monaten

What does the future of TV really look like? “A TV is no longer just a screen — it’s a system.” Hear how industry experts explain the role of SQD-Mini LED and AI at TCL’s CES 2026 Tech Talk.

What does the future of TV really look like? “A TV is no longer just a screen — it’s a system.” Hear how industry experts explain the role of SQD-Mini LED and AI at TCL’s CES 2026 Tech Talk.

TCL

10,030,680 Aufrufe • vor 4 Monaten