Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

How does Exa serve billion-scale vector search? We combine binary quantization, Matryoshka embeddings, SIMD, and IVF into a novel system that can beat alternatives like HNSW. Shreyas gave a talk today at the AI Engineer World's Fair explaining our approach! ⬇️

Exa

51,636 subscribers

85,482 просмотров • 1 год назад •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

Комментарии: 10

Фото профиля Jeffrey Wang

Jeffrey Wang1 год назад

@shreyas4_ @aiDotEngineer I wanna be nearest neighbors w/ @shreyas4_

Фото профиля Tigran III

Tigran III1 год назад

@shreyas4_ @aiDotEngineer i am still struggling to believe how much cracked engineering talent is coming from that one university. @shreyas4_ what's the secret sauce?

Фото профиля Martyn Strydom 🤸

Martyn Strydom 🤸1 год назад

@shreyas4_ @aiDotEngineer Unreal @shreyas4_

Фото профиля Karan☕

Karan☕1 год назад

@shreyas4_ @aiDotEngineer great talk learned a lot of new things, had this question: I think if you use binary quantization, for smaller embeddings you will get poorer results because of lossy compression(already dimension reduction is done and then BQ)

Фото профиля Prashant Dixit

Prashant Dixit1 год назад

@shreyas4_ @aiDotEngineer Anyone wants to just give a quick try and Build Matryoshka Embedding based RAG in a min, Give it a try 🙂

Фото профиля sophia

sophia1 год назад

@shreyas4_ @aiDotEngineer I'm confused why you said 8TB of memory to hold everything in RAM is too expensive. Back of the envelope Hetzner has 24 core/192GB systems for $366/mo. 8TB would be ~$200k/y or ~18k queries/$ @ 100 QPS

Фото профиля Hamish Ogilvy

Hamish Ogilvy1 год назад

@shreyas4_ @aiDotEngineer Nice work. So funny how obsessed people were with HNSW…

Фото профиля omkaar

omkaar1 год назад

@shreyas4_ @aiDotEngineer awesome great job guys

Фото профиля Aarush Sah

Aarush Sah1 год назад

@shreyas4_ @aiDotEngineer i love shreyas shreyas is so cool

Фото профиля agi

agi1 год назад

@shreyas4_ @aiDotEngineer love this - great insight for my product

Похожие видео

Gave a talk on how build AI apps in a weekend that go viral and scale to millions last year. AMA! Thanks for having me AI Engineer!

Gave a talk on how build AI apps in a weekend that go viral and scale to millions last year. AMA! Thanks for having me AI Engineer!

Hassan

83,260 просмотров • 1 год назад

Our approach to AI infra is simple: build the most fungible and flexible fleet to meet the real world's needs across inference and training as Scott Guthrie shared with Alex Kantrowitz. And we are already doing it at scale today, as we power the biggest AI workloads like Copilot and ChatGPT, APIs that power 3P products & enterprise workloads and high scale training.

Our approach to AI infra is simple: build the most fungible and flexible fleet to meet the real world's needs across inference and training as Scott Guthrie shared with Alex Kantrowitz. And we are already doing it at scale today, as we power the biggest AI workloads like Copilot and ChatGPT, APIs that power 3P products & enterprise workloads and high scale training.

Satya Nadella

111,599 просмотров • 8 месяцев назад

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Andrew Ng

146,200 просмотров • 1 год назад

How do computers understand data? With semantic search! Instead of just matching keywords, it understands context using vector embeddings. Here’s how: 1) Convert data (text, images, etc.) into vectors (embeddings) 2) Store these vectors in a vector database 3) Search by meaning, not just the keywords Semantic search makes finding data across formats easier. Learn more in this blog post by Leonie, my all-time favorite:

How do computers understand data? With semantic search! Instead of just matching keywords, it understands context using vector embeddings. Here’s how: 1) Convert data (text, images, etc.) into vectors (embeddings) 2) Store these vectors in a vector database 3) Search by meaning, not just the keywords Semantic search makes finding data across formats easier. Learn more in this blog post by Leonie, my all-time favorite:

Femke Plantinga

23,911 просмотров • 1 год назад

We (@RohitDilip8 Alex Beatson Ishaan Javali + I) won 2nd place & the Chroma prize at the Scale AI GenAI hackathon w/ Protex: the protein therapeutics universe is immense, we focus the search space w/ OpenAI & Meta ESM embeddings of InterPro, Chroma vector db, InfoNCE & Vercel

We (@RohitDilip8 Alex Beatson Ishaan Javali + I) won 2nd place & the Chroma prize at the Scale AI GenAI hackathon w/ Protex: the protein therapeutics universe is immense, we focus the search space w/ OpenAI & Meta ESM embeddings of InterPro, Chroma vector db, InfoNCE & Vercel

Shawn

10,926 просмотров • 2 лет назад

My pitch to Sam Altman was simple - "We want to serve a billion people" For that scale, we rebuilt the insurance infrastructure with AI and Bitcoin. Today, meanwhile | Bitcoin Life Insurance can close quarterly books in 2.5 hours. And customers can get a fully underwritten policy in a few days. w/ @TBPN

My pitch to Sam Altman was simple - "We want to serve a billion people" For that scale, we rebuilt the insurance infrastructure with AI and Bitcoin. Today, meanwhile | Bitcoin Life Insurance can close quarterly books in 2.5 hours. And customers can get a fully underwritten policy in a few days. w/ @TBPN

Zac Townsend

10,738 просмотров • 2 месяцев назад

How does Aaron Rodgers want to see the Steelers learn from last week’s loss and respond? “I don’t like getting too binary, but winning. That’s a good response, but we can’t get attached to the binary system that our league is judged on necessarily because it’s a 17-game season.”

How does Aaron Rodgers want to see the Steelers learn from last week’s loss and respond? “I don’t like getting too binary, but winning. That’s a good response, but we can’t get attached to the binary system that our league is judged on necessarily because it’s a 17-game season.”

Brooke Pryor

423,820 просмотров • 8 месяцев назад

INTELLIGENT TASKS ARE A STEPPING STONE TO AGI Today, we are launching ChatLLM Tasks. We have hooked up tools like web search, email, and web scrappers to mini-agents that can be triggered on a schedule. These tasks combine intelligent tool use with crons! We think this is the first step towards AGI. The next step is to connect our AI engineer to create more complex tasks. AGI STEP ONE - DONE!

INTELLIGENT TASKS ARE A STEPPING STONE TO AGI Today, we are launching ChatLLM Tasks. We have hooked up tools like web search, email, and web scrappers to mini-agents that can be triggered on a schedule. These tasks combine intelligent tool use with crons! We think this is the first step towards AGI. The next step is to connect our AI engineer to create more complex tasks. AGI STEP ONE - DONE!

Bindu Reddy

18,486 просмотров • 1 год назад

Supabase can be used as a vector database! This means that you can perform a semantic search against Supabase! This allows you to create RAG apps or content recommendation engines on top of Supabase! Learn what embeddings are, and how you can use them 👇

Supabase can be used as a vector database! This means that you can perform a semantic search against Supabase! This allows you to create RAG apps or content recommendation engines on top of Supabase! Learn what embeddings are, and how you can use them 👇

Tyler Shukert

39,359 просмотров • 1 год назад

Today, Scale AI powers the world's leading AI systems by providing essential data labeling. His company works with OpenAI, Google, Meta, and the Department of Defense. In 2024, Scale AI reached a $13.8 billion valuation. But he didn't just build a company.

Today, Scale AI powers the world's leading AI systems by providing essential data labeling. His company works with OpenAI, Google, Meta, and the Department of Defense. In 2024, Scale AI reached a $13.8 billion valuation. But he didn't just build a company.

Jens Honack

50,336 просмотров • 1 год назад

We raised $85M in Series B funding at a $700M valuation, led by Benchmark. Exa is a research lab building the search engine for AI.

We raised $85M in Series B funding at a $700M valuation, led by Benchmark. Exa is a research lab building the search engine for AI.

Exa

1,807,827 просмотров • 9 месяцев назад

🚨 Treasury Secretary Scott Bessent gave a FASCINATING insight into Trump's tariff strategy with regard to China today. BESSENT: "At the end of the day... we can probably reach a deal with our allies... and then we can approach China as a group." Helps to explain the 90-day pause with other nations & lowered reciprocal tariffs but the increased tariffs on China. ⬇️

🚨 Treasury Secretary Scott Bessent gave a FASCINATING insight into Trump's tariff strategy with regard to China today. BESSENT: "At the end of the day... we can probably reach a deal with our allies... and then we can approach China as a group." Helps to explain the 90-day pause with other nations & lowered reciprocal tariffs but the increased tariffs on China. ⬇️

Kayleigh McEnany

150,051 просмотров • 1 год назад

How did we build & scale Replit Agent? Find out from a Replit AI engineer (and a good friend), James Austin ⠕ We chat about building Replit Agent, what makes a great SWE, and James' journey to becoming an AI engineer

How did we build & scale Replit Agent? Find out from a Replit AI engineer (and a good friend), James Austin ⠕ We chat about building Replit Agent, what makes a great SWE, and James' journey to becoming an AI engineer

matt palmer

26,222 просмотров • 6 месяцев назад

Introducing Exa Monitors - your agent’s radar for the web Exa is a search engine built from scratch, and today we're exposing our "update" layer. Simply define what to find and how often - Monitors will return any new information, via webhook.

Introducing Exa Monitors - your agent’s radar for the web Exa is a search engine built from scratch, and today we're exposing our "update" layer. Simply define what to find and how often - Monitors will return any new information, via webhook.

Exa

121,344 просмотров • 2 месяцев назад

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action.

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action.

Thinking Machines

7,709,515 просмотров • 25 дней назад

Get Your Very Own AI Engineer! We present your very OWN AI Engineer as part of ChatLLM! Watch the engineer create a custom bot based on your documents. It will create a vector store, configure your custom bot, and deploy it in minutes. An end-to-end custom RAG system becomes instantly available! This is just the beginning! We hope to upskill the Engineer rapidly.

Get Your Very Own AI Engineer! We present your very OWN AI Engineer as part of ChatLLM! Watch the engineer create a custom bot based on your documents. It will create a vector store, configure your custom bot, and deploy it in minutes. An end-to-end custom RAG system becomes instantly available! This is just the beginning! We hope to upskill the Engineer rapidly.

Bindu Reddy

77,693 просмотров • 1 год назад

We believe AI can be a dedicated research partner to help discover the next breakthrough. Enter Co-Scientist: our latest Gemini-based multi-agent system that can generate, debate and evolve novel hypotheses for complex scientific problems 🧵

We believe AI can be a dedicated research partner to help discover the next breakthrough. Enter Co-Scientist: our latest Gemini-based multi-agent system that can generate, debate and evolve novel hypotheses for complex scientific problems 🧵

Google DeepMind

175,139 просмотров • 4 дней назад

I built a Pinterest clone that uses AI to find similar images I crawled tumblr and collected lots of images, then used a model to get vector embeddings. When you click an image it finds the most similar embeddings and returns the images

I built a Pinterest clone that uses AI to find similar images I crawled tumblr and collected lots of images, then used a model to get vector embeddings. When you click an image it finds the most similar embeddings and returns the images

ab

15,330 просмотров • 2 лет назад

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built HydraDB for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built HydraDB for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️

Nishkarsh

3,854,883 просмотров • 2 месяцев назад

What does the future of TV really look like? “A TV is no longer just a screen — it’s a system.” Hear how industry experts explain the role of SQD-Mini LED and AI at TCL’s CES 2026 Tech Talk.

What does the future of TV really look like? “A TV is no longer just a screen — it’s a system.” Hear how industry experts explain the role of SQD-Mini LED and AI at TCL’s CES 2026 Tech Talk.

TCL

10,030,680 просмотров • 4 месяцев назад