Video yükleniyor...

Video Yüklenemedi

Bu video yüklenirken bir sorun oluştu. Bu geçici bir ağ sorunundan kaynaklanıyor olabilir veya video kullanılamıyor olabilir.

Ana Sayfaya Dön

How does Exa serve billion-scale vector search? We combine binary quantization, Matryoshka embeddings, SIMD, and IVF into a novel system that can beat alternatives like HNSW. Shreyas gave a talk today at the AI Engineer World's Fair explaining our approach! ⬇️

Exa

51,636 subscribers

85,482 görüntüleme • 1 yıl önce •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

10 Yorum

Jeffrey Wang profil fotoğrafı

Jeffrey Wang1 yıl önce

@shreyas4_ @aiDotEngineer I wanna be nearest neighbors w/ @shreyas4_

Tigran III profil fotoğrafı

Tigran III1 yıl önce

@shreyas4_ @aiDotEngineer i am still struggling to believe how much cracked engineering talent is coming from that one university. @shreyas4_ what's the secret sauce?

Martyn Strydom 🤸 profil fotoğrafı

Martyn Strydom 🤸1 yıl önce

@shreyas4_ @aiDotEngineer Unreal @shreyas4_

Karan☕ profil fotoğrafı

Karan☕1 yıl önce

@shreyas4_ @aiDotEngineer great talk learned a lot of new things, had this question: I think if you use binary quantization, for smaller embeddings you will get poorer results because of lossy compression(already dimension reduction is done and then BQ)

Prashant Dixit profil fotoğrafı

Prashant Dixit1 yıl önce

@shreyas4_ @aiDotEngineer Anyone wants to just give a quick try and Build Matryoshka Embedding based RAG in a min, Give it a try 🙂

sophia profil fotoğrafı

sophia1 yıl önce

@shreyas4_ @aiDotEngineer I'm confused why you said 8TB of memory to hold everything in RAM is too expensive. Back of the envelope Hetzner has 24 core/192GB systems for $366/mo. 8TB would be ~$200k/y or ~18k queries/$ @ 100 QPS

Hamish Ogilvy profil fotoğrafı

Hamish Ogilvy1 yıl önce

@shreyas4_ @aiDotEngineer Nice work. So funny how obsessed people were with HNSW…

omkaar profil fotoğrafı

omkaar1 yıl önce

@shreyas4_ @aiDotEngineer awesome great job guys

Aarush Sah profil fotoğrafı

Aarush Sah1 yıl önce

@shreyas4_ @aiDotEngineer i love shreyas shreyas is so cool

agi profil fotoğrafı

agi1 yıl önce

@shreyas4_ @aiDotEngineer love this - great insight for my product

Benzer Videolar

Gave a talk on how build AI apps in a weekend that go viral and scale to millions last year. AMA! Thanks for having me AI Engineer!

Gave a talk on how build AI apps in a weekend that go viral and scale to millions last year. AMA! Thanks for having me AI Engineer!

Hassan

83,260 görüntüleme • 1 yıl önce

Our approach to AI infra is simple: build the most fungible and flexible fleet to meet the real world's needs across inference and training as Scott Guthrie shared with Alex Kantrowitz. And we are already doing it at scale today, as we power the biggest AI workloads like Copilot and ChatGPT, APIs that power 3P products & enterprise workloads and high scale training.

Our approach to AI infra is simple: build the most fungible and flexible fleet to meet the real world's needs across inference and training as Scott Guthrie shared with Alex Kantrowitz. And we are already doing it at scale today, as we power the biggest AI workloads like Copilot and ChatGPT, APIs that power 3P products & enterprise workloads and high scale training.

Satya Nadella

111,599 görüntüleme • 8 ay önce

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Tokenization -- turning text into a sequence of integers -- is a key part of generative AI, and most API providers charge per million tokens. How does tokenization work? Learn the details of tokenization and RAG optimization in Retrieval Optimization: From Tokenization to Vector Quantization, created in collaboration with Qdrant and taught by its Developer Relations Lead, Kacper Łukawski. This course focuses on Retrieval augmented generation (RAG), which has two steps: First, a retriever finds relevant information; then, the generator uses what’s retrieved as context to produce a response. You’ll learn to optimize the first step (the retriever) by understanding how tokenization works and how it impacts the relevance of your search. In addition, you will also learn to measure and improve retrieval quality, speed, and memory. In detail, you’ll: - Learn about the internal workings of the embedding models and how your text turns into vectors. - Understand how several tokenizers, such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece work. - Explore common challenges with tokenizers, such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. - Understand how to measure the quality of your search across relevance, ranking, and score-related metrics. - Understand how the main parameters in "HNSW", a graph-based algorithm, affect the relevance and speed of vector search, and how to tune its parameters. - Experiment with the three major quantization methods – product, scalar, and binary – and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’ll have a solid understanding of how tokenization functions and how to optimize vector search in your RAG systems. Please sign up here!

Andrew Ng

146,200 görüntüleme • 1 yıl önce

How do computers understand data? With semantic search! Instead of just matching keywords, it understands context using vector embeddings. Here’s how: 1) Convert data (text, images, etc.) into vectors (embeddings) 2) Store these vectors in a vector database 3) Search by meaning, not just the keywords Semantic search makes finding data across formats easier. Learn more in this blog post by Leonie, my all-time favorite:

How do computers understand data? With semantic search! Instead of just matching keywords, it understands context using vector embeddings. Here’s how: 1) Convert data (text, images, etc.) into vectors (embeddings) 2) Store these vectors in a vector database 3) Search by meaning, not just the keywords Semantic search makes finding data across formats easier. Learn more in this blog post by Leonie, my all-time favorite:

Femke Plantinga

23,911 görüntüleme • 1 yıl önce

My pitch to Sam Altman was simple - "We want to serve a billion people" For that scale, we rebuilt the insurance infrastructure with AI and Bitcoin. Today, meanwhile | Bitcoin Life Insurance can close quarterly books in 2.5 hours. And customers can get a fully underwritten policy in a few days. w/ @TBPN

My pitch to Sam Altman was simple - "We want to serve a billion people" For that scale, we rebuilt the insurance infrastructure with AI and Bitcoin. Today, meanwhile | Bitcoin Life Insurance can close quarterly books in 2.5 hours. And customers can get a fully underwritten policy in a few days. w/ @TBPN

Zac Townsend

10,738 görüntüleme • 2 ay önce

We (@RohitDilip8 Alex Beatson Ishaan Javali + I) won 2nd place & the Chroma prize at the Scale AI GenAI hackathon w/ Protex: the protein therapeutics universe is immense, we focus the search space w/ OpenAI & Meta ESM embeddings of InterPro, Chroma vector db, InfoNCE & Vercel

We (@RohitDilip8 Alex Beatson Ishaan Javali + I) won 2nd place & the Chroma prize at the Scale AI GenAI hackathon w/ Protex: the protein therapeutics universe is immense, we focus the search space w/ OpenAI & Meta ESM embeddings of InterPro, Chroma vector db, InfoNCE & Vercel

Shawn

10,926 görüntüleme • 2 yıl önce

How does Aaron Rodgers want to see the Steelers learn from last week’s loss and respond? “I don’t like getting too binary, but winning. That’s a good response, but we can’t get attached to the binary system that our league is judged on necessarily because it’s a 17-game season.”

How does Aaron Rodgers want to see the Steelers learn from last week’s loss and respond? “I don’t like getting too binary, but winning. That’s a good response, but we can’t get attached to the binary system that our league is judged on necessarily because it’s a 17-game season.”

Brooke Pryor

423,820 görüntüleme • 8 ay önce

INTELLIGENT TASKS ARE A STEPPING STONE TO AGI Today, we are launching ChatLLM Tasks. We have hooked up tools like web search, email, and web scrappers to mini-agents that can be triggered on a schedule. These tasks combine intelligent tool use with crons! We think this is the first step towards AGI. The next step is to connect our AI engineer to create more complex tasks. AGI STEP ONE - DONE!

INTELLIGENT TASKS ARE A STEPPING STONE TO AGI Today, we are launching ChatLLM Tasks. We have hooked up tools like web search, email, and web scrappers to mini-agents that can be triggered on a schedule. These tasks combine intelligent tool use with crons! We think this is the first step towards AGI. The next step is to connect our AI engineer to create more complex tasks. AGI STEP ONE - DONE!

Bindu Reddy

18,486 görüntüleme • 1 yıl önce

Supabase can be used as a vector database! This means that you can perform a semantic search against Supabase! This allows you to create RAG apps or content recommendation engines on top of Supabase! Learn what embeddings are, and how you can use them 👇

Supabase can be used as a vector database! This means that you can perform a semantic search against Supabase! This allows you to create RAG apps or content recommendation engines on top of Supabase! Learn what embeddings are, and how you can use them 👇

Tyler Shukert

39,359 görüntüleme • 1 yıl önce

We raised $85M in Series B funding at a $700M valuation, led by Benchmark. Exa is a research lab building the search engine for AI.

We raised $85M in Series B funding at a $700M valuation, led by Benchmark. Exa is a research lab building the search engine for AI.

Exa

1,807,827 görüntüleme • 9 ay önce

🚨 Treasury Secretary Scott Bessent gave a FASCINATING insight into Trump's tariff strategy with regard to China today. BESSENT: "At the end of the day... we can probably reach a deal with our allies... and then we can approach China as a group." Helps to explain the 90-day pause with other nations & lowered reciprocal tariffs but the increased tariffs on China. ⬇️

🚨 Treasury Secretary Scott Bessent gave a FASCINATING insight into Trump's tariff strategy with regard to China today. BESSENT: "At the end of the day... we can probably reach a deal with our allies... and then we can approach China as a group." Helps to explain the 90-day pause with other nations & lowered reciprocal tariffs but the increased tariffs on China. ⬇️

Kayleigh McEnany

150,051 görüntüleme • 1 yıl önce

How did we build & scale Replit Agent? Find out from a Replit AI engineer (and a good friend), James Austin ⠕ We chat about building Replit Agent, what makes a great SWE, and James' journey to becoming an AI engineer

How did we build & scale Replit Agent? Find out from a Replit AI engineer (and a good friend), James Austin ⠕ We chat about building Replit Agent, what makes a great SWE, and James' journey to becoming an AI engineer

matt palmer

26,222 görüntüleme • 6 ay önce

Introducing Exa Monitors - your agent’s radar for the web Exa is a search engine built from scratch, and today we're exposing our "update" layer. Simply define what to find and how often - Monitors will return any new information, via webhook.

Introducing Exa Monitors - your agent’s radar for the web Exa is a search engine built from scratch, and today we're exposing our "update" layer. Simply define what to find and how often - Monitors will return any new information, via webhook.

Exa

121,180 görüntüleme • 2 ay önce

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action.

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action.

Thinking Machines

7,708,890 görüntüleme • 25 gün önce

Get Your Very Own AI Engineer! We present your very OWN AI Engineer as part of ChatLLM! Watch the engineer create a custom bot based on your documents. It will create a vector store, configure your custom bot, and deploy it in minutes. An end-to-end custom RAG system becomes instantly available! This is just the beginning! We hope to upskill the Engineer rapidly.

Get Your Very Own AI Engineer! We present your very OWN AI Engineer as part of ChatLLM! Watch the engineer create a custom bot based on your documents. It will create a vector store, configure your custom bot, and deploy it in minutes. An end-to-end custom RAG system becomes instantly available! This is just the beginning! We hope to upskill the Engineer rapidly.

Bindu Reddy

77,693 görüntüleme • 1 yıl önce

We believe AI can be a dedicated research partner to help discover the next breakthrough. Enter Co-Scientist: our latest Gemini-based multi-agent system that can generate, debate and evolve novel hypotheses for complex scientific problems 🧵

We believe AI can be a dedicated research partner to help discover the next breakthrough. Enter Co-Scientist: our latest Gemini-based multi-agent system that can generate, debate and evolve novel hypotheses for complex scientific problems 🧵

Google DeepMind

172,609 görüntüleme • 3 gün önce

I built a Pinterest clone that uses AI to find similar images I crawled tumblr and collected lots of images, then used a model to get vector embeddings. When you click an image it finds the most similar embeddings and returns the images

I built a Pinterest clone that uses AI to find similar images I crawled tumblr and collected lots of images, then used a model to get vector embeddings. When you click an image it finds the most similar embeddings and returns the images

ab

15,330 görüntüleme • 2 yıl önce

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built HydraDB for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built HydraDB for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️

Nishkarsh

3,854,883 görüntüleme • 2 ay önce

What does the future of TV really look like? “A TV is no longer just a screen — it’s a system.” Hear how industry experts explain the role of SQD-Mini LED and AI at TCL’s CES 2026 Tech Talk.

What does the future of TV really look like? “A TV is no longer just a screen — it’s a system.” Hear how industry experts explain the role of SQD-Mini LED and AI at TCL’s CES 2026 Tech Talk.

TCL

10,030,680 görüntüleme • 4 ay önce