Загрузка видео...

Не удалось загрузить видео

На главную

How Postgres 17 improves IN queries In prior releases, postgres would loop and do multiple "primitive" index scans as many as your IN elements. In Postgres 17 it takes an array and eliminates the need for multiple scans. This only applies for constant values

143,151 просмотров • 1 год назад •via X (Twitter)

Комментарии: 10

Фото профиля Hussein Nasser
Hussein Nasser1 год назад

btw the tool used in the video is the btree viewer from university of san francisco portal. should show up in google.

Фото профиля Shyam Kumar
Shyam Kumar1 год назад

Thanks for the video, Hussein! Postgres really helped solve the issue.

Фото профиля Abdelrahman
Abdelrahman1 год назад

Would it matter if the values in the IN operator is sorted as the index column or not?

Фото профиля Hussein Nasser
Hussein Nasser1 год назад

it actually might, the “loop” to lookup and find can break fast faster. worth testing but will only be visible in large in list.

Фото профиля Mohammed Al-Sabaawi
Mohammed Al-Sabaawi1 год назад

Is there a way to tell the algorithm that they are in adjacent pages (i.e a range query) so instead of making another search it could just add an offset and get the next item/page?

Фото профиля Hussein Nasser
Hussein Nasser1 год назад

@Mohammed_allabd Not in postgres but SQL server has this hint actually FORCESEEK vs FORCESCAN

Фото профиля Zaher
Zaher1 год назад

If I increase the page size, would it help improve IN query performance by fitting more elements in each page, given that each page holds an ID array and some concurrency metadata? Or would it come with trade-offs?

Фото профиля حسون
حسون1 год назад

So the old method in random IN query would be faster? Cuz it doesn't check the leaf for other elements in "IN query" which is clearly won't find them

Фото профиля Hussein Nasser
Hussein Nasser1 год назад

Worth testing, the question is does the extra lookup in the leaf page against the IN list adds much? Also we know the implementation always sorts IN list (not sure if this always was the case or mot), so we need to account for that too.

Фото профиля PrimeXBT PT
PrimeXBT PT1 год назад

Ei, você encontrou o elixir mágico da eficiência tecnológica! 🚀 Assim como Adriana Lima desfila nas passarelas ou Al Capone fazia suas manobras, Postgres 17 tá na vida, trazendo uma evolução estilo Lambo nos IN queries. É como transformar scans primitivos em uma sinfonia otimizada! Agora é só adicionar um pouco de blockchain no loop e o future é nosso. HODL esse conhecimento! #TechRevolution #KeepBUIDLing 🌟

Похожие видео

Your Postgres is 100x slower than traditional OLAP engines. A deceptively simple OSS extension fixes this. Here's an interview where we dive into the deep engineering around how this is achieved. Joining me (and leading the conversation) is Marco Slot: an engineer with an EXTENSIVE and impressive career history around PostgreSQL: 👉 Created pg_cron in 2017 (3.7k stars) - a tool to run cron-jobs in Postgres 👉 Built pg_incremental - fast, reliable, incremental batch processing inside PostgreSQL itself 👉 co-created pg_lake (after working on Crunchy Data's Warehouse, and getting acquired into Snowflake) 👉 Helped get pg_documentdb (MongoDB-on-Postgres) off the ground Marco Slot is a world-class expert in Postgres extensions. He seriously impressed me with his knowledge over the course of a private LinkedIn conversation, and now that I type out his resume - I understand where it came from. He should be on everyone's radar. So I brought him on the pod. In our full 2-hour deep-dive, we went over: • 🔥 how pg_lake makes analytics 100x faster (literally) • 🔥 perf internals like vectorized execution & CPU branching • 🤔 practical differences between OLTP and OLAP database development (and the age-old mission in uniting both) • 🤔 how (and why) pg_lake intercepts query plans and delegates parts of the query tree to DuckDB • 💡 why Postgres is architecturally terrible at analytical queries (and how vectorized execution fixes this) • 💡 Marco's hard-won experience through a decade+ career in Postgres • 🏆 Iceberg's role as the TCP/IP for tables • 🏆 what the real moat of PostgreSQL is Developments like pg_lake are a real reason why "Just Use Postgres" is much more than a meme, and it'll continue to dominate discourse. I promise you will learn a lot from this episode. Timestamps: (0:02) What is pg_lake? (2:23) Postgres' 100x slower problem and columnar storage experiments they had to make Postgres fast for analytics (6:00) practical examples and internals (16:20) perf internals - vectorized execution & CPU optimization (23:00) pg_lake architecture (why DuckDB isn't embedded) and the connection-per-process issue (29:16) how pg_lake intercepts the query plan tree and delegates parts to DuckDB (41:09) Iceberg catalogs (48:24) postgres to iceberg ingestion patterns (and pg_incremental) (53:40) Marco's (long) career: early AWS, Citus, Microsoft, Crunchy Data & Snowflake (1:04:20) Marco's observations around the merging between OLTP and OLAP (and the subtle dev differences there) (1:15:30) reverse ETL (1:33:08) Iceberg as the TCP/IP for tables (1:35:00) Marco's thoughts on the "Just Use Postgres" fever

Stanislav Kozlovski

16,620 просмотров • 2 месяцев назад