正在加载视频...

视频加载失败

How Postgres 17 improves IN queries In prior releases, postgres would loop and do multiple "primitive" index scans as many as your IN elements. In Postgres 17 it takes an array and eliminates the need for multiple scans. This only applies for constant values

143,151 次观看 • 1 年前 •via X (Twitter)

10 条评论

Hussein Nasser 的头像
Hussein Nasser1 年前

btw the tool used in the video is the btree viewer from university of san francisco portal. should show up in google.

Shyam Kumar 的头像
Shyam Kumar1 年前

Thanks for the video, Hussein! Postgres really helped solve the issue.

Abdelrahman 的头像
Abdelrahman1 年前

Would it matter if the values in the IN operator is sorted as the index column or not?

Hussein Nasser 的头像
Hussein Nasser1 年前

it actually might, the “loop” to lookup and find can break fast faster. worth testing but will only be visible in large in list.

Mohammed Al-Sabaawi 的头像
Mohammed Al-Sabaawi1 年前

Is there a way to tell the algorithm that they are in adjacent pages (i.e a range query) so instead of making another search it could just add an offset and get the next item/page?

Hussein Nasser 的头像
Hussein Nasser1 年前

@Mohammed_allabd Not in postgres but SQL server has this hint actually FORCESEEK vs FORCESCAN

Zaher 的头像
Zaher1 年前

If I increase the page size, would it help improve IN query performance by fitting more elements in each page, given that each page holds an ID array and some concurrency metadata? Or would it come with trade-offs?

حسون 的头像
حسون1 年前

So the old method in random IN query would be faster? Cuz it doesn't check the leaf for other elements in "IN query" which is clearly won't find them

Hussein Nasser 的头像
Hussein Nasser1 年前

Worth testing, the question is does the extra lookup in the leaf page against the IN list adds much? Also we know the implementation always sorts IN list (not sure if this always was the case or mot), so we need to account for that too.

PrimeXBT PT 的头像
PrimeXBT PT1 年前

Ei, você encontrou o elixir mágico da eficiência tecnológica! 🚀 Assim como Adriana Lima desfila nas passarelas ou Al Capone fazia suas manobras, Postgres 17 tá na vida, trazendo uma evolução estilo Lambo nos IN queries. É como transformar scans primitivos em uma sinfonia otimizada! Agora é só adicionar um pouco de blockchain no loop e o future é nosso. HODL esse conhecimento! #TechRevolution #KeepBUIDLing 🌟

相关视频

Your Postgres is 100x slower than traditional OLAP engines. A deceptively simple OSS extension fixes this. Here's an interview where we dive into the deep engineering around how this is achieved. Joining me (and leading the conversation) is Marco Slot: an engineer with an EXTENSIVE and impressive career history around PostgreSQL: 👉 Created pg_cron in 2017 (3.7k stars) - a tool to run cron-jobs in Postgres 👉 Built pg_incremental - fast, reliable, incremental batch processing inside PostgreSQL itself 👉 co-created pg_lake (after working on Crunchy Data's Warehouse, and getting acquired into Snowflake) 👉 Helped get pg_documentdb (MongoDB-on-Postgres) off the ground Marco Slot is a world-class expert in Postgres extensions. He seriously impressed me with his knowledge over the course of a private LinkedIn conversation, and now that I type out his resume - I understand where it came from. He should be on everyone's radar. So I brought him on the pod. In our full 2-hour deep-dive, we went over: • 🔥 how pg_lake makes analytics 100x faster (literally) • 🔥 perf internals like vectorized execution & CPU branching • 🤔 practical differences between OLTP and OLAP database development (and the age-old mission in uniting both) • 🤔 how (and why) pg_lake intercepts query plans and delegates parts of the query tree to DuckDB • 💡 why Postgres is architecturally terrible at analytical queries (and how vectorized execution fixes this) • 💡 Marco's hard-won experience through a decade+ career in Postgres • 🏆 Iceberg's role as the TCP/IP for tables • 🏆 what the real moat of PostgreSQL is Developments like pg_lake are a real reason why "Just Use Postgres" is much more than a meme, and it'll continue to dominate discourse. I promise you will learn a lot from this episode. Timestamps: (0:02) What is pg_lake? (2:23) Postgres' 100x slower problem and columnar storage experiments they had to make Postgres fast for analytics (6:00) practical examples and internals (16:20) perf internals - vectorized execution & CPU optimization (23:00) pg_lake architecture (why DuckDB isn't embedded) and the connection-per-process issue (29:16) how pg_lake intercepts the query plan tree and delegates parts to DuckDB (41:09) Iceberg catalogs (48:24) postgres to iceberg ingestion patterns (and pg_incremental) (53:40) Marco's (long) career: early AWS, Citus, Microsoft, Crunchy Data & Snowflake (1:04:20) Marco's observations around the merging between OLTP and OLAP (and the subtle dev differences there) (1:15:30) reverse ETL (1:33:08) Iceberg as the TCP/IP for tables (1:35:00) Marco's thoughts on the "Just Use Postgres" fever

Stanislav Kozlovski

16,529 次观看 • 1 个月前