Video wird geladen...

Video konnte nicht geladen werden

Zur Startseite

Let Postgres own the Iceberg catalog and delegate analytics to DuckDB. The result => transactional lakehouse updates with fast analytical queries. This isn’t a concept. It’s exactly what pg_lake delivers today. pg_lake combines a set of extensions and components that let you query and modify Iceberg tables (and other...

28,461 Aufrufe • vor 4 Monaten •via X (Twitter)

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Your Postgres is 100x slower than traditional OLAP engines. A deceptively simple OSS extension fixes this. Here's an interview where we dive into the deep engineering around how this is achieved. Joining me (and leading the conversation) is Marco Slot: an engineer with an EXTENSIVE and impressive career history around PostgreSQL: 👉 Created pg_cron in 2017 (3.7k stars) - a tool to run cron-jobs in Postgres 👉 Built pg_incremental - fast, reliable, incremental batch processing inside PostgreSQL itself 👉 co-created pg_lake (after working on Crunchy Data's Warehouse, and getting acquired into Snowflake) 👉 Helped get pg_documentdb (MongoDB-on-Postgres) off the ground Marco Slot is a world-class expert in Postgres extensions. He seriously impressed me with his knowledge over the course of a private LinkedIn conversation, and now that I type out his resume - I understand where it came from. He should be on everyone's radar. So I brought him on the pod. In our full 2-hour deep-dive, we went over: • 🔥 how pg_lake makes analytics 100x faster (literally) • 🔥 perf internals like vectorized execution & CPU branching • 🤔 practical differences between OLTP and OLAP database development (and the age-old mission in uniting both) • 🤔 how (and why) pg_lake intercepts query plans and delegates parts of the query tree to DuckDB • 💡 why Postgres is architecturally terrible at analytical queries (and how vectorized execution fixes this) • 💡 Marco's hard-won experience through a decade+ career in Postgres • 🏆 Iceberg's role as the TCP/IP for tables • 🏆 what the real moat of PostgreSQL is Developments like pg_lake are a real reason why "Just Use Postgres" is much more than a meme, and it'll continue to dominate discourse. I promise you will learn a lot from this episode. Timestamps: (0:02) What is pg_lake? (2:23) Postgres' 100x slower problem and columnar storage experiments they had to make Postgres fast for analytics (6:00) practical examples and internals (16:20) perf internals - vectorized execution & CPU optimization (23:00) pg_lake architecture (why DuckDB isn't embedded) and the connection-per-process issue (29:16) how pg_lake intercepts the query plan tree and delegates parts to DuckDB (41:09) Iceberg catalogs (48:24) postgres to iceberg ingestion patterns (and pg_incremental) (53:40) Marco's (long) career: early AWS, Citus, Microsoft, Crunchy Data & Snowflake (1:04:20) Marco's observations around the merging between OLTP and OLAP (and the subtle dev differences there) (1:15:30) reverse ETL (1:33:08) Iceberg as the TCP/IP for tables (1:35:00) Marco's thoughts on the "Just Use Postgres" fever

Stanislav Kozlovski

16,620 Aufrufe • vor 2 Monaten

Your agents can't keep up with real-time data. Especially when it's scattered across dozens of sources. Most teams waste weeks building custom connectors for every database, API, and data warehouse. Then they build ETL pipelines to sync everything. By the time your agent retrieves the data, it's already outdated. Picture this: Your Postgres database updated 5 minutes ago. Your MongoDB collection changed 2 minutes ago. Your agent is still pulling from yesterday's snapshot. This is why most production RAG systems fail. There's a better approach: MindsDB is an open-source AI platform with a federated data engine that lets you query multiple data sources in real-time using SQL - without moving any data. Here's what makes it different: ↳ Your data stays in place. No ETL pipelines or data duplication ↳ Query Postgres, MongoDB, REST APIs, and more using consistent SQL ↳ JOIN across different sources in real-time with a unified interface ↳ Works with both structured and un-structured data And here's the best part: You don't even need to write SQL. Just describe what you want in plain English, and MindsDB converts it to SQL automatically. The system does all the heavy lifting. The breakthrough for AI agents is simple: When data updates at the source, your agent gets fresh results immediately. No sync delays. No stale embeddings. No custom code for each integration. You can literally write a SQL query that joins a Postgres table with a MongoDB collection and gets live results. This is what production AI applications need but rarely get. In this video, I give you a complete walkthrough of what we just discussed and how to actually do it. Make sure you watch this till the end. I've shared the link to MindsDB's GitHub repo in the next tweet!

Akshay 🚀

65,672 Aufrufe • vor 7 Monaten