Stanislav Kozlovski's banner
Stanislav Kozlovski's profile picture

Stanislav Kozlovski

@kozlovski18,345 subscribers

"The Kafka Guy" 🧠 Have worked on Apache Kafka for 7+ years, now I write about it. (& the general data space) Low-frequency, highly-technical tweets. ✌️

Videos

kozlovski's profile picture

Your Postgres is 100x slower than traditional OLAP engines. A deceptively simple OSS extension fixes this. Here's an interview where we dive into the deep engineering around how this is achieved. Joining me (and leading the conversation) is Marco Slot: an engineer with an EXTENSIVE and impressive career history around PostgreSQL: 👉 Created pg_cron in 2017 (3.7k stars) - a tool to run cron-jobs in Postgres 👉 Built pg_incremental - fast, reliable, incremental batch processing inside PostgreSQL itself 👉 co-created pg_lake (after working on Crunchy Data's Warehouse, and getting acquired into Snowflake) 👉 Helped get pg_documentdb (MongoDB-on-Postgres) off the ground Marco Slot is a world-class expert in Postgres extensions. He seriously impressed me with his knowledge over the course of a private LinkedIn conversation, and now that I type out his resume - I understand where it came from. He should be on everyone's radar. So I brought him on the pod. In our full 2-hour deep-dive, we went over: • 🔥 how pg_lake makes analytics 100x faster (literally) • 🔥 perf internals like vectorized execution & CPU branching • 🤔 practical differences between OLTP and OLAP database development (and the age-old mission in uniting both) • 🤔 how (and why) pg_lake intercepts query plans and delegates parts of the query tree to DuckDB • 💡 why Postgres is architecturally terrible at analytical queries (and how vectorized execution fixes this) • 💡 Marco's hard-won experience through a decade+ career in Postgres • 🏆 Iceberg's role as the TCP/IP for tables • 🏆 what the real moat of PostgreSQL is Developments like pg_lake are a real reason why "Just Use Postgres" is much more than a meme, and it'll continue to dominate discourse. I promise you will learn a lot from this episode. Timestamps: (0:02) What is pg_lake? (2:23) Postgres' 100x slower problem and columnar storage experiments they had to make Postgres fast for analytics (6:00) practical examples and internals (16:20) perf internals - vectorized execution & CPU optimization (23:00) pg_lake architecture (why DuckDB isn't embedded) and the connection-per-process issue (29:16) how pg_lake intercepts the query plan tree and delegates parts to DuckDB (41:09) Iceberg catalogs (48:24) postgres to iceberg ingestion patterns (and pg_incremental) (53:40) Marco's (long) career: early AWS, Citus, Microsoft, Crunchy Data & Snowflake (1:04:20) Marco's observations around the merging between OLTP and OLAP (and the subtle dev differences there) (1:15:30) reverse ETL (1:33:08) Iceberg as the TCP/IP for tables (1:35:00) Marco's thoughts on the "Just Use Postgres" fever

Stanislav Kozlovski

16,663 views • 2 months ago

kozlovski's profile picture

Something amazing is coming to Apache Kafka… Consumer Groups v2! If you’ve ever used consumer groups in production at any non-trivial scale, you probably know all the problems with it: - ⛔️ Group-wide synchronization barrier acts as a cap on scalability A single misbehaving consumer can disturb the whole group. Even if you have cooperative rebalancing and static membership enabled, you will still have rebalances happen. 🤷‍♂️ It’s a fact of life I've heard - death, taxes & consumer group rebalances. And the problem is that even with cooperative rebalancing (which helped a lot!), you’re bound on waiting for the slowest member of the group to complete the rebalance(s). 🐌 The problem is that no consumer can commit offsets while a rebalance is in progress. ❌ Another subtle thing is that with cooperative rebalancing, a rebalance will take longer than usual. Why? Because consumers are allowed to process partitions during rebalances. They will call the poll() method more infrequently -- they're busy processing records after all. Thus, the overall rebalance time will increase. This makes it pretty hard to scale to 1000s of members. - 🤯 Complexity There’s a reason you’re reading this! The current protocol is pretty complex and hard to understand. It's used for a bunch of stuff, including metadata propagation in Kafka Streams. This compexity results in more: - 🐛 bugs The harder to reason about and the more moving parts you have - the greater chance for bugs. There have been quite a few in the protocol. And due to the fact that a lot of the protocol’s logic lives on the clients, that results in: - 🐌 slow fixes Bugs require client-side fixes, which are slow to be adopted. If you run a Kafka ops team, you know how hard it is to get all of your clients' teams to upgrade! If you're using a cloud service, you need to wait for a new Kafka release to go out. Can't have the cloud provider handle it for you behind the scenes! - 🔍 hard to debug Debugging is harder because you need client logs. In the cloud, that's hard to do again. On-prem, it requires reading through a lot of logs and collecting a lot of files. - ⚙️ very extendable There’s a reusable embedded protocol within the rebalance protocol, where clients are free to attach raw bytes that only they can then parse themselves. It's challenging to build compatible software for this cross-client protocol, as well as near-impossible to inspect from the broker side. - 😢 inconsistent metadata Clients are responsible for triggering rebalances based on the metadata, but different clients can have different views of the metadata. - 😵‍💫 interoperability Different implementations of the clients (i.e. anything besides the Java client) may have bugs. The complex logic needs to be re-implemented quite a few times. This usually means more bugs and slower time to ship features in your favorite client. A combination nobody likes. ... So what should an open source community do? Move the logic to the broker! Then? Simplify it. The new protocol is very elegant - it streamlines all of the regular Kafka consumer logic inside a new heartbeat API. It has the broker decide what partition assignments the consumers should have, and totally omits the notion of a Group Leader client. Another major change is that the notion of a group-wide rebalance is removed now. The rebalance is more fine-grained now. 👌 When you think about it, a rebalance is simply a reassignment of some partitions from some consumers to others. 💡 Why does the whole group need to stop and know about this? It had to before because the logic was on the client. It doesn’t now. 🎂 The new protocol is fine-grained in its assignments. It maintains per-member epochs, as well as separate epochs for the general group membership and the global assignment. The goal is simple - get all of those epoch numbers to be the same. The order is the following: 1. the group-wide epoch is bumped. 2. the target assignment epoch is bumped. 3. individual consumers catch up to the epoch via the heartbeat request, individually. (fine-grained) In general, what you have is a simple state machine inside the Group Coordinator broker that’s running a constant reconciliation loop. 💥 Because every member converges to the target state independently, the coordinator is free to simplify that convergence member by member. 👍 It has the logic to resolve dependencies between members - the act of: 1. revoking one member’s partition. 2. confirming the success of that. 3. bumping that member's epoch before assigning that partition to another consumer. Here is an example visual of what happens when two members join a consumer group one by one:

Stanislav Kozlovski

52,814 views • 3 years ago

No more content to load