High Watermark: Architecture Patterns

High Watermark help you prevent inconsistent reads from clients, even when the leader goes down! Used internally in Kafka.

Pratik Pandey

Jun 15, 2022

High Watermark allows you to see a consistent view of the data in a distributed system. Kafka leverages high watermark to ensure consumers do not see any data that doesn’t exist on a Quorum of nodes — High Watermark Ensures a Consistent View in a Distributed System

Prerequisites:

Write Ahead Log(WAL)

Pratik Pandey

July 7, 2023

Read full story

Quorum — Distributed Design Pattern

Pratik Pandey

June 15, 2022

Read full story

Problem:

Before we go into what high watermarks are, and where they help, let’s look into the problem statement and why they were needed.

We talked about how WAL can help us recover our state from server crashes and restarts. But if you have a single server & it goes down, then WAL can’t help you with availability. This is where we resort to a distributed setup, where we keep a cluster of servers and under the leader-follower pattern, the leader replicates the log to a Quorum of its followers.

Stateful Distributed Systems generally ensure that they keep more than one copy of the data for fault tolerance & high availability. But maintaining multiple copies of the data, across multiple nodes, leads to challenges in maintaining strong consistency as -

The leader can fail before the entry has been synced to all followers.
The leader can fail after the entry has synced to some but not all followers.

Let me explain the problem in detail -

Let’s assume we have a 5-node cluster, with a leader-follower setup. Each node will act as a leader of a particular partition, and other nodes will hold a replica of the data. In such a system, all the writes are handled by the leader of that partition and the followers will keep a replica, for high resiliency and availability in case the leader goes down.

Our requirement is to have strong consistency with the mentioned setup(highly available & resilient), i.e. our application shouldn’t have data inconsistencies(data disappearing, repeatable reads etc).

Here’s the flow of events that would happen in general -

Each write to the leader is written to the WAL, to ensure resiliency from crashes or failures.
The followers can then get this write asynchronously, either through PUSH(the leader pushes the write to all followers) or PULL(followers poll the leader continuously for any mutations to WAL).
The read & write for any partition will always go to the leader of that partition.

Failure Scenario

Let’s assume, the leader received a write operation. The leader wrote the transaction on the WAL. Let’s also take that a consumer read the operation immediately after it was written, and before the operation could be propagated to all the followers, the leader crashed.

Post the leader crash, the cluster would undergo Leader Election, & one of the followers becomes the new leader for that partition. However, the latest changes from the previous leader were not replicated to the new leader, i.e. new leader is behind the old leader.

Now let’s assume, another consumer tries to read the latest record. Since the new leader doesn’t have the latest write, this consumer doesn’t know about that record. This leads to data inconsistency/data loss, which is exactly what we didn’t want!!

Note: We do have these transactions in the WAL on the old leader, but those log entries cannot be recovered until the old leader becomes alive again.

Solution:

To overcome the problem, we use the concept of a High Watermark.

The leader keeps track of the indexes of the entries that have been successfully replicated on each follower. The high-water mark index is the highest index, which has been replicated on the quorum of the followers.

The leader can push the high-water mark index to all followers as part of a heartbeat message(in case it’s a push-based model)/leader can respond to the pull request from the followers with the high watermark index. We’ll go ahead assuming the pull-based approach for our example.

Red marks the Current High Watermark. Each Follower Pulls the Data from the Leader by sending its current offset so that the leader can respond with the latest data it has, beyond the offset requested by the follower. Along with the latest data, the leader also informs the followers of the High Watermark index that it has.

Above is a snapshot once the followers are in sync with the leader. You must have a question, how does the leader know all followers/quorum of followers are in sync & the high watermark can be moved?

The leader gets pull requests from the followers, with the latest offset they are in sync with. Hence the leader can easily make a call on when to update the high watermark. Once the high watermark is updated on the leader, with the next fetch, the leader will propagate the updated high watermark to the followers.

Now, from the above image, even though offset 3 was written to the WAL on the leader since the high watermark is still on offset 2, no consumer will ever get to read data from offset 3, allowing us high consistency.

This guarantees that even if the leader fails and another leader is elected, the client will not see any data inconsistencies, as any client would not have read anything beyond the high watermark. This is how we can prevent inconsistent reads while ensuring high availability and resiliency……..

So we successfully handled leader failures, and we won’t see data inconsistencies. Let’s take the state as of the second snapshot, where the leader has data till offset 3, and the followers only have the state till offset 2. The high watermark is also till offset 2.

Let’s assume that the leader crashed at this point. Any of the other servers is selected as a new leader for that partition. Now, let’s assume that there was a new write on this new leader, and they were written to this new leader. Below is the current state(notice S1 is missing) -

Now when S1 comes back up, its WAL will contain 3, however, the state of the other servers does not have entry 3. This is again leading to an inconsistent state… How do we handle this? We’ll discuss it in detail in the next article! :)

Real-Life Use Cases of High Watermarks:

Kafka keeps a high watermark to ensure that consumers never read inconsistent data.
Apache Bookkeeper uses it to keep track of records that are replicated successfully across a quorum of servers.

I’ve used the high-watermark concept myself to keep my clients from reading inconsistent data. So hopefully you’ll learn from this article and apply it!

High Watermark: Architecture Patterns

High Watermark help you prevent inconsistent reads from clients, even when the leader goes down! Used internally in Kafka.

Prerequisites:

Write Ahead Log(WAL)

Quorum — Distributed Design Pattern

Problem:

Failure Scenario

Solution:

Discussion about this post