Version Vector(II): Resolving Concurrent Updates
Version Vectors are used to detect concurrent updates happening across multiple replicas in a distributed system.
Prerequisite:
In my last article, we saw how Distributed Data Stores use Version Vectors to identify concurrent updates to data records. We looked at one of the techniques of identifying concurrent updates/conflicts by leveraging ClientId as an Actor & the advantages and disadvantages of doing so. In this article, we’ll look at another approach for identifying concurrent updates/conflicts.
Server As An Actor
The problem with Server as an Actor is that of Actor Explosion, as the number of clients can grow to a very high number. To solve that, we can leverage servers as actors.
But, you can ask, we can have very large clusters as well, across multiple regions and that might face the same problem of Actor Explosion.
Yes, You’re right! Hence, we define servers as the number of nodes defined by the replication factor. If you remember, Each data record is tied to Version Vectors & hence for each data record, the maximum size of the version vectors will be the replication factor for the data in that cluster.
Let’s try to understand what’s happening in the above diagram -
Let’s assume we have a key K, with value U. We’re assuming that we have an empty version vector, to begin with. Client’s C2 and C3 sync the same state from the Replica(Assuming all clients are interacting with the same replica) that’s implementing Version Vectors.
C2 updates the value to W & sends a PUT command, with the local state of the Version Vector it has(empty VV).
C3 updates the value to V & sends a PUT command, with the local state of the Version Vector it has(empty VV).
Replica A receives the request from C3 first(C2’s request might be delayed because of network latency). Replica A compares the Version Vector it received with its local state & sees that they match. So it increments the counter to 1 & updates the value to V.
The request from C2 finally arrives at Replica A. Replica A compares the Version Vector it received with its local state & sees that the vector it received does not match its state. The following approaches can be taken —
Replica A can ignore the request from C2, as its local version vector is higher than the incoming version vector from C2.
Replica A can update the value to W and counter to 2. This way, we end up serializing requests from C2 & C3 and applying Last Write Wins strategy.
Replica A can store both the values V & W as siblings & increment the counter to 2. There is again a risk of Siblings' explosions here, and unlike the last two options, we’re leaving the conflict resolution between the two values to be done later!
Advantage -
Does not suffer from Actor Explosion, like the Client As An Actor approach.
Concerns -
Potential missing data if sequential write requests where the local version vector is higher, can be ignored.
The potential issue of Sibling Explosion if siblings are being stored. Also, as siblings grow in size, there would be performance issues during reconciliation.
If siblings are being stored, no way to track causality in the merged state. Eg — K: {(A, 2)}: W, V cannot tell me which update V/W happened before which.
This brings us to the end of this article. Server As An Actor is definitely a promising approach to avoid the explosion problem, but the server’s still a proxy to the clients which are actually performing the operations. Hence, it also suffers from issues based on different approaches, where without siblings, we can have data loss/updates and with siblings, we don’t have a way to track causality in the merged state.