Write Ahead Log(WAL)

Guaranteeing strong durability with performance in Distributed Systems!

Jul 07, 2023

In the realm of distributed systems, ensuring both high durability and optimal performance is a formidable challenge. Have you ever wondered how these systems manage to achieve such resilience, even in the face of server crashes, power failures, or hardware malfunctions? The answer lies within the realm of Write-Ahead Logs (WAL), a prevailing pattern that guarantees durability and contributes significantly to the functionality of distributed systems.

WAL — Write Ahead Log

Although not exclusively a distributed design pattern, WAL has become a ubiquitous technique employed in a wide range of distributed systems.

WAL’s purpose is to guarantee atomicity and durability (A&D) in traditional Relational Database Management Systems (RDBMS). Essentially, all mutations made to a table are first written to the WAL or Transaction Log before being asynchronously applied to the actual data files of the tables.

Sample WAL and WALEntry structure:

type WAL struct {
    dir       string // directory under which WAL files are stored.

    file      *os.File // reference to the file

    metadata  []byte           // metadata recorded at the head of each WAL
    decoder   *decoder       // decoder to decode records
    encoder   *encoder // encoder to encode records
    
    mutex          sync.Mutex // To ensure single update per writer
    lastIndex      uint64   // index of the last entry saved to the WAL
}

type WALEntry struct {
    lsn      uint64 // unique identifier for each log entry
    data     []byte // actual WAL entry in bytes
    crc      uint32 // crc for data integrity validation
    type     uint32 // type of wal record  
}

Why WAL?

Why do we need WAL? Why not flush changes directly to the actual data files? There are two key aspects to consider:

Caches and Durability: When performing a write-to-disk operation, the data doesn't get flushed directly to the disk. Instead, it goes through various buffers (such as RAM, Buffer Cache, and Disk Cache) before being ultimately written to the disk sectors. These intermediate caches help reduce the number of disk writes, which improves performance. However, they pose a risk to data durability as the data residing in these caches may be lost in the event of restarts or crashes. By avoiding direct data file flushing, we can achieve improved performance. Nevertheless, frequent disk flushes for each write operation would adversely impact system performance and throughput.
Random Writes and Performance: Disk writes, particularly random writes, are slower compared to sequential writes. When dealing with multiple tables or entities in a data store, the likelihood of performing more random writes increases. Additionally, writing records to disk may require updating on-disk structures, such as sorting records by Clustered Indexes or updating on-disk indexes and auxiliary structures. These operations have a direct impact on system throughput and performance.

Wow WAL!

To overcome the aforementioned challenges, WAL serves as an append-only log, storing each state change in the data store as a command. Rather than flushing data mutations directly to disk for different tables or entities, we flush the operations or commands to the WAL using a single disk operation.

An asynchronous process can subsequently read the operations from the WAL and apply the data mutations to the data files on the disk, following the standard flow through various caches. This approach significantly enhances the write throughput of data stores.

In case of failures, some changes might not have been applied to the data files. However, since the operations are present in the WAL file, we can replay them and apply them to restore the data store to a consistent state. This crucial feature of WAL ensures data integrity, reliability, and high write throughput for our data stores.

Real-Life Implementation Considerations

Flushing WAL Operations to Disk -

While it is possible to force the operating system to flush changes directly to the disk sectors, frequent flushing for each write operation can impact performance in write-intensive systems. Striking the right balance between durability and performance involves making a tradeoff. It could involve adjusting the flush frequency, implementing micro-batching techniques, or adopting a combination of strategies.

Corruption Detection -

To ensure data integrity, each WAL record includes a CRC value that can be used to validate the record upon retrieval from the WAL. This mechanism helps detect and prevent data corruption.

Operation Duplication -

Since WAL is an append-only file, it is possible to encounter scenarios where duplicate operations are written to the WAL due to client retries caused by communication failures. Therefore, when reading the WAL to apply changes to the actual data store, it is essential to handle duplicates appropriately. This can involve either ignoring duplicates or ensuring that the application of operations to the data store is idempotent.

Real-Life Usage

WAL is widely employed in databases, including NoSQL databases like Cassandra, to guarantee durability.
Systems like Kafka utilize a similar structure known as a Commit Log, which serves a comparable purpose to WAL.
Key-Value Stores like Rocks DB, and Level DB, and distributed caches like Apache Ignite also leverage WAL for ensuring data integrity and durability.

Summary

In summary, Write Ahead Logs (WAL) play a pivotal role in achieving durability and performance in distributed systems. By circumventing direct data file flushing and instead leveraging the power of WAL, these systems achieve enhanced write throughput, recoverability in the face of failures, and the ability to restore to specific points in time. By comprehending WAL's implementation considerations and examining its real-life applications, developers can build robust distributed systems capable of meeting modern data storage demands.

Jan 25, 2024

I'm not sure I get the question around encoding/decoding logs. If the question is which format in which you'd store it, the answer is ideally something optimal for storage and efficient for machine consumption. So a binary or protobuf format is preferred over JSON.

As for error detection, yes, SHA-1 would work as well, but CRC is generally faster than computing cryptographic hash for something. So, prefer CRC.

Expand full comment

caleb erioluwa

what guideline can employed when trying to encode or decode logs ? also according to my findings on CRC as an error detection method . Can one use a normal checksum (SHA-1) for example to detect this