Implementing Bloom Filters: Probabilistic Data Structures

Bloom Filters provide a unique approach to set membership verification. Let's implement Bloom Filters in Golang and understand how they work.

Pratik Pandey

Dec 17, 2023

∙ Paid

Bloom Filters: Probabilistic Data Structures

Pratik Pandey

December 5, 2023

Bloom Filters: Probabilistic Data Structures

A Bloom Filter is a data structure designed to quickly and efficiently check whether an element is a member of a set. However, as the title suggests, it’s a probabilistic data structure, meaning it can tell you with certainty if an element is not in a set, but only with a certain

Read full story

In my last post(link above), we talked about what Bloom Filters are, how they work and what are the factors influencing the false positivity rate of Bloom Filters. In this post, we’ll dive into the implementation details and then look into how we can alter different input parameters of Bloom Filter to offer lower false positives!

Implementation

Let’s define our Bloom Filter. It comprises the attributes, size(m) and hashCount(k), along with the bit array to store the values.

type BloomFilter struct {
	bitArray  []bool
	size      int
	hashCount int
}

Each Bloom filter has two functions, one to add an element to the Bloom filter and the other to validate if an element exists in the Bloom filter.


func (bf *BloomFilter) Add(str string) {
	for i := 0; i < bf.hashCount; i++ {
		hash := hash(str, i) % bf.size
		bf.bitArray[hash] = true
	}
}

func (bf *BloomFilter) Contains(str string) bool {
	for i := 0; i < bf.hashCount; i++ {
		hash := hash(str, i) % bf.size
		if !bf.bitArray[hash] {
			return false
		}
	}
	return true
}

Let’s define a basic hash function that we’ll be using to hash the elements and map them to the bits. You can choose to have different hash functions or use something other than murmur3, it is completely your call.


func hash(str string, seed int) int {
	mmr := murmur3.New32WithSeed(uint32(seed))
	_, err := mmr.Write([]byte(str))
	if err != nil {
		return 0
	}
	return int(mmr.Sum32())
}

And that’s it, you’ve implemented your Bloom Filters. But how do we validate the false positiveness %age of your Bloom Filters implementation and prove our theory that the false positivity rate of a Bloom filter is impacted by the factors we discussed in our previous post.

Implementing Bloom Filters: Probabilistic Data Structures

Bloom Filters provide a unique approach to set membership verification. Let's implement Bloom Filters in Golang and understand how they work.

Bloom Filters: Probabilistic Data Structures

Implementation

This post is for paid subscribers