Implementing Bloom Filters: Probabilistic Data Structures
Bloom Filters provide a unique approach to set membership verification. Let's implement Bloom Filters in Golang and understand how they work.
In my last post(link above), we talked about what Bloom Filters are, how they work and what are the factors influencing the false positivity rate of Bloom Filters. In this post, we’ll dive into the implementation details and then look into how we can alter different input parameters of Bloom Filter to offer lower false positives!
Implementation
Let’s define our Bloom Filter. It comprises the attributes, size(m) and hashCount(k), along with the bit array to store the values.
type BloomFilter struct {
bitArray []bool
size int
hashCount int
}
Each Bloom filter has two functions, one to add an element to the Bloom filter and the other to validate if an element exists in the Bloom filter.
func (bf *BloomFilter) Add(str string) {
for i := 0; i < bf.hashCount; i++ {
hash := hash(str, i) % bf.size
bf.bitArray[hash] = true
}
}
func (bf *BloomFilter) Contains(str string) bool {
for i := 0; i < bf.hashCount; i++ {
hash := hash(str, i) % bf.size
if !bf.bitArray[hash] {
return false
}
}
return true
}
Let’s define a basic hash function that we’ll be using to hash the elements and map them to the bits. You can choose to have different hash functions or use something other than murmur3, it is completely your call.
func hash(str string, seed int) int {
mmr := murmur3.New32WithSeed(uint32(seed))
_, err := mmr.Write([]byte(str))
if err != nil {
return 0
}
return int(mmr.Sum32())
}
And that’s it, you’ve implemented your Bloom Filters. But how do we validate the false positiveness %age of your Bloom Filters implementation and prove our theory that the false positivity rate of a Bloom filter is impacted by the factors we discussed in our previous post.