Have we ever wondered how websites validate millions of records within seconds? One of the solutions is using the Bloom filter.
What is it ?
A bloom filter is a probabilistic data structure meant for checking whether a given entry does not occur in a list. It is meant to be quite fast, and is used as a way of not doing costly queries when it can be determined that no results will be returned.
Under the hood, the Bloom filter is just an array of bits with all bits set to zero initially. The below diagram depicts the Bloom filter of size 19:
On this page.
What are the Bloom Filter Operations ?
Insert
We can perform Insert operations in constant space and time complexity. Bloom filter performs the below steps for Insert operation:
Hash the input value
Mod the result by the length of the array
Set corresponding bit to 1
Let’s illustrate this by inserting two strings – John Doe and Jane Doe into the Bloom filter. Let’s assume that hash values for these two strings are 1355 and 8337, respectively. Now, let’s perform the modulo operation to get an index within the bounds of the bit array: 1355%19 = 6 and 8337%19 = 15.
The Bloom filter will look like this after inserting values:
Lookup
We can perform a Lookup operation in constant time complexity. Bloom filter performs the below steps as a part of the Lookup operation:
Hash the input value
Mod the result by the length of the array
Check if the corresponding bit is 0 or 1
If the bit is 0, then that input definitely isn’t a member of the set. But if the bit is 1, then that input might be a member of a set. So let’s check if string John is present or not in the Bloom filter:
Note : the bit at the 10th index is 0, which indicates that the given string isn’t present in the Bloom filter.