For In Depth Operation Working & Practical Applications: Bloom Filter Systems
Used for estimating whether a string has been queried before
Is not 100% accurate (probabilistic)
Used mostly in search indexing
Storing
- String is passed through "K" Hash functions and those values are set 0 -> 1 i.e set (val % m) = 1 in [0,m] ranged arr.
Searching
- Get the "K" hash values and see if all has been set (1)
If all set -> most likely that the search has been made before
Takeaway: With more hash functions, two different Strings have chances of having same values for all hash functions
Resetting
- If a string is to be removed -> unset (0) all its hash values
- if majority is set -> accuracy drops: even unsearched may appear to be searched -> reset
Analysis
Since there's m places of bits, then for 1 hash function:
Probability of setting a bit = 1/m
Probability of not setting a bit = 1 - (1/m)
Then, for 'K' hash functions
Probability of not setting a bit, for a char
> [1 = (1/m)] ^ k
Probability of not setting a bit, for a string length 'i'
> [1 = (1/m)] ^ (k*i)
Multi Dimensional Bloom Filters
Another Version for improved accuracy would be nD Vector bloom filters
It can be of two types:
- Multi Level
- Where hash functions go through filters one by one, if all set in one, go a level deep and so on..
- Parallel
- Each hash function maps to one level (faster)