For In Depth Operation Working & Practical Applications: Bloom Filter Systems

Pasted image 20240725210854.png

Used for estimating whether a string has been queried before
Is not 100% accurate (probabilistic)
Used mostly in search indexing

Storing

String is passed through "K" Hash functions and those values are set 0 -> 1 i.e set (val % m) = 1 in [0,m] ranged arr.

Searching

Get the "K" hash values and see if all has been set (1)

If all set -> most likely that the search has been made before

Takeaway: With more hash functions, two different Strings have chances of having same values for all hash functions

Resetting

If a string is to be removed -> unset (0) all its hash values
if majority is set -> accuracy drops: even unsearched may appear to be searched -> reset

Analysis

Since there's m places of bits, then for 1 hash function:

Probability of setting a bit =  1/m   
Probability of not setting a bit = 1 - (1/m)

Then, for 'K' hash functions

Probability of not setting a bit, for a char 
>	[1 = (1/m)] ^ k

Probability of not setting a bit, for a string length 'i'
>	[1 = (1/m)] ^ (k*i)

Multi Dimensional Bloom Filters

Pasted image 20240725212954.png

Another Version for improved accuracy would be nD Vector bloom filters

It can be of two types:

Multi Level

Where hash functions go through filters one by one, if all set in one, go a level deep and so on..

Parallel

Each hash function maps to one level (faster)