Logo

For In Depth Operation Working & Practical Applications: Bloom Filter Systems

Pasted image 20240725210854.png

Used for estimating whether a string has been queried before
Is not 100% accurate (probabilistic)
Used mostly in search indexing

Storing

  • String is passed through "K" Hash functions and those values are set 0 -> 1 i.e set (val % m) = 1 in [0,m] ranged arr.

Searching

  • Get the "K" hash values and see if all has been set (1)

If all set -> most likely that the search has been made before

Takeaway: With more hash functions, two different Strings have chances of having same values for all hash functions

Resetting

  • If a string is to be removed -> unset (0) all its hash values
  • if majority is set -> accuracy drops: even unsearched may appear to be searched -> reset

Analysis

Since there's m places of bits, then for 1 hash function:

Probability of setting a bit =  1/m   
Probability of not setting a bit = 1 - (1/m)

Then, for 'K' hash functions

Probability of not setting a bit, for a char 
>	[1 = (1/m)] ^ k

Probability of not setting a bit, for a string length 'i'
>	[1 = (1/m)] ^ (k*i)

Multi Dimensional Bloom Filters

Pasted image 20240725212954.png

Another Version for improved accuracy would be nD Vector bloom filters

It can be of two types:

  1. Multi Level
    • Where hash functions go through filters one by one, if all set in one, go a level deep and so on..
  2. Parallel
    • Each hash function maps to one level (faster)

© 2025 All rights reservedBuilt with DataHub Cloud

Built with LogoDataHub Cloud