レナート   PID EINS!   ﻟﻴﻨﺎﺭﺕ

Tue, 20 Apr 2010

A Few Notes on Bloom Filters

For future reference (mostly for myself), here's a little summary of how to use Bloom filters in real world applications.

Most references are terse and vague on how to pick the hash functions for bloom filters, so here's some detail about that: For small filters, just use a boring and fast hash function like the djb hash function and split up the 32bit result into smaller independent chunks for each of the k hash indexes you'll need. Often those 32 bits already provide enough hash bits to get enough independent bloom filter indexes. And if they don't you basically have three options:

The size of the bloom filter and the number of hash functions you should be using depending on your application can be calculated using the formulas on the Wikipedia page:

m = -n*ln(p)/(ln(2)^2)

This will tell you the number of bits m to use for your filter, given the number n of elements in your filter and the false positive probability p you want to achieve. All that for the ideal number of hash functions k which you can calculate like this:

k = 0.7*m/n

And that's already everything you need to know to build good bloom filters. If you know the p and n for your use case the above will tell you the m and k, and how to choose the k hash functions.

Bloom filters are a really really useful tool, and given their simplicity something every developer should be aware of.

(And in case you were wondering what this all is about, Kay Sievers and I were discussing using bloom filters in the libudev netlink BSD socket filters, to allow monitoring a certain subset of devices that is orthogonal to the usual subsystem hierarchy, and all that in a way where the number of wakeups in listening clients is minimized)

posted at: 22:01 | path: /projects | permanent link to this entry | comments


It should be obvious but in case it isn't: the opinions reflected here are my own. They are not the views of my employer, or Ronald McDonald, or anyone else.

Please note that I take the liberty to delete any comments posted here that I deem inappropriate, off-topic, or insulting. And I excercise this liberty quite agressively. So yes, if you comment here, I might censor you. If you don't want to be censored you are welcome to comment on your own blog instead.


Lennart Poettering <mzoybt (at) 0pointer (dot) net>
Syndicated on Planet GNOME, Planet Fedora, planet.freedesktop.org, Planet Debian Upstream. feed RSS 0.91, RSS 2.0
Archives: 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013

Valid XHTML 1.0 Strict!   Valid CSS!