bloomfilter-blocked
bloomfilter-blocked is a Haskell library providing multiple fast and
efficient implementations of
bloom filters. It is a full
rewrite of the
bloomfilter package,
originally authored by Bryan O'Sullivan
bos@serpentine.com.
A bloom filter is a space-efficient data structure representing a set
that can be probablistically queried for set membership. The set
membership query returns no false negatives, but it might return false
positives. That is, if an element was added to a bloom filter, then a
subsequent query definitely returns True. If an element was not
added to a filter, then a subsequent query may still return True if
False would be the correct answer. The probabiliy of false positives
-- the false positive rate (FPR) -- is configurable.
The library includes two implementations of bloom filters: classic, and blocked.
- Classic bloom filters, found in the
Data.BloomFilter.Classicmodule: a default implementation that is faithful to the canonical description of a bloom filter data structure. - Blocked floom filters, found in the
Data.BloomFilter.Blockedmodule: an implementation that optimises the memory layout of a classic bloom filter for speed (cheaper CPU cache reads), at the cost of a slightly higher FPR for the same amount of assigned memory.
The FPR scales inversely with how much memory is assigned to the filter.
It also scales inversely with how many elements are added to the set.
The user can configure how much memory is asisgned to a filter, and the
user also controls how many elements are added to a set. Each
implementation comes with helper functions, like sizeForFPR and
sizeForBits, that the user can leverage to configure filters.
Both immutable (Bloom) and mutable (MBloom) bloom filters, including
functions to convert between the two, are provided for each
implementation. Note however that a (mutable) bloom filter can not be
resized once created, and that elements can not be deleted once
inserted.
For more information about the library and examples of how to use it, see the Haddock documentation of the different modules.
Usage notes
User should take into account the following:
- This package is not supported on 32bit systems.
Differences from the bloomfilter package
The library is a full rewrite of the bloomfilter package, originally authored by Bryan O'Sullivan bos@serpentine.com. The main differences are:
bloomfilter-blockedsupports both classic and blocked bloom filters, whereasbloomfilteronly supports the former.bloomfilter-blockedsupports bloom filters of arbitrary sizes, whereasbloomfilterlimits the sizes to powers of two.bloomfilter-blockedsupports sizes up to2^48for classic bloom filters and up to2^41for blocked bloom filters, instead of2^32.- In
bloomfilter-blocked, theBloomandMBloomtypes are parameterised over aHashabletype class, instead of having aa -> [Hash]typed field. This separation inbloomfilter-blockedallows clean (de-)serialisation of filters as the hashing scheme is static. bloomfilter-blockeduses XXH3 for hashing instead of Jenkins' lookup3, whichbloomfilteruses.- The user can configure hash salts for improved security in
bloomfilter-blocked, whereas this is not supported inbloomfilter.
Packages
bloomfilter-blocked-0.1.0.0
- Data
bloomfilter-blocked-0.1.0.0
- Data