| Safe Haskell | Safe-Inferred |
|---|---|
| Language | GHC2021 |
Data.BloomFilter
Description
By default, this module re-exports the classic bloom filter implementation from Data.BloomFilter.Classic. If you want to use the blocked bloom filter implementation, import Data.BloomFilter.Blocked.
Synopsis
- module Data.BloomFilter.Classic
Documentation
module Data.BloomFilter.Classic
Example: a spelling checker
This example reads a dictionary file containing one word per line,
constructs a Bloom filter with a 1% false positive rate, and
spellchecks its standard input. Like the Unix spell command, it
prints each word that it does not recognize.
>>>import Control.Monad (forM_)>>>import System.Environment (getArgs)>>>import qualified Data.BloomFilter as B
>>>:{main :: IO () main = do files <- getArgs dictionary <- readFile "/usr/share/dict/words" let !bloom = B.fromList (B.policyForFPR 0.01) 4 (words dictionary) forM_ files $ \file -> putStrLn . unlines . filter (`B.notElem` bloom) . words =<< readFile file :}
Differences with the bloomfilter package
This package is an entirely rewritten fork of the bloomfilter package.
The main differences are
- Support for both classic and "blocked" Bloom filters. Blocked-structured
Bloom filters arrange all the bits for each insert or lookup into a single
cache line, which greatly reduces the number of slow uncached memory reads.
The trade-off for this performance optimisation is a slightly worse
trade-off between bits per element and the FPR. In practice for typical
FPRs of
1-e3up to1e-4, this requires a couple extra bits per element. - This package support Bloom filters of arbitrary sizes (not limited to powers of two).
- Sizes over
2^32are supported up to2^48for classic Bloom filters and2^41for blocked Bloom filters. - The
BloomandMBloomtypes are parametrised over aHashabletype class, instead of having aa -> [typed field. This separation allows clean (de-)serialisation of Bloom filters in this package, as the hashing scheme is static.Hash] XXH3hash is used instead of Jenkins'lookup3.