Abstract
Compared with a hash table, a Bloom filter (BF) is more space efficient for supporting fast matching through a controllable and acceptable false positive probability. The space size of the basic BF is predetermined based on the expected number of elements to be stored. However, we cannot predict the space scale of a BF for dynamic sets. It is still challenging for the two existing solutions, scalable BF (SBF) and dynamic BF (DBF), to manipulate dynamic data sets with low memory overhead but achieving high performance. This article presents a partitioned BF (Par-BF) for dynamic data sets. Compared with DBF and SBF, Par-BF is able to leverage a sweet spot between high performance and low overhead by a group of formulas to support fast concurrent matching. Specifically, the size and the range of the false positive probability in Par-BF can be deliberately derived. From our trace-driven experimental results, the input/output operations per second of Par-BF outperforms that of DBF and SBF by 10× to 14× and by 3× to 8×, respectively. Besides, through our proposed garbage collection policy, Par-BF consumes less than half of the memory usage of SBF.
Original language | English (US) |
---|---|
Pages (from-to) | 259-275 |
Number of pages | 17 |
Journal | International Journal of High Performance Computing Applications |
Volume | 30 |
Issue number | 3 |
DOIs | |
State | Published - Aug 1 2016 |
Bibliographical note
Funding Information:This work is partially supported by the following NSF awards: 1439622, 1305237, 1421913, 1217569, and 1115471.
Publisher Copyright:
© SAGE Publications.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
Keywords
- Bloom filter
- Par-BF
- dynamic sets
- false positive
- fast matching