DMTCS Proceedings, 2007 Conference on Analysis of Algorithms, AofA 07

Font Size:  Small  Medium  Large

HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm

P. Flajolet, É. Fusy, O. Gandouet, F. Meunier

Abstract


This extended abstract describes and analyses a near-optimal probabilistic algorithm, HyperLogLog, dedicated to estimating the number of distinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, ``short bytes''), HyperLogLog performs a single pass over the data and produces an estimate of the cardinality such that the relative accuracy (the standard error) is typically about 1.04/√m. This improves on the best previously known cardinality estimator, LogLog, whose accuracy can be matched by consuming only 64% of the original memory. For instance, the new algorithm makes it possible to estimate cardinalities well beyond 109 with a typical accuracy of 2% while using a memory of only 1.5 kilobytes. The algorithm parallelizes optimally and adapts to the sliding window model.

Full Text: GZIP Compressed PostScript PostScript PDF

Valid XHTML 1.0 Transitional