2005 International Conference on Analysis of Algorithms
Conrado Martínez (ed.)
DMTCS Conference Volume AD (2005), pp. 157166
author:  Frédéric Giroire 

title:  Order statistics and estimating cardinalities of massive data sets 
keywords:  cardinality, estimates, very large multiset, traffic analysis 
We introduce a new class of algorithms to estimate the
cardinality of very large multisets using constant memory
and doing only one pass on the data. It is based on order
statistics rather that on bit patterns in binary
representations of numbers. We analyse three families of
estimators. They attain a standard error of
1/√
using
M
M
units of storage, which places them in the same class
as the best known algorithms so far. They have a very
simple internal loop, which gives them an advantage in term
of processing speed. The algorithms are validated on
internet traffic traces.

reference:  Frédéric Giroire (2005), Order statistics and estimating cardinalities of massive data sets, in 2005 International Conference on Analysis of Algorithms, Conrado Martínez (ed.), Discrete Mathematics and Theoretical Computer Science Proceedings AD, pp. 157166 
