## Discrete Mathematics & Theoretical Computer Science, Vol 4, No 2 (2001)

Font Size:
DMTCS vol 4 no 2 (2001), pp. 301-322

# Discrete Mathematics & Theoretical Computer Science

## Volume 4 n° 2 (2001), pp. 301-322

author: Jessica H. Fong and Martin Strauss An Approximate Lp Difference Algorithm for Massive Data Streams streaming algorithms, data streams, Lp norms Several recent papers have shown how to approximate the difference ∑i|ai-bi| or ∑|ai-bi|2 between two functions, when the function values ai and bi are given in a data stream, and their order is chosen by an adversary. These algorithms use little space (much less than would be needed to store the entire stream) and little time to process each item in the stream. They approximate with small relative error. Using different techniques, we show how to approximate the Lp-difference ∑i|ai-bi|p for any rational-valued p∈(0,2], with comparable efficiency and error. We also show how to approximate ∑i|ai-bi|p for larger values of p but with a worse error guarantee. Our results fill in gaps left by recent work, by providing an algorithm that is precisely tunable for the application at hand. These results can be used to assess the difference between two chronologically or physically separated massive data sets, making one quick pass over each data set, without buffering the data or requiring the data source to pause. For example, one can use our techniques to judge whether the traffic on two remote network routers are similar without requiring either router to transmit a copy of its traffic. A web search engine could use such algorithms to construct a library of small ``sketches,'' one for each distinct page on the web; one can approximate the extent to which new web pages duplicate old ones by comparing the sketches of the web pages. Such techniques will become increasingly important as the enormous scale, distributional nature, and one-pass processing requirements of data sets become more commonplace. If your browser does not display the abstract correctly (because of the different mathematical symbols) you can look it up in the PostScript or PDF files. Jessica H. Fong and Martin Strauss (2001), An Approximate Lp Difference Algorithm for Massive Data Streams , Discrete Mathematics and Theoretical Computer Science 4, pp. 301-322 For a corresponding BibTeX entry, please consider our BibTeX-file. dm040217.ps.gz (0 K) dm040217.ps (240 K) dm040217.pdf (165 K)

The first source gives you the `gzipped' PostScript, the second the plain PostScript and the third the format for the Adobe accrobat reader. Depending on the installation of your web browser, at least one of these should (after some amount of time) pop up a window for you that shows the full article. If this is not the case, you should contact your system administrator to install your browser correctly.

Due to limitations of your local software, the two formats may show up differently on your screen. If eg you use xpdf to visualize pdf, some of the graphics in the file may not come across. On the other hand, pdf has a capacity of giving links to sections, bibliography and external references that will not appear with PostScript.

Automatically produced on Sat Jun 19 18:14:15 CEST 2004 by gustedt