DMTCS Proceedings, Discrete Random Walks, DRW'03

Font Size:  Small  Medium  Large
DMTCS Conference vol AC (2003), pp. 243-258

DMTCS

Discrete Random Walks, DRW'03

Cyril Banderier and Christian Krattenthaler (eds.)

DMTCS Conference Volume AC (2003), pp. 243-258


author: Pierre Nicodème
title:
q
-gram analysis and urn models
keywords: Sequence comparison, Bernoulli model, urn models
abstract: Words of fixed size
q
are commonly referred to as
q
-grams. We consider the problem of
q
-gram filtration, a method commonly used to speed up sequence comparison. We are interested in the statistics of the number of
q
-grams common to two random texts (where multiplicities are not counted) in the non uniform Bernoulli model. In the exact and dependent model, when omitting border effects, a
q
-gram in a random sequence depends on the
q-1
preceding
q
-grams. In an approximate and independent model, we draw randomly a
q
-gram at each position, independently of the others positions. Using ball and urn models, we analyze the independent model. Numerical simulations show that this model is an excellent first order approximation to the dependent model. We provide an algorithm to compute the moments.
  If your browser does not display the abstract correctly (because of the different mathematical symbols) you may look it up in the PostScript or PDF files.
reference: Pierre Nicodème (2003),
q
-gram analysis and urn models, in Discrete Random Walks, DRW'03, Cyril Banderier and Christian Krattenthaler (eds.), Discrete Mathematics and Theoretical Computer Science Proceedings AC, pp. 243-258
bibtex: For a corresponding BibTeX entry, please consider our BibTeX-file.
ps.gz-source: dmAC0124.ps.gz (80 K)
ps-source: dmAC0124.ps (248 K)
pdf-source: dmAC0124.pdf (216 K)

The first source gives you the `gzipped' PostScript, the second the plain PostScript and the third the format for the Adobe accrobat reader. Depending on the installation of your web browser, at least one of these should (after some amount of time) pop up a window for you that shows the full article. If this is not the case, you should contact your system administrator to install your browser correctly.

Due to limitations of your local software, the two formats may show up differently on your screen. If eg you use xpdf to visualize pdf, some of the graphics in the file may not come across. On the other hand, pdf has a capacity of giving links to sections, bibliography and external references that will not appear with PostScript.


Automatically produced on Di Sep 27 10:09:21 CEST 2005 by gustedt

Valid XHTML 1.0 Transitional