Discrete Random Walks, DRW'03
Cyril Banderier and Christian Krattenthaler (eds.)
DMTCS Conference Volume AC (2003), pp. 243258
author:  Pierre Nicodème 

title: 
q
gram analysis and urn models

keywords:  Sequence comparison, Bernoulli model, urn models 
abstract: 
Words of fixed size
q
are commonly referred to as
q
grams. We consider the problem of
q
gram filtration, a method commonly used to speed up
sequence comparison. We are interested in the statistics of
the number of
q
grams common to two random texts (where
multiplicities are not counted) in the non uniform
Bernoulli model. In the exact and dependent model, when
omitting border effects, a
q
gram in a random sequence depends on the
q1
preceding
q
grams. In an approximate and independent model, we
draw randomly a
q
gram at each position, independently of the others
positions. Using ball and urn models, we analyze the
independent model. Numerical simulations show that this
model is an excellent first order approximation to the
dependent model. We provide an algorithm to compute the
moments.

reference: 
Pierre Nicodème (2003),
q
gram analysis and urn models, in Discrete Random
Walks, DRW'03, Cyril Banderier and Christian
Krattenthaler (eds.), Discrete Mathematics and
Theoretical Computer Science Proceedings AC, pp.
243258

