Discrete Mathematics & Theoretical Computer Science, Vol 6, No 2 (2004)

Font Size:  Small  Medium  Large

Rare Events and Conditional Events on Random Strings

Mireille Régnier, Alain Denise

Abstract


Some strings -the texts- are assumed to be randomly generated, according to a probability model that is either a Bernoulli model or a Markov model. A rare event is the over or under-representation of a word or a set of words. The aim of this paper is twofold. First, a single word is given. One studies the tail distribution of the number of its occurrences. Sharp large deviation estimates are derived. Second, one assumes that a given word is overrepresented. The distribution of a second word is studied; formulae for the expectation and the variance are derived. In both cases, the formulae are accurate and actually computable. These results have applications in computational biology, where a genome is viewed as a text.

Full Text: GZIP Compressed PostScript PostScript PDF original HTML abstract page