DMTCS Proceedings, 2007 Conference on Analysis of Algorithms, AofA 07

Font Size:  Small  Medium  Large

On Correlation Polynomials and Subword Complexity

Irina Gheorghiciuc, Mark Daniel Ward

Abstract


We consider words with letters from a q-ary alphabet A. The kth subword complexity of a word w ∈A* is the number of distinct subwords of length k that appear as contiguous subwords of w. We analyze subword complexity from both combinatorial and probabilistic viewpoints. Our first main result is a precise analysis of the expected kth subword complexity of a randomly-chosen word w ∈An. Our other main result describes, for w ∈A*, the degree to which one understands the set of all subwords of w, provided that one knows only the set of all subwords of some particular length k. Our methods rely upon a precise characterization of overlaps between words of length k. We use three kinds of correlation polynomials of words of length k: unweighted correlation polynomials; correlation polynomials associated to a Bernoulli source; and generalized multivariate correlation polynomials. We survey previously-known results about such polynomials, and we also present some new results concerning correlation polynomials.

Full Text: PostScript PDF Compressed PostScript

Valid XHTML 1.0 Transitional