On Correlation Polynomials and Subword Complexity
Irina Gheorghiciuc, Mark Daniel Ward
Abstract
We consider words with letters from a q-ary alphabet A. The kth subword complexity of a word w ∈A* is the number of distinct subwords of length k that appear as contiguous subwords of w. We analyze subword complexity from both combinatorial and probabilistic viewpoints. Our first main result is a precise analysis of the expected kth subword complexity of a randomly-chosen word w ∈An. Our other main result describes, for w ∈A*, the degree to which one understands the set of all subwords of w, provided that one knows only the set of all subwords of some particular length k. Our methods rely upon a precise characterization of overlaps between words of length k. We use three kinds of correlation polynomials of words of length k: unweighted correlation polynomials; correlation polynomials associated to a Bernoulli source; and generalized multivariate correlation polynomials. We survey previously-known results about such polynomials, and we also present some new results concerning correlation polynomials.
Full Text: PostScript PDF Compressed PostScript