Efficient repeat finding in sets of strings via suffix arrays
Pablo Barenbaum, Verónica Becher, Alejandro Deymonnaz, Melisa Halsband, Pablo Ariel Heiber
Abstract
We consider two repeat finding problems relative to sets of strings:
(a) Find the largest substrings that occur in every string of a
given set; (b) Find the maximal repeats in a given string that
occur in no string of a given set. Our solutions are based on the
suffix array construction, requiring
O(m) memory, where m is
the length of the longest input string, and
O(n &log;m) time, where n
is the the whole input size (the sum of the length of each string in
the input). The most expensive part of our algorithms is the
computation of several suffix arrays. We give an implementation and
experimental results that evidence the efficiency of our algorithms in
practice, even for very large inputs.
Full Text: PDF PostScript