DMTCS Proceedings, 2005 International Conference on Analysis of Algorithms

Font Size:  Small  Medium  Large

Analysis of biclusters with applications to gene expression data

Gahyun Park, Wojciech Szpankowski


For a given matrix of size n × m over a finite alphabet A, a bicluster is a submatrix composed of selected columns and rows satisfying a certain property. In microarrays analysis one searches for largest biclusters in which selected rows constitute the same string (pattern); in another formulation of the problem one tries to find a maximally dense submatrix. In a conceptually similar problem, namely the bipartite clique problem on graphs, one looks for the largest binary submatrix with all `1'. In this paper, we assume that the original matrix is generated by a memoryless source over a finite alphabet A. We first consider the case where the selected biclusters are square submatrices and prove that with high probability (whp) the largest (square) bicluster having the same row-pattern is of size logQ2 n m where Q-1 is the (largest) probability of a symbol. We observe, however, that when we consider any submatrices (not just square submatrices), then the largest area of a bicluster jumps to A n (whp) where A is an explicitly computable constant. These findings complete some recent results concerning maximal biclusters and maximum balanced bicliques for random bipartite graphs.

Full Text: GZIP Compressed PostScript PostScript PDF original HTML abstract page

Valid XHTML 1.0 Transitional