### Analysis of biclusters with applications to gene expression data

*Gahyun Park, Wojciech Szpankowski*

#### Abstract

For a given matrix of size n × m over a finite alphabet

*A*, a bicluster is a submatrix composed of selected columns and rows satisfying a certain property. In microarrays analysis one searches for largest biclusters in which selected rows constitute the same string (pattern); in another formulation of the problem one tries to find a maximally dense submatrix. In a conceptually similar problem, namely the bipartite clique problem on graphs, one looks for the largest binary submatrix with all `1'. In this paper, we assume that the original matrix is generated by a memoryless source over a finite alphabet*A*. We first consider the case where the selected biclusters are square submatrices and prove that with high probability (whp) the largest (square) bicluster having the same row-pattern is of size*log*Q2 n m where Q-1 is the (largest) probability of a symbol. We observe, however, that when we consider*any*submatrices (not just*square*submatrices), then the largest area of a bicluster jumps to A n (whp) where A is an explicitly computable constant. These findings complete some recent results concerning maximal biclusters and maximum balanced bicliques for random bipartite graphs.Full Text: GZIP Compressed PostScript PostScript PDF original HTML abstract page