Ranking through Integration of Protein-Similarity for Identification of Cell-cyclic Genes

Sumeet Dua, Pradeep Chowriappa and Alan Alex



Gene array experiments are being progressively conducted. However, the biological functional interpretation using (semi-)automated discovery routines have not kept pace with this rapid escalation. Functional genomics using data mining methods potentially offers precise, objective, and more reliable gene identification. The goal of our work is to create a generanking scheme by integrating the phase information of gene expression profiles with protein similarity to identify cell- cyclic genes. We hypothesize that the identification of cell-cyclic genes could be enhanced by integrating gene phase and primary protein sequence similarity to every other gene in the dataset. Comparing two sequences according to the properties of their residues may highlight regions of sequence similarity emphasizing only identities in the alignment. Those regions (sub-sequence) that may have diverged will not draw attention away from any remaining common features. We present a unique schema to enable such integration by employing QR-factorization from the pair-wise similarity matrix formulation. Angular coefficients are derived and consequently employed for integrated gene ranking. Experimental results on an independent benchmark dataset signify the efficacy of the method when compared to previous results in the area.

Index Terms Microarray, cell-cyclic genes, gene-phase, proteinphase, periodicity.