Wavelets-Based Dimensionality Reduction for Gene Expression Feature Extraction

Sumeet Dua and Kaustubh S. Sabnis



Successful treatment of a chronic disease depends on how early and how accurately it is detected. Molecular diagnosis has the potential to predict diagnosis more precisely than clinical diagnosis. Molecular biologists frequently lack information about the molecular markers that are responsible for causing most solid tumors. Because the body contains so many genes, it is practically impossible to find the genes responsible for each type of cancer class through laboratory experiments. In this paper, we present a unique computational approach to find marker genes using Discrete Wavelet Transformation (DWT) as a baseline dimensionality reduction technique. DWT is applied to preprocessed gene expression data, giving orthonormal wavelet coefficients for each sample. These coefficients are passed through two filters. Inverse DWT is applied to these filtered coefficients yielding marker genes per cancer class. We cross-validate these results against a biologically significant database of cancer genes and previously published results in the area. A total of 21 genes spanning 7 cancer classes are found in common with the cancer gene database. With the exception of two cancer classes (breast and bladder), we identify 41.67 % more marker genes than a previous result in this area.

Index Terms Dimensionality reduction, data mining, feature extraction, wavelet coefficients, microarray expression analysis.