Wavelets-Based Dimensionality Reduction for Gene Expression Feature Extraction
Sumeet Dua and Kaustubh S. Sabnis
Successful treatment of a chronic disease depends on
how early and how accurately it is detected. Molecular
diagnosis has the potential to predict diagnosis more precisely
than clinical diagnosis. Molecular biologists frequently lack
information about the molecular markers that are responsible
for causing most solid tumors. Because the body contains so
many genes, it is practically impossible to find the genes
responsible for each type of cancer class through laboratory
experiments. In this paper, we present a unique computational
approach to find marker genes using Discrete Wavelet
Transformation (DWT) as a baseline dimensionality reduction
technique. DWT is applied to preprocessed gene expression
data, giving orthonormal wavelet coefficients for each sample.
These coefficients are passed through two filters. Inverse DWT
is applied to these filtered coefficients yielding marker genes
per cancer class. We cross-validate these results against a
biologically significant database of cancer genes and
previously published results in the area. A total of 21 genes
spanning 7 cancer classes are found in common with the cancer
gene database. With the exception of two cancer classes (breast
and bladder), we identify 41.67 % more marker genes than a
previous result in this area.
Index Terms
Dimensionality reduction, data mining, feature extraction, wavelet
coefficients, microarray expression analysis.