Gene Selection through Association Rule Filtering for Supervised Classification
One of the challenges facing bioinformatics is the
assignment of biochemical and cellular functions to the
thousands of uncharacterized gene products discovered by
international gene-sequencing projects. Similarly, microarray
gene expression analysis, an important component in the design
of in-silico molecular medicine methods, has made it possible to
simultaneously monitor the expression level of thousands of
genes under different samples (conditions). The extraction of
biologically significant knowledge from the gene expression data
is a growing computational challenge as the large number of
genes has a larger dimensionality than the evaluated samples.
Our aim is to identify correlated sets of genes that share similar
pattern and biological properties such as regulation and
function. In this paper, we present a novel method for gene
selection by discovering unique association rules between
expressed genes. The genes are scored by degree of participation
in the discovered rules and the support and confidence measures
those rules possess. The selected genes are evaluated by five
different machine learning classifiers and success metrics are
derived for comparison with feature ranking approaches using
information gain and chi-square statistics based measures. Our
results show superiority in accuracy despite the small number
compared to other feature ranking methods.
Index Terms
Gene expression data analysis, association rules, classification,
feature selection.