Gene Selection through Association Rule Filtering for Supervised Classification

Prerna Sethi



One of the challenges facing bioinformatics is the assignment of biochemical and cellular functions to the thousands of uncharacterized gene products discovered by international gene-sequencing projects. Similarly, microarray gene expression analysis, an important component in the design of in-silico molecular medicine methods, has made it possible to simultaneously monitor the expression level of thousands of genes under different samples (conditions). The extraction of biologically significant knowledge from the gene expression data is a growing computational challenge as the large number of genes has a larger dimensionality than the evaluated samples. Our aim is to identify correlated sets of genes that share similar pattern and biological properties such as regulation and function. In this paper, we present a novel method for gene selection by discovering unique association rules between expressed genes. The genes are scored by degree of participation in the discovered rules and the support and confidence measures those rules possess. The selected genes are evaluated by five different machine learning classifiers and success metrics are derived for comparison with feature ranking approaches using information gain and chi-square statistics based measures. Our results show superiority in accuracy despite the small number compared to other feature ranking methods.

Index Terms Gene expression data analysis, association rules, classification, feature selection.