Mining Cytochrome b561 from Plant Genomes
Cytochrome b561 (Cyt-b561) proteins play important
functions in plants such as anti-toxin defense reactions, growth
and development, and prevention of damage to plants from excess
light under drought condition. Because of their high sequence
divergence, thorough mining of Cyt-b561 and related proteins
from diverse plant genomes is not easy. For example, currently
there is only one Cyt-b561 gene in the maize genome and none
has been found from the soybean genome, while twenty two
are known in the Arabidopsis thaliana genome. Alignment-free
methods for protein classification, e.g., multivariate statistical
analysis methods using various amino acid properties as sequence
descriptors, can be more sensitive for remotely similar
protein identification compared to often-used alignment-based
methods. In order to identify Cyt-b561 proteins thoroughly from
available plant genomes, we examined alignment-free protein
classifiers based on partial least squares (PLS) and support
vector machines. These classifiers performed better than profile
hidden Markov models and PSI-BLAST in identifying Cyt-b561
related proteins. Furthermore, PLS with a reduced number of
descriptors performed the best among both of alignment-based
and alignment-free classifiers we tested. This classifier had the
highest accuracy (96.2%) and the lowest false negative rate
(3.0%), and should be useful for mining Cyt-b561 related proteins
from diverse plant genomes.
Index Terms
Cytochrome b561, partial least squares, support vector machines, profile
hidden Markov model.
Full Text (PDF)