REPCLASS: A Software Workflow Toolset for Automated Classification of Transposable Elements

Umeshkumar Keswani, Nirmal Ranganathan, Cedric Feschotte and David Levine



Whole genomes for new species are being sequenced at an ever increasing pace. This creates an urgent need to create software tools that can assist automated (or semi-automated) analysis for fundamental biological understanding. While computers become much faster and hold much more data every year, utilizing these computational resources effectively is quite challenging. Analyzing the information in a new genome quickly, yet accurately and creating biologically important summary overviews without drowning someone in overwhelming details is a ambitious yet worthwhile goal. The DNA in genome sequences is very repetitive, finding and classifying those repeated segments has been a very tedious and valuable effort. In this work, we present REPCLASS, a software workflow toolset that automatically classifies transposable elements (TEs) in genomes. REPCLASS provides biologically valuable reports and views, allowing a quick overview of a genome. In order to provide a fast response time REPCLASS scales to work faster on clusters of computers, dividing large computational tasks into pieces running on many computational nodes concurrently. In addition to running quickly, the REPCLASS workflow eliminates many artifacts of automated classification, providing more accurate results to scientists.

Index Terms Software tool, transposable elements, automated classification.

Full Text (PDF)