A Multi-pronged Parallel Approach to Enhance Speed and Accuracy of Sequence
Assembly
Munib Ahmed, Mohammad Saad Ahmad and
Ishfaq Ahmad
Genome assembly is one of the most computationally
complex processes in the field of bioinformatics. This
complexity stems from the requirement that a very large
number of sequenced segments, each 700-1000 bases long, be
put together to reconstruct the full genome that, depending upon
the specie, could be more than several billion bases in length.
Moreover, due to the limitations of some laboratory procedures
that occur during the sequencing of the genome, a full
reconstruction with 100% accuracy is practically infeasible.
One of the early and critical phases of assembly is the detection
of overlaps among segments. In this work, we propose a holistic
approach to simultaneously enhance the execution speed and
improve the accuracy using an error correction logic in a highperformance
computing environment. By leveraging the extra
processing power available in a parallel computing
environment, we attempt to correct errors in the weaker end
regions as opposed to trimming them off thereby enhancing the
accuracy of the solution. The speed is improved by dynamically
balancing the load among multiple processors and utilizing
innovative data structures along with a hashing technique that
require relatively lesser memory compared to some other
programs.
Index Terms
Accuracy, genome assembly, parallel processing.