808 IEEE TRANSACTIONS ON COMPUTERS, VOL. 59, NO. 6, JUNE 2010 A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Azzedine Boukerche, Senior Member , IEEE , Jan M. Correa, Alba Cristina M.A. de Melo, Senior Member , IEEE , and Ricardo P. Jacobi Abstract —The recent and astonishing accomplishments in the field of Genomics would not have been possible without the techniques, algorithms, and tools developed in Bioinformatics. Biological sequence comparison is an important operation in Bioinformatics because it is used to determine how similar two sequences are. As a result of this operation, one or more alignments are produced. DIALIGN is an exact algorithm that uses dynamic programming to obtain optimal biological sequence alignments in quadratic space and time. One effective way to accelerate DIALIGN is to design FPGA-based architectures to execute it. Nevertheless, the complete retrieval of an alignment in hardware requires modifications on the original algorithm because it executes in quadratic space. In this paper, we propose and evaluate two FPGA-based accelerators executing DIALIGN in linear space: one to obtain the optimal DIALIGN score (DIALIGN-Score) and one to retrieve the DIALIGN alignment (DIALIGN-Alignment). Because it appears to be no documented variant of the DIALIGN algorithm that produces alignments in linear space, we here propose a linear space variant of the DIALIGN algorithm and have designed the DIALIGN-Alignment accelerator to implement it. The experimental results show that impressive speedups can be obtained with both accelerators when comparing long biological sequences: the DIALIGN-Score accelerator achieved a speedup of 383.4 and the DIALIGN-Alignment accelerator reached a speedup of 141.38. Index Terms —Biology and genetics, dynamic programming, special-purpose and application-based systems. Ç 1 I NTRODUCTION T Smith-Waterman (SW) [3] is an exact algorithm based on HE rapid evolution of sequencing techniques combined the longest common subsequence (LCS) concept that uses with the intense growth in the number of large-scale dynamic programming to find optimal local alignments genome projects is producing a huge amount of biological between two sequences of size n in quadratic space and sequence data. Nevertheless, determining the genome time. In this algorithm, a similarity matrix of size n � n is sequence is only the first step toward deciphering the genetic calculated. Nowadays, SW is the most widely used exact message encoded in those sequences. In genome projects, method to locally align two sequences, and it is very newly determined sequences are first compared with those accurate if the sequences have a single common region of placed in genomic databases, in order to discover similarities high similarity. However, if the sequences share more than [1]. This is done because relevant sequence similarity is one region of high similarity, SW is not very effective. evidence of common evolutionary origin and homology DIALIGN [4] is based on the idea that a biological relationship. sequence alignment must be built from significant gapless Pairwise sequence comparison is, therefore, a very basic fragments and is thus able to cope with the situation of but important step in genome projects. As a result of this sequences sharing many high similarity regions. DIALIGN step, one or more sequence alignments can be produced. A can be used for either local or global alignment as well as sequence alignment has a similarity score associated to it pairwise or multiple sequence alignment. In [5], a variant of that is obtained by placing one sequence above the other, DIALIGN was successfully used to obtain multiple se- making clear the correspondence between the characters [2]. quence alignments of noncoding DNAs. One drawback of DIALIGN is that it is slower than SW. To overcome this, alternatives have been proposed to run DIALIGN in . A. Boukerche is with the School of Information Technology and parallel [6] and to combine it with a fast local search Engineering (SITE), University of Ottawa, 800 King Edward Avenue, Ottawa, Ontario KIN 6N5, Canada. E-mail: boukerch@site.uottawa.ca. similarity tool [7]. . J.M. Correa and R.P. Jacobi are with the Department of Computer Science, Several high performance hardware-based architectures University of Brasilia (UnB), Campus UNB—ICC-Norte—sub-solo have been proposed in the literature [8]. In this paper, we 70910-900, Brasilia-DF, Brazil. E-mail: {jan, rjacobi}@cic.unb.br. propose two FPGA-based architectures that execute DIA- . A.C.M.A. de Melo is with the Department of Computer Science, University LIGN in linear space. The goal of the first architecture, of Brasilia (UnB), Campus UNB—ICC-Norte—sub-solo 70910-900, Brasilia-DF, Brazil, and with the PARADISE Research Laboratory, called DIALIGN-Score, is to obtain the DIALIGN similarity University of Ottawa, Canada. E-mail: albamm@cic.unb.br. score. A partition technique is used in this design, enabling Manuscript received 29 July 2008; revised 6 Feb. 2009; accepted 16 July 2009; sequences of any size to be compared. published online 11 Feb. 2010. In many cases, the biologists also need to observe the Recommended for acceptance by A. George. alignment between the sequences. It is for this reason that For information on obtaining reprints of this article, please send E-mail to: DIALIGN-Alignment, a second architecture which is able to tc@computer.org, and reference IEEECS Log Number TC-2008-07-0378. retrieve the optimal alignment entirely in hardware, is Digital Object Identifier no. 10.1109/TC.2010.42. 0018-9340/10/$26.00 � 2010 IEEE Published by the IEEE Computer Society
Recommend
More recommend