Par-PSSE: Software for Pairwise Statistical ...

Par-PSSE: Software for Pairwise Statistical Significance Estimation in Parallel for Local Sequence Alignment Yuhong Zhang, Md. Mostofa Ali Patwary, Sanchit Misra, Ankit Agrawal, Wei-keng Liao, Zhiguang Qin, Alok Choudhary

Par-PSSE: Software for Pairwise Statistical Significance Estimation in Parallel for Local Sequence Alignment Yuhong Zhang, 2Md. Mostofa Ali Patwary, 2Sanchit Misra, 2Ankit Agrawal, 2Wei-keng Liao, 1Zhiguang Qin, 2Alok Choudhary 1 School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China, E-mail :{yuhongzhang, qinzg}@uestc.edu.cn 2 Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, USA, E-mail :{mpatwary, smi539, ankitag, wkliao, choudhar}@eecs.northwestern.edu

*1,2

Abstract Pairwise statistical significance (PSS) has been recognized as a very useful method for homology detection. It can help in estimating whether the output of sequence alignment is evolutionarily link or just arisen by accident. However, pairwise statistical significance estimation (PSSE) poses a big challenge in terms of performance and scalability since it is both computationally intensive and data intensive to construct the empirical score distribution during the estimation. This paper presents a software library for estimating pairwise statistical significance in parallel, named Par-PSSE, implemented in C++ using OpenMP, MPI paradigms and their hybrids. Further, we apply the parallelization technique to estimate non-conservative PSS using standard, sequence-specific, and position-specific substitution matrices. These extensions have been found superior compared to the standard pairwise statistical significance in term of retrieval accuracy. Through distributing the compute-intensive kernels of the pairwise statistical significance estimation across multiple computational units, we achieve a speedup of up to 621.73× over the corresponding sequential implementation when using1024 cores.

Keywords: pairwise statistical significance, multi-core, OpenMP, MPI, hybrid paradigm 1. Introduction Accurately identifying similarity among nucleic acid or protein sequences is very important to provide evidence of evolutionary relations among sequences, as well as of functional or structural conservation [6]. Alignment programs, such as BLAST [6], PSI-BLAST [7], and FASTA [16] maximize the summed scores of the compared residues and find optimal local alignments using a dynamic programming procedure. However, the alignment score depends on various factors, such as the alignment program, scoring scheme, sequence lengths, and compositions of sequence under comparison [15]. Therefore, the statistical significance of an alignment score is very useful since it helps in identifying whether it is evolutionarily link or whether it could have arisen simply by chance [14]. Pairwise statistical significance (PSS) is the promising method to evaluate the statistical significance of an alignment, which is specific to the sequence-pair being aligned, and independent of database [3]. Although it has been demonstrated to be capable of producing much more biologically relevant estimates of statistical significance than database statistical significance, its estimation involves thousands of permutations and alignments, which are enormously time consuming and can become impractical for estimating pairwise statistical significance of a large number of sequence pairs [20]. Applying high performance computing (HPC) techniques provides more opportunities of accelerating the estimation in reasonable time. Multi-core computers and clusters have become increasingly ubiquitous and more powerful. Therefore, it is of interest to unlock the potential of multi-core desktops and clusters using high performance technologies. In this paper, we present a software for pairwise statistical significance estimation in parallel, called Par-PSSE, implemented in C++ using OpenMP, MPI and their hybrid paradigm. Carefully analyzing the

International Journal of Digital Content Technology and its Applications(JDCTA) Volume6,Number5,March 2012 doi:10.4156/jdcta.vol6.issue5.24

200


characteristics of pairwise statistical significance estimation (PSSE), we distribute the computeintensive kernels of the algorithm across processors or cores efficiently to reap the maximum benefits of OpenMP or/and MPI paradigm. Our experimental results show that our parallelization methodologies achieve high performance for these applications. With 1024 cores spread across different 43 nodes, flat MPI and hybrid implementation achieve speedups (over the corresponding sequential implementation) of 452.93× and 621.73×, respectively.

2. Background and related work The optimal alignment scores of random sequences, assessed from high scoring segments pairs, are expected to follow a Gumbel distribution (a type I extreme value distribution, EVD). EVD is a limiting distribution for the maximum or the minimum of a large collection of random observations (say, optimal alignment scores) from the same arbitrary distribution [12]. The distribution can be deduced not only from Reliability Theory [8], but from Karlin-Altschul model [14]. The remarkable Karlin-Altschul model interprets the number of highest scoring matching regions above a threshold by a Poisson distribution. Briefly, considering the query sequence A = a1a2 ¼ am and the subject sequence B = b1b2 ¼bn , where m = | A | and n = | B | are the lengths, respectively, the scoring scheme SC (substitution matrix, gap opening penalty, and gap extension penalty), the expected number of distinct local alignments with score values of at least x is approximately Poisson distributed with mean (1) E ( x) » Kmne- l x , where K and λ are the statistical parameters dependent on the scoring system and the compositions frequencies of the sequences being aligned. Based on Gumbel distribution, the probability of finding an ungapped segment pair with a score lower than x, can be defined: (2) P ( S ( A, B ) < x) » exp(- Kmne- l x ) , Correspondingly, P-value represents the probability of observing at least one score S greater than or equal to a score x in the set of alignment scores, which is given by: (3) P ( S ( A, B ) ³ x) = 1 - P ( S ( A, B ) < x) » 1 - exp(- Kmne - l x ) = 1 - e - E ( x ) . However, Karlin-Altschul model rigorously applies to the scores of optimal ungapped alignments [10]. Although no asymptotic score distribution has yet been established, computation experiments strongly suggest that the optimal gapped local alignment scores still follow an extreme value distribution. But the relevant parameters like K and λ can only be determined by simulation and empirical curve fitting. The simplest fitting approach is the linear regression. However, experiments shown the linear regression are seriously inferior to maximum likelihood fitting [11]. Under some scenarios, low scoring tail may be contaminated with very poor-scoring sequences that do not conform to an extreme value distribution, so a better scheme, called “Type I censored” fitting , is often used, in which any xi < c (a priori) is excluded to the fit. Given the scoring scheme, SC, the number of permutations, N, pairwise statistical significance presented in [2, 3] can be described as the following function: (4) PSS ( A, B, m, n, SC , N ). PSS function first simulates N random sequences of equal length n through permuting sequence B. Then it outputs N scores by aligning A against the N permutated copies of B, subsequently fits these scores to a Gumbel type EVD; Finally, it reports the pairwise statistical significance of the alignment score of A and B using Equation 2 after pragmatic estimation of the K and λ parameters. Compared to the database statistical significance used by popular alignment tools like BLAST, FASTA and SSEARCH, PSSE is specific to the sequence-pair being aligned and is independent of any database.

3. Extensions for standard pairwise statistical significance To improve performance of PSSE in term of retrieval accuracy, in this paper, we extend the existing work on the estimation of standard pairwise statistical significance [20] to incorporate three recent developments in the aspects of: (I) customizing scoring scheme (using position-specific substitution

201


matrices, PSSM) [7]; and (II) supporting non-conservative PSS [1]; (III) supporting analysis of errors per query versus coverage [17]. The former two works (I) and (II) have demonstrated the potential to improve sequence comparison performance significantly compared to the database statistical significance for biologically relevant and practical applications of homology detection. The extended work (III) can be used conveniently to measure different statistical significance estimation schemes in terms of retrieval accuracy [2].

3.1. Customized substitution matrix In general, the substitution score matrix is indexed by query-sequence residue and subject-sequence residue as in BLOSUM62, which is not sensitive to the difference of query sequences or the position of residues in query sequence. The position-specific score matrix (PSSM) introduced by [7] offers another way that indexes the substitution score matrix by query-sequence position and subjectsequence symbol. In PSSM, the substitution scores are further refined. For example, the same residue (e.g. `A') appearing at different position of query sequence has different scores, as shown in Figure 1. For a given query sequence, PSSM can be pre-trained by PSI-BLAST. Compared to standard substitution matrix, the dimension | w | ´ | w | of PSSM is replaced by a query-specific matrix of dimension | w | ´ | L | , where | w | is the size of residue characters set and | L | is the length of query sequence. This increases the memory requirement compared to the original layout, but reduces the lookups of the substitution matrix significantly.

Figure 1. Position-specific score matrix

Figure 2. The computation cells of SmithWaterman algorithm.

3.2. Non-conservative pairwise statistical significance The extended work (II) uses the earlier described PSS function twice with different ordering of sequence inputs [4]. In such way, it ensures that both sequences are permuted separately to construct two different EVD, subsequently resulting in two different estimations of PSS (PSS1 and PSS2, respectively) for the same sequence pair (A and B). Let PSS1 = PSS ( A, B, m, n, SC , N ) and

PSS 2 = PSS ( B, A, n, m, SC , N ) , then non-conservative pairwise statistical significance is defined as min{PSS1 , PSS 2 } .

3.3. Errors per Query versus Coverage Furthermore, in order to evaluate the accuracy of different approaches of statistical significance, our software library offers the popular evaluation measure, Error per Query (EPQ) versus Coverage [2, 17], which can be plotted to visualize and compare the results of statistical significance. The EPQ is defined as: (4) EPQ = Fnum / Qnum , where Fnum is the total number of non-homologous sequences detected as homolog (i.e., false positives) and Qnum is the total number of queries. The Coverage can be given by

202


(5) Converage = H d / H t , where H d and H t is the number of homologous pairs detected and the total number of homologous pairs presented in the sequence database, respectively. In order to plot the curves of Errors per Query versus Coverage, the list of pairwise comparisons is sorted based on increasing P-values (say, decreasing statistical significance). The higher position in the sorted list implies the higher probability that the compared pairwise sequences are homolog. Therefore, while traversing the sorted list from top to bottom, the coverage count is added one if the two sequences of the pair are true homolog, else it means false positive happened, and then the error count is increased by one. The Errors versus Coverage curves illustrate how much coverage is obtained at a given error level. Obviously, at a same EPQ level, a curve shows a better performance if it is more to the right of x-axis (representing Coverage).

4. Parallel implementations of pairwise statistical significance estimation Careful analysis of the data pipelines of PSSE shows that it can be decomposed into three main computation kernels: (I) Permutation: produce sequences by permuting sequence B randomly; (II) Alignment: align sequence A against the N permuted copies of sequence B using Smith-Waterman (SW) algorithm [18]; and (III) Fitting: fits the N alignment scores generated in (II) into an EVD in order to obtain constants K and λ, finally returns the PSS between sequence pair A and B according to Equation 2. The kernel II comprises the overwhelming majority of the overall execution time (a percentage of from 92.4% to 99.9% according to different length of aligned sequences). As a result, reducing the alignment time would significantly improve the overall performance.

4.1. MPI implementation As described in the previous section, the permutation and alignment stages during the procedure of PSSE is self-reliant of one another. Therefore, the PSSE can be mapped very well to MPI task parallelism. We design two approaches for MPI implementation, called intuitive strategy and tiling strategy respectively. 4.1.1. Intuitive strategy Let P be the number of computing nodes and N be the number of permutations. In [5], the root process broadcasts a query sequence, A, and a subject sequence, B, to all other processes. Each process then computes êé N / p úù permutations of B and aligns them against A. Subsequently, the root process gathers the éê N / p úù alignment scores from each process and fits them into an EVD to obtain the statistical parameters K and λ and then computes the pairwise statistical significance PSS using Equation 2. The intuitive strategy is simply extended from [5] for multi-pair PSSE. Each query and subject sequence pair chosen from sequence database is executed until every sequence pair is enumerated. Clearly, the main disadvantage of this strategy is that a higher overhead of communication among processes has to pay since the query and subject sequences need to be broadcasted to all the processes. Furthermore, in order to complete fitting task, the extra communication is required for gathering alignment scores to root process. In this case, the processes except root have to be idle when performing fit. 4.1.2. Tiling strategy The tiling strategy employs a more coarse-grained parallelism compared to the intuitive strategy. More specifically, this strategy partitions the sequence database into different disjoint sets of subject sequences and assigns one set to each process. Therefore, each process has all the query sequences and a subset of subject sequences, which means that the PSSE of each pair is independent of other processes. It profits data locality on single process, resulting in reducing the overhead of communication among processes. In addition, the workload of each process should be roughly equal in order to achieve higher efficiency for this parallelization. Specific to PSSE, the workload of process

203


mainly depends on the length of query and subject sequence. Before tiling the sequences, we therefore reorder the subject sequences based on their lengths. Instead of distributing roughly equal number of subject sequences to different processes, this approach partitions sequences into different bins based on the total length of sequence residues. Unlike the previous strategy, there is no need to gather alignment scores generated from every node as each node can complete PSSE separately. Algorithm 1 outlines the MPI implementation. Before all the sequences are partitioned, we first sort them based on their increasing length (line 6). Then for each query A chosen from query sequence database Dq, we load the corresponding substitution matrix (such as standard matrix like BLOSUM62 or position-specific scoring matrix) (line 11). Then we schedule P processes to handle P partitions in parallel. Be specific to a given process px (0 £ x £ P - 1) , for each subject sequence B selected from Bin[x], it runs a loop of N iterations (line 16-19) for the tasks of permutation and alignment. In each iteration of the loop, we produce a new random sequence, called Bk, by permuting sequence B (line 17-18), and use SW algorithm to align Bk against the query sequence A to get an alignment score (line 19). Subsequently, to obtain statistical parameters K and λ, the N alignment scores are fitted into an EVD using censored maximum likelihood fitting (line 20). Finally, the PSS is obtained using Equation 3 (line 21). If non-conservative PSS is needed, then sequence A and B are swapped . px jumps back to the beginning of alignment section (line 15), and then repeats the same procedure (line 16-21). Correspondingly, the non-conservative PSS is the minimum of the two pairwise statistical significance (line 22). After one round is done (line 10-25), in which one query sequence aligns against all the subject sequences, all PSS values are gathered to process 0 (namely, root), then PSS list is quickly sorted based on decreasing statistical significance (line 27). After that, Errors per Query and Coverage can be obtained according to Equation 4 and Equation5, respectively (line 28). Algorithm 1: Pseudo-code of MPI implementation for PSS estimation. Input: Dq: query sequence database; Ds: subject sequence database; Output: PSS- pairwise statistical significance; EPQ -errors per query 1 Read sequences from database ( Dq, Ds); 2 Read scoring scheme SC; 3 P← size of processes; 4 R← rank of process; 5 if R=0 then 6 sort length array of subject sequences ∈ Ds based on their length; 7 dynamic partition length array into P Bins; 8 transfer the ID information of each partition to different MPI task; 9 for i← 1 to |Dq| do 10 pick up query sequence A from Dq; 11 M ← substitution matrix for A; 12 /* MPI-level parallelization */ for j ← 0 to |bin[x]| do 13 pick up subject sequence B from Bin[p]; 14 t←0; /* switch variable for non-conservative PSS */ 15 ncPss: 16 /* OpenMP-level parallelization */ for k ← 0 to N − 1 do 17 RNs ← rand(time(null) + Tid* Tid); /* generate random numbers */ 18 B k ← k-th permuted sequence of B; /* permute sequence B using RNs */ 19 Scores[k] ← SW(A,Bk,m,n,SC); /* align using SW algorithm */ 20 (Kj,λj) ← EVD_Fitting(Scores); /* obtain K and λ using censored maximum likelihood fitting */ pss j ¬ 1 - exp (- K j mne

21 22 23 24 25 26 27 28

-l j x

)

/*compute pairwise statistical significance using Equation 3 */

nc_Pss flag = true & t= 0 then swap (A, B); t ← 1; goto ncPss; if nc_Pss flag = true then PSS[i] ← min(pss0 , pss1 ); else PSS[i] ← pss0; if rank = 0 then gather PSS from each process; quick sort PSS[|Ds|] list ; compute EPQ and Coverage; if

/* if non-conservative is required */

4.2. OpenMP implementation Because the operations of permutation and alignment of PSSE are self-reliant of one another, it also can be easily extended to OpenMP paradigm. In general, OpenMP only can be applied one shared memory computer, therefore our pure OpenMP implementation just focuses on a single multi-

204


core processor. To enhance parallelism of OpenMP implementation, we have investigated two approaches for parallelizing the main kernel of PSSE: (1) Inter-parallelism: each task as a whole is assigned to one thread. In this case, each thread works on the permutation and alignment. Correspondingly, multiple permutation and alignment tasks are executed in parallel by different threads. (2) Intra-parallelism: one alignment task is assigned to all the threads which cooperate to carry out the task in parallel, exploiting the parallel characteristics of cells in the anti-diagonals, as shown in Figure 2. However, because of the higher overhead of synchronizations and communications among threads, it is hard to achieve good performance. Compared to the former approach, our experiments show no further performance improvement as expected.

4.3. Hybrid implementation of OpenMP and MPI The application built with the hybrid model of parallel programming matches the characteristics of the cluster of multi-core nodes very well, since it allows for a two-level communication pattern (i.e., intra-communications among cores within a node and inter-node communications among nodes). In this pattern, OpenMP can be transparently extended to nonshared memory systems. Our hybrid implementation is a combination of our OpenMP and MPI implementation, which consists of a hierarchy of MPI tasks and OpenMP threads. To employ hybrid paradigms of OpenMP and MPI, we simply use OpenMP paradigm to unloop the section marked by comment “OpenMP level parallelization” in Algorithm 1. We use the tiling strategy to first distribute workload over multiple MPI processes. For a given MPI process p x , it handles its own partition Bin[x] of the subject sequences. When p x picks up one pair of sequences (A, B) belonging to Bin[x], it spawns T threads complete corporately the kernels of permutation and alignment (line 16-19 in Algorithm 1). After done, the threads exit and MPI task p x uses the all the alignment scores to perform fitting and computes PSS.

5. Experimental results and discussion The experiments are carried out on Cray XE6 machines (provided by Hopper system at NERSC), each of which has 24 cores and 32 GB memory. Cray XE6 machine contains two twelve-core AMD® "Magny-Cours" @2.1 GHz processors, running in 64-bit Linux operating system. For flat-MPI and hybrid implementations, we use 1024 cores which spread across different 43 nodes. For OpenMP implementation, we use 24 cores provided by single node to perform our experiments since current OpenMP paradigm cannot be applied on more than one node on Cray platform. The sequences data used in this work comprises of a non-redundant subset of the CATH 2.3 database [17]. This dataset consists of 2771 domain sequences as our subject sequences library and includes 86 CATH queries serving as our query set. The number of permutations N is set to be 1000, as used in previous studies on PSSE [19, 20].

5.1 Results and analysis of MPI implementation We first compare the performance of flat-MPI and hybrid paradigm. When MPI tasks P < 64, there are not many differences in term of speedup between the two strategies. Due to lower overhead of communications and better load balancing, the flat-MPI implementation using tiling strategy shows a much better scalability and higher performance than the intuitive strategy when P > 64. Figure 3-(a) shows the experimental results of the flat-MPI implementation. All speedups are computed over the corresponding sequential implementation. Their maximum speedups are 452.93× and 92.92× when using 1024 MPI processes and 256 MPI processes, respectively. As expected, non-conservative PSSE

takes nearly double time to estimate pairwise statistical significance since twice permutations and alignments are required compared to standard PSSE. But its speed-up is similar to standard PSSE.

205


Figure 3. The speedup of MPI and OpenMP implementation.

5.2 Results and analysis of OpenMP implementation The amount of computation for a given query sequence depends on its length. To test the influence of using different length sequence, we compare the performance using multiple lengths of query sequences. Four query sequences of length 200, 400, 800, and 1600 are chosen from CATH to align against all the 2771 subject sequences. We have tested different number of OpenMP threads T. Figure 3-(b) shows the experimental results of the OpenMP implementation. The maximum speedup is 6.10×, 9.61×, 14.55×, and 18.94×, corresponding to the length of query equal to 200, 400, 800 and 1600, respectively. Observe that, performance increases with the increase in length of query sequences. Although the non-uniform memory architecture (NUMA) used by Hopper machine provides low level memory characteristics (such as memory bandwidth) and higher scalability, it also brings its disadvantages(i.e., NUMA effect). Each used NUMA node is a four-chip node, each of which contains a six-core die. although all the memory in NUMA is transparently accessible, if a process running on chip node i accesses the memory connected to a different chip node j (where i ≠ j), a performance penalty will be incurred because of NUMA effect. When amount of computation (presenting the length of sequence) is lower, the performance will be more sensitive to this extra overhead. This can shed a light on why the speedup using 8 threads is even a little lower than that using 6 threads when the length of query sequence is equal to 200, because the benefits of increasing threads is lower than the overhead generated by NUMA effect.

5.3 Results and analysis of hybrid implementation As for hybrid implementation on multiple nodes, the experiments are carried out for different combinations of MPI tasks P and OpenMP threads T, subject to P×T =1024 and T < 24. When T = 4 and P = 256, its performance is up to a maximum speedup of 624.73×. Again, substantial reduction in memory usage occurs with increasing number of OpenMP threads at the cost of some performance loss. NUMA effect hurts performance significantly for the same reason analyzed in above subsection when T > 6. We show the results in Figure 4. Figure 4. The speedup of hybrid implementation. In summary, MPI paradigm usually obtains good performance and scalability but consumes more memory. OpenMP paradigm may not alter the performance of an application much since thread-level synchronization and communications bring extra overheads, but it can decrease memory usage, in some cases, very dramatically. It allows larger problems to have a better chance to be addressed. Hybrid paradigm can harvest the benefits

206


offered by OpenMP and MPI, resulting in a best performance. However, it is not a necessary conclusion. We need to tune the parameters like the number of threads and processes to find out the best performance according to specific architecture of machine used by a given application.

5.4. Verifications In this section, we compare the retrieval accuracy of pairwise statistical significance against database statistical significance and the retrieval accuracy of pairwise statistical significance using different substitution matrices. In our experiments, the coverage of each of the 86 queries at the 1st, 3rd, 10-th, 30-th, and 100-th error is reported, and the median coverage at each error level is compared among different sequence-comparison schemes. As shown in Figure 5-(a), pairwise statistical significance performs significantly better than database statistical significance (used by BLAST, PSI-BLAST, and SSEARCH) in term of retrieval accuracy (a percentage of 35%-40% according to different schemes). The non-conservative pairwise statistical significance achieves the best performance. As for the PSS using different substitution matrices, the methods using PSSM win the best performance, followed by using SSSM [2]. At the same level of Error per Query, the Coverage of the approach using position-specific scoring matrix is higher than the one using standard substitution matrix (such as BLOSUM62) about at a percentage of 25%, as shown in the Figure 5-(b). The software is available for free academic use at http://cucis.ece.northwestern.edu/projects /PSSE/.

Figure 5. Errors per Query vs. Coverage plots for different methods of statistical significance.

6. Conclusions In this paper, we present a software library for estimating the pairwise statistical significance in parallel of local sequence alignment, which employs OpenMP, MPI and hybrid paradigms to build a high performance accelerator. By distributing the most compute-intensive task of permutation and alignment, our accelerator harvests the maximum power of these parallelization paradigms, which results in up to an end-to-end speedup. In addition, the proposed strategies and the efficient framework of PSSE are applicable to many other next-generation sequencing comparison based applications, such as phylogenetic tree construction, DNA sequence mapping, and multiple sequence alignment [9, 13].

7. Acknowledgment This work is supported in part by NSF award numbers CCF-0621443, OCI-0724599, CCF-0833131, CNS-0830927, and OCI-1144061, and in part by DOE grants DE-FC02-07ER25808, DE-FG0208ER25848, DE-SC0001283, DE-SC0005309, DE-SC0005340, National Nature Science Foundation of China (Contract No. 60973118) and China Scholarship Council. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

207


8. References [1] Ankit Agrawal and Xiaoqiu Huang. “Conservative, non-conservative and average pairwise statistical significance of local sequence alignment”. International Conference on Bioinformatics and Biomedicine, pp. 433-436, 2008. [2] Ankit Agrawal and Xiaoqiu Huang. “Pairwise statistical significance of local sequence alignment using sequence-specific and position-specific substitution matrices”. IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no.1, pp.194-205, 2011. [3] Ankit Agrawal, Volker Brendel and Xiaoqiu Huang. “Pairwise statistical significance versus database statistical significance for local alignment of protein sequences”. Bioinformatics Research and Applications, vol.4983, pp.50-61, 2008. [4] Ankit Agrawal, Alok N. Choudhary and Xiaoqiu Huang. “Non-conservative pairwise statistical significance of local sequence alignment using position-specific substitution matrices”. International Conference on Bioinformatics & Computational Biology, pp.262-268, 2010. [5] Ankit Agrawal, Sanchit Misra, Daniel Honbo and Alok N. Choudhary. “MPIPairwiseStatSig: parallel pairwise statistical significance estimation of local sequence alignment”. HighPerformance Parallel and Distributed Computing, pp. 470-476, 2010. [6] Stephen F. Altschul, Warren Gish, Webb Miller, Eugene Myers and David J. Lipman. “Basic local alignment search tool”. Journal of Molecular Biology, vol.215, pp. 403-410, 1990. [7] Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer1, Jinghui Zhang, Zheng Zhang, Webb Miller and David J. Lipman. “Gapped blast and PSI-BLAST: A new generation of protein database search programs”. Nucleic Acids Research, vol.25, no.17, pp. 3389-3402, 1997. [8] Bastien Olivier and Maréchal Eric. “Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores”. BMC Bioinformatics, vol.9, no.1, pp.332, 2008. [9] Zhanmao Cao, Jianchao Tang, Shizhong Jiang and Huixuan Wu. “Optimization multiple sequence alignment scheme in DC-BTA”. JDCTA: International Journal of Digital Content Technology and its Applications, vol.8, no.8, pp. 55-64, 2010. [10] Sean R. Eddy. “A probabilistic model of local sequence alignment that simplifies statistical significance estimation”. PLoS Computational Biology, vol.4, no.5, 2008. [11] Sean R. Eddy. “Maximum likelihood fitting of extreme value distributions”. 1997 (unpublished work). [12] Emil J. Gumbel. Statistics of extremes. Addison-Wesley, USA, 2004. [13] Tie-jun Jia and Xi-qiang Zhao. “An improved multiple sequence alignment genetic algorithm”. JDCTA: International Journal of Digital Content Technology and its Applications, vol.5, no.12, pp.436-444, 2011. [14] Samuel Karlin and Stephen Altschul. “Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes”. Proceedings of the National Academy of Sciences, vol.87, no.6, pp. 2264-2268, 1990. [15] Richard Mott. “Accurate formula for p-values of gapped local sequence and profile alignments”. Journal of Molecular Biology, vol.300, pp. 649-659, 2000. [16] William R. Pearson. “Empirical statistical estimates for sequence similarity searches”. Journal of molecular biology, vol.276, pp.71-84, 1998. [17]Michael L. Sierk and William R. Pearson. “Sensitivity and selectivity in protein structure comparison”. Protein Science, vol.13, no.3, pp. 773-785, 2004. [18]Temple F. Smith and Michael S. Waterman. “Identification of common molecular subsequence”. Journal of molecular biology, vol.147, pp. 195-197, 1981. [19]Yuhong Zhang, Md. Mostofa Ali Patwary, Sanchit Misra, Ankit Agrawal, Wei-keng Liao, Zhiguang Qin and Alok N. Choudhary. “Accelerating pairwise statistical significance estimation for local alignment by harvesting GPU’s power”. BMC bioinformatics, 2012 (Accepted). [20]Yuhong Zhang, Md. Mostofa Ali Patwary, Sanchit Misra, Ankit Agrawal, Wei-keng Liao, and Alok N. Choudhary. “Enhancing parallelism of pairwise statistical significance estimation for local sequence alignment”. Workshop on Hybrid Multi-core Computing, pp.1-8, 2011.

208