Guest Editors' Introduction to the Special Section on Bioinformatics

IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS,

VOL. 7,

NO. 4,

OCTOBER-DECEMBER 2010

577

Guest Editors’ Introduction to the Special Section on Bioinformatics Research and Applications Ion Mandoiu, Giri Narasimhan, Yi Pan, and Yanqing Zhang

Ç

T

special section includes a selection of papers presented at the Fifth International Symposium on Bioinformatics Research and Application, which was held on 13-16 May 2009 at Nova Southeastern University in Fort Lauderdale, Florida. The ISBRA symposium provides a forum for the exchange of ideas and results among researchers, developers, and practitioners working on all aspects of bioinformatics and computational biology and their applications. In 2009, 55 papers were submitted in response to the ISBRA call for papers, out of which 26 papers appeared in the proceedings published as volume 5542 of Springer Verlag’s Lecture Notes in Bioinformatics series. Following a rigorous review process, extended versions of six of these papers were selected for publication in this special section. The selected papers cover a broad range of bioinformatics topics, ranging from comparative genomics and phylogenetics to population genetics, and from RNA structure prediction to analysis of protein-protein interaction networks. Below, we briefly introduce each of them. Munoz and Sankoff explore methods for computing rearrangement distances between genomes for which contig sequences, but not necessarily full chromosomes, are available. Addressing this problem is very timely since an increasing number of genomes are released in draft format, and new algorithms are needed to extract information from such fragmentary data. The authors propose treating each contig as a chromosome, then correcting intercontig distances computed with standard rearrangement distance algorithms to account for the number of extra fusion operations needed to assemble contigs into full chromosome sequences. For the case of comparing a fragmented genome to a complete genome, the authors show a linear dependence of distance estimation measure on the number of contigs and offer formulas to correct the distance estimates accordingly. A similar linear dependence is observed when both genomes are fragmented. The authors show that corrected distances can be used to accurately HIS

. I. Mandoiu is with the Computer Science and Engineering Department, University of Connecticut, 371 Fairfield Road, Unit 2155, Storrs, CT 06269-2155. E-mail: [email protected]. . G. Narasimhan is with the School of Computing & Information Science, Florida International University, 11200 SW 8th Street, University Park, Miami, FL 33199. E-mail: [email protected]. . Y. Pan and Y. Zhang are with the Computer Science Department, Georgia State University, Atlanta, GA 30303-4110. E-mail: {pan, yzhang}@cs.gsu.edu. For information on obtaining reprints of this article, please send e-mail to: [email protected]. 1545-5963/10/$26.00 ß 2010 IEEE

reconstruct the phylogeny of a group of insects, including 10 Drosophila species. Venkatachalam et al. present new algorithms for several optimization problems on tanglegrams. Tanglegrams, which are widely used in biology to compare phylogenies and infer phenomena such as gene transfers, are pairs of rooted trees whose edges are joined by a perfect matching. The main problem considered by the authors is finding a drawing of the trees which minimizes the number of crossings between the edges of the matching. The authors give efficient algorithms for the case when one tree is fixed, and a new fixed-parameter tractable algorithm for the case when both trees can be rearranged. The latter algorithm settles an open question on the complexity of minimizing the number of crossings for d-ary trees with d > 2. They also consider a variant of the problem that seeks minimization of the Spearman’s footrule distance instead of the crossing number, and give integer programming formulations for optimizing both objectives. Bonizzoni et al. investigate new algorithms for the pure parsimony XOR haplotyping (PPXH) problem. For diploid organisms, the XOR genotype is a binary vector in which 0 and 1 represent homozygous, respectively heterozygous SNP loci. Since XOR genotypes can be determined using inexpensive techniques, the problem of reconstructing haplotypes from a set of XOR genotypes naturally arises. Under the pure parsimony model, the objective is to determine a smallest set of haplotypes that explain a given set of XOR genotypes. The authors introduce a graph representation of the set of solutions and establish several interesting combinatorial properties that lead to polynomial-time algorithms for some restricted versions of the PPXH problem. They also give fixed-parameter and approximation algorithms for the general version of the problem, as well as a practical heuristic. Wu describes a new dynamic programming algorithm for computing the exact likelihood of a set of sequences under the infinite sites coalescent model. His method relies on a classical recurrence due to Ethier, Griffiths, and Tavare`. The key to the improved efficiency is to calculate the probabilities forward in time, from the root of the perfect phylogeny of the data toward its leaves. This results in an appreciable reduction in runtime and memory usage. The author presents experimental results showing the feasibility of exact likelihood computations for simulated and real data sets of moderate size, and uses it to assess the accuracy of a popular approximation method based on importance sampling. Published by the IEEE CS, CI, and EMB Societies & the ACM

578

IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS,

Rajasekaran et al. present improved parsing algorithms for two types of grammars, Simple Linear Tree Adjoining Grammars (SLTAG) and Extended SLTAG (ESTLAG), previously introduced to model RNA folding with pseudoknots. The new algorithms have worst-case time complexity matching previous results of Uemura et al. but achieve improved practical performance by exploiting the sparsity of the underlying TAG parsing matrix. Experimental results on test sequences from the Rfam, Pseudobase, and tmRNA databases confirm the significantly improved practical performance. Blin et al. introduce a new algorithm to query protein interaction networks for the presence of a subgraph similar to a given query graph. As with the previously proposed QNet method, the new algorithm uses dynamic programming and color-coding techniques to align a tree query to an arbitrary graph. To transform cyclic query graphs into trees without loss of information, the authors use a feedback vertex set and node duplications, which gives an alternative to the treedecomposition of the query used by QNet. The authors provide a python implementation of their algorithm, called PADA1, and validate it on several PPI data sets. We would like to thank all ISBRA authors for their highquality contributions, and the ISBRA Program Committee and anonymous reviewers for volunteering their time and expertise to evaluate the manuscripts submitted to the symposium and the special section. Last, but not least, we would also like to thank the Editor-in-Chief, Dr. MarieFrance Sagot, for providing us with the opportunity to showcase some of the exciting research presented at ISBRA in this special section of the IEEE/ACM Transactions on Computational Biology and Bioinformatics.

Ion Mandoiu Giri Narasimhan Yi Pan Yanqing Zhang Guest Editors Ion Mandoiu received the MS degree from Bucharest University in 1992 and the PhD degree from the Georgia Institute of Technology in 2000, both in computer science. He is an associate professor in the Computer Science and Engineering Department at the University of Connecticut, Storrs. His main research interests are in the design and analysis of approximation algorithms for NP-hard optimization problems, particularly in the areas of bioinformatics, design automation, and ad-hoc wireless networks, areas in which he has authored more than 70 refereed journal and conference proceeding papers. He has also coedited a book Bioinformatics Algorithms: Techniques and Applications published in the Wiley Book Series on Bioinformatics. Dr. Mandoiu has served on the program committee of numerous international conferences, and as program committee cochair for the 2006-2009 International Symposium on Bioinformatics Research and Applications and the 2007 IEEE International Conference on Bioinformatics and Biomedicine. He serves as an associate editor for BMC Bioinformatics and is on the editorial board of the the International Journal of Bioinformatics Research and Applications. He has also been a guest editor for the IEEE/ACM Transactions on Computational Biology and Bioinformatics, IEEE Transactions on Nanobiosciences, the International Journal of Wireless and Mobile Computing, and the Journal of Universal Computer Science. Dr. Mandoiu is a 2006 recipient of the US National Science Foundation Faculty Early Career Development Award.

VOL. 7,

NO. 4,

OCTOBER-DECEMBER 2010

Giri Narasimhan received the BTech degree in electrical engineering from the Indian Institute of Technology, Bombay, and the PhD degree in computer science from the University of Wisconsin-Madison. He is a professor in the School of Computing and Information Sciences at Florida International University (FIU). He is also currently the associate dean for Research and Graduate Studies in the College of Engineering and Computing at FIU. His research interests are in the design and analysis of algorithms and bioinformatics. He has more than 100 refereed publications to his credit in the form of journal publications, conference proceedings, and book chapters. He is the coauthor of a monograph titled Geometric Spanner Networks published by Cambridge University Press and has coedited two conference proceedings. He has been on the program committee of numerous international conferences and is on the editorial board of three international journals. His research has been funded by the US National Science Foundation, US National Institutes of Health, state agencies and industry. Yi Pan received the BEng and MEng degrees in computer engineering from Tsinghua University, China, in 1982 and 1984, respectively, and the PhD degree in computer science from the University of Pittsburgh, Pennsylvania, in 1991. He is the chair and a professor in the Department of Computer Science and a professor in the Department of Computer Information Systems at Georgia State University. Dr. Pan’s research interests include parallel and distributed computing, networks, and bioinformatics. Dr. Pan has published more than 100 journal papers with more than 40 papers published in various IEEE journals. In addition, he has published more than 100 papers in refereed conferences. He has also authored/edited 34 books (including proceedings) and contributed many book chapters. Dr. Pan has served as an editor-in-chief or editorial board member for 15 journals including five IEEE transactions and was a guest editor for 10 journals, including the IEEE/ACM Transactions on Computational Biology and Bioinformatics and IEEE Transactions on NanoBioscience. He has organized several international conferences and workshops and has also served as a program committee member for several major international conferences such as INFOCOM, GLOBECOM, and ICC. Dr. Pan has delivered more than 10 keynote speeches at many international conferences and is a speaker for several distinguished speaker series. He is listed in Men of Achievement, Who’s Who in Midwest, Who’s Who in America, Who’s Who in American Education, Who’s Who in Computational Science and Engineering, and Who’s Who of Asian Americans. Yanqing Zhang received the BS and MS degrees in computer science from Tianjin University, China, in 1983 and 1986, respectively, and the PhD degree in computer science from the University of South Florida, Tampa, in 1997. He is an associate professor in the Computer Science Department at Georgia State University. His research interests include hybrid intelligent systems, computational intelligence, machine learning, data mining, bioinformatics, Web intelligence, green computing, etc. He has coauthored two books and coedited two books and four conference proceedings. He has published 15 book chapters, 65 journal papers, and more than 130 conference/ workshop papers. He has served as a program committee member for more than 100 international conferences and workshops. He was program cochair for several symposiums and conferences. He is managing editor of the International Journal of Functional Informatics and Personalised Medicine, associate editor of the Journal of Computational Intelligence in Bioinformatics and Systems Biology, and an editorial board member of several international journals. He has received several awards, including the Outstanding Academic Service Award at the Seventh IEEE International Conference on Bioinformatics & Bioengineering (BIBE 2007) and the Outstanding Service Award at the 2005 IEEE International Conference on Granular Computing. He is a member of the Bioinformatics and Bioengineering Technical Committee, and the Data Mining Technical Committee of the IEEE Computational Intelligence Society.