Signal Processing in Genomics and Proteomics - IEEE Xplore

2 downloads 0 Views 54KB Size Report
Dec 8, 2011 - first full draft of the human genome, yet we have ... genomic and proteomic exploration technologies on .... synthetic biology, the aim is to exploit.
[from the GUEST EDITORS]

Nevenka Dimitrova, Judith Klein-Seetharaman, Peter Beyerlein, and Ahmed H. Tewfik

Signal Processing in Genomics and Proteomics: Defining Its Role

I

t has only been a decade since the first full draft of the human genome, yet we have seen an explosion of genomic and proteomic exploration technologies on the “journey to the center of biology,” as Eric Lander put it. Massive amounts of data are available for research in molecular biology and medicine, which require new computational approaches to elucidate the biological meaning. The January 2007 issue of IEEE Signal Processing was dedicated to the role of signal processing in the acquisition and analysis of this data (vol. 24, no. 1). The rapid (r)evolution from microarray to next-generation sequencing technologies and proteomic technologies requires an update of the state-of-the-art and future challenges for signal processing in genomics and proteomics, defining its role in current computational and systems biology. Recent high-throughput studies probe regulatory mechanisms through gene and noncoding RNA expression [e.g., microRNA (miRNA) and long noncoding RNA (lncRNA)] genomic single nucleotide and gene copy number variations, DNA methylation, histone methylation, and other modifications and chromatin remodeling. Whole genome sequencing is approaching wide affordability. The latest sequencing technology is fast producing new data (RNA-seq, ChIP-seq, DNA-seq, and methyl-seq) on a whole genome scale at a single nucleotide resolution level. New approaches in quantitative proteomics estimate the amounts and post-translational modification status of proteins allowing the building of dynamic models as functions

Digital Object Identifier 10.1109/MSP.2011.943214 Date of publication: 8 December 2011

of time, species origin, and internal or external perturbation. One of the interesting developments in this area is integrating approaches to genomic, transcriptomic, and epigenetic regulation via biological pathway knowledge. The article by Yoon et al., “Comparative Analysis of Biological Networks” provides a profound mathematical analysis of biological networks, employing the well-understood mathematical concept of hidden Markov models (HMMs). Today, many facts about biomolecule interactions are provided in

ONE OF THE MAJOR APPLICATION AREAS FOR GENOMIC SIGNAL PROCESSING IS BIOMARKER DISCOVERY. databases and are subsequently summarized in network models. Network analysis has become a substantial tool in analyzing biomolecular pathways. Comparing complex networks is a difficult intellectual task, even for the human brain. The authors review existing comparative network analysis methods, derive optimal network querying using HMMs as well as local network alignment using HMMs, and finally estimate functional similarity through Markov random walks. The authors succeed in embedding the problem of comparative network analysis into the well-established and powerful HMM framework. One of the major application areas for genomic signal processing is biomarker discovery. The statistical challenges in discovering robust biomarkers using high-throughput molecular profil-

IEEE SIGNAL PROCESSING MAGAZINE [19] JANUARY 2012

ing technologies are due to the coupling of the innate heterogeneity of the disease along with the feature-rich, casepoor nature of oncology research. A review of the evolution of methods that analyze genome-wide molecular profiling data is presented by Varadan et al. in their article “The Integration of Biological Pathway Knowledge in Cancer Genomics.” They highlight a few approaches that explicitly integrate biological pathway information for cancer genomics and chart unsolved problems for the signal processing community to contribute to this exciting field. In “Graphical Models and Inference on Graphs in Genomics,” Shamaiah et al. review graphical models and their use in genomics. Processing and analysis of high-throughput genomics data is viewed as the problem of inference from the resulting large graphs. Typical application areas are microarray data, gene regulatory networks, and next-generation sequencing data. Emphasis is given to message-passing algorithms, their complexity, and implementation issues. Following a survey of currently used mathematical models to infer gene regulatory networks from genomic data sets, specific emphasis is given to approaches that score amongst the top performing methods in the Dialogue for Reverse Engineering Assessment and Methods (DREAM) initiative. These methods are then quantitatively compared to message-passing algorithms. Other application areas for message-passing algorithms are also discussed, including motif finding and clustering in microarray data. The attractive features of these algorithms are their intuitive visual interpretation of the underlying complex problems and allowing practical solutions to complex inference problems.

[from the GUEST EDITORS]

continued

Finally, they are highly amenable to parallelization. Their proven performance in a number of important tasks in genomics reviewed in this article makes further developments an attractive area of application of signal processing in genomics. The article “Robust Approaches for Genetic Regulatory Network Modeling and Intervention,” by Pal et al., takes a closer look at the modeling of gene regulatory networks. Not only are different models to describe these networks reviewed but also their integration with other diverse biological information sources. The relationship between detailed models based on the stochastic master equation and approximate models such as coarse-scale Markov chain models and deterministic differential equation models is described. The issue of robustness of the models is addressed in particular. While modeling gene regulatory networks aids greatly in our understanding of the underlying mechanisms, one particularly attractive motivation is the possibility to exploit this knowledge for intervention. Two major areas of intervention are in systems medicine and in synthetic biology. In the former, we wish to change genetic networks to move from an undesirable state (such as one associated with disease) to a desirable one (the healthy state). In synthetic biology, the aim is to exploit gene regulatory networks to perform functions such as logical operations in biocomputing or chemical syntheses for example to improve production of biofuels or other biotechnology application areas. Issues in how to exploit the models discussed in general for the specific application area of designing strategies for intervention such as mismatch between the actual gene regulatory network on the model and the problem in reliable estimation of parameters are taken into account. Thus, it is possible to elicit a robust response through intervention even in the presence of these uncertainties. Current unsolved problems in this area are proposed for future directions in the use of signaling processing

approaches for gene regulatory network inference and design. Our understanding of complexity of gene regulation has fundamentally changed after the discovery of the importance of miRNAs with their ability for post-transcriptional regulation. MiRNA is a class of small noncoding RNAs of around 22 nucleotides; they have a role in repression of transcription and/or inhibition of translation of a large number of genes by binding mainly to the 3r untranslated region (UTR) of target genes. In the article “Understanding microRNA Regulation,” Yue et al. present a state of the art on the computational approaches for three important topics: miRNA target prediction, miRNA regulating pathways prediction, and reconstruction of cooperative regulatory

DNA COPIES OF PORTIONS OF A GENOME CONTRIBUTE TO NATURAL GENETIC DIVERSITY IN HEALTHY INDIVIDUALS AND ALSO ARE A DRIVING CAUSE FOR CANCER AND OTHER GENETIC DISEASES. networks by miRNA and transcription factors. They reviewed the high throughput experiments and databases essential for miRNA functional prediction. The cancer genomics field is growing at a very fast pace. Saksena et al. have articulated the steps that can lead to probable error in data processing from platform data analysis point of view in their article “Developing Algorithms to Discover Novel Cancer Genes.” Assessing statistical significance and iterative development process provides significant insight into the whole process of implementing new strategies and algorithms to gain high confidence results. Their article focuses on the challenges of developing new quantitative techniques that need to be continuously adapted to the evolving new biological insights that are produced by these same methods. One of

IEEE SIGNAL PROCESSING MAGAZINE [20] JANUARY 2012

the biggest challenges is how to keep our attention on distinguishing signal from noise and the detection of somatic mutations that drive cancer which are buried in variations that are irrelevant to cancer. Of course while driver event callers must ultimately be validated by lab experiments, new algorithms are first assessed based on their ability to recapitulate the results from wet-lab experiments, reproduce the current biological knowledge, demonstrate consistency over multiple sample sets, and show concordance across experiments from multiple types of measurements. They show the example of their own development of Genomic Identification of Significant Targets in Cancer (GISTIC) which uses data about copy number alterations across multiple samples and identifies genomic regions that are gained or lost more often than one would expect by chance. The authors recall the evolution of the thinking behind this important algorithm from the first version where the null background model assumes that all amplification and deletion events are passenger events, and thus their locations are equally probable across the genome. The second version, on the other hand, uses separate background models for chromosome arm-level events and short (focal) events and infers the most likely set of overlapping short copy number events that occurred. DNA copies of portions of a genome contribute to natural genetic diversity in healthy individuals and also are a driving cause for cancer and other genetic diseases. In the article titled “Detecting Changes in the DNA Copy Number,” Pique-Regi et al. offer a signal processing view on the substantial problem of detecting copy number variations. Copy number variation is a type of genetic variation that has a significant impact on the phenotype. It is crucial in generation of biological diversity as a driving force of evolution as well in understanding disease. The authors succeed in embedding the problem of copy number detection into a signal processing problem. Once transforming the problem

into the world of signal processing, it is natural to employ a standard signal processing analysis. Here the authors employ filtering techniques, segmentation techniques, dynamic programming and HMMs, and, finally, sparse signal processing techniques in view of the well-known problem of improving signal-to-noise ratios. Finally, time-course analysis is critical to longitudinal genomic studies. The article “High-Dimensional Longitudinal Genomic Data” by Carin et al. proposes novel computational strategies for studying longitudinal gene expression and proteomic data. The experimental data comes from several viral challenge studies, performed on healthy human

[from the EDITOR]

THIS SPECIAL ISSUE AMPLY DEMONSTRATES THE SUCCESSES AND OPPORTUNITIES FOR SIGNAL PROCESSING IN GENOMICS AND PROTEOMICS. volunteers. The article presents signal processing methods for analysis of genomic biomarkers that mark the temporal changes of the genes involved in the host response. There are multiple perspectives related to factor analysis and dictionary learning, and in each the high-dimensional data trajectories are related to a relatively low-dimensional

vector of latent factors or dictionary elements. In effect, the authors present an approach for monitoring high-dimensional data by projecting it onto lowerdimensional spaces. They discuss the use of Bayesian and non-Bayesian inference methods. Examining all of these computational strategies leads to the conclusion that different methods yielded similar gene lists that have important role in the host response to a virus. This special issue amply demonstrates the successes and opportunities for signal processing in genomics and proteomics and the guest editors’ hope that it will spark further attention on this biological frontier by the signal processing community. [SP]

(continued from page 2)

new research results, SPM introduces, in an accessible manner, new areas to research students as well as practitioners. This is what makes SPM such a healthy and high-quality magazine. SPM also includes columns and forums that publish interesting articles on technology as well as emerging areas and trends in signal processing research. Thomson Reuters released its 2010 Journal Citation Report, which is generally accepted as the world’s most influential source of information about highly cited, peer-reviewed publications. SPM continues to rank first among all publications in the electrical engineering field (nearly 250 of them). In 2010, its impact factor reached a high of 6.00. This impressive accomplishment is due to the hard work of the entire magazine’s editorial team. Various initiatives have strengthened the outreach of the magazine. They include the translations of selected articles into Chinese and Brazilian Portuguese

as well as the use of video Tags, which are a link associated with an electronic article, allowing readers to view various

I ENCOURAGE YOU TO BE INVOLVED IN FEATURE ARTICLES, SPECIAL ISSUES, COLUMNS, AND FORUMS, AND TO BE A DILIGENT REVIEWER FOR THE MAGAZINE. media from electronic devices, ranging from desktop or notebook computers to smart phones. These achievements were only made possible through the dedicated work of staff and volunteers, particularly the reviewers, associate editors, and the continued excellent leadership of area editors and the editor-in-chief. Special thanks go to Prof. Li Deng,

IEEE SIGNAL PROCESSING MAGAZINE [21] JANUARY 2012

SPM’s former editor-in-chief, for his invaluable support during the transition period. The new era of signal processing brings great challenges with it; only the active support of our community can maintain SPM as the flagship magazine of the IEEE Signal Processing Society. I encourage you to be involved in feature articles, special issues, columns, and forums, and to be a diligent reviewer for the magazine. Given the stature of SPM and the great reputation it enjoys, there are tremendous expectations from us. With your continued support, we will definitely be able to make SPM an even better magazine.

[SP]

Suggest Documents