Developing Protein Structure-Function Relationships

0 downloads 0 Views 603KB Size Report
PROTEIN PHYLOGENY AND MULTI FUNCTION-. ALITY. The first 3D model was published in 1960 (whale myo- globin [23]) and the 3D structures published ...
Current Organic Chemistry, 2008, 12, 957-971

957

Developing Protein Structure-Function Relationships in silico Nasir-ud-Din1*, Ishtiaq Ahmad1, A.R. Shakoori2 and Daniel C. Hoessli3 1

Institute of Molecular Sciences & Bioinformatics, Lahore, Pakistan; 2School of Biological Sciences University of the Punjab, Lahore, Pakistan; 3Department of Pathology and Immunology, Centre Medical Universitaire, Genève, Switzerland Abstract: Understanding the biological functions of proteins has been facilitated by the availability of powerful computational tools that greatly help analyzing the complex nature of protein structure-function relationships. The most important challenge faced by functional genomics and proteomics is to elucidate how the three-dimensional structure of a protein may change in vivo, and directly produce a new functional state. The key to understanding functional switches in proteins is to define how post-translational modifications contribute to the structural plasticity of those proteins and how new structures display new functions. Deciphering the protein functional switches regulated by transitory structural and conformational changes will require collaboration between computational scientists and experimentalists. The purpose of this review is to critically discuss the tools presently available in computational biology to approach these questions.

INTRODUCTION Genome sequencing of different organisms has led to a substantial increase in genome and proteome data in the last two decades. Sequence data of proteins provides useful information about their function, yet additional structural data are required to achieve the goals of structural and functional proteomics. The protein sequence defines the three dimensional structure, which, in turn, determines its function. Solving protein three dimensional structure experimentally is protracted, consequently fewer structures are elucidated than those actually known sequences. Computational (in silico) methods are therefore useful to predict protein structures from those already established. Furthermore, determining protein functions on the basis of structure becomes complex since any given structure is the sum of many different conformations. It is well known that certain post-translational modifications (PTMs) induce conformational changes in protein structures in vivo, resulting in functional switches [1,2]. Such dynamic PTMs can occur either on the same or on neighboring amino acid residues. For example, the O-GlcNAc modification is short-lived, and dynamic like phosphorylation [3]. The temporary changes induced by alternate O--GlcNAc modification and phosphorylation on the same amino acid can cause the protein to switch functionality. Much effort has been devoted to unraveling the structure and function of proteins [4-6], as well as developing methods for visualizing 3-D models and structures of DNA, proteins, and their DNA-protein, protein-protein complexes. At present, in silico, techniques have proved useful for analyzing structural and functional genomics, for studying structure-function relationships of proteins and their post-translational processing. These techniques also suggest how gene products may function in cells, tissues, organs, systems and whole organisms. *Address correspondence to this author at the Institute of Molecular Sciences & Bioinformatics, Lahore, Pakistan; E-mail: [email protected]

1385-2728/08 $55.00+.00

Mutations in DNA sequence bring about changes in the protein sequence, thereby modifying its structure and function, which is the modus operandi of evolution. The complex architecture of a particular protein is described at the primary, secondary, tertiary and quaternary structural levels that are all stabilized by non-covalent interactions of components amino acids. Any alteration in these interactions will affect structure and conformation, having an impact on activation, deactivation, binding, inhibition of a protein complex or in triggering signal transductional events in a cell. Switching from one function to another and back is directly related to the conformations adopted by the protein. Conformational changes of proteins are difficult to assess in vivo. However, in silico approaches are able to supply information about an unknown protein, provided data exists for similar proteins. The simplest concept of one gene – one protein – one function is no longer useful for investigating an ever growing number of proteins that perform more than one function and which are encoded by more than one gene (Fig. 1). Multiple functions performed by proteins may be due to genetic and epigenetic factors, and most importantly to post-translational modifications, differential protein-protein, protein-ligand interactions, differential sub-cellular location and differential tissue expression of the protein. Furthermore, a single gene may encode for more than one protein by differential mRNA splicing and joining of the splice variants (exons) in different combinations [7] (Fig. 1). Similarly, there are many ways to generate multiple genes for a protein or protein domain [7]. An essential part of functional genomics and proteomics is the evaluation of structure-function relationships, and computational methods based on machine learning have been frequently used as tools [8-11]. The application of computer science, informatics and statistics can provide general and specific knowledge about genes and their protein products. Learning methods developed by statisticians rely on the probability theory and statistical methods, and are a practical examples of hidden Markov models applied to predicting © 2008 Bentham Science Publishers Ltd.

958 Current Organic Chemistry, 2008, Vol. 12, No. 11

protein structure [12], but machine learning approaches are inspired by biological systems (such as genetic algorithms, artificial neural networks) or based on the logic of rule learning. A convergence of the two approaches is manifest in hybrid systems where both statistical and computer science are utilized to study pattern recognition [13]. Developing protein structure-function relationships with the help of machine learning is limited by a lack of protein structures and their computer readable forms in the existing biological knowledge, databases and annotations. Thus text mining and automatic inference from free text have made it necessary to utilize machine learning to achieve the goals of functional genomics. Utilizing data mining, a method for association analysis of the sequence environment of PTM sites has recently been developed [11]. Mining Association Patterns among preferred amino acids and the residues targeted for modifications (MAPRes), is a useful technique for large scale analyses of PTM data and for finding preferred amino acid patterns surrounding a PTM site. In this review, we shall describe techniques and some of their implications that are important to achieve the aims of functional genomics and to develop efficient tools that can predict the structure and conformation of multifunctional proteins. We will also emphasize the possible ways whereby proteins are able to perform multiple functions with special reference to temporary structural and conformational changes. GENETICS OF PROTEIN MULTIFUNCTIONALITY The genetic code specifies the position and sequence combination of the 20 standard amino acids in protein synthesis. The substituents of some amino acid residues are added either co- or post-translationally, but before the protein achieves its functional state in a given cellular context. Protein structural information is not only necessary to define function at the individual protein level, but also to establish the functionality of that protein in the cell and in the organism. The way the genetic information flow is handled by the synthetic machinery of the cell is therefore as important as the translation of the primary message (Fig. 1). It follows that functional genomics requires determining the structure of proteins, as well as understanding the effects of genotypic variations in genes and proteins. Studying the nucleotide polymorphism is now in progress for many proteins to determine the effects of genotypic changes on phenotypic expression of gene (Fig. 1). Alternatively spliced variants: Evolutionary changes in gene and/or protein sequence by insertion and deletion provide solutions to suit the survival needs of the organism, but the unique capability of eukaryotes to develop phenotypically plastic insertions and deletions is the result of alternative splicing of genes at the mRNA level, generating different protein products [14,15]. These genotypic variations affect protein structure, function and the resulting phenotype in many ways. For instance, gene splicing may operate by way of exon skipping, exon insertion, alternative 5 initiations, 3 terminations and intron inclusions, and some of these splicing patterns are conserved at the species level [15]. Form-

Nasir-ud-Din et al.

ation of alternative splice variants occurs during expression of about 74% of human multi-exon genes [16]. Spliced products differ in their binding affinity, enzymatic activity, sub-cellular location, stability and patterns of posttranslational modification [17]. In some cases, alternative splicing is so extensive that a single gene may result in thousands of variants. A good example of such phenomenon is depicted by the Dscam gene of Drosophila melanogaster [18], which produces about 36,016 variants/isoforms from alternative splicing of multiple segments in exons 4, 6, 9 and 17 [19]. But in other cases, the spliced segments of mRNA of many different genes are joined to form a multidomain protein or multi-isoforms of same protein with different binding specificities (Fig. 2). Very few 3D structures of two splice isoforms of the same gene have been solved to constitute currently available information in PDB [20]. An insertion or deletion of a small exon resulting in a change in few amino acids may significantly alter a protein structure. For example, the two isoforms of human ectodysplasin (EDA) differ in two amino acid residues, so that each isoform binds to different receptors and triggers distinct signaling pathways [21]. We can therefore conclude that the alternative isoforms constitute a functional switch within a given gene product. Likewise, a deletion may change protein structure and abolish function, as it is the case for the PDZ domain of a protein tyrosine phosphatase where the absence of five residues due to an exon deletion [22]. There are many cases for which no 3D structure is available for comparing isoforms, and building a homology model may be considered useful, but homology models in many cases are not reliable enough to accurately predict the effects of minor changes. PROTEIN PHYLOGENY AND MULTI FUNCTIONALITY The first 3D model was published in 1960 (whale myoglobin [23]) and the 3D structures published since have been steadily growing in numbers and sophistication of structural detail. Thanks to spectacularily improved experimental techniques, yielding more than 49000 structures of DNA, proteins and their complexes at the end of February 2008 (www.rcsb.org), more than 4000 unreleased entries are processed to become part of the RCSB/PDB database (www.rcsb.org). Protein sequence and structure in evolution: Knowledge of protein 3D structure showed that proteins with divergent sequences possess the same 3D structures, and that deletions and insertions may result in an overall conserved protein fold [24-28]. Different mutagenesis studies revealed that many proteins retain their activity [29,30] and stability [31] regardless of the multiple mutations at many positions in the sequence. These findings prompted the development of structure alignment and comparison methods, emphasizing the key structural residues and their clusters [4, 32-36]. Hence a very small number of key residues at specific positions in the sequence are responsible for a given 3D structural folding obtained with divergent sequences. Structural similarity is thus extremely helpful to reveal the distant relationship of evolutionarily divergent proteins, an information which is impossible to obtain by comparing only the protein sequences. Approximately 40-60% distantly related proteins

Developing Protein Structure-Function Relationships in silico

Current Organic Chemistry, 2008, Vol. 12, No. 11 959

Fig. (1). Genetic information flow from gene to protein. The three possible ways can be through 1) one gene (containing multiple exons) resulting in coding multiple proteins by alternative mRNA splicing, 2) one gene coding only one protein, or 3), multiple genes (containing either single or multiple exons) coding multiple domain containing proteins.

Fig. (2). Different ways of alternative splicing of mRNA. In a schematic way green boxes represent exons in mRNA joined by introns. There are three different manners by which exons join in diverse combinations of processed mRNA coding for diverse proteins: 1) exon deletion, 2) exon insertion and 3) intron inclusion. Exon deletion implies alternative 5 initiation, when there are multiple promoters present, and alternative 3 termination resulting in diverse translated proteins.

with 20-35% sequence identity, described as the ‘Twilight Zone’ [37], can easily be compared simply by protein sequence alignment methods [38, 39]. The remainder of the proteins, however, will require structural comparison for their identification [40]. It is therefore so that structural information has become necessary to identify the distant homologues of any given protein, and carry out similarity search algorithms. Structure-function changes by mutations: Most of the 140 known protein domains can be traced back to ancestral

domains which have been conserved during evolution [41]. Thus conservation of structure between related proteins is a common trend, which may not necessarily be true for the functions of related proteins. Functional divergence can be observed when similar protein structures perform different functions. On the other hand a functional convergence may result in proteins performing similar functions with different structures [42]. However, point mutations do not frequently affect protein structure significantly, in contrast to insertions and deletions that usually produce significant changes in

Fig. (3). The are two hypotheses of protein structure evolution. Divergent (left) and convergent (right) evolution. In divergent evolution different structural elements, that is helices, strands, and coils are the result of divergence of a complex structure into simpler forms, whereas the hypothesis of convergent evolution suggests that a complex protein structure is the result of convergences of simpler structural units. There are many more examples in support of convergent evolution.

960 Current Organic Chemistry, 2008, Vol. 12, No. 11

protein structure [24]. Proteins can accommodate insertions and deletions in their structures, and insertions may even result in functional improvement, the usual course of evolution for functional proteins. Protein domains-the evolutionary units: Presently, the known groups of protein folds are about 900, and several proteins consist of one or more structural domains, with each domain having a specific 3D folding [43]. Many common domains occur several times in different proteins with different combinations that perform different sets of functions. Consequently, a protein can perform multiple functions with activation of one or more domain(s) and inactivation of others, or simply through structural and/or conformational changes within these domains regulating for instance interactions between receptor and ligand, or enzyme and substrate. Proteins with very different sequences may therefore fold into similar structures and the resulting domains become an important structural, functional and evolutionary unit as well. The definition of a domain as an evolutionary unit has been used in the structural classification of proteins (SCOP) database [44]. The hierarchy of SCOP describes taxonomic analogies at different levels including; species, protein family, superfamily, fold and class. The different levels of protein domains grouped in SCOP describe the relationships of their sequences, structures and functions, but SCOP shows its limitations when describing the temporary structural and conformational changes related to the functional switch of a protein domain or whole protein. The development of prediction methods for temporary structural and conformational changes in proteins induced by specific PTMs will require organized data to predict multi-functional regulations in proteins. Evolution: Divergent or Convergent: Possibilities of a convergent, or of a divergent evolution have been suggested to explain the phylogeny of protein families (Fig. 3). A convergent evolution emphasizes that similarities of protein structure arose independently without any evidence for common ancestry, while divergent evolution holds that different proteins in different organisms have diverged from a common ancestor protein. Divergent evolution is based on the observation that each copy of an ancestor protein in different species, regardless of point mutations, deletions, and insertions of amino acids, generally exhibits a similar 3-D fold, and consequently a similar function [24,45]. Thus the divergent evolution model implies that i) the ancestor of a protein is its closest structural homologue, and ii) the oldest proteins have largest number of descendants. Directed Evolution: The determination of protein structure-function relationships is undoubtedly dependent on the protein sequence, and generation of altered protein sequences is frequently carried out experimentally to confirm or exclude the functional contribution of an individual amino acid. Production of “adequate” variants through directed mutations is actually similar to the process of natural evolution, where diverse natural mutants are tested against the demands of optimal adaptation. Directed evolution is not only a tool to produce proteins with improved or altered properties, but it is also a useful technique to define and investigate the relationships of individual amino acids with the structure and function of protein [46,47]. The difficulty in

Nasir-ud-Din et al.

finding the contributions of individual residues to protein structure and function arises when a large number of mutants are to be evaluated. Additionally, the contribution of an individual amino acid to the structure of the protein is also dependent on its neighboring amino acids and their specific side-chain(s), shape, charge, size or polarity. It has been documented that directed evolution can act as a way to improve protein stability by inducing mutations that strengthen inter-domain interactions or introduce mutations that rigidify the individual domains without altering the flexibility between different domains [48-50]. Similarly, such approaches have also been useful to determine the amino acids involved in proteinprotein interaction. Selected technical improvements have been applied in this field, but with limited success [51]. FROM SEQUENCE TO STRUCTURE AND FUNCTION Once gene/protein sequence is established, the structure and function of a gene and its protein product can be explored. It involves predicting the secondary structure, determining the positions of helices and strands, and finally predicting the three-dimensional folding and threading of structure. Pairwise and multiple sequence alignment methods, and algorithms: The first step in predicting the protein structure is finding the homologous sequences in the databases through homology search. BLAST (Basic Local Alignment Search Tool) search [39] is the most often utilized tool to search the databases for the homologous gene/protein sequences. PSI-BLAST utilizes different matrices for aligning the sequences from the selected databases. Protein structure prediction by homology modeling involves finding the structure template. Template search from structure database utilizing PSI-BLAST also relies on pairwise sequence alignment. There are many other methods that are superior to PSIBLAST, but utilization of these methods is inadequate because of their stand-alone status, discouraging the developers from comparing directly the alignment details with those of PSI-BLAST. Other difficulties may include some very uncommon input or output formats, poor program design, compilation and other difficulties. There are still other programs that are available only as web servers and cannot process a large test on web servers. In addition to pairwise sequence alignment, multiple sequence alignment methods are also equally important for predicting the protein structure, function and phylogeny in sequence analysis. When a phylogenetic relation of a gene or protein has to be determined, multiple alignment is a common approach, but these multiple alignment methods rely on pairwise sequence alignments. A number of advancements have been made to increase the alignment accuracy and develop the ability to align thousands of proteins with a sufficient flexibility in comparing distant proteins. The web resources available for protein sequence alignments are summarized in Table 1. Despite many other new and valuable methodologies now available for multiple alignment, CLUSTALW remains the most frequently utilized alignment tool (Table 1). SCOP is another method frequently used for assessing remote homologies [44]. SCOP is rather convenient for this

Developing Protein Structure-Function Relationships in silico

Current Organic Chemistry, 2008, Vol. 12, No. 11 961

Table 1. Different DNA/Protein Sequence Alignment/Analysis Tools Name

Function

URL

PRC, the Profile Comparer

Stand-alone program for aligning and scoring two profile hidden markov models

http://supfam.org/PRC/

TMMOD QC-COMP CASA

A server for comparison of Profile Hidden Markov Models Based on Consensus sequence

http://liao.cis.udel.edu/website/webscripts/frame. php?p=servers

BLAST

An algorithm for searching and aligning the similar DNA/Protein sequences from databases

http://www.ibt.lt/bioinformatics/iss/

BioInfoBank Meta Server

Gateway to well-benchmarked protein structure and function prediction methods

http://meta.bioinfo.pl/submit_wizard.pl

Fold Prediction Metaserver

Gateway to various methods for protein structure prediction

http://meta.bioinfo.pl/submit_wizard.pl

Pcon

a consensus fold recognition predictor

http://www.bioinfo.se/pcons/

MUSCLE

Protein multiple sequence alignment software

http://www.drive5.com/muscle/

Palign

The ProteinALIGNment

http://www.bioinfo.se/palign/

TCoffee

A collection of tools for computing, evaluating and manipulating multiple alignments of DNA and protein sequences and structures

http://www.igs.cnrsmrs.fr/Tcoffee/tcoffee_cgi/index.cgi

Kalign

A fast and accurate multiple sequence alignment tools

Kalignvu

An lightweight viewer for multiple sequence alignments and phylogenetic trees

Mumsa

A program to assess the quality of multiple sequence alignments

Dialign

software program for multiple alignment

http://bibiserv.techfak.uni-bielefeld.de/dialign/

MuSiC

A Tool for Multiple Sequence Alignment with Constraints

http://genome.life.nctu.edu.tw/MUSIC/

Lobster

Protein sequence analysis software

http://www.drive5.com/lobster/

SVM-BALSA

The Bayesian Algorithm for Local Sequence Alignment

http://www.bioinfo.rpi.edu/applications/bayesian/ balsa/manual/balsa.html

PROBCONS

Probabilistic Consistency-based Multiple Alignment of Amino Acid Sequences

http://probcons.stanford.edu/

http://msa.sbc.su.se/cgi-bin/msa.cgi

MAFFT

multiple sequence alignment program

http://align.bmr.kyushu-u.ac.jp/mafft/software/

ALTAVIST

Alternative Alignment Visualization Tool

http://world.altavista.com/

Jalview

A multiple alignment editor written in Java

http://www.jalview.org/

ProDA

software for multiple alignment of protein sequences with repeated and shuffled elements

http://proda.stanford.edu/

purpose, as nearly all superfamily pairs (i.e. two proteins from different families, but the same super-family) represent remote protein homologies. Thus we can select various sets of structural pairs by choosing one per fold or super-family pair. But using SCOP may also lead to some problems as in some super-families, even folds are likely to be related to each other. In summary, there are many ways to identify and align sequences and structures, but the alignment accuracy depends on the alignment methods. However, the precision of the identification method depends on the scoring system and the estimated statistical significance of the alignment. STRUCTURAL MODELS OF PROTEINS With the completion of the genome sequencing projects ranging from bacteria and yeast to man, a large amount of

gene and protein sequence data has been accumulated. This data influx will keep increasing with several genome sequencing projects currently under way and within a few years, complete genomes of more than hundred species will be sequenced. The main goal of genome sequencing of different species is to achieve the first step of functional genomics, namely to determine the function of the encoded proteins. The eventual target of functional genomics will be achieved by solving the structure-function dynamics of proteins, providing pertinent answers to different molecular processes both in normal and pathological conditions. Even though complete genome sequencing has almost become a standard undertaking, experimental determination of the 3D structure of the proteins remains laborious. For instance, determining the 3D structure of a membrane protein is still facing major difficulties, making it clear that the

962 Current Organic Chemistry, 2008, Vol. 12, No. 11

Nasir-ud-Din et al.

Table 2. Protein-protein Interaction Analysis and Prediction Methods APID

Agile Protein Interaction Data Analyzer

http://bioinfow.dep.usal.es/apid/index.htm

APID2NET

Exploration and analysis of interactome networks at systems level

http://bioinfow.dep.usal.es/apid/apid2net.html

Cons-PPISP

Consensus neural-network Protein-Protein Interaction Site Predictor

http://pipe.scs.fsu.edu/ppisp.html

FastContact

A Free Energy Scoring Tool for Protein-Protein Complex Structures

http://structure.pitt.edu/servers/fastcontact/

Interprets

Protein-Protein Interaction Prediction through Tertiary Structure

http://www.russell.embl.de/cgi-bin/interprets2

InterProSurf

Prediction of Functional Sites in Monomeric Protein Surface

http://curie.utmb.edu/prosurf.html

Potential Interactions of Proteins

Web Server for Potential Protein-Protein Interactions of Human, Rat and Fission Yeast Proteins

http://bmm.cancerresearchuk.org/~pip/

P.R.I.S.M

Web-server used to explore protein interfaces and predict proteinprotein interactions

http://gordion.hpc.eng.ku.edu.tr/prism/

SPPIDER

Solvent accessibility based Protein-Protein Interface iDEntification and Recognition

http://sppider.cchmc.org/

structure determination step is the bottleneck of protein characterization. The number of sequences with 3D structure solved is understandably less than that of protein sequences, and powerful computing techniques, quantum and statistical mechanics, protein models and function simulators are much needed to correct this discrepancy. It is widely accepted that the native, functional 3D structure of a protein should have the lowest free energy for its unique amino acid composition. In silico studies for prediction of protein structure have been pursued by computational biologists for many years with the aim of obtaining a 3D model of a protein following processing and evaluation of its amino acid sequence (Table 2). Unfortunately, as long as the precise energetic determinants and parameters of protein folding are not known, in silico protein structure prediction should remain ineffective. Interestingly however, experimental studies performed to solve protein structure following in silico procedures have showed some outstanding similarities, as well as some differences, supporting the theory of convergent evolution for protein structure. MOLECULAR DESIGN OF PROTEIN-PROTEIN INTERACTIONS Proteins often work as multimeric complexes and maintain a high degree of binding specificity for the structures they interact with. This ability of a protein to multiply interact in a non-covalent manner formulates the basis of the functional architecture of the cell. For example, signal transduction in many physiological and pathological conditions operates by way of such protein-protein interactions [52]. Potential binding sites are predetermined in unbound proteins: Like enzymes, proteins contain active sites that interact non-covalently with corresponding sites in other proteins. Computational studies to locate potential binding sites in the unbound structure of the protein involve analyzing properties and parameters of proteins such as solvation potential, amino acid composition, conservation, electrostatics and hydrophobicity. None of such properties alone has high predictive power, but a combination of some of them resulted in relatively encouraging predictive output. Fur-

thermore, all amino acids in the protein binding interface do not have the same importance. Some interface residues act as hot spots and significantly contribute to binding while several other amino acids only have a marginal contribution [53]. It was recently shown that the residues in hot spots areas are often predetermined in the unbound protein state [54]. Specificity of Protein Binding: Proteins perform multiple functions through their varying binding affinities to other proteins. Affinity is not sufficient for protein binding and a degree of specificity is also required such as for interactions of antibodies with protein antigens. Identifying the amino acids involved in protein binding and determining the relative specificities of different potential amino acids could help selecting the best binding activity of two proteins. A very good example for variable binding specificities of the same site with two different ligands is exhibited by Src homology 3 (SH3) domains which interact with both type I and type II polyproline ligands as in case of binding the Fyn SH3, with the two polyproline ligands (type I & type II) being governed by local conformational change of the SH3 binding pocket containing conserved Trp with two different orientations [55]. Amino Acid Composition of Binding Sites: Protein binding sites are commonly located in hydrophilic cavities with specific amino acid composition. For instance, the antigen binding fragment Fab-YADS, recognizing vascular endothelial growth factor (VEGF), from a library in which the diversity is restricted to four amino acids (Tyr, Ser, Ala, Asp), was assessed for its amino acids required for high-affinity binding. The Fab-antigen complex studies revealed that the side chains of the abundant Tyr residues predominated in the binding [56]. Protein Binding Sites are Dynamic: Binding diversities of a protein depend on structural and conformational rearrangements of the unbound proteins. Such changes in the binding sites mainly involve only small movements of loop regions and side-chain rearrangements, probably through side chain modification by a charged group. However, major

Developing Protein Structure-Function Relationships in silico

movements of protein backbone are observed in a small number of proteins. A most recent computational study has explored many protein complexes undergoing structural/conformational changes and suggested that the changes calculated for monomers correlated with those experimentally observed in bound complexes [57]. Thus the preexisting equilibrium of different structures or conformations acts as a selection mechanism for protein–protein interactions [57]. TISSUE AND ORGANELLE-SPECIFIC VARIATIONS IN PROTEIN FUNCTION The subcellular location of a protein determines its functions and an emerging field, sub-cellular proteomics [58], combines traditional biochemical fractionation techniques to isolate different sub-cellular compartments with peptide mass spectrometry to identify the constituent proteins [59, 60]. A good example of sub-cellular specific functional variation is illustrated by PutA which performs two functions in two different cellular locations. In Salmonella typhimurium, the PutA protein performs dual functions and acts as a proline dehydrogenase and pyrroline-5- carboxylate dehydrogenase in association with the plasma membrane, but its enzymatic activity is lost in the cytoplasm, where it acts as a transcriptional repressor by binding to DNA [61,62]. POST-TRANSLATIONAL MODIFICATIONS AND PROTEIN STRUCTURE-FUNCTION RELATIONSHIPS Amongst different post translational processing events, covalent substitution on the –OH, -NH2, and -COOH groups of amino acid side chains, on termini by phosphate, sugars, methyl, acetyl groups, or on proteins in form of ubiquitination or sumoylation, endows proteins with their specific functional characteristics. For example phosphorylation on the –OH group of Ser/Thr/Tyr may result in structural or conformational changes, leading to specific new functions [1,2]. Whether such changes in proteins are minimal [1] or important [2], phosphorylation in both cases is decisive for the regulation of protein function. Similarly, protein glycosylation is important in protein folding [6] and may also result in structural or conformational changes and thereby regulate function [6]. The dynamic nature of most of these PTMs results in temporary changes [3, 63]. A complex interplay of different PTMs on the same or neighboring amino acid residues provides proteins with the capacity to sequentially perform different functions [63-66]. A number of in silico approaches have resulted in developing different PTMs prediction and analysis methods which are useful for experimentalists to select experimental procedures to further define the diverse functions of proteins regulated by PTMs (Table 3). Glycosylation: The covalent linkage of carbohydrates to linear polymers is catalyzed by the family of glycosyltransferases, and results in the formation of glycoproteins, glycolipids and proteoglycans (glycoconjugates). In glycoproteins, the oligosaccharide chains (glycans) are linked to different amino acids through –NH2 groups in side chain of Asn (N-linked glycosylation) or with –OH groups of Ser/Thr and hPro and hLys (O-linked glycosylation). The glycoproteins become functional only after full glycosylation. Glycosylation regulates protein folding, localization, trafficking, solu-

Current Organic Chemistry, 2008, Vol. 12, No. 11 963

bility, antigenicity, half-life, and cell-cell interactions [6769]. Glycan synthesis involves stepwise addition of monosaccharides catalyzed by multiple glycosyltransferases in different cellular locations. In man, more than one hundred glycosyltransferases in the endoplasmic reticulum (ER) and Golgi apparatus carry out the precise synthesis of glycans destined to be added to proteins, lipids or proteoglycans. The study of glycosyltransferase families has been termed glycogenetics and the genes encoding glycosyltransferases have been named glycogenes. Like genome and proteome, the repertoire of glycans from glycoconjugates in tissues or organisms is known as the glycome. The research in glycomics is involved with the study of glycan structures and functions which are far more complex than those of protteins. Glycan biosynthesis is more like a chemical reaction and neither requires a template (as in replication and transcription) nor does it need to be proofread (as in translation), but a competitive substrate and different possibilities of linkages and anomery result in microheterogeneity and diverse glycoforms. Investigating glycan structure to find their covalent and 3D structures, their dynamics and interactions, is a difficult task. Support from different computational tools and databases is mandatory to interpret the glycan experimental data. Glycans are important at all levels of functional regulation in the cell and constitute one of the main components of genes, serving as backbone in DNA (deoxyribose) and ribose in RNA (ribose), and providing the linkages to PO4 and nitrogenous bases through its different –OH groups. The functional differences of DNA and RNA are due to differences in their chemical composition of a single –OH on C2 of the monosaccharide structure. For oligosaccharide chains attached to proteins, the chemistry is even more complex, as different positional and stereo isomers of oligosaccharides provide with many combinations of structures and memory for oligosaccharides in cellular recognition processes. Thus deciphering the function of glycan has become an important area in post-genomic research. The Glycogenes involved in the oligosaccharide synthesis are important as the biosynthesis of a complex oligosaccharide structure is dependent upon the properly ordered set of glycosyltransferases and their isoforms that guarantee sequential linkages of specific incoming monosaccharides with correct linkage type and anomery. For example, addition of a sugar N-acetylgalactosamine (GalNAc) to the –OH group of Ser/Thr is regulated by 25 different glycogenes and almost 15 isoforms [70]. The uniqueness of Gal/GalNAc compared to other sugars is probably due to axial position of the –OH group on C4 that cannot be the same for any other sugar. A striking example is that of the human prion peptides which upon modification of Ser132 and Ser135, acquire opposite structural and functional characteristics [6]. An addition of GalNAc results in transition of coiled structure to a  sheet (Fig. 4). Besides substrate specificity of glycosyltransferases, a major reason for glycan diversity is to be found in competitive binding of different glycosyltransferases to different substrates, donor supplying systems and other complexes regulating organelle specific allocation of transferases. Finally, a glycan may be in form of a single sugar residue attached to Ser/Thr such as GlcNAc both in  and  anomery. When the attached sugar is O--GlcNAc, it is as dynamic as phosphorylation [reviewed in 3]. A number of nuclear and

964 Current Organic Chemistry, 2008, Vol. 12, No. 11

Nasir-ud-Din et al.

Table 3. PTMs Prediction/Analysis Tools Tool

Function

URL

ChloroP

Prediction of chloroplast transit peptides

http://www.cbs.dtu.dk/services/ChloroP/

LipoP

Prediction of lipoproteins and signal peptides in Gram negative bacteria

http://www.cbs.dtu.dk/services/LipoP/

MITOPROT

Prediction of mitochondrial targeting sequences

http://ihg.gsf.de/ihg/mitoprot.html

PATS

Prediction of apicoplast targeted sequences

http://gecco.org.chemie.unifrankfurt.de/pats/pats-index.php

PlasMit

Prediction of mitochondrial transit peptides in Plasmodium falciparum

http://gecco.org.chemie.unifrankfurt.de/plasmit/index.html

Predotar

A prediction service for identifying putative N-terminal targeting sequences

http://urgi.versailles.inra.fr/predotar/predotar.htm l

PTS1

Prediction of peroxisomal targeting signal 1 containing proteins

http://mendel.imp.ac.at/mendeljsp/sat/pts1/PTS1 predictor.jsp

SignalP

Prediction of signal peptide cleavage sites

http://www.cbs.dtu.dk/services/SignalP/

DictyOGlyc

Prediction of GlcNAc O-glycosylation sites in Dictyostelium

http://www.cbs.dtu.dk/services/DictyOGlyc/

NetCGlyc

C-mannosylation sites in mammalian proteins

http://www.cbs.dtu.dk/services/NetCGlyc/

NetOGlyc

Prediction of O-GalNAc (mucin type) glycosylation sites in mammalian proteins

http://www.cbs.dtu.dk/services/NetOGlyc/

NetGlycate

Glycation of epsilon amino groups of lysines in mammalian proteins

http://www.cbs.dtu.dk/services/NetGlycate/

NetNGlyc

Prediction of N-glycosylation sites in human proteins

http://www.cbs.dtu.dk/services/NetNGlyc/

OGPET

Prediction of O-GalNAc (mucin-type) glycosylation sites in eukaryotic (non-protozoan) proteins

http://ogpet.utep.edu/OGPET/

YinOYang

O-beta-GlcNAc attachment sites in eukaryotic protein sequences

http://www.cbs.dtu.dk/services/YinOYang/

big-PI Predictor

GPI Modification Site Prediction

http://mendel.imp.ac.at/sat/gpi/gpi_server.html

GPI-SOM

Identification of GPI-anchor signals by a Kohonen Self Organizing Map

http://gpi.unibe.ch/

Myristoylator

Prediction of N-terminal myristoylation by neural networks

http://ca.expasy.org/tools/myristoylator/

NMT

Prediction of N-terminal N-myristoylation

http://mendel.imp.ac.at/myristate/SUPLpredictor. htm

CSS-Palm

Palmitoylation site prediction with CSS

http://bioinformatics.lcd-ustc.org/css_palm/

PrePS

Prenylation Prediction Suite

http://mendel.imp.ac.at/sat/PrePS/index.html

NetAcet

Prediction of N-acetyltransferase A (NatA) substrates (in yeast and mammalian proteins)

http://www.cbs.dtu.dk/services/NetAcet/

NetPhos

Prediction of Ser, Thr and Tyr phosphorylation sites in eukaryotic proteins

http://www.cbs.dtu.dk/services/NetPhos/

NetPhosK

Kinase specific phosphorylation sites in eukaryotic proteins

http://www.cbs.dtu.dk/services/NetPhosK/

NetPhosYeast

Serine and threonine phosphorylation sites in yeast proteins

http://www.cbs.dtu.dk/services/NetPhosYeast/

Sulfinator

Prediction of tyrosine sulfation sites

http://ca.expasy.org/tools/sulfinator/

Sulfosite

Prediction of tyrosine sulfation sites

http://sulfosite.mbc.nctu.edu.tw/

SUMOplot

Prediction of SUMO protein attachment sites

http://www.abgent.com/doc/sumoplot

Terminator

Prediction of N-terminal modification

http://www.isv.cnrs-gif.fr/terminator2/index.html

NetPicoRNA

Prediction of protease cleavage sites in picornaviral proteins

http://www.cbs.dtu.dk/services/NetPicoRNA/

NetCorona

Coronavirus 3C-like proteinase cleavage sites in proteins

http://www.cbs.dtu.dk/services/NetCorona/

Developing Protein Structure-Function Relationships in silico

Current Organic Chemistry, 2008, Vol. 12, No. 11 965 Table 3. Contd….

Tool

Function

URL

ProP

Arginine and lysine propeptide cleavage sites in eukaryotic protein sequences

http://www.cbs.dtu.dk/services/ProP/

OGLYC

Identification of O-glycosylation sites in mammalian proteins

http://www.biosino.org/Oglyc

DISPHOS

Disorder-Enhanced Phosphorylation Sites Predictor

http://core.ist.temple.edu/pred/pred.html

SCANSITE

Prediction of phosphorylated site by specific kinase

http://scansite.mit.edu/

KinsaePhos

Identification of protein kinase-specific Phosphorylation sites

http://kinasephos.mbc.nctu.edu.tw/

MAPRes

Mining Association Patterns among Preferred Amino Acid Residues

http://imsb.edu.pk/mapres/

cytoplasmic proteins are modified by this dynamic glycosylation in form of O--GlcNAc, catalyzed by O-GlcNAc-transferase (OGT), thus regulating replication, transcription and translation [reviewed in 71]. The diversity in functional regulation by O-GlcNAc modification becomes more effective when this modification alternates with phosphorylation on the same or neighboring –OH group of Ser/Thr. This complex interplay of O-GlcNAc modification with phosphorylation often regulates the protein functions in a Yin Yang manner (the Yin Yang hypothesis) and the amino acids involved in the interplay are called Yin Yang sites. This interplay of O-GlcNAc modification and phosphorylation is regulated by the concentration, availability ratio and activation of the enzymes catalyzing the modifications (glycotransferases and kinases), and those enzymes removing these modifications (O-GlcNAcase, OGT and phosphatases) [reviewed in 3]. The number of proteins modified by dynamic O--GlcNAc is increasing but the most important information lacking in this respect is the small number of sites mapped experimentally, a main hindrance in developing an efficient prediction tool for this modification. It is expected that functional regulations of proteins by O-GlcNAc will be much more numerous than expected. Phosphorylation: Phosphorylation, a dynamic protein modification on the –OH group of Ser/Thr/Tyr, is catalyzed by many different kinases, which often results in temporary structural/conformational changes in proteins regulating functional switches [1,2]. Addition of a phosphate group to the –OH group of Ser/Thr/Tyr appears to be a simple chemical reaction, but the process occurs in a regulated and coordinated manner. Although a phosphate donor (ATP) is common to all phosphorylation reactions, the specificity lies in correct recognition of the substrate by a specific kinase among hundreds, all requiring highly organized and regulated contexts. The addition of a negatively charged phosphate group to a polar side chain containing amino acid residues (Ser/Thr/Tyr) changes the conformation, transforming the hydrophobic part of a protein into a polar and highly hydrophilic one (Fig. 5a and b). This is how the functional state of a protein can be modified by changing the intramolecular hydrophilic and hydrophobic properties in a specific domain of the protein, and resulting in a new conformation (Fig. 6a-c). Conformational changes in protein structure induced by phosphorylation regulate cellular functions as diverse as cell growth, initiation of signal transduction, activation or inacti-

vation of an enzyme or simply blocking the other dynamic PTM on –OH groups of Ser/Thr. A good example for a regulatory role of phosphorylation is the p53 tumor suppressor protein which is heavily phosphorylated with over 18 different phosphorylation sites [5]. Phosphorylaton of p53 activates the protein and leads to cell cycle arrest, but dephosphorylation of specific p53 amino acids results in apoptotic cell death [72]. This happens when cells are damaged or malfunctioning. As phosphorylation is temporary and phosphate groups easily removed by phosphatases [reviewed in ref. 3], the –OH group of Ser/Thr becomes available for sugars to replace the phosphate. Additionally, the effect of phosphorylation on Tyr may result in functional regulations other than those of Ser/Thr (Fig. 5b). The possible explanation for this may be due to the resonating aromatic ring of Tyr. The diversity in functional regulations by phosphorylation ranges from mitogenic [73] and growth stimulatory or inhibitory effects [74] to signal transductional pathways [65,75,76], cytoplasmic and nuclear functional regulations [52, 63-66, 76]. The efficiency of functional regulations by phosphorylation lies in the easy removal of the phosphate group by dephosphorylation [52]. Sometimes phosphorylation also blocks other dynamic PTMs (such as O--GlcNAc modification) on –OH group of Ser/Thr [65] or for acetylation or methylation, on –NH2 group of Lys or Arg [63]. Acetylation: Acetylation and phosphorylation are amongst the most studied functional PTMs on Ser, Thr and Tyr residues, and next to phosphorylation, acetylation is the second most frequent PTM in proteins. Acetylation is the addition of an acetyl group to an N-terminal amino group and to those in the side chains of lysine and arginine. It took time, since the identification of acetylases and deacetylases, to establish that acetylation is a dynamic and regulatory modification similar to phosphorylation. Additionally, diversity in protein sequence recognition by acetylases makes acetylation comparable to phosphorylation. Nonetheless, the consequences of acetylation need to be clarified in many different proteins, in particular how acetylation competes with phosphorylation and which are the functional implications. Extended studies of specific inhibitors of acetylases are necessary to map the different in vivo pathways regulated by acetylation and achieve an understanding comparable to that of phosphorylation. Acetylation of –NH2 group of the side chains of lysine residues located at the N-terminal region of histone proteins

966 Current Organic Chemistry, 2008, Vol. 12, No. 11

Fig. (4). Transition of 3D structure in prion peptide occurs from coil to  sheets by sugar modification [6]. Addition of -GalNAc on Ser 132 favors conversion of coiled structure to a  sheet that in turn enhances polymerization of prion peptides more rapidly than the recombinant non-glycosylated form, whereas addition of same -GalNAc to Ser135 produces the reverse effect by inhibiting polymerization of prion peptide to form amyloid fibrils [6]. Though the whole structure of the peptide remains almost the same, specific structural changes (coil to  sheets) occur in prion peptide which affect protein binding required to form amyloid fibrils. This figure was produced by utilizing the PDB entry 1B10.

Nasir-ud-Din et al.

Fig. (5). (a). Conformational changes induced by phosphorylation may be subtle and only change the local conformations of the protein structure as in case of SpoIIAA, a bacterial sporulation regulator. Phosphorylation on Ser 57 results in conformational changes of the area of phosphorylation that are evident in 2nd and 3rd beta sheets of non phosphorylated to phosphorylated form resulting in different binding specificities. (b). Similar local conformational changes in protein structure may occur in protein by Tyr phosphorylation. Human Feline Sarcoma Viral Oncogene Homologue (v-FES) proteins with Tyr in phosphorylated (yellow) and non-phosphorylated (blue) are super-imposed. Phosphorylated Tyr is shown in red. The super imposition of the two structures of PDB, 3bkb and 3cd3, show as a whole conservation of the protein fold but phosphorylation of Tyr in area is actually a binding site for the SO4 ion as in 3bkb (Blue), but the conformation of this binding site changes by incoming phosphate on Tyr as in 3cd3 (yellow).

Fig. (6). Structural switches can be spectacular, as in the case of the K+ channel gating control by protein phosphorylation. The PDB entries 1b4g and 1b4I show the crystal structure of the inactivation gate domain of the human voltage-gated potassium channel subunit. These two structures are shown in backbone style and phosphorylated Ser in wire frame mode. When Ser 8 (near N-Terminal) is phosphorylated, shown in wire frame style, the overall fold does not changes much (a), but phosphorylation at Ser 15 and 21 (near C-Terminal) results in a compact fold of inactivation gate domain of human voltage-gated potassium channel subunit (b). Though in both (a) and (b) there are different possible conformations, the compaction in structural fold of ‘b’ is quite spectacular. Superimposing of the c-alpha trace of the two structures (in blue and yellow) with phosphorylated Ser 8 (green) and with phosphorylated Ser 15 and 21 (cyan) shows a clear difference of the two structures (c).

Developing Protein Structure-Function Relationships in silico

neutralizes the positive charges, thereby reducing the affinity between histones and DNA, leading to uncoiling of chromatin and easier access to promoter regions for RNA polymerases and transcription factors [63]. Hence, histone acetylation generally enhances transcription while histone deacetylation represses transcription [63]. Several different forms of acetyl-transferases and deacetylases have been identified, regulating acetylation and deacetylation of histones. Acetylation of different site-specific, DNA-binding transcription factors usually takes place adjacent to the DNAbinding domain and acetylation in this region results in their transactivation and DNA binding [77-80]. In contrast, when the acetylation sites are located within DNA-binding domains, it abolishes their DNA-binding capacity. Thus acetylation can either enhance or inhibit binding of transcription factors to DNA. In addition to regulating protein-DNA binding, acetylation may also regulate protein–protein interactions [81]. Moreover, acetylation could also regulate protein stability, as an acetylated protein has a longer half-life [80]. Likewise, microtubule stability is regulated by -tubulin acetylation [82], but this stabilizing effect of acetylation remains to be established for other proteins [83]. Thus, acetylation and phosphorylation are effected by acetylases and kinases and both modifications induce reversible structural and conformational changes that regulate nuclear and cytoplasmic events. Methylation: Protein methylation is another covalent modification occurring on –COOH group of Glu and Asp or on the side-chain –NH2 group of Lys and Arg residues [84]. It has been suggested that protein methylation affects signal transduction and RNA metabolism [85,86] but the precise role of protein methylation remains unknown. Histones H3 and H4 were the first proteins shown to be methylated on different Lys residues [87, 88]. In addition to histone methyltransferases, some members of the protein arginine methyltransferase family can also methylate histones in vitro [86]. With the exception of histone H3 lysine methylation in transcriptional regulation [89,90], gene regulations by methylated proteins are not as complex as with phosphorylation and acetylation. Arginine can be either mono- or dimethylated in both symmetric and asymmetric configurations. Similarly, Lys methylation on the -N of side chain amino group can also occur as mono-, di-, or tri-methylated forms. Additionally, histone H3 methylation has been investigated for different and even opposite functional regulations in combination with other PTMs [63]. Thus protein methylation may also generate multifunctionality in certain proteins. Carboxyl Group Modifications: Two types of carboxyl group containing amino acids undergo different PTMs such as the –COOH group of C-terminal amino acids and of Asp and Glu side chains. Modifications at carboxyl function of amino acids at C-terminus include O-methylation, glypiation (GPI anchor addition), amidation, ubiquitination and sumoylation [91-93]. Modification of –COOH group of Asp include O-phosphorylation, O-methylation and isoaspartic acid formation whereas that of Glu include O-methylation, ADP-ribosylation, pyroglutamate formation and -carboxylation [91-93].

Current Organic Chemistry, 2008, Vol. 12, No. 11 967

A prevalent PTM on carboxyl groups is methyl ester formation [94]. The two enzymes associated with this type of modification are the carboxyl methyltransferase (protein Omethyltransferase and S-adenosyl-L-methionine protein Omethyltransferase) [95]. The former is a bacterial enzyme responsible for catalyzing the methylation of chemoreceptors at glutamyl residues as an adaptive response to sensory stimuli [95,96], whereas the latter enzyme is found in both prokaryotes and eukaryotes with a wider substrate specificity. In human erythrocytes, the methyl esters formation at Asp residues is very dynamic [97]. Similarly, dynamic methyl ester formation has also been documented in specific cytoskeletal and membrane proteins [98,99]. The S-adenosylmethioninedependent methyltransferase catalyzes methyl esterification at carboxyl groups of amino acids located at the C-terminus of certain proteins [100-102] or on side chain –NH2 group of Lys and Arg [103]. The methylesters are usually hydrolysed by esterases making protein methyl esterification reversible, and demethylation may result in the formation of methanol [104]. Moreover, the degree of methylation also depends on attractants binding to their receptors [104] and reversible methylation at specific Glu residues in membrane-receptor proteins provides an essential adaptive function in bacterial chemotaxis [105]. N-terminal pyroglutamate (pGlu) formation from its glutaminyl precursor is an important PTM controlling conformation, receptor binding and receptor protection from Nterminal exopeptidase degradation. The pGlu has extensively been studied in human growth hormone [106] and tissue plasminogen activator [107]. DYNAMIC PTMS REGULATE PROTEIN STRUCTURAL AND FUNCTIONAL CHANGES Protein multifunctionality is often regulated by the interplay of different covalent and post-translational modifications on the same or neighboring residues [108]. Additionally, one PTM may facilitate or prevent other modifications and thus regulate the function of the modified protein. A specific combination of different PTMs may provide a basis for proteins to perform multiple functions. Studying PTMs in biological systems requires identification of the sites likely to be modified and to determine under which biological conditions each PTM will occur. Such experimental tasks can considerably benefit from informatic and computational tools. The database of sequence and structure information should allow scientists to predict possible reaction sites and anticipate the structure-function changes induced by PTMs. One should ultimately be able to use prediction tools that scan proteome-wide databases and suggest potential PTMs and their effect on structure and function, so as to envisage the multifunctional behavior of proteins [8-11]. Protein multifunctionality often involves diverse activities performing and facilitating coordinated responses to differents biological contexts across cellular compartments. The specific environment of a cellular compartment often triggers a functional switch in the protein present in that compartment [109]. Modifications of non-covalent intramolecular interactions by PTMs are often the cause of structural changes resulting in functional switches [110-113]. The anionic group modification of amino acid residues often

968 Current Organic Chemistry, 2008, Vol. 12, No. 11

Nasir-ud-Din et al.

affects the non-covalent interactions that induce conformational changes and modulate protein function. Most of the 3-D macromolecular structures in the Protein Data Bank (PDB) were solved by one of three methods: i) Xray crystallography (approximately 85%), ii) solution nuclear magnetic resonance (NMR) (approximately 15%), and iii) electron microscopy (>1%). Theoretical models have been removed from PDB. A few structures were determined by other methods. X-ray crystal diffraction usually cannot resolve the positions of hydrogen atoms but can reliably distinguish among nitrogen, oxygen and carbon. Structures solved by NMR are in solution form, but limited to molecules with a molecular weight less than 30 kDa. NMR is a very good method for solving structures of small proteins and also yields the positions of hydrogen atoms. The results of NMR analysis constitute a collection of alternative models, in contrast to the single unique model obtained by crystallography. Prediction and building of a 3D protein model, utilizing sequence homology methods without an experimental 3-D model, are also helpful to predict the protein’s functions [114,115]. Understanding the function of protein not only implies determination of its 3D structure, but also the definition of the active sites of the folded protein, a useful shortcut to translate the structural information into functional information. However, as the protein is subjected in vivo to a number of processing events resulting in opening or closing of active sites, the 3D structure of a protein is therefore dynamic and continuously changing. The 3D models built either experimentally or theoretically are static and silent about the dynamic aspects of protein structure (molecular weight,

isoelectric points, and potential PTMs). Amongst these global properties, PTMs are the most influential as they are dynamic and can induce temporary changes in protein structures or conformations such as the change of coil structure to a -sheet, as in the prion peptide [1,2,6]. Predicting the functions of proteins through structural and conformational changes by PTMs is based on the hypothesis that the overall structure and function of a mature cellular protein is influenced significantly by PTMs that can also occur alternatively on the same or on neighboring amino acid residues. For example, O--GlcNAc addition is temporary and as dynamic as phosphorylation [reviewed in ref. 3], and temporary changes induced by O--GlcNAc modification and phosphorylation alternatively on same amino acids functionally regulate proteins. A considerable development in the area of molecular 3D structural visualization tools (Table 4) has widened our understanding the structure-function relationships of DNA, proteins and their complexes. THEORETICAL AND COMPUTATIONAL APPROACHES TO STRUCTURAL AND FUNCTIONAL GENOMICS AND PROTEOMICS Different computational methods have been decisive in understanding the structural and functional consequences of post-translational protein modifications. The structurefunction relationship and possible changes in the 3-D structure of a modified protein or peptide can be predicted utilizing machine learning techniques, an artificial intelligence system poised to acquire or develop new knowledge or skills. Methods for machine learning are: supervised learn-

Table 4. Protein and Other Macromolecule Visualization and Modification Tools Tool

Function

URL

TopDraw

Schematics of protein secondary structure

http://stein.bioch.dundee.ac.uk/~charlie/software/t opdraw/

Folding

Dynamics calculations for protein folding

http://folding.stanford.edu/FAQhighperformance.html

TexMol

Rendering atomic models

http://cvcweb.ices.utexas.edu/cvc/projects/project. php?pageIndex=1&proID=8

QuteMol

High quality molecular visualization system

http://qutemol.sourceforge.net/

Chimera

Extensible molecular modeling system

http://www.cgl.ucsf.edu/chimera/

PyMOL

Molecular visualization system

http://pymol.sourceforge.net/

VMD

Visual Molecular Dynamics

http://www.ks.uiuc.edu/Research/vmd/

EMAN2

Scientific image processing

http://blake.bcm.tmc.edu/eman/eman2/

PeepeR

Visualization of 3D-Electron Microscopy volume maps

http://biocomp.cnb.uam.es/das/PeppeR/index.html

RasMol

Molecular Visualization Freeware for proteins, DNA and macromolecules

http://www.umass.edu/microbio/rasmol/

MolMol

MOLecule analysis and MOLecule display

http://hugin.ethz.ch/wuthrich/software/molmol/

Cn3D

Structure and sequence alignment viewer for NCBI databases entries

http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3 d.shtml

Jmol

An open-source Java viewer for chemical structures in 3D

http://jmol.sourceforge.net/

StarBiochem

An application that displays molecules from the Protein Data Bank

http://web.mit.edu/star/biochem/

Developing Protein Structure-Function Relationships in silico

ing, unsupervised learning, reinforcement learning, learning to learn and semi-supervised learning. Historically, back propagation algorithm was one of the most significant developments in the area of neuro-computing, as a way to elaborate a powerful mapping network. It is used for parameterized non-linear modeling in regression and classification problems. Back-propagation attempts to learn the training example patterns under a supervised learning mode, by adjusting its weights on the bases of training errors produced for each training data set, in each training epoch and by calculating the gradient vector of the error surface. Back propagation is associated with a practical problem of slow convergence and thus requires long training periods which may last more than hundred hours. A major problem in the development of back propagation is the selection of hidden layers and numbers of processing units in the hidden layer(s). Wrong selection of the network topology can lead to overlearning and/or under-learning, resulting in high false positive or false negative predictions. Other machine learning techniques include hybrid approaches involving evolutionary methods, genetic algorithms, genetic programming, Bayesian methods and Occam’s razor [117]. Several programs have been developed to analyze and/or predict different PTM sites in proteins with reliable accuracy, utilizing neural network training by the primary sequence data of proteins [8-11] (Table 3). A simple approach for the identification of PTM sites is based on the application of regular expression search (regular expressions are constructed from experimentally verified functional sites in proteins). In order to improve the prediction efficiency by this method and to lower the number of false-positives, contextbased rules and logical filters are applied in the ELM (Eukaryotic Linear Motif) resources [118]. The Sulfinator [119] uses hidden Markov model (HMM) to recognize sulfated residues and it is built on the basis of multiple sequence alignments of 25-amino acid long segments. The NetPhos server utilizes neural networks trained on PhosphoBase database (version 2.0) in order to characterize a 9-amino acid neighborhood of the serine, threonine and tyrosine phosphorylation sites in eukaryotic proteins [120]. The PredPhospho server predicts phosphorylation sites and the type of protein kinase acting at each site using SVMs [8]. Another web program, Scansite, uses sequence profiles derived from experimental data for identification of PTM sites. This motifbased scanning approach is applied for genome-wide prediction of signaling pathways [121]. The eMOTIF [122] reveals conserved sequence motifs in families of proteins derived from the multiple sequence alignments with a wide range of specificities and sensitivities. The PROSITE database [123] allows inferring a function and the classification of a protein, using a set of local sequence similarity tools. Most of the methods described above for predicting PTMs have utilized data-driven approaches which involve machine learning techniques like neural networks [120] and hidden Markov models [119] and association rule mining [11]. These machine learning methods try to classify or cluster the sequence motifs containing positional correlations by learning their patterns, either by supervised learning [8-10] or unsupervised learning mode [11]. These methodologies for data analysis are good at partitioning sets of sequence patterns into groups on the basis of positional correlations.

Current Organic Chemistry, 2008, Vol. 12, No. 11 969

These methods alone are sufficient to discover regulation for amino acids association at different positions. A recent study has exploited data mining techniques to collect significant association patterns of amino acids around a PTM site, representing a correlation approach between the PTM site and its surrounding sequence [11]. This method, MAPRes is capable of finding important amino acid patterns surrounding a PTM site from a large data collection of PTM or protease binding site. The method is effective for primary sequence analysis of large data to find a general and specific requirement for a target site in protein and was used to analyze the PTM data of experimentally verified O-GalNAc modification sites (Ser/Thr) and those for phosphorylation (Ser/Thr/Tyr) [11]. CONCLUSIONS AND PERSPECTIVES Proteins regulate cellular functions through different mechanisms. Protein multifunctionality, the coordinated adaptation of a protein to different functional contexts, is regulated directly by isoforms or indirectly by PTMs, and the enzymes catalyzing those modifications. Additionally, the dynamics and interplay of different PTMs is controlled by a combination of different enzymes. Despite the great importance of PTMs for biological function, their study on a large scale has been hampered by the lack of suitable methods, and many key modifications have only been discovered late in the elucidation of various biological processes. Direct analysis of modifications requires isolation of the correctly processed protein in a sufficiently large amount for biochemical study. Knowledge of these modifications is extremely important because they may alter physical and chemical properties, folding, conformation distribution, stability, activity, and consequently, function of the proteins. Furthermore, the modification itself can act as an added functional group. The generation of multifunctional proteins in vivo by substitution of functional groups in different amino acids such as phosphorylation, glycosylation, acetylation and methylation, and processing of such proteins is the actual goal of functional genomics and proteomics. A practical appreciation of protein multifunctionality and the mechanisms involved has become a standard expectation, but there is little information to suggest an earlier appreciation of the problem. The development of computational procedures based on theoretical biochemistry is undoubtedly an essential component for further developing the concept of protein multifunctionality. In silico computational tools provide software and hardware, database search algorithms optimized for the detection of modified peptides and patterns, prediction of protein-protein, protein-ligand interactions and simulation. ACKNOWLEDGEMENTS NUD thanks the Pakistan Academy of Sciences, the Higher Education Commission, and EMRO-WHO for financial support. DCH was supported by grants from the Swiss National Science Foundation, Oncosuisse, the Geneva Cancer League and the Bernische Krebsliga. We thank Prof. C.W. Jefford for useful critical discussion. REFERENCES [1]

Clarkson, J.; Campbell, I.D.; Yudkin, M.D. Biochem. J., 2003, 372(Pt 1), 113.

970 Current Organic Chemistry, 2008, Vol. 12, No. 11 [2]

[3] [4] [5] [6] [7]

[8] [9] [10] [11]

Jang, H.H.; Kim, S.Y.; Park, S.K.; Jeon, H.S.; Lee, Y.M.; Jung, J.H.; Lee, S.Y.; Chae, H.B.; Jung, Y.J.; Lee, K.O.; Lim, C.O.; Chung, W.S.; Bahk, J.D.; Yun, D.J.; Cho, M.J.; Lee, S.Y. FEBS Lett., 2006, 580, 351. Zachara, N.E.; Hart, G.W. Chem. Rev., 2002, 102, 431. Artymiuk, P.J.; Poirrette, A.R.; Grindley, H.M.; Rice, D.W.; Willett, P. J. Mol. Biol., 1994, 243, 327. Ashcroft, M.; Kubbutat, M.H.G.; Vousden K.H. Mol. Cell Biol., 1999, 19, 1751. Chen, P-Y.; Lin, C-C.; Chang, Y-T.; Lin, S-C.; Chan, S.I. Proc. Natl. Acad. Sci., 2002, 99, 12633. Lander, E.S.; Linton, L.M.; Birren, B.; Nusbaum, C.; Zody, M.C.; Baldwin, J.; Devon, K.; Dewar, K.; Doyle, M.; FitzHugh, W.; Funke, R.; Gage, D.; Harris, K.; Heaford, A.; Howland, J.; Kann, L; Lehoczky, J.; LeVine, R.; McEwan, P.; McKernan, K.; Meldrim, J.; Mesirov, J.P.; Miranda, C.; Morris, W.; Naylor, J.; Raymond, C.; Rosetti, M.; Santos, R.; Sheridan, A.; Sougnez, C.; StangeThomann, N.; Stojanovic, N.; Subramanian, A.; Wyman, D.; Rogers, J.; Sulston, J.; Ainscough, R.; Beck, S.; Bentley, D.; Burton, J.; Clee, C.; Carter, N.; Coulson, A.; Deadman, R.; Deloukas, P.; Dunham, A.; Dunham, I.; Durbin, R.; French, L.; Grafham, D.; Gregory, S.; Hubbard, T.; Humphray, S.; Hunt, A.; Jones, M.; Lloyd, C.; McMurray, A.; Matthews, L.; Mercer, S.; Milne, S.; Mullikin, J.C.; Mungall, A.; Plumb, R.; Ross, M.; Shownkeen, R.; Sims, S.; Waterston, R.H.; Wilson, R.K.; Hillier, L.W.; McPherson, J.D.; Marra, M.A.; Mardis, E.R.; Fulton, L.A.; Chinwalla, A.T.; Pepin, K.H.; Gish, W.R.; Chissoe, S.L.; Wendl, M.C.; Delehaunty, K.D.; Miner, T.L.; Delehaunty, A.; Kramer, J.B.; Cook, L.L.; Fulton, R.S.; Johnson, D.L.; Minx, P.J.; Clifton, S.W.; Hawkins, T.; Branscomb, E.; Predki, P.; Richardson, P.; Wenning, S.; Slezak, T.; Doggett, N.; Cheng, J.F.; Olsen, A.; Lucas, S.; Elkin, C.; Uberbacher, E.; Frazier, M.; Gibbs, R.A.; Muzny, D.M.; Scherer, S.E.; Bouck, J.B.; Sodergren, E.J.; Worley, K.C.; Rives, C.M.; Gorrell, J.H.; Metzker, M.L.; Naylor, S.L.; Kucherlapati, R.S.; Nelson, D.L.; Weinstock, G.M.; Sakaki, Y.; Fujiyama, A.; Hattori, M.; Yada, T.; Toyoda, A.; Itoh, T.; Kawagoe, C.; Watanabe, H.; Totoki, Y.; Taylor, T., Weissenbach, J.; Heilig, R.; Saurin, W.; Artiguenave, F.; Brottier, P.; Bruls, T.; Pelletier, E.; Robert, C.; Wincker, P.; Smith, D.R.; Doucette-Stamm, L.; Rubenfield, M.; Weinstock, K.; Lee, H.M.; Dubois, J.; Rosenthal, A.; Platzer, M.; Nyakatura, G.; Taudien, S.; Rump, A.; Yang, H.; Yu, J.; Wang, J.; Huang, G.; Gu, J.; Hood, L.; Rowen, L.; Madan, A.; Qin, S.; Davis, R.W.; Federspiel, N.A.; Abola, A.P.; Proctor, M.J.; Myers, R.M.; Schmutz, J.; Dickson, M.; Grimwood, J.; Cox, D.R.; Olson, M.V.; Kaul, R.; Raymond, C.; Shimizu, N.; Kawasaki, K.; Minoshima, S.; Evans, G.A.; Athanasiou, M.; Schultz, R.; Roe, B.A.; Chen, F.; Pan, H.; Ramser, J.; Lehrach, H.; Reinhardt, R.; McCombie, W.R.; de la Bastide, M.; Dedhia, N.; Blöcker, H.; Hornischer, K.; Nordsiek, G.; Agarwala, R.; Aravind, L.; Bailey, J.A.; Bateman, A.; Batzoglou, S.; Birney, E.; Bork, P.; Brown, D.G.; Burge, C.B.; Cerutti, L.; Chen, H.C.; Church, D.; Clamp, M.; Copley, R.R.; Doerks, T.; Eddy, S.R.; Eichler, E.E.; Furey, T.S.; Galagan, J.; Gilbert, J.G.; Harmon, C.; Hayashizaki, Y.; Haussler. D.; Hermjakob, H.; Hokamp, K.; Jang, W.; Johnson, L.S.; Jones, T.A.; Kasif, S.; Kaspryzk, A.; Kennedy, S.; Kent, W.J.; Kitts, P.; Koonin, E.V.; Korf, I.; Kulp, D.; Lancet, D.; Lowe, T.M.; McLysaght, A.; Mikkelsen, T.; Moran, J.V.; Mulderm N.; Pollara, V.J.; Ponting, C.P.; Schuler, G.; Schultz, J.; Slater, G.; Smit, A.F.; Stupka, E.; Szustakowski, J.; Thierry-Mieg, D.; Thierry-Mieg, J.; Wagner, L.; Wallis, J.; Wheeler, R.; Williams, A.; Wolf, Y.I.; Wolfe, K.H.; Yang, S.P.; Yeh, R.F.; Collins, F.; Guyer, M.S.; Peterson, J.; Felsenfeld, A.; Wetterstrand, K.A.; Patrinos, A.; Morgan, M.J.; de Jong, P.; Catanese, J.J.; Osoegawa, K.; Shizuya, H.; Choi, S.; Chen, Y.J. International Human Genome Sequencing Consortium. Nature, 2001, 409, 860. Kim, J.H.; Lee, J.; Oh, B.; Kimm, K.; Koh, I. Bioinformatics, 2004, 20, 3179. Blom, N.; Sicheritz-Ponten, T.; Gupta, R.; Gammeltoft, S.; Brunak, S. Proteomics, 2004, 4, 1633. Chen, H.; Xue, Y.; Huang, N.; Yao, X.; Sun, Z. Nucleic Acids Res., 2006, 34, W249. Ahmad, I.; Qazi, W.M.; Khurshi, A.; Ahmad, M.; Hoessli, D.C.; Khawaja, I.; Choudhary, M.I.; Shakoori, A.R.; Nasir-ud-Din. Proteomics, 2008, 8, In Press.

Nasir-ud-Din et al. [12] [13] [14] [15] [16]

[17] [18] [19] [20] [21]

[22] [23]

[24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42]

[43] [44] [45] [46] [47] [48] [49] [50] [51]

Bystroff, C.; Krogh, A. Methods Mol. Biol., 2007, 413, 173. Bishop, C.M. Pattern Recognition and Machine Learning, Springer-Verlag: New York, 2006. Black, D. L. Cell, 2000, 103, 367. Xing, Y.; Lee, C. Nat. Rev. Genet., 2006, 7, 499. Johnson, J.M.; Castle, J.; Garrett-Engele, P.; Kan, Z.; Loerch, P.M.; Armour, C.D; Santos, R.; Schadt, E.E.; Stoughton, R.; Shoemaker, D.D. Science, 2003, 302, 2141. Stamm, S.; Ben-Ari, S.; Rafalska, I.; Tang, Y.; Zhang, Z.; Toiber, D.; Thanaraj, T.A.; Soreq, H. Gene, 2005, 344, 1. Schmucker, D.; Clemens, J.C.; Shu, H.; Worby, C.A.; Xiao, J.; Muda, M.; Dixon, J.; E.; Zipursky, S. L. Cell, 2000, 101, 671. Celetto, A. M.; Gravely, B.R. Genetics, 2001, 159, 599. Stetefeld, J.; Ruegg, M.A. Trends Biochem. Sci., 2005, 30, 515. Hymowitz, S. G.; Compaan, D.M.; Yan, M.; Wallweber, H.J.; Dixit, V.M.; Starovasnik, M. A.; de Vos, A.M. Structure, 2003, 11, 1513. Walma, T.; Aelen, J.; Nabuurs, S. B.; Oostendorp, M.; van den, Berk.; L.; Hendriks, W.; Vuister, G. W. Structure, 2004, 12, 11. Kendrew, J.C.; Dickerson, R. E.; Strandberg, B. E.; Hart, R. G.; Davies, D. R.; Phillips, D. C.; Shore, V. C. Nature, 1960, 185, 422. Chothia, C.; Lesk, A. M. EMBO J., 1986, 5, 823. Hargbo, J.; Elofsson, A. Proteins, 1999, 36, 68. Ison, J.C.; Blades, M.J.; Bleasby, A.J.; Daniel, S.C.; Parish, J.H.; Findlay, J.B. Proteins, 2000, 40, 330. Blake, J.D.; Cohen, F.E. J. Mol. Biol., 2001, 307, 721. Jennings, A.J.; Edge, C.M.; Sternberg, M.J. Protein Eng., 2001, 14, 227. Markiewicz, P.; Kleina, L.G.; Cruz, C.; Ehret, S.; Miller, J.H. J. Mol. Biol., 1994, 240, 421. Milla, M.E.; Brown, B.M.; Sauer, R.T. Nat. Struct. Biol., 1994, 1, 518. Suckow, J.; Markiewicz, P.; Kleina, L.G.; Miller, J.; KistersWoike, B.; Muller-Hill, B. J. Mol. Biol., 1996, 261, 509. Dosztanyi, Z.; Fiser, A.; Simon, I. J. Mol. Biol., 1997, 272, 597. Kannan, N.; Vishveshwara, S. J. Mol. Biol., 1999, 292, 441. Orengo, C.A. Protein Sci., 1999, 8, 699. Reddy, B.V.; Li, W.W.; Shindyalov, I.N.; Bourne, P.E. Proteins, 2001, 42, 148. Li, W.W.; Reddy, B.V.; Tate, J.G.; Shindyalov, I.N.; Bourne, P.E. Nucleic Acids Res., 2002, 30, 409. Rost, B. Protein Eng., 1999, 12, 85. Thompson, J.D.; Higgins, D.G.; Gibson, T.J. Nucleic Acids Res., 1994, 22, 4673. Altschul, S.F.; Madden, T.L.; Schaffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Nucleic Acids Res., 1997, 25, 3389. Pearl, F.M.; Lee, D.; Bray, J.E.; Buchan, D.W.; Shepherd, A.J.; Orengo, C.A. Protein Sci., 2002, 11, 233. Ranea, J. A. G.; Sillero, A.; Thornton, J. M.; Orengo, C. A. J. Mol. Evol., 2006, 63, 513. Martin, A.C.; Orengo, C.A.; Hutchinson, E.G.; Jones, S.; Karmirantzou, M.; Laskowski, R.A.; Mitchell, J.B.; Taroni, C.; Thornton, J.M. Structure, 1998, 6, 875. Marsden, R.L.; Ranea, J.A.; Sillero, A.; Redfern, O.; Yeats, C.; Maibaum, M.; Lee, D.; Addou, S.; Reeves, G.A.; Dallman, T.J.; Orengo, C.A. Phil. Trans. Royal Soc. B-Biol. Sci., 2006, 361, 425. Murzin, A.G.; Brenner, S.E.; Hubbard, T.; Chothia, C. J. Mol. Biol., 1995, 247, 536. Lesk, A.M.; Chothia, C. Philos. Trans. Royal Soc. London, 1986, 317, 345. Yuan, L.; Kurek, I.; English, J.; Keenan, R. Microbiol. Mol. Biol. Rev., 2005, 69, 373. Johannes, T.W.; Zhao, H. Curr. Opin. Microbiol., 2006, 9, 261. Hoseki, J.; Okamoto, A.; Takada, N.; Suenaga, A.; Futatsugi, N.; Konagaya, A.; Taiji, M.; Yano, T.; Kuramitsu, S.; Kagamiyama, H. Biochemistry, 2003, 42, 14469. Acharya, P.; Rajakumara, E.; Sankaranarayanan, R.; Rao, N.M. J. Mol. Biol., 2004, 341, 1271. Hecky, J.; Muller, K.M. Biochemistry, 2005, 44, 12640. Fox, R.J.; Davis, S.C.; Mundorff, E.C.; Newman, L.M.; Gavrilovic, V.; Ma, S.K.; Chung, L.M.; Ching, C.; Tam, S.; Muley, S.; Grate, J.; Gruber, J.; Whitman, J.C.; Sheldon, R.A.; Huisman, G.W. Nat. Biotechnol., 2007, 25, 338.

Developing Protein Structure-Function Relationships in silico [52]

[53] [54] [55] [56] [57] [58] [59]

[60] [61] [62] [63]

[64] [65] [66]

[67] [68] [69] [70] [71] [72] [73] [74] [75] [76]

[77] [78] [79] [80] [81] [82] [83] [84] [85]

Khwaja, T.A.; Wajahat, T.; Ahmad, I.; Hoessli, D.; Walker-Nasir, E.; Kaleem, A.; Qazi, W.M.; Shakoori, A.R.; Nasir-ud-Din. Cell Biochem., 2008, 103, 479. Clackson, T.; Wells, J.A. Science, 1995, 267, 383. Keskin, O.; Ma, B.; Rogale, K.; Gunasekaran, K.; Nussinov, R. Phys. Biol., 2005, 2, S24. Fernandez-Ballester, G.; Blanes-Mira, C.; Serrano, L. J. Mol. Biol., 2004, 335, 619. Fellouse, F.A.; Barthelemy, P.A.; Kelley, R.F.; Sidhu, S.S. J. Mol. Biol., 2006, 357, 100. Tobi, D.; Bahar, I. Proc. Natl. Acad. Sci. USA, 2005, 102, 18908. Dreger, M. Mass Spectrom. Rev., 2003, 22, 27. Kumar, A.; Agarwal, S.; Heyman, J.A.; Matson, S.; Heidtman, M.; Piccirillo, S.; Umansky, L.; Drawid, A.; Jansen, R.; Liu, Y.; Cheung, K.H.; Miller, P.; Gerstein, M.; Roeder, G.S.; Snyder, M. Genes Dev., 2002, 16, 707. Andersen, J.S.; Lyon, C.E.; Fox, A.H.; Leung, A.K.; Lam, Y.W.; Steen, H.; Mann, M.; Lamond, A.I. Curr. Biol., 2002, 12, 1. Ostrovsky de, S. P.; Maloy, S. Proc. Natl. Acad. Sci. USA, 1993, 90, 4295. Muro-Pastor, A.M.; Ostrovsky, P.; Maloy, S. J. Bacteriol., 1997, 179, 2788. Kaleem, A.; Hoessli, D.C.; Ahmad, I.; Walker-Nasir, E.; Nasim, A.; Shakoori, A.R.; Nasir-ud-Din. J. Cell Biochem., 2008, 103, 835. Ahmad, I.; Hoessli, D.C.; Walker-Nasir, E.; Rafik, S.M.; Shakoori, A.R.; Nasir-ud-Din. Nucleic Acids Res., 2006 34, 175. Ahmad, I.; Hoessli, D.C.; Gupta, R.; Walker-Nasir, E.; Rafik, S.M.; Choudhary, M.I.; Shakoori, A.R.; Nasir-ud-Din. J. Cell Biochem. 2007, 100, 1558. Ahmad, I.; Khan, T.S.; Hoessli, D.C.; Walker-Nasir, E.; Kaleem, A.; Shakoori, A.R.; Nasir-Ud-Din. Protein Pept. Lett., 2008, 15, 193. Stanley, P. Glycobiology, 1992, 2, 99. Varki, A. Glycobiology, 1993, 3, 97. Hounsell, E. F.; Davies, M. J.; Renouf, D. V. Glycoconjug. J., 1996, 13, 19. Ten Hagen, K.G.; Fritz, T.A.; Tabak, L. A. Glycobiology, 2003, 13,1R. Comer, F.I.; Hart, G.W. Biochim. Biophys. Acta, 1999, 1473, 161. Bates, S.; Vousden. K. H. Curr. Opin. Genet. Dev., 1996, 6, 1. Crews, C.M.; Alessandrini, A.; Erikson, R.L. Cell Growth Differ., 1992, 3, 135. Cross, D.A.; Alessi, D.R.; Cohen, P.; Andjelkovich, M.; Hemmings, B.A. Nature, 1995, 378, 785. Downward, J. FEBS Lett., 1994, 338, 113. Ahmad, I.; Hoessli, D.C.; Walker-Nasir, E.; Choudhary, M.I.; Rafik, S.M.; Shakoori, A.R.; Nasir-ud-Din. J. Cell. Biochem., 2006, 99, 706. Gu, W.; Roeder, R.G. Cell, 1997, 90, 595. Boyes, J.; Byfield, P.; Nakatani, Y.; Ogryzko, V. Nature, 1998, 396, 594. Zhang, W.; Bieker, J.J. Proc. Natl. Acad. Sci. USA, 1998, 95, 9855. Martínez-Balbás.; Bauer.; Nielsen.; Brehm.; Kouzarides. EMBO, 2000, 19, 662. Dhalluin, C.; Carlson, J.E.; Zeng, L.; He, C.; Aggarwal, A.K.; Zhou, M.M. Nature, 1999, 399, 491. Takemura, R.; Okabe, S.; Umeyama, T.; Kanai, Y.; Cowan, N.J.; Hirokawa, N. J. Cell Sci., 1992, 103, 953. Herrera, J.E.; Sakaguchi, K.; Bergel, M.; Trieschmann, L.; Nakatani, Y.; Bustin, M. Mol. Cell. Biol., 1999, 19, 3466. Clarke, S. Curr. Opin. Cell Biol., 1993, 5, 977. Aletta, J.M.; Cimato, T.R.; Ettinger, M.J. Trends Biochem. Sci., 1998, 23, 89.

Current Organic Chemistry, 2008, Vol. 12, No. 11 971 [86] [87] [88] [89] [90]

[91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123]

Gary, J.D.; Clarke, S. Prog. Nucleic Acid Res. Mol. Biol., 1998, 61, 65. van Holde, K.E. In Molecular Biology; Rich, A. Ed., Springer, New York, 1988; pp. 111. Strahl, B.D.; Ohba, R.; Cook, R.G.; Allis, C.D. Proc. Natl. Acad. Sci. USA, 1999, 96, 14967. Chen, D.; Ma, H.; Hong, H.; Koh, S.S.; Huang, S.M.; Schurter, B.T.; Aswad, D.W.; Stallcup, M.R. Science, 1999, 284, 2174. Rea, S.; Eisenhaber, F.; O'Carroll, D.; Strahl, B.D.; Sun, Z.W.; Schmid, M.; Opravil, S.; Mechtler, K.; Ponting, C.P.; Allis, C.D.; Jenuwein, T. Nature, 2000, 406, 593. Ogata, N.; Ueda, K.; Hayaishi, O. J. Biol. Chem., 1980, 255, 7610. Ahlgren, J.A.; Ordal, G.W. Biochem. J., 1983, 213, 759. Chelius, D.; Jing, K.; Lueras, A.; Rehder, D.S.; Dillon, T.M.; Vizel, A.; Rajan, R.S.; Li, T.; Treuheit, M.J.; Bondarenko, P.V. Anal. Chem., 2006, 78, 2370. Paik, W.K.; Kim, S. Protein Methylation; Wiley, New York 1980; Vol. 1, pp. 112. Clarke, S.; Vogel, J.P.; Deschenes, R.J.; Stock, J. Proc. Natl. Acad. Sci. USA, 1988, 85, 4643. Koshland, D.E. Jr. Annu. Rev. Biochem., 1981, 50, 765. Janson, C.A.; Clarke, S. J. Biol. Chem., 1980, 255, 11640. Freitag, C.; Clarke, S. J. Biol. Chem., 1981, 256, 6102. Terwilliger, T.C.; Clarke, S. J. Biol. Chem., 1981, 256, 3067. Lee, J.; Stock, J. J. Biol. Chem., 1993, 268, 19192. Xie, H.; Clarke, S. J. Biol. Chem., 1994, 269, 1981. Favre, B.; Zolnierowicz, S.; Turowski, P.; Hemmings, B.A. J. Biol. Chem., 1994, 269, 16311. Burgess-Cassler, A.; Ordal, G.W. J. Biol. Chem., 1982, 257, 12835. Goldman, D.J.; Worobec, S.W.; Siegel, R.B.; Hecker, R.V.; Ordal, G.W. Biochemistry, 1982, 21, 915. Springer, M.S.; Goy, M.F.; Adler, J. Nature, 1979, 280, 279. Shimizu T, Matsuoka Y, Shirasawa T. Biol. Pharm. Bull., 2005, 28, 1590. O'Farrell, P. J. Biol. Chem., 1975, 250, 4007. Khidekel, N.; Hsieh-Wilson, L.C. Org. Biomol. Chem., 2004, 2, 1. Jeffery, C.J. Trends Biochem. Sci., 1999, 24, 8. Berlot, S.; Aissaoui, Z.; Pavon-Djavid, G.; Belleney, J.; Jozefowicz, M.; Helary, G.; Migonney, V. Biomacromolecules, 2002, 3, 63. Boynton, J. C R.; Heinegard, D.; Barry, F. Biochemistry, 2001, 40, 12983. Varki, A.; Kornfeld, S. J Biol. Chem., 1980, 255, 10847. Yuan, Z.Q.; Feldman, R.I.; Busman, G.E.; Coppola, D.; Nicosia, S.V.; Cheng, J.Q. J. Biol. Chem., 2003, 278, 23432. Marcotte, E.M.; Pellegrini, M.; Ng, H.L.; Rice, D.W.; Yeates, T.O.; Eisenberg, D. Science, 1999, 285, 751. Eisenberg, D.; Marcotte, E.M.; Xenarios, I.; Yeates, T.O. Nature, 2000, 405, 823. Hecht-Nielson, R., Neurocomputing, Addison-Wesley: New Yark, 1989. Zhang, B.; Muhlenbein, H. Complex Systems, 1993, 7, 199. Diella, F.; Gould, C.M.; Chica, C.; Via, A. Gibson, T.J. Nucleic Acids Res., 2008, 36, D240. Monigatti, F.; Gasteiger, E.; Bairoch, A.; Jung, E. Bioinformatics, 2002, 18, 769. Blom, N.; Gammeltoft, S.; Brunak, S. J. Mol. Biol., 1999, 294, 1351. Yaffe, M.B.; Leparc, G.G.; Lai, J.; Obata, T.; Volinia, S.; Cantley, L.C. Nat. Biotechnol., 2001, 19, 348. Huang, J.Y.; Brutlag, D.L. Nucleic Acids Res., 2001, 29, 202. Falquet, L.; Pagni, M.; Bucher, P.; Hulo, N.; Sigrist, C.J.; Hofmann, K.; Bairoch, A. Nucleic Acids Res., 2002, 30, 235.

Suggest Documents