Send Orders for Reprints to
[email protected] Letters in Drug Design & Discovery, 2015, 12, 46-59
46
Structure Based Functional Annotation of Putative Conserved Proteins from Treponema pallidum: Search for a Potential Drug Target Avni Sinha1, Faizan Ahmad2 and Md. Imtaiyaz Hassan2,* 1
Department of Computer Science, 2Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi – 110025, India Abstract: Treponema pallidum, a Gram negative bacterium, causes chronic disease syphilis, a sexually transmitted disease. T. pallidum penetrates dermal micro abrasions or intact mucous membranes resulting in varieties of symptoms. The complete genome for T. pallidum was sequenced, which contains approximately 1,090 genes encoding approximately 1,041 proteins. These open reading frames account for a large number of hypothetical proteins (HPs) for which no pieces of experimental evidence are available. Being a virulent and not very well characterized organism, it is essential to analyze these HPs whose structure and function are not available in the protein data bank. Here, our aim is to predict structure and ultimately function of HPs using combination of modern bioinformatics tools. We successfully modeled the structures of six HPs with high accuracy, which possess endonuclease, NTP-transferase, transcription regulator and DNA-binding activities. We further performed virulence search to check their potential role in pathogenicity. Here, we performed an extensive structure analysis to get an insight into the function of a particular HP. We believe that these analyses expand our knowledge regarding the functional roles of HPs in the T. pallidum and provide an opportunity to validate novel potential drug targets.
Keywords: Homology modeling, sequence analysis, structural genomics, treponema pallidum, virulence factor and function prediction. INTRODUCTION Treponema pallidum is a Gram negative spirochaete that causes an epidemic disease known as syphilis [1]. This disease is acquired most commonly by sexual contact, but it is also congenital in nature and can be transferred from an infected woman to the developing fetus [1]. T. pallidum initially infects the epithelial cells of the genitals during sexual intercourse and subsequently proceeds to infect most of the organs or tissues in the human body [2]. Syphilis is characterized by multiple clinical stages and long periods of latent, asymptomatic infection [3]. In later stages, it leads to cardiovascular and neurological complications [4]. Furthermore, syphilis infected people are more susceptible to HIV infection which causes AIDS [1, 5]. The current treatments available against these organisms are not specific and thus some problems are always associated. Despite a potent infectious agent, relatively little information is available about T. pallidum. The genome of T. pallidum contains 1,138,006 bp [6], comprised of 1041 open reading frames (ORFs) which account for 92.9% of genomic DNA [7]. The biological roles of 577 ORFs (~55%) have already been predicted. However, 177 ORFs (17%) show resemblance to hypothetical proteins (HPs) and rest of the 288 ORFs (28%) are not in agreement with any of the known proteins. Since it is very difficult to work on T. pallidum because of its complete obligate de-
*Address correspondence to this author at the Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi – 110025, India; Tel: +91-9990323217; Fax: +91-11-2698-3409; E-mail:
[email protected] 1875-628X/15 $58.00+.00
pendence on a mammalian host system to survive, genomic sequence of T. pallidum offers a wealth of data which can be further analyzed to extract useful information [7]. T. pallidum genome also encodes a large number of HP, i.e., predicted from nucleic acid sequences but its function is not determined [8]. Extensive analysis of new HPs offers many new structures and new functions [9-11]. Two protein sequences are considered as homologous if they share a common evolutionary origin. Evidence for homology should be explicitly laid out for making a clear relationship based on the level of observed similarity [12]. Unfortunately, the relationship between sequence and functional similarity is not straightforward [13-16]. Hence, structural similarity is more reliable than sequence similarity because it is only the threedimensional conformation of a protein, which is responsible for its function [17, 18]. In other words, function of proteins depends on their three-dimensional structure [19-21]. But in a situation, when the structure of a protein is not known, it does not really stop the researcher from working on that protein. In such a case homology modeling is now widely used for structure determination [22-24]. The predicted structure is further analyzed for its putative function and virulence. Recently, we have been working on the structure based drug designing where we successfully modeled the structure of many proteins and designed specific ligands for drug design and discovery [25-30]. Here, we selected six HPs for which templates with sufficient coverage are available to model their three-dimensional structure. All models were validated and extensively analyzed to predict their corresponding function. This method is fast, cost effective and has considerable accuracy. ©2015 Bentham Science Publishers
Structure Based Functional Annotation of Putative Conserved
Letters in Drug Design & Discovery, 2015, Vol. 12, No. 1
47
The first step was the data collection, where we collected the fasta sequences of 234 HPs from T. pallidum from NCBI [7, 31]. The information was gathered with their accession number, length, gene id, protein id and sequence. These data are enough for proteins to be investigated for their function prediction.
identify similar structures present in the PDB [60]. The top hits of DALI search was further compared and extensively analyzed to establish a structure-function relationship. Profunc is another important tool which performs a comprehensive analysis of three-dimensional structure and provides a potential function to the HPs [61]. Furthermore, active pocket sites in the predicted structures of HPs were identified using the POCASA [62] and Pocket-Finder [63] servers. The PPM server [64] was used for calculating spatial positions in membranes of HPs.
Sequence Analysis
Virulence Factor Prediction
We performed sequence analysis to predict the function of a protein based on its conserved regions. Protein sequences were scanned using the InterProScan 5 service for the conserved sites and regions from various databases like Pfam, TIGRFAMs, HAMAP, CATH, PANTHER, Superfamily and Conserved Domain Database search [32-39]. Furthermore, gene ontology (GO) annotations provide insight about their molecular function and the process in which they are involved along with their localization in the cell [40]. The BLASTp [41] and HHpred [42] were used for remote homology detection against various available protein databases such as PDB [43], SCOP [44], CATH [45], etc. We further performed domain analysis of proteins for more precise function prediction of HPs [47]. The databases such as Pfam [34], PANTHER [46], SMART [47], SUPERFAMILY [39], CATH [37], CDART [48], SYSTERS [49], ProtoNet [50] and SVMProt [51] were used for precise domain annotation in HPs. The scheme for function assignment to HPs has already been reported [10].
VirulentPred tool predicts virulence of bacterial proteins, based on bilayer cascade Support Vector Machine (SVM) [65]. This server checks the virulence at various measures like amino acid composition, dipeptide composition, higher order dipeptide composition and cascade of SVMs and PSIBLAST.
METHODOLOGY Data Mining
Modeling For modeling three-dimensional structures of HPs we followed basic steps that include template search, targettemplate alignment, model building and model validation [52]. Template search was done using BLAST program of NCBI against the protein data bank (PDB) [41]. We selected template on the basis of high identity and lower E-value to get better predictions of their atomic coordinates using model building tool, SWISS-MODELER [53]. A detail of each step of model generation was described elsewhere [5456]. The modeled protein structures were validated on Structural Analysis and Verification Server (http://nihserver.mbi.ucla.edu/SAVES/) to check stereochemical quality of the model. Furthermore, Ramachandran plot of modeled structure provides information about the number of residues present in the allowed region [57]. Finally, energy minimized and well validated coordinates were used for the structure analysis and function prediction of HPs. Structure Analysis This is a well known fact that structure of a protein decides its function [58], and structures of homologous proteins are more conserved than their sequences [59]. CATH and SCOP analysis provided basic information about the types of fold present in the HPs and hence a possible function can be derived. We used DALI search program to
RESULT AND DISCUSSION After the sequence analysis, we selected six proteins for structure analysis and function prediction. We analyzed all structures using various tools with well defined objective of function prediction. Here we are describing each HP separately. HP TPFB_0650 HP TPFB_0650 protein of T. pallidum is a small protein containing 160 amino acid residues. This protein possesses a domain of unknown function namely, UPF0054. Various protein families and domain databases analyses suggest that this protein may act as an endonuclease which modifies RNA and binds to metal (Table 1). CATH analysis suggests the presence of metalloprotease catalytic domain in the sequence of HP TPFB_0650. Likewise, HAMAP analysis suggests that HP TPFB_0650 acts as a single strand-specific metallo-endoribonuclease. Furthermore, GO annotations suggest them to be involved in rRNA processing where they are employed in the metal dependent endopeptidase activity. The structure of TPFB_0650 protein was successfully modeled using template, ybeY protein from Escherichia coli (PDB id: 1XM5, Chain B), a metal dependent hydrolase [66]. There is only 26.7% of sequence identity between target and template sequences. On the other hand, the refined model showed the root mean square deviation (rmsd) of 0.069 Å only, where 107 residues of the target and template aligned perfectly (Table 2). Ramachandran plot of this model indicates that 98.6% (139) of residues are present in the allowed region. The overall structure of TPFB_0650 protein is represented in the Fig. (1A). The topology of TPFB_0650 protein is an -helix surrounded by many -strands and attains a typical ------- fold which is similar to the template in which six -helices and four -strands arranged in the same manner. Furthermore, both target and template structures show high resemblance in their active site and binding site residues (Fig. 1B).
48 Letters in Drug Design & Discovery, 2015, Vol. 12, No. 1
Sinha et al.
Table 1. Sequence based function prediction. S. No.
Protein ID
GO_bp
GO_mf
1
GO:00042 GO:00063 22 metal64 (rRNA loenTPFB_0650 processdopeptidase ing) activity
2
TPFB_0921 a
Pfam
TIGRFAMs
PF02130: UPF0054
HAMAP
MF_00009: Single TIGR00043: strandprobable specific rRNA matumetalloration factor endoribonuYbeY clease
PF12804: MobA-like NTP transferase domain
CATH
Metalloprotease catalytic domain
Super Family
SSF55486Metallo proteases Metal dependent endonu("zincins"), clease catalytic domain superfamily
Spore Coat SSF53448Nucleot ide-diphosphoPolysaccharide Biosyn- sugar transferases thesis Protein superfamily
PF02591: zinc ribbon domain DUF164
3
TPFB_0494
4
TPFB_0561
5
GO:00081 GO:00063 RNA methPF04452: 68 (meth64 (rRNA RNA methyl- yltransferase, TPFB_0032 yltransprocesstransferase RsmE family ferase ing) activity)
6
GO:00063 55 (reguPF01709: lation of TranscripGO:00036 transcrip77 (DNA tional regulaTPFB_0474 tion, binding) tor TACO1DNAlike dependent)
Predicted Function
Resembles a Rossmann fold NTP transferase
May bind to nucleic acid
PF04536: TPM Domain
Repair protein
SSF75217: alpha/beta knot superfamily (trefoil knot)
Methylation of RNA
Integrase, Nterminal zincTIGR01033: binding DNAMF_00693: domain-like binding TranSSF75625: YebC- May function as transcripTranscripregulatory like superfamily tion regulator scrip_reg_T tional regulaprotein, ACO1. tor TACO1YebC/PmpR like, domain family 2
Table 2. Results of homology modeling of HPs. RMSD* S. No.
Protein_ID
Template
Length
(Å)
Residues in the Allowed Region (%)
Sequence Identity (%)
1
TPFB_0650
1XM5_B
160
26.71
0.069 (107 to 107)
98.6 (139)
2
TPFB_0921a
3PNN_A
334
36.45
0.056 (242 to 242)
99.3 (293)
3
TPFB_0494
3NA7_A
237
20.85
5.464 (213 to 213)
98.6 (219)
4
TPFB_0561
2KW7_A
213
26.51
0.037 (117 to 117)
98.6(139)
5
TPFB_0032
4J3C_B
248
20.48
0.079 (146 to 146)
99.5 (221)
6
TPFB_0474
1LFP_A
244
52.28
0.063 (223 to 223)
98.1 (196)
Three histidine residues (His118, His122 and His128) forming the main binding site, are totally conserved. Pairwise sequence alignment of HP TPFB_0650 with its template clearly indicates that other functionally important resi-
dues, Arg59, Asp66 and Ser69 were also found to be highly conserved indicating the functional relevance of HP TPFB_0650 (Fig. 1C and Table 3). We further tried to search structurally similar proteins using DALI. The struc-
Structure Based Functional Annotation of Putative Conserved
Letters in Drug Design & Discovery, 2015, Vol. 12, No. 1
49
Fig. (1). (A). Superimposed structure of TPFB_0650 and template (PDB Id 1XM5) represented in cartoon model. (B). Proposed catalytic residues based on template structure are represented in ball and stick (purple and pink). (C). Pair wise sequence alignment of TPFB_0650 with its template. Catalytic and binding residues are shaded in teal and blue, respectively. Conserved residues are highlighted in dark grey. Secondary structures are shown on the top of sequence in which cylinder and filled arrow represents alpha helix and -strands, respectively. Structure was drawn in PyMOL.
ture of HP TPFB_0650 shows a remarkable structure similarity with metal-dependent hydrolase ybeY from E. coli (Z Score- 27.18, identity- 28%). We further performed Profunc analysis and found enough evidence to suggest that HP TPFB_0650 belongs to the UPF0054 protein family. These results helped us to suggest the function of HP TPFB_0650 as a metal dependent ribonuclease. The active site cavity provides multiple distinct pockets capable of accommodating specific extensions to basic metal binding scaffold in this HP. The virulence prediction suggests that HP TPFB_0650 is a virulent protein (Table 4), and hence this protein is having disease causing ability that can be a potential drug target. Recently, many endonuclease inhibitors are designed by top pharmaceutical companies to optimally inhibit the endonuclease activity and influenza A and B virus replication [67-69]. This further provides an evidence for this protein that it may be a novel promising target for the cure of syphilis. Furthermore, it is recently shown that a nucleoprotein has 3–5 exoribonuclease activity which is essential for mediating host immune suppression [70-71]. This further supports that such HP may be a drug/vaccine target against T. pallidum.
HP TPFB_0921A HP TPFB_0921a, is a 334 amino acid residues long protein. Sequence comparison results suggest that this protein possesses MobA-like NTP-transferase domain (PF12804) and belongs to nucleotide-diphospho-sugar transferases superfamily (SSF53448) (Table 1). This protein presumably has NTP transferase activity and catalyzes the transfer of nucleotides to phosphosugars, similar to the nucleotidyltransferases which transfer nucleotides to phosphosugars. The products are activated sugars that are precursors for synthesis of lipopolysaccharide, glycolipids and polysaccharides. Moreover, this protein shows a close resemblance with -D-gGlucose-1-phosphate cytidylyltransferase, mannose-1phosphate guanyl-transferase, and glucose-1-phosphate thymidylyltransferase. Fold analysis indicates the presence of Rossmann fold in this protein, which is a characteristic structural fold found in nucleotidyltransferases and many other nucleotide binding proteins [72]. We successfully modeled the structure of HP TPFB_0921a using glycosyltransferase from Porphyromonas gingivalis W83 (PDB id: 3PNN, Chain A) as template
50 Letters in Drug Design & Discovery, 2015, Vol. 12, No. 1
Sinha et al.
Table 3. Structure analysis and function prediction. S. No.
Protein_ID
Overall Structure
Active Site Residues
Binding Residues
Function
1
TPFB_0650
------- fold; 1 central surrounded by 4 and 3
R59, I61, D66, S69
H118, H122, H128
Metal dependent ribonucleases
11 central open twisted surrounded by 8 in a Rossman fold
R17
UDP- Glc bindingA12, G13, E95, G99. Glucose interacts with G168. Mg ion binds to D127
Catalyzes conversion of UTP and Glc-1-PO4 to UDP-Glc and PyroPO4.
C201, 204, 225, 228
Flagellum formation and motility. Coiled coil domain involved in P-P interaction. C4 Zn ribbon domain bind to nucleic acid.
2
3
4
5
6
TPFB_0921a
4 and 2 . 2 Domains: Coiled Coil domain of 2 helices. Zn ribbon domain with 2 which forms scaffold for Zn knuckle.
K180, R183, N199, M206, R230
/ fold consisting of 4 and 4 . Two in model which are deleted in template.
C1: L123, E124, K155, A156, R157, H174, I177,R178, A182; C2: E124, L128, A140, R157, E159, V160, G161, E165,D170; C3: R154, K155, A182, F185, Q186; C4: L118, Q119, G120, D121, L123, E153, K155
TPFB_0032
Five twisted and 1 forms small RNA-binding domain at N terminal. have trefoil knot like template of 2 and 7 .
Dimerization res. S100, L101, G102, Q139, M222, L227, R228, E230, T231
TPFB_0474
9 helices and 7 sheet. Have 3 domains. Domain 1 is helix bundle with 3 helices. Domain 2 a 2 layer () sandwich with 3 and each. Domain 3 also a 2 layer () sandwich with 3 and 4 .
PAS1- Around T103 (K30, Y87, D104, R108, D229) PAS2- Around S129 (E88, Y90, Y132)
TPFB_0494
TPFB_0561
(nucleotidyltransferases)
gene is part of operon which encode subunits of aspartate carbamoyl transferase which plays role in de novo pyrimidine synthesis.
G199, E201, R202, G203, M222, G223, R225, L227, T231
Transfer methyl to nucleic acid more specifically RNA ; trefoil knot at c terminal, majorly active site. May act as transcription regulator. Bind to nucleic acid.
Table 4. Virulence factor of proteins determined by VirulentPred.
S. No.
Protein_ID
Based on Amino Acid Composition
Based on Dipeptide Compostion
Based on Higher Order Dipeptide Composition
Based on Cascade of SVMs and PSI-BLAST
Overall Result
1
TPFB_0650
0.9474
0.8493
0.6353
0.9929
Virulent
2
TPFB_0921a
-0.911
-0.374
-0.715
-0.361
Non-Virulent
3
TPFB_0494
-0.409
-0.115
-1.056
-0.418
Non-Virulent
4
TPFB_0561
1.549
0.0515
-0.103
0.7465
Virulent
5
TPFB_0032
0.4927
0.3
-0.055
0.9607
Virulent
6
TPFB_0474
-1.283
-0.269
-0.635
-0.382
Non Virulent
which shows a sequence identity of 36.45% [73]. Energy minimized model show that 99.3% (293) of residues fall in the allowed region of Ramachandran plot. Structure alignment of the model with template is showing an rmsd of 0.056 Å for 242 residues. Overall structure of this model
contains 11 central open twisted -strands surrounded by eight -helices in a characteristic Rossman fold structure. The topology of this protein is ------------- -------- (Fig. 2A). We further predicted some essential residues which are involved in the binding and ca-
Structure Based Functional Annotation of Putative Conserved
Letters in Drug Design & Discovery, 2015, Vol. 12, No. 1
51
Fig. (2). (A). Superimposed structure of TPFB_0921a with its template (PDB id: 3PNN) represented in cartoon. (B). Structural representation of functionally significant residues in ball and stick. Binding residues (Purple orange) and putative catalytic residue (Pink Blue) of TPFB_0921a model are superimposed with its template. (C). Pairwise sequence alignment of TPFB_0921a and template highlighting binding residues in teal and putative catalytic residue in blue. Conserved cysteine residues are shaded in yellow.
talysis. These residues are Ala12, Gly13, Arg101, Gly105, Asp127, Gly168, Trp208 and Arg17, show close sequence and structural similarity with the template (Fig. 2B and 2C). The putative catalytic and substrate binding residues are mainly located on the loops rather than on helices or strands. In order to annotate function, we searched structurally similar proteins to HP TPFB_0921a in the PDB using online server DALI. The results of DALI clearly indicate a close structural similarity with glycosyltransferase from Porphyromonas gingivalis (PDB id: 3PNN, Z Score- 48.8, identity40%), UDP-Glc pyrophosphorylase (PDB id: 3JUJ, Z Score20.1, identity- 19%), glucose-1-Phosphate Thymidylyltransferase (PDB Id: 3HL3, Z Score-20.1, identity- 16%). All these findings clearly indicate that HP TPFB_0921a possesses a function of nucleotidyl sugar transfer. We further performed Profunc analysis for function annotation based on
both sequence and structure and found many evidences that support the role of HP TPFB_0921a in nucleotide transfer. This protein looks like a non-virulent (Table 4). Targeting a protein which is primarily responsible for a metabolism essential for the survival of pathogen, has recently emerged as a new antimicrobial development strategy. Since it has several advantages compared with the conventional methods of targeting in vitro viability of bacterial cells [74]. Our findings clearly indicate that HP TPFB_0921a may be a promising target for this purpose because it plays a critical role in the biosyntheses intermediate metabolites. HP TPFB_0494 HP TPFB_0494 of T. pallidum is a 237-residue long protein containing DUF164 domain, a putative zinc ribbon domain presumed to be having nucleic acid binding function
52 Letters in Drug Design & Discovery, 2015, Vol. 12, No. 1
Sinha et al.
Fig. (3). (A). Superimposed structure of TPFB_494 and its template (PDB id: 3NA7) illustrated in cartoon. (B). A zoomed view of Zn knuckle housing four cysteines are shown in ball and stick (golden and white color) which are the principal binding residues. (C). Pairwise sequence alignment of TPFB_0494 with its template in which binding residues (Cys) are shaded in yellow. Conserved residues are highlighted in dark grey.
(Table 1), although, we did not get sufficient evidence at sequence level. To further understand the function, we modeled the structure of HP TPFB_0494 using template, HP0958 protein from Helicobacter pylori (PDB id: 3NA7, Chain A), a protein which also contains DUF164 domain [75]. Energy minimized coordinates of HP TPFB_0494 model shows that 98.6% (219) residues fall in the allowed region of Ramachandran plot (Table 2). The structure has a consensus sequence CX(2)CX(20)CX(2)C, and two pair Cys in CXXC pattern, forming the major binding site. This protein is thought to be responsible for flagellum formation and motility (Table 3). The HP TPFB_0494 consists of two distinct domains. One is highly elongated and kinked N-terminal anti-parallel -helical coiled-coil hairpin domain, and other is a globular C-terminal Zn-ribbon domain comprised of two pairs of cysteines. Overall structure of HP TPFB_0494 contains four helices and two -strands, and resembles a hockey stick similar to its template structure [75]. The protein has ---- fold, with two -helices forming coiled coil hairpin structure while -strands act as scaffolding to zinc knuckle, which has four Cys residues (Cys201, Cys204, Cys225 and Cys228) which hold the structure together and form a major binding site (Fig. 3C), suggesting the involvement of HP
TPFB_0494 in nucleic acid interactions [76]. There are several conserved residues on the surface, which could possibly play a role in RNA binding. Moreover, the Zn-ribbon domain of HP TPFB_0494 is rich in positively charged and aromatic residues, which often play important functional roles in RNA binding proteins [77, 78]. Dali search suggests that this HP closely resembles with HP0958 protein from Helicobacter pylori (PDB id: 3NA7, Z Score- 27.18, identity- 28%), a hypothetical protein from Aquifex aeolicus (PDB id: 1OZ9, Z Score- 23.3, identity21%); CT398 from Chlamydia trachomatis (PDB id: 4ILO, Z Score- 18, identity- 21%) and a translocator IPAB (PDB id: 3U0C, Z Score-12.3, identity- 7%). All these observations clearly indicate that HP TPFB_0494 is primarily involved in the flagellum formation and hence motility. Profunc server predicts that HP TPFB_0494 has zinc ribbon domain and shows close sequence similarity with other putative Zn ribbon domain. The HP TPFB_0494 consists of two structurally distinct domains. It may be possible that these two domains carry out separate functions in flagellar protein export [79]. On the basis of above results, we resolved that the protein has nucleic acid and metal binding function and may act in flagellum formation in bacteria. The virulence study results show that the protein is a non-virulent (Table
Structure Based Functional Annotation of Putative Conserved
Letters in Drug Design & Discovery, 2015, Vol. 12, No. 1
53
Fig. (4). (A). Overlaid structure of TPFB_0561 and its template (PDB id: 2KW7) represented in cartoon, showing similar folds shared by two structures. (B). Showing superimposed residues of the putative clefts, important for their activity. (C). Pairwise sequence alignment of TPFB_0561 with its template. Residues composing four clefts C1, C2, C3 and C4 are highlighted in teal, blue green and red, respectively. Conserved residues are highlighted in dark grey.
4). Currently, many proteins are the focus of drug discovery that possess RNA-binding properties. The structural information of RNA-binding proteins can be used for highthroughput screening to guide structure based rational drug design to gain potential therapeutic intervention [80]. HP TPFB_0561 HP TPFB_0561 of T. pallidum, a 213-residue long protein, is thought to contain a TPM domain (Pfam: PF04536) and to have a characteristic fold or [81]. This family of proteins are thought to be involved in repair of many proteins in plants as this domain plays a role in the photosystem II (PSII) repair cycle [82]. It may be involved in the regulation of synthesis/degradation of the D1 protein of the PSII core and in the assembly of PSII monomers into dimers in the grana stacks. Structural comparison suggested that this protein might be a phosphatase (Table 1).
We successfully modeled the structure of HP TPFB_0561 using a template, N-terminal domain of protein PG_0361 from P. gingivalis (PDB id: 2KW7, Chain A) to resolve the three-dimensional structure of this HP to explore its molecular function by structural comparison and functional characterization [83]. Energy minimized coordinates show 98.6% (139) of residues in the allowed region of the Ramachandran Plot. The model showed rmsd of 0.037 Å for 117 residues of the target and template despite of a very low sequence similarity (25.5%) (Table 2). The overall structure of the protein has / fold -------- consisting of 4 -helices and 6 strands [82]. Two strands in the model are not present in the template. It has a sandwich structure with 4 strands sandwiched in between two layers of -helices. Of 4 -helices, three (1, 3, 4) are on one side of sandwich and one is on the other side (2) (Fig. 4A). There are four putative clefts which may be
54 Letters in Drug Design & Discovery, 2015, Vol. 12, No. 1
functionally important. These are Cleft 1: Leu123, Glu124, Lys155, Ala156, Arg157, His174, Ile177, Arg178, Ala182; Cleft 2: Glu124, Leu128, Ala140, Arg157, Glu159, Val160, Gly161, Glu165, Asp170; Cleft 3: Arg154, Lys155, Ala182, Phe185, Gln186; and Cleft 4: Leu118, Gln119, Gly120, Asp121, Leu123, Glu153, Lys155. These clefts are thought to be important for their activity. But some of the residues in cleft are predicted to be functionally more crucial. These residues are Glu124 (C1 and C2), Lys155 (C1, C3 and C4), Arg157 (C1 and C2) and Ala182 (C1 and C2), represented in Fig. (4B and 4C). Dali search shows that protein has best similarity to template protein, 2KW7 (Z Score- 29.5, identity- 31%). Other structurally similar proteins are UPF0603 protein AT1G54780, chloroplastic (PDB id: 3PW9, Z Score- 17.9, identity- 21%); UPF0603 protein AT1G54780, chloroplastic (PDB id: 3PTJ, Z Score- 17.9, identity- 20%) and UPF0603 protein AT1G54780, chloroplastic (PDB id: 3PVH, Z Score17.8, identity- 21%). Profunc results suggest that this protein presumably plays a significant role in the de novo pyrimidine synthesis. All these results finally suggest that HP TPFB_0561 may be a part of pyrimidine synthesis mechanism. HP TPFB_0561 was also found to be virulent as shown in the Table 4. HP TPFB_0561 gene is found to be the part of an operon which encodes subunits of aspartate carbamoyl transferase which plays role in de novo pyrimidine synthesis. Identification of the limiting nutrients and of bacterial genes that are critical for their growth can pinpoint biosynthesis and acquisition strategies that are decisive during the bacteremic stage of infection. Enzymes that are critical for survival and proliferation of pathogenic bacteria can acts as a potential target for treatment of such infection [84]. The de novo nucleotide biosynthesis represents the single most critical metabolic function for bacterial growth in many diseases hence corresponding enzyme may acts as a putative antibiotic targets for the treatment of such infections [85]. HP TPFB_0032 The HP TPFB_0032 is 244 residues long protein from T. palidium. Its sequence analysis suggest that this protein may have RNA methyltransferase activity (Pfam: PF04452; TIGR00046: RNA methyltransferase, RsmE family; PTHR30027: rRNA small subunit methyltransferase; SSF75217: alpha/beta knot superfamily). It may have a role in nucleic acid methylation as evident from the gene ontology based annotations which predict them to be involved in RNA processing as a methyltransferase (Table 1). Structure of HP TPFB_0032 was modeled using hypothetical 16S ribosomal RNA methyltransferase RsmE (PDB id: 4J3C, Chain B) as a template. 99.1% (221) of residues are present in the allowed region of the Ramachandran Plot. Despite of a lower sequence similarity with the template structure (20.48%) a rmsd of 0.079Å was observed when 146 residues were aligned between target and template, indication a close structural resemblance and hence function. The structure of HP TPFB_0032 contains two domains; a Nterminal RNA-binding domain, and other is alpha/beta
Sinha et al.
knot in a trefoil fold which forms the major active site [86]. The overall fold is -------------- with eight -helices and seven -strands (Fig. 5A). The N-terminal domain consists of five twisted -strands and -helix. Rest forms the deep trefoil folded structure. Active site residues are predicted as Gly199, Glu201, Arg202, Gly203, Met222, Gly223, Arg225, Lys227 and Thr231, illustrated in the (Fig. 5B and 5C) and listed in Table 3. A conserved glutamate has previously been shown to be involved in catalysis in tRNA methyltransferases [87]. Proteins of similar folds with this HP was searched by the DALI, showing close matches with several putative methyltransferases such as rRNA small subunit methyltransferase (PDB id: 4J3C, Z Score- 32.4, identity- 21%), hypothetical protein HI0303 (PDB id: 1VHY, Z Score- 22, identity- 18%) and another rRNA small subunit methyltransferase (PDB id: 1TVI, Z Score- 21.9, identity- 20%). Profunc server predicts that HP TPFB_0032 may be involved in methyl transfer in RNA based on their high sequence and structure similarity with that of many rRNA methyltransferase enzymes. Their ligand binding site also has similarity with the ligand binding site of rRNA methyltransferase (PDB id: 2EGV, 1VHK and 2EGW). It is also evident from the sequence based virulence prediction that HP TPFB_0032 is a virulent proteins and may play significant role in pathogenesis (Table 4). In addition to its roles in ribosomal function and fidelity, HP TPFB_0032 also affects the response to antibiotics because of its structural resemblance with other proteins possessing similar function. The crystal structure (with antibiotics hygromycin B and ribosome complex) clearly indicates a sequencespecific binding. Furthermore, drug interacts with several conserved nucleotides in the helix 44 of 16S rRNA [88]. The emergence of drug-resistant strains is one of the impediments to curing bacterial pathogenesis. Several ribosomal methyltransferases have already been extensively studied for their role in methylations of ribosome function and drug resistance [89-91]. We hope the characterization of HP TPFB_0032 may further adds a new target against bacterial drug resistance. TPFB_0474 HP TPFB_0474 is 244 residues long protein and presumably acts as transcriptional regulator because it belongs to the TACO1-likeprotein family as indicated by various protein family database searches (Pfam: PF01709, TACO-1 like; TIGR01033: DNA-binding regulatory protein, YebC/PmpR family; MF_00693: Transcrip_reg_TACO1; SSF75625: YebC-like superfamily). The GO annotations of the sequence, also hint them to have DNA-binding activity and regulating transcription (Table 1). Three-dimensional structure of HP TPFB_0032 was predicted by using a hypothetical protein Aq1575 from Aquifex aeolicus (PDB id: 1LFP, Chain A) as a template. Both proteins share a sequence identity of 52.28% and rmsd of 0.063Å for 223 residues (Table 2). Overall structure of thisprotein having three domains; a N-terminal helix bundle of three helices forming domain 1, domain 2 comprising of
Structure Based Functional Annotation of Putative Conserved
Letters in Drug Design & Discovery, 2015, Vol. 12, No. 1
55
Fig. (5). (A). Structure of TPFB_0032 overlaid with its template (PDB id: 4J3C) represented in cartoon. Structure contains two distinct domains (N-terminal RNA binding and C-terminal Trefoil domain). (B). Aligned structures of model and template showing binding residues in ball and stick. (C). Pairwise sequence alignment of TPFB_0032 with its template. Residues involved in binding and dimerization are shown in blue and green, respectively. Conserved residues are highlighted in dark grey.
two layer () sandwich with an unknown topology of central -sheet, and domain 3 of similar two layer () sandwich [92]. Structure of HP TPFB_0032 has an -helix surrounded by -strands and other helices. They have -------------- topology, with nine and seven secondary structures. This protein has similar putative active site (PAS1, PAS2) residues. Two predicted active sites residue are Lys30, Tyr87, Asp104, Arg108, Asp229 forming PAS1 and Glu88, Tyr90 and Tyr132 forming PAS 2 as depicted in the (Fig. 6) and listed in Table 3.
evident from the Profunc analysis that HP TPFB_0032 belongs to the family of YebC protein family and its sequence is similar to transcription regulators. This protein also share structural similarity with minor core protein lambda 3 and regulatory protein E2. When tested for virulence, we found that the protein is a non-virulent as all the values are negative (Table 4).
Dali search listed few close structures such as HP Aq1575 (PDB id: 1LFP, Z Score- 41.2, identity- 52%), HP HP0162 (PDB id: 1MW7, Z Score- 24.7, identity- 34%), protein yebC (PDB id: 1KON, Z Score- 22.8, identity- 45%), and transcriptional regulatory protein CBU_1566 (PDB id: 4F3Q, Z Score-21.6, identity- 52%). Profunc server was also used to confirm the function of HP TPFB_0032. It was
Here we performed an extensive sequence and structure analysis to predict the function of six HPs from the T. palidium. We found that these proteins possess characteristics of metal dependent endonuclease and NTP transferase, and is involved in protein repair, RNA methylation and transcription regulation function. We successfully modeled the threedimensional structure of all six proteins and analyzed further
CONCLUSION
56 Letters in Drug Design & Discovery, 2015, Vol. 12, No. 1
Sinha et al.
Fig. (6). (A). Structure of TPFB_0474 aligned with its template (PDB id: 1LFP), represented in cartoon model depicting three domains present in the protein. (B). Aligned structure comparing two putative active sites (PAS1; Cyan-Pink and PAS2; Purple-Orange) residues represented as ball and stick. (C). Sequence alignment of TPFB_0474 with its template. PAS1 and PAS2 residues are shaded in teal and red, respectively. Conserved residues are highlighted in dark grey.
to gain a functional insight. Structure based functional annotation further suggest that HP TPFB_0650 has metal dependent ribonucleases activity, HP TPFB_0921a catalyzes conversion of UTP and Glc-1-PO4 to UDP-Glc and PyroPO4, HP TPFB_0494 is involved in flagellum formation and motility, HP TPFB_0561 plays role in de novo pyrimidine synthesis, TPFB_0032 transfers methyl to nucleic acid more specifically RNA, and HP TPFB_0474 may act as transcription regulator. Homology modeling helped us to understand the structure of proteins precisely to predict new functions, and provided a root to design drugs based on structural knowledge. Although the exact molecular function for a HP is not immediately evident, its structure provides a framework to deduce the exact molecular function based on conserved residues or folds. Our study facilitates a rapid identification of the hidden function of HPs which is a potential therapeutic targets and may play a significant role in better understanding of host-pathogen interactions. Once these HPs are established as a novel drug/vaccine targets, further research for new inhibitors and vaccines can be conducted.
CONFLICT OF INTEREST The authors confirm that this article content has no conflict of interest. ACKNOWLEDGEMENTS This work is supported by the Indian Council of Medical Research (BIC/12(04)/2012) to MIH and FA. REFERENCES [1] [2]
[3] [4] [5]
Lafond, R. E.; Lukehart, S. A., Biological basis for syphilis. Clin. Microbiol. Rev., 2006, 19 (1), 29-49. Weinstock, G. M.; Hardham, J. M.; McLeod, M. P.; Sodergren, E. J.; Norris, S. J., The genome of Treponema pallidum: new light on the agent of syphilis. FEMS. Microbiol. Rev., 1998, 22 (4), 323-332. Radolf, J. D.; Steiner, B.; Shevchenko, D., Treponema pallidum: doing a remarkable job with what it's got. Trends Microbiol., 1999, 7 (1), 7-9. Peeling, R. W.; Hook, E. W., 3rd, The pathogenesis of syphilis: the Great Mimicker, revisited. J. Pathol. 2006, 208 (2), 224-232. Stamm, W. E.; Handsfield, H. H.; Rompalo, A. M.; Ashley, R. L.; Roberts, P. L.; Corey, L., The association between genital ulcer
Structure Based Functional Annotation of Putative Conserved
[6]
[7]
[8] [9]
[10] [11]
[12]
[13] [14] [15]
[16]
[17]
[18]
[19] [20]
[21]
[22] [23]
[24]
[25]
disease and acquisition of HIV infection in homosexual men. JAMA., 1988, 260 (10), 1429-1433. Walker, E. M.; Arnett, J. K.; Heath, J. D.; Norris, S. J., Treponema pallidum subsp. pallidum has a single, circular chromosome with a size of approximately 900 kilobase pairs. Infect. Immun., 1991, 59 (7), 2476-2479. Fraser, C. M.; Norris, S. J.; Weinstock, G. M.; White, O.; Sutton, G. G.; Dodson, R.; Gwinn, M.; Hickey, E. K.; Clayton, R.; Ketchum, K. A.; Sodergren, E.; Hardham, J. M.; McLeod, M. P.; Salzberg, S.; Peterson, J.; Khalak, H.; Richardson, D.; Howell, J. K.; Chidambaram, M.; Utterback, T.; McDonald, L.; Artiach, P.; Bowman, C.; Cotton, M. D.; Fujii, C.; Garland, S.; Hatch, B.; Horst, K.; Roberts, K.; Sandusky, M.; Weidman, J.; Smith, H. O.; Venter, J. C., Complete genome sequence of Treponema pallidum, the syphilis spirochete. Science, 1998, 281 (5375), 375-388. Galperin, M. Y.; Koonin, E. V., 'Conserved hypothetical' proteins: prioritization of targets for experimental study. Nucleic Acids Res., 2004, 32 (18), 5452-5463. Kumar, K.; Prakash, A.; Tasleem, M.; Islam, A.; Ahmad, F.; Hassan, M. I., Functional annotation of putative hypothetical proteins from Candida dubliniensis. Gene, 2014, 543 (1), 93-100. Shahbaaz, M.; Hassan, M. I.; Ahmad, F., Functional annotation of conserved hypothetical proteins from Haemophilus influenzae Rd KW20. PLoS ONE, 2013, 8 (12), e84263. Shahbaaz, M.; Ahmad, F.; Hassan, M. I., Structure-based functional annotation of putative conserved proteins having lyase activity from Haemophilus influenzae. 3Biotech, 2014, 1-20. [DOI 10.1007/s13205-014-0231-z]. Hassan, M. I.; Kumar, V.; Singh, T. P.; Yadav, S., Structural model of human PSA: a target for prostate cancer therapy. Chem. Biol. Drug Des., 2007, 70 (3), 261-267. Bork, P.; Koonin, E. V., Predicting functions from protein sequences--where are the bottlenecks? Nat. Genet., 1998, 18 (4), 313-318. Karp, P. D., What we do not know about sequence analysis and sequence databases. Bioinformatics, 1998, 14 (9), 753-754. Hassan, M. I.; Ahmad, F., Structural diversity of class i MHC-like molecules and its implications in binding specificities. In Adv. Protein Chem. Struct. Biol., 2011; Vol. 83, pp 223-270. Hassan, M. I.; Bilgrami, S.; Kumar, V.; Singh, N.; Yadav, S.; Kaur, P.; Singh, T. P., Crystal Structure of the Novel Complex Formed between Zinc 2-Glycoprotein (ZAG) and Prolactin-Inducible Protein (PIP) from Human Seminal Plasma. J. Mol. Biol., 2008, 384 (3), 663-672. Weibel, E. R.; Taylor, C. R.; Hoppeler, H., The concept of symmorphosis: a testable hypothesis of structure-function relationship. Proc. Natl. Acad. Sci U. S. A., 1991, 88 (22), 1035710361. Tasleem, M.; Ishrat, R.; Islam, A.; Ahmad, F.; Hassan, M. I., Structural Characterization, Homology Modeling and Docking Studies of ARG674 Mutation in MyH8gene Associated with Trismus-Pseudocamptodactyly Syndrome. Lett. Drug Des. Discov., 2014, 11, DOI: 10.2174/1570180811666140717190217. Hegyi, H.; Gerstein, M., The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J. Mol. Biol, 1999, 288 (1), 147-164. Hassan, M. I.; Waheed, A.; Grubb, J. H.; Klei, H. E.; Korolev, S.; Sly, W. S., High resolution crystal structure of human glucuronidase reveals structural basis of lysosome targeting. PLoS ONE, 2013, 8 (11) e79687. Zaidi, S.; Hassan, M. I.; Islam, A.; Ahmad, F., The role of key residues in structure, function, and stability of cytochrome-c. Cell. Mol. Life Sci., 2014, 71 (2), 229-255. Baker, D.; Sali, A., Protein structure prediction and structural genomics. Science, 2001, 294 (5540), 93-96. O'Donoghue, P.; Amaro, R. E.; Luthey-Schulten, Z., On the structure of hisH: protein structure prediction in the context of structural and functional genomics. J. Struct. Biol., 2001, 134 (2-3), 257-268. Looger, L. L.; Hellinga, H. W., Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: implications for protein design and structural genomics. J. Mol. Biol., 2001, 307 (1), 429-445. Prakash, A.; Kumar, K.; Islam, A.; Hassan, M. I.; Ahmad, F., Receptor Chemoprint Derived Pharmacophore Model for
Letters in Drug Design & Discovery, 2015, Vol. 12, No. 1
[26]
[27]
[28]
[29]
[30] [31]
[32]
[33]
[34]
[35] [36]
[37]
[38]
[39]
[40]
[41] [42]
57
Development of CAIX Inhibitors. J. Carcinog. Mutagen., 2013, S8, 003. Singh, A.; Thakur, P. K.; Meena, M.; Kumar, D.; Bhatnagar, S.; Dubey, A. K.; Hassan, M. I., Interaction between Basic 7S Globulin and Leginsulin in Soybean [Glycine max]: A Structural Insight. Lett. Drug Des. Discov., 2014, 11 (2), 231-239. Thakur, P. K.; Hassan, M. I., Discovering a potent small molecule inhibitor for gankyrin using de novo drug design approach. Int. J. Comp. Biol. Drug Des., 2011, 4 (4), 373-386. Thakur, P. K.; Kumar, J.; Ray, D.; Anjum, F.; Hassan, M. I., Search of potential inhibitor against New Delhi metallo-betalactamase 1 from a series of antibacterial natural compounds. J. Nat. Sci. Biol. Med., 2013, 4 (1), 51-56. Thakur, P. K.; Prakash, A.; Khan, P.; Fleming, R. E.; Waheed, A.; Ahmad, F.; Hassan, M. I., Identification of Interfacial Residues Involved in Hepcidin-Ferroportin Interaction. Lett. Drug Des. Discov., 2014, 11 (3), 363-374. Hassan, M. I.; Kumar, V.; Somvanshi, R. K.; Dey, S.; Singh, T. P.; Yadav, S., Structure-guided design of peptidic ligand for human prostate specific antigen. J. Pept. Sci., 2007, 13 (12), 849-855. Zobanikova, M.; Strouhal, M.; Mikalova, L.; Cejkova, D.; Ambrozova, L.; Pospisilova, P.; Fulton, L. L.; Chen, L.; Sodergren, E.; Weinstock, G. M.; Smajs, D., Whole genome sequence of the Treponema Fribourg-Blanc: unspecified simian isolate is highly similar to the yaws subspecies. PLoS. Negl. Trop. Dis., 2013, 7 (4), e2172. Mi, H.; Muruganujan, A.; Casagrande, J. T.; Thomas, P. D., Largescale gene function analysis with the PANTHER classification system. Nat. Protoc., 2013, 8 (8), 1551-1566. Jones, P.; Binns, D.; Chang, H. Y.; Fraser, M.; Li, W.; McAnulla, C.; McWilliam, H.; Maslen, J.; Mitchell, A.; Nuka, G.; Pesseat, S.; Quinn, A. F.; Sangrador-Vegas, A.; Scheremetjew, M.; Yong, S. Y.; Lopez, R.; Hunter, S., InterProScan 5: genome-scale protein function classification. Bioinformatics, 2014, 30, (9), 1236-1240. Punta, M.; Coggill, P. C.; Eberhardt, R. Y.; Mistry, J.; Tate, J.; Boursnell, C.; Pang, N.; Forslund, K.; Ceric, G.; Clements, J.; Heger, A.; Holm, L.; Sonnhammer, E. L.; Eddy, S. R.; Bateman, A.; Finn, R. D., The Pfam protein families database. Nucleic Acids Res., 2014, 40 (Database issue), D290-301. Haft, D. H.; Selengut, J. D.; White, O., The TIGRFAMs database of protein families. Nucleic Acids Res., 2003, 31 (1), 371-373. Pedruzzi, I.; Rivoire, C.; Auchincloss, A. H.; Coudert, E.; Keller, G.; de Castro, E.; Baratin, D.; Cuche, B. A.; Bougueleret, L.; Poux, S.; Redaschi, N.; Xenarios, I.; Bridge, A., HAMAP in 2013, new developments in the protein family classification and annotation system. Nucleic Acids Res., 2013, 41 (Database issue), D584-D589. Sillitoe, I.; Cuff, A. L.; Dessailly, B. H.; Dawson, N. L.; Furnham, N.; Lee, D.; Lees, J. G.; Lewis, T. E.; Studer, R. A.; Rentzsch, R.; Yeats, C.; Thornton, J. M.; Orengo, C. A., New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures. Nucleic Acids Res., 2013, 41 (Database issue), D490-498. Marchler-Bauer, A.; Zheng, C.; Chitsaz, F.; Derbyshire, M. K.; Geer, L. Y.; Geer, R. C.; Gonzales, N. R.; Gwadz, M.; Hurwitz, D. I.; Lanczycki, C. J.; Lu, F.; Lu, S.; Marchler, G. H.; Song, J. S.; Thanki, N.; Yamashita, R. A.; Zhang, D.; Bryant, S. H., CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res., 2013, 41 (Database issue), D348-352. Gough, J.; Karplus, K.; Hughey, R.; Chothia, C., Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol. Biol., 2001, 313 (4), 903-919. Ashburner, M.; Ball, C. A.; Blake, J. A.; Botstein, D.; Butler, H.; Cherry, J. M.; Davis, A. P.; Dolinski, K.; Dwight, S. S.; Eppig, J. T.; Harris, M. A.; Hill, D. P.; Issel-Tarver, L.; Kasarskis, A.; Lewis, S.; Matese, J. C.; Richardson, J. E.; Ringwald, M.; Rubin, G. M.; Sherlock, G., Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet., 2000, 25 (1), 25-29. Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J., Basic local alignment search tool. J. Mol. Biol., 1990, 215 (3), 403410. Soding, J.; Biegert, A.; Lupas, A. N., The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 2005, 33 (Web Server issue), W244-248.
58 Letters in Drug Design & Discovery, 2015, Vol. 12, No. 1 [43]
[44] [45]
[46]
[47] [48]
[49] [50]
[51]
[52] [53]
[54] [55]
[56] [57]
[58] [59] [60] [61]
[62] [63]
[64]
[65]
Bernstein, F. C.; Koetzle, T. F.; Williams, G. J.; Meyer, E. F., Jr.; Brice, M. D.; Rodgers, J. R.; Kennard, O.; Shimanouchi, T.; Tasumi, M., The Protein Data Bank: a computer-based archival file for macromolecular structures. Arch. Biochem. Biophys., 1978, 185 (2), 584-591. Hubbard, T. J.; Ailey, B.; Brenner, S. E.; Murzin, A. G.; Chothia, C., SCOP: a Structural Classification of Proteins database. Nucleic Acids Res., 1999, 27 (1), 254-256. Sillitoe, I.; Cuff, A. L.; Dessailly, B. H.; Dawson, N. L.; Furnham, N.; Lee, D.; Lees, J. G.; Lewis, T. E.; Studer, R. A.; Rentzsch, R.; Yeats, C.; Thornton, J. M.; Orengo, C. A., New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures. Nucleic Acids Res., 2013, 41 (Database issue), D490-498. Mi, H.; Lazareva-Ulitsky, B.; Loo, R.; Kejariwal, A.; Vandergriff, J.; Rabkin, S.; Guo, N.; Muruganujan, A.; Doremieux, O.; Campbell, M. J.; Kitano, H.; Thomas, P. D., The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res., 2005, 33 (Database issue), D284-288. Letunic, I.; Doerks, T.; Bork, P., SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res., 2012, 40 (Database issue), D302-305. Geer, L. Y.; Domrachev, M.; Lipman, D. J.; Bryant, S. H., CDART: protein homology by domain architecture. Genome Res., 2002, 12 (10), 1619-1623. Meinel, T.; Krause, A.; Luz, H.; Vingron, M.; Staub, E., The SYSTERS Protein Family Database in 2005. Nucleic Acids Res., 2005, 33 (Database issue), D226-229. Rappoport, N.; Karsenty, S.; Stern, A.; Linial, N.; Linial, M., ProtoNet 6.0: organizing 10 million protein sequences in a compact hierarchical family tree. Nucleic Acids Res., 2012, 40 (Database issue), D313-320. Cai, C. Z.; Han, L. Y.; Ji, Z. L.; Chen, X.; Chen, Y. Z., SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res., 2003, 31 (13), 3692-3697. Webb, B.; Sali, A., Protein structure modeling with MODELLER. Methods Mol. Biol., 2014, 1137, 1-15. Schwede, T.; Kopp, J.; Guex, N.; Peitsch, M. C., SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res., 2003, 31 (13), 3381-3385. Bordoli, L.; Schwede, T., Automated protein structure modeling with SWISS-MODEL Workspace and the Protein Model Portal. Methods Mol. Biol., 2012, 857, 107-136. Bordoli, L.; Kiefer, F.; Arnold, K.; Benkert, P.; Battey, J.; Schwede, T., Protein structure homology modeling using SWISSMODEL workspace. Nat. Protoc., 2009, 4 (1), 1-13. Kiefer, F.; Arnold, K.; Kunzli, M.; Bordoli, L.; Schwede, T., The SWISS-MODEL Repository and associated resources. Nucleic Acids Res. 2009, 37 (Database issue), D387-392. Laskowski, R. A.; Rullmannn, J. A.; MacArthur, M. W.; Kaptein, R.; Thornton, J. M., AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J. Biomol. NMR., 1996, 8 (4), 477-486. Taylor, W. R.; Orengo, C. A., Protein structure alignment. J. Mol. Biol., 1989, 208 (1), 1-22. Chothia, C.; Lesk, A. M., The relation between the divergence of sequence and structure in proteins. EMBO J., 1986, 5 (4), 823-826. Holm, L.; Rosenstrom, P., Dali server: conservation mapping in 3D. Nucleic Acids Res., 2010, 38 (Web Server issue), W545-549. Laskowski, R. A.; Watson, J. D.; Thornton, J. M., ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res., 2005, 33 (Web Server issue), W89-93. Yu, J.; Zhou, Y.; Tanaka, I.; Yao, M., Roll: a new algorithm for the detection of protein pockets and cavities with a rolling probe sphere. Bioinformatics, 2010, 26 (1), 46-52. Laurie, A. T.; Jackson, R. M., Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics, 2005, 21 (9), 1908-1916. Lomize, M. A.; Pogozheva, I. D.; Joo, H.; Mosberg, H. I.; Lomize, A. L., OPM database and PPM web server: resources for positioning of proteins in membranes. Nucleic Acids Res., 2012, 40 (Database issue), D370-376. Garg, A.; Gupta, D., VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens. BMC Bioinformatics., 2008, 9, 62.
Sinha et al. [66]
[67]
[68]
[69]
[70]
[71]
[72] [73]
[74]
[75]
[76] [77] [78]
[79]
[80] [81]
[82]
[83]
[84]
[85] [86]
Zhan, C.; Fedorov, E. V.; Shi, W.; Ramagopal, U. A.; Thirumuruhan, R.; Manjasetty, B. A.; Almo, S. C.; Fiser, A.; Chance, M. R.; Fedorov, A. A., The ybeY protein from Escherichia coli is a metalloprotein. Acta. Crystallogr. Sect F. Struct. Biol. Cryst. Commun., 2005, 61 (Pt 11), 959-963. Parkes, K. E.; Ermert, P.; Fassler, J.; Ives, J.; Martin, J. A.; Merrett, J. H.; Obrecht, D.; Williams, G.; Klumpp, K., Use of a pharmacophore model to discover a new class of influenza endonuclease inhibitors. J. Med. Chem., 2003, 46 (7), 1153-1164. Kuzuhara, T.; Iwai, Y.; Takahashi, H.; Hatakeyama, D.; Echigo, N., Green tea catechins inhibit the endonuclease activity of influenza A virus RNA polymerase. PLoS Curr., 2009, 1, RRN1052. Kowalinski, E.; Zubieta, C.; Wolkerstorfer, A.; Szolar, O. H.; Ruigrok, R. W.; Cusack, S., Structural analysis of specific metal chelating inhibitor binding to the endonuclease domain of influenza pH1N1 (2009) polymerase. PLoS Pathog., 2012, 8 (8), e1002831. Qi, X.; Lan, S.; Wang, W.; Schelde, L. M.; Dong, H.; Wallat, G. D.; Ly, H.; Liang, Y.; Dong, C., Cap binding and immune evasion revealed by Lassa nucleoprotein structure. Nature, 2010, 468 (7325), 779-783. Jiang, X.; Huang, Q.; Wang, W.; Dong, H.; Ly, H.; Liang, Y.; Dong, C., Structures of arenaviral nucleoproteins with triphosphate dsRNA reveal a unique mechanism of immune suppression. J. Biol. Chem., 2013, 288 (23), 16949-16959. Wu, H. Y.; Liu, M. S.; Lin, T. P.; Cheng, Y. S., Structural and functional assays of AtTLP18.3 identify its novel acid phosphatase activity in thylakoid lumen. Plant Physiol., 157 (3), 1015-1025. Kim, H.; Choi, J.; Kim, T.; Lokanath, N. K.; Ha, S. C.; Suh, S. W.; Hwang, H. Y.; Kim, K. K., Structural basis for the reaction mechanism of UDP-glucose pyrophosphorylase. Mol. Cells,2010, 29 (4), 397-405. Clatworthy, A. E.; Pierson, E.; Hung, D. T., Targeting virulence: a new paradigm for antimicrobial therapy. Nat. Chem. Biol., 2007, 3 (9), 541-548. Caly, D. L.; O'Toole, P. W.; Moore, S. A., The 2.2-A structure of the HP0958 protein from Helicobacter pylori reveals a kinked antiparallel coiled-coil hairpin domain and a highly conserved ZNribbon domain. J. Mol. Biol., 2010, 403 (3), 405-419. Hall, T. M., Multiple modes of RNA recognition by zinc finger proteins. Curr. Opin. Struct. Biol., 2005, 15 (3), 367-373. Ellis, J. J.; Broom, M.; Jones, S., Protein-RNA interactions: structural analysis and functional classes. Proteins, 2007, 66 (4), 903-911. Jones, S.; Daley, D. T.; Luscombe, N. M.; Berman, H. M.; Thornton, J. M., Protein-RNA interactions: a structural analysis. Nucleic Acids Res., 2001, 29 (4), 943-954. Caly, D. L.; O'Toole, P. W.; Moore, S. A., The 2.2-A structure of the HP0958 protein from Helicobacter pylori reveals a kinked antiparallel coiled-coil hairpin domain and a highly conserved ZNribbon domain. J. Mol. Biol., 2010, 403 (3), 405-419. DeJong, E. S.; Luy, B.; Marino, J. P., RNA and RNA-protein complexes as targets for therapeutic intervention. Curr. Top. Med. Chem., 2002, 2 (3), 289-302. Boulin, T.; Rapti, G.; Briseno-Roa, L.; Stigloher, C.; Richmond, J. E.; Paoletti, P.; Bessereau, J. L., Positive modulation of a Cys-loop acetylcholine receptor by an auxiliary transmembrane subunit. Nat. Neurosci., 2012, 15 (10), 1374-1381. Wu, H. Y.; Liu, M. S.; Lin, T. P.; Cheng, Y. S., Structural and functional assays of AtTLP18.3 identify its novel acid phosphatase activity in thylakoid lumen. Plant Physiol., 2011, 157 (3), 10151025. Eletsky, A.; Acton, T. B.; Xiao, R.; Everett, J. K.; Montelione, G. T.; Szyperski, T., Solution NMR structures reveal a distinct architecture and provide first structures for protein domain family PF04536. J. Struct. Funct. Genomics, 2012, 13 (1), 9-14. Christopherson, R. I.; Lyons, S. D.; Wilson, P. K., Inhibitors of de novo nucleotide biosynthesis as drugs. Acc. Chem. Res., 2002, 35 (11), 961-971. Samant, S.; Lee, H.; Ghassemi, M.; Chen, J.; Cook, J. L.; Mankin, A. S.; Neyfakh, A. A., Nucleotide biosynthesis is critical for growth of bacteria in human blood. PLoS Pathog., 2008, 4 (2), e37. Forouhar, F.; Shen, J.; Xiao, R.; Acton, T. B.; Montelione, G. T.; Tong, L., Functional assignment based on structural analysis: crystal structure of the yggJ protein (HI0303) of Haemophilus
Structure Based Functional Annotation of Putative Conserved
[87]
[88]
[89]
Letters in Drug Design & Discovery, 2015, Vol. 12, No. 1
influenzae reveals an RNA methyltransferase with a deep trefoil knot. Proteins, 2003, 53 (2), 329-332. Kumar, A.; Kumar, S.; Taneja, B., The structure of Rv2372c identifies an RsmE-like methyltransferase from Mycobacterium tuberculosis. Acta. Crystallogr. D. Biol. Crystallogr. 2014, 70 (Pt 3), 821-832. Brodersen, D. E.; Clemons, W. M., Jr.; Carter, A. P.; MorganWarren, R. J.; Wimberly, B. T.; Ramakrishnan, V., The structural basis for the action of the antibiotics tetracycline, pactamycin, and hygromycin B on the 30S ribosomal subunit. Cell, 2000, 103 (7), 1143-1154. Kumar, A.; Saigal, K.; Malhotra, K.; Sinha, K. M.; Taneja, B., Structural and functional characterization of Rv2966c protein reveals an RsmD-like methyltransferase from Mycobacterium
Received: April 21, 2014
[90]
[91]
[92]
Revised: July 23, 2014
59
tuberculosis and the role of its N-terminal domain in target recognition. J. Biol. Chem., 2011, 286 (22), 19652-19661. Okamoto, S.; Tamaru, A.; Nakajima, C.; Nishimura, K.; Tanaka, Y.; Tokuyama, S.; Suzuki, Y.; Ochi, K., Loss of a conserved 7methylguanosine modification in 16S rRNA confers low-level streptomycin resistance in bacteria. Mol. Microbiol., 2007, 63 (4), 1096-1106. Johansen, S. K.; Maus, C. E.; Plikaytis, B. B.; Douthwaite, S., Capreomycin binds across the ribosomal subunit interface using tlyA-encoded 2'-O-methylations in 16S and 23S rRNAs. Mol. Cell, 2006, 23 (2), 173-182. Shin, D. H.; Yokota, H.; Kim, R.; Kim, S. H., Crystal structure of conserved hypothetical protein Aq1575 from Aquifex aeolicus. Proc. Natl. Acad. Sci. U. S. A., 2002, 99 (12), 7980-7985. Accepted: August 05, 2014