Structure-function annotation and phylogenetic strategy of nifH domain ...

12 downloads 222 Views 767KB Size Report
protein sequences available in the databases in order to obtain similarity and to evolve a phylogram. Similarity search for the. NifH/frxC family sequence using ...
Indian Journal of Biotechnology Vol 8, January 2009, pp 46-52

Structure-function annotation and phylogenetic strategy of nifH domain of a cyanobacterium—Chlorogloeopsis sp. P T V Lakshmi1*, S Uma Maheswari1 and A Annamalai2 1

Department of Bioinformatics, Bharathiar University, Coimbatore 641 046, India Schools of Biotechnology, Institute of Technology and Science, Karunya University, Coimbatore 641 114, India

2

Received 26 March 2007; revised 25 April 2008; accepted 27 June 2008 Exploration of the available protein sequences of the Cyanobacterial genera Chlorogloeopsis of Stigonematales from the NCBI database to annotate structural and functional domains through BLOCKS SEARCHER and SWISS PDB VIEWER (SWISSMODEL) revealed eleven different important domains, of which nitrogen fixing domain showed high frequency of occurrence (18.18%), among the sequences explored. Therefore, it was particularly compared with other protein sequences available in the databases in order to obtain similarity and to evolve a phylogram. Similarity search for the NifH/frxC family sequence using BLAST P showed that it is similar to nitrogenase protein of the genera Mastigocladus, Nostoc, Anabaena and other uncultured cyanobacterium. Phylogenetic tree based on the similarities obtained from BLAST P indicates the evolutionary relationships. Keywords: BLAST P, Cyanobacteria, Chlorogloeopsis, nifH domain, phylogram

Introduction A complete understanding, in particular, of the complex relationships between protein sequence and its structure and function is critical. Structure determination typically follows the functional characterization of a protein in order to uncover the details of its molecular mechanism at the atomic level1,2. Since, the novelty in protein function often arises as a result of the gain or loss of domains or by re-shuffling existing domains, they are defined as structurally and functionally discrete units along the linear amino acid sequence. Moreover, domains are defined as the units of evolution. Both structural and functional similarity between proteins needs to be analyzed at the domain level3, especially with BLAST P4 (a tool for performing sequence alignment and listing all the best matching sequences from the protein sequence database), to establish the evolutionary relationship that are undetectable by sequence-based methods5. Cyanobacteria are oxygenic, photosynthetic prokaryotes that are dominant in aquatic and terrestrial environments6, and are found in virtually every major habitat type including deserts, fresh water, marine environments, temperate environments ______________ *Author for correspondence: Tel: 91-422-2428283; Fax: 91-422-2424387 E-mail: [email protected]

as well as in extreme environments, such as, hot springs and Antartic lakes6. Cyanobacteria originated 3 billion years ago and are supposed to have contributed significantly to the oxygenation of the primitive earth atmosphere. It has been proposed that cyanobacteria were responsible for converting the ancient earth’s atmosphere to an aerobic one and are identified as a fascinating group of photosynthetic bacteria, fixing atmospheric nitrogen and playing an important role in evolution of life on earth7. Although both academic and applied aspects of the biology of cyanobacterial organisms are studied extensively, not much investigation has been prompted on the genus Chlorogloeopsis, which provoked our interest to explore this organism. Chlorogloeopsis8, a cyanobacterium belonging to the Order Stigonematales and Family Stigonemataceae, is a thalloid form, which is composed of irregularrounded cell-packets in aggregates or uni-seriate to multiseriate short rows of cells (trichomes with 3-20 cells), usually without distinct mucilaginous envelopes, but sometimes single cell-packets enclosed in thin firm sheaths. Heterocysts are terminal and intercalary without akinites formation. Reproduction takes place by groups of two or more cells, or by short few-celled hormogonia. Chlorogloeopsis is extensively used in agricultural fields for the process of nitrogen fixation9 and much

LAKSHMI et al: NifH DOMAIN OF A CYANOBACTERIUM—CHLOROGLOEOPSIS SP.

importance was given to this particular genus because less investigation has been carried out on both the proteomic and genomic studies. Since no functional studies were enumerated for Chlorogloeopsis, in the present investigation, a hypothesis was framed to explore the function of 22 protein sequences available on the National Centre for Biotechnology Information (NCBI) site with an aid of function prediction tool— BLOCKS SEARCHER. A BLOCK performs the local alignments without gaps. Thus, blocks are alignments of fragments (segments) of sequences and are reported to be consistent10. In the present investigation, an attempt has been taken to model the protein sequences of the genus Chlorogloeopsis and to predict 3-D structures for those sequences using the Bioinformatics tool SWISS PDB VIEWER (SWISSMODEL), since no structures were available in the database. The protein structures obtained were functionally annotated to reveal the different domains through BLOCKS SEARCHER and the results highlighted. Since 3-D structures of the protein are not stored in the protein data bank, an attempt to model the 3-D structures was performed using SWISS MODELER, which is programmed to generate models by threading protein primary sequence onto a 3-D template to produce 3-D structures. In general, Cyanobacteria is a potent nitrogen fixer11 and is reported to be one of the largest global suppliers of fixed nitrogen in the environment12. Therefore, much emphasis was given to the nitrogen fixing components of Chlorogloeopsis and to be compared with the protein database sequences using BLAST P for the following reasons: first to determine the biological activity of the nitrogen fixing domain at family level to provide insights of the domain, second to determine the residues that are conserved through evolution, and third to trace or estimate the probability of sequences. Materials and Methods

47

the first block, and a score for that amino acid was obtained from the profile column corresponding to that position. Scores are summed over the width of the alignment following the block search for next position. If the scoring is higher than the blocks in, the query can be hypothesized to be related to other groups corresponding. Since this procedure is carried out exhaustively for all positions of the sequence for all blocks in the database, the best alignments between a sequence and entries in the Blocks database are reported. Thus, pasting one sequence after the other, the domains were identified and the results tabulated. From the functions annotated, the frequency of the common domains identified was calculated using the formula: Frequency percentage of domains

= Frequency in all sequecs × 100 Total no. of sequences

3-D Structure Modeling

Using SWISS PDB VIEWER, which is an automated homology modeling server14 developed at Glaxo SmithKline R&D, Swiss Institute of Bioinformatics (SIB) at the Structural Bioinformatics Group, Biozentrum, Basel, all 22 protein sequences were modeled as follows: • Loading of raw protein sequences (1D) in SWISS PDB VIEWER to find out the appropriate ExPdb templates. • The modeling request was submitted to the SWISSMODEL (an automated comparative protein modeling server) and the result obtained as SWISS MODEL project file was downloaded and viewed in the program Deep–View of SWISS PDB VIEWER. • The functional domains obtained from BLOCKS SEARCHER were highlighted manually by selecting those corresponding residues.

Sequence Retrieval

The available protein sequences (22 sequences) of Chlorogloeopsis were obtained from the NCBI site. Function Annotation

Annotation was performed for each sequence using the BLOCKS SEARCHER tool developed at Fred Hutchinson Cancer Research Center, USA. A BLOCK performs the local alignments without gaps13 and searches the database of blocks by aligning the first position of the sequence with the first position of

Evolution of Nif Domains

BLAST P (Basic Local Alignment Search Tool— Protein) was used to compare and determine the similar sequences for the identified nifH family domains (AAB37315 and AAB37316). Appropriate searches were made and the evolution was traced by constructing a phylogenetic tree using BLAST P2.2.18 versions available at National Center for Biotechnology Information (NCBI), USA. It provided the possibility of determining the relationship

48

INDIAN J BIOTECHNOL, JANUARY 2009

between these organisms. A phylogram was constructed using a number of bootstrap trials which provided the bootstrap tree. Results and Discussion Results indicate that all sequences can be functionally annotated. Total number of protein sequences observed for Chlorogloeopsis sp. in the database was 22. Since the structural details was none in the PDB bank, an attempt to model the structures showed an interesting patterns (Fig. 1) and their functions annotated using a Block Searcher revealed the presence of number of domains (Table 1). Of which the most significant ones were calculated for the frequency percentage and highlighted (Table 2). The observations revealed that among the sequences annotated, the nitrogen fixing domains was found to be predominant in almost all the sequences analyzed. However, highest nitrogenase domain frequency (18.18%) was found in about 4 sequences with the significance Block E-value of 3.1e-40 (Table 2). The protein sequences of AAB37315 and

AAB37316 occupied the positions between 25 to 72 residues, comprising of 47 residues (GDLELEEVMLTGFRGVKCVESGGPEPGVGCAG RGIITAINFLEENGAY) with glycine as the initiator and tyrosine as the terminator, while the sequences of AAP40828 and AAN63671 occupied the nitrogenase signatures in between 76 to 91 (PIKDMIHISHGPVGCG) with proline as the initiator and glycine as the terminator having the Block Evalue of 6.6e-12, (Table 1). The second most significant block annotated was identified as ferredoxin, which occurred 13.63% frequently compared to the other domains (Table 2) and showed the E-value of 0.021. It occupied the positions between 13 and 52 in the block and showed the amino acid profile of CRAGACSTCAGKIKSG (AN P00247, 0812213A & 0805212A) having cystein as the initiating residue and glycine as the terminator (Tables 1 & 2). It revealed the domains of A-adrenodoxin, B-ferridoxin, C-ribosomal protein S6e, D-olfactomedin like protein and E-respiratory chain NADH dehydrogenase (Fig. 1). Chlorogloeopsis contains

Fig. 1—Modeled 3-D structures of Chlorogloeopsis sp. highlighting the pockets of functionally annotated domains in their respective locations. *Indicates the sequences coding for nitrogen fixation

LAKSHMI et al: NifH DOMAIN OF A CYANOBACTERIUM—CHLOROGLOEOPSIS SP. Table 1— Functionally annotated domains of Chlorogloeopsis sp. Accession no.

Amino Annotated functional domains (signatures) acids using BLOCKS SEARCHER

CAC60249

546

AMP-binding signature

BAC76798

295

BAC76788

396

BAC21117

396

BAC21140

188

*AAN63671

491

BAC10517

295

AAG01688

198

*AAB37316

108

*AAB37315

108

AAL76316

366

• RNA polymerase Rpb1, domain1 • RNA polymerase Rpb1, domain3 • RNA polymerase I subunit A, N-terminal • RNA polymerase, alpha subunit • RNA polymerase Rpb1, domain4 • RNA polymerase Rpb1, domain 5 • CPL • Desmocollin signature • Molluscan insulin-related peptide • DNA gyrase, subunit B, C-terminal • DNA gyrase subunit B signature • DNA topoisomerase II • Histidine kinase, dimerisation • Bacterial sensor protein C-terminal • Cell division protein FtsA • DNA gyrase, subunit B, C-terminal • DNA gyrase subunit B signature • DNA topoisomerase II • Histidine kinase, dimerisation • Bacterial sensor protein C-terminal • Sigma-70, non-essential region • Sigma-70 factor, region 1.1 • Sigma-70 region 1.2 • Sigma-70 region 3 • Sigma-70 factor family • Sigma-70 region 4 • Sigma-70 region 2 • Nitrogenase component 1 alpha and b • B-lytic metalloendopeptidase (M23) • Show voltage-gated K+ channel family • RNA polymerase Rpb1, domain1 • RNA polymerase Rpb1, domain3 • RNA polymerase I subunit A, N-terminal • RNA polymerase, alpha subunit • RNA polymerase Rpb1, domain 4 • RNA polymerase Rpb1, domain 5 • Desmocollin signature • Molluscan insulin-related peptide • KaiC • Protein of unknown function DUF1605 • Adenylate/cytidine kinase, N-terminal • Uridine kinase signature • Cell division FtsK/SpoIIIE protein Intimin signature • NifH/frxC family Ankyrin repeat signature • NifH/frxC family Ankyrin repeat signature • Poly-beta-hydroxybutyrate polymerase • DBP10CT • ParB-like nuclease Nitrogen regulatory AreA, N-terminal Contd

49

Table 1— Functionally annotated domains of Chlorogloeopsis sp. Contd Accession Amino Annotated functional domains (signatures) no. acids using BLOCKS SEARCHER BAE80667 394 Ribulose bisphosphate carboxylase • Thionin • TatD deoxyribonuclease • Coenzyme Q biosynthesis Coq4 394 • Ribulose bisphosphate carboxylase BAE80666 • Thionin • TatD deoxyribonuclease • Coenzyme Q biosynthesis Coq4 BAE80655 247 • Spectrin repeat superantigen • Apoptosis regulator Bcl-2 protein • RelB antitoxin 247 • Spectrin repeat superantigen BAE80654 • Apoptosis regulator Bcl-2 protein • RelB antitoxin *AAP40828 497 • Nitrogenase component 1 alpha and beta • B-lytic metalloendopeptidase (M23) • Show voltage-gated K+ channel family BAC44901 198 • Sigma-70, non-essential region • Sigma-70 factor, region 1.1 • Sigma-70 region 1.2 • Sigma-70 factor family • Sigma-70 region 4 • Sigma-70 region 2 P00247 98 • 2Fe-2S ferredoxin • Ferredoxin • Respiratory-chain NADH dehydrogenase • Adrenodoxin • Ribosomal protein S6e • Olfactomedin-like 0812213A 98 • 2Fe-2S ferredoxin • Ferredoxin • Respiratory-chain NADH dehydrogenase • Adrenodoxin • Ribosomal protein S6e Olfactomedin-like 0805212A 98 • 2Fe-2S ferredoxin • Ferredoxin • Respiratory-chain NADH dehydrogenase • Adrenodoxin • Ribosomal protein S6e Olfactomedin-like AAA20897 99 • RNA-binding region RNP-1 • RBM1CTR Polyadenylate-binding protein AAK38139 364 • Poly-beta-hydroxybutyrate • Epoxide hydrolase signature • Alpha/beta hydrolase fold signature • ParB-like nuclease DBP10CT Chlorogloeopsis obtained from NCBI represented with accession numbers, number of amino acid residues and functional domains annotated through BLOCKS SEARCHER tool and their domain sequences. *Denotes the sequences coding for nitrogen fixation

INDIAN J BIOTECHNOL, JANUARY 2009

50

ferredoxin, which is an iron containing protein that mediates electron transfer in a range of metabolic reactions15. With the highly significant nitrogenase and ferredoxins domains, the other domains with the frequency of 9.09% were identified (Table 2). It included RNA polymerase (BAC76798 & BAC10517); DNA gyrase (BAC76788 & BAC21117); Sigma 70, a protein component of RNA polymerase (BAC21140 & BAC44901) that determines the specific site on DNA where transcription begins; Spectrin repeat super antigen (BAE80655 & BAE80654); Ribulose biphosphate, most commonly known as RuBisCO (BAE80667 & BAE80666), an enzyme that is used in the Calvin cycle to catalyze the first major step of carbon fixation; and Poly beta hydroxybutyrate (AAL76316 & AAK38139), which is a carbon and energy reserve polymer produced in some prokaryotes when carbon sources are plentiful and other nutrients, such as, nitrogen, phosphate, oxygen or sulphur becomes limiting16 (Fig. 1). It was followed by annotation of three more domains, such as, AMP (CAC60249), RNA (AAA20897) binding signatures and KaiC (AAG01688), which showed the frequency of 4.54% among the sequences. AMP binding signature is reported to have a major role in downstream phosphate regulation17,18, while KaiC domain is essential for circadian clock in cyanobacteria (Table 2). However, the highly significant (18.18%) and predominantly occurring “nifH domain”, comprising of 47 amino residues in between 25th and Table 2—Frequency percentage of common domains derived using BLOCKS SEARCHER for Chlorogloeopsis sp. No.

Annotated Domains

1

Nitrogen Fixing/ Nitrogenase Ferredoxin RNA Polymerase DNA gyrase Sigma 70 Spectrin Repeat Super Antigen Ribulose bi phosphate Poly beta hydroxybutyrate AMP binding Signature RNA binding Signature Kai C

2 3 4 5 6 7 8 9 10 11

Frequency in all sequences 4

Frequency % of domains 18.18

3 2 2 2 2

13.63 9.09 9.09 9.09 9.09

2

9.09

2

9.09

1

4.54

1

4.54

1

4.54

72nd positions as compared through BLAST P, revealed to be similar to nitrogenase protein of the genera Mastigocladus, Nostoc, Anabaena and other uncultured cyanobacteria. Phylogenetic tree (Fig. 2) indicated the evolutionary relationships among the other organisms. It has been postulated that some of the morphological characters that are used to delineate the cyanobacteria may not reflect true evolutionary relationships within the heterocystous lineage19,20, because ecologically nitrogen fixation is important as it allows cyanobacteria to invade or dominate nitrogen deficient environments. Moreover, nitrogen fixation (nif) genes have been highly conserved throughout evolution, even though they are widely distributed through out the bacterial and archaeal genera21 and their molecular phylogenies have been employed to resolve evolutionary relationships with in the cyanobacteria1. Thus, attempts made in the present investigation could provide us an insight into the understanding that the structure theoretically or computationally predicted should be directly tested in the laboratory. This would result in creating refinements in theoretical models, which in turn could become the starting point for further experiments and new results that lead to further model refinement3 to be exploited in the industries. Likewise, the homologous relationship between query and target protein can be inferred based on significant sequence similarity over most of the aligned sequences, which also directs the possibility of predicting the biochemical function and cellular location of the proteins. Therefore, bioinformatics is one such platform available to elucidate the relationships between biological sequence, 3-D structure and its accompanying functions, and then to use this knowledge for predictive purposes22. References 1 Henson B J, Hasselbrock S M, Watson L E & Barnum S R, Molecular phylogeny of the heterocystous cyanobacteria (subsections IV and V) based on nifD, Int J Syst Evol Microbiol, 54 (2004) 493-497. 2 Todd A E, From protein structure to function, in Bioinformatics, genes, proteins and computers, edited by C Orengo, D Jones & J Thornton (BIOS Scientific Publisher, Oxford) 2003. 3 Nagl S B, Function prediction from protein sequence, in Bioinformatics, genes, proteins and computers, edited by C Orengo, D Jones & J Thornton (BIOS Scientific Publisher, Oxford) 2003. 4 Rama B, Karen R C, Maria C C, Kara D, Selina S D et al, Fungal BLAST and Model Organism BLASTP Best Hits:.

LAKSHMI et al: NifH DOMAIN OF A CYANOBACTERIUM—CHLOROGLOEOPSIS SP.

51

Fig. 2—Construction of a phylogram. Phylogram indicates the branch lengths that are proportional to the amount of inferred evolutionary changes.

52

5

6

7

8 9

10 11

12 13

INDIAN J BIOTECHNOL, JANUARY 2009

New comparison resources at the Saccharomyces Genome Database (SGD), Nucleic Acids Res, 33 (2005) 374-D377. Silitoe I & Orengo C, Protein structure comparison, in Bioinformatics, genes, proteins and computers, edited by C Orengo, D Jones & J Thornton (BIOS Scientific Publisher, Oxford) 2003. Castenholz R W & Waterbury J B, Group I. Cyanobacteria, in Bergey’s manual of systematic bacteriology, edited by J T Stanley, M P Bryant, N Pfennig, J G Holt (Williams & Wilkins, Bltimore, MD), 3 (1989) 1710-1721. Schopf J W & Watlter M R, Origin and early evolution of cyanobacteria: The geological evidence, in The biology of cyanobacteria, edited by N G Carr & B A Whitton (Blackwell Scientific Publicationsm, Oxford) 1982, 543-564. Mitra and Pandey: CYANO-DATABASE: Database of cyanoprokaryotes. Phykos, 5 (1967) 112. Komárek J & Anagnostidis K, Modern approach to the classification system of Cyanophytes. 4-Nostocales. Algological studies, Arch Hydrobiol, 56 (1989) 247-345. Vingron M & Argos P, A fast and sensitive multiple sequence alignment algorithm, Comput Appl Biosci, 5 (1989) 115-121. Steunou A S, Bhaya D, Bateson M M, Melendrez M C, Ward D M et al, In situ analysis of nitrogen fixation and metabolic switching in unicellular thermophilic cyanobacteria inhabiting hot spring microbial mats, Proc Natl Acad Sci, USA, 103 (2006) 2398-403. Sprent J & Sprent P, Nitrogen fixing organisms: Pure and applied aspects (Chapman and Hall, New York) 1990. Pietrokovski S, Henikoff J G & Henikoff S, The BLOCKS database—A system for protein classification, Nucleic Acids Res, 24 (1996) 197-200.

14 Nicolas G, Alexandre D, Peitsch M C & Schwede T, 2006. http://www.expasy.org/spdbv/ 15 Takahashi Y, Hase T, Matsubara H, Hutber G N & Rogers L J, Amino acid sequence of Chlorogloeopsis fritschii ferredoxin: Taxonomic and evolutionary aspects, J Biochem, 92 (1982) 1363-1368. 16 Oldenburg O, Qin Q, Sharma A R, Cohen M V, Downey J M et al, Acetylcholine leads to free radical production dependent on KATP channels, Gi proteins, phosphatidylinositol 3, and tyrosine kinase, Cardiovasc Res, 55 (2002) 544-552. 17 Scanlan D J, Bourne J A & Mann N H, A putative transcriptional activator of the Crp/Fnr family from the marine cyanobacterium Synechococcus sp. WH7803, J Appl Phycol, 8 (1996) 565-567. 18 Tetsuya M, Sergei V, Yao Xu, Walter F, Michael M et al, Circadian clock protein KaiC forms ATP-dependent hexameric rings and binds DNA, Proc Natl Acad Sci, USA, 99 (2002) 17203-17208. 19 Mollenhauer D, Nostoc species in the field, Arch Hydrobiol, 80 (1988) 1-4. 20 Rippka R, Recognition and identification of cyanobacteria, Methods Enzymol, 167 (1988) 28-87. 21 Young J P W, Phylogenetic classification of nitrogen fixing organisms, in Biological nitogen fixation, edited by G Stacey, H J Evans & R H Burris, (Chapman and Hall, New York) 1992, 43-86. 22 Weir M, Swindels M & Overington J, Insights into protein function through large-scale computational analysis of sequence and structure, Trends Biotechnol, 19 (2001) 1-66.

Suggest Documents