Theory Biosci. (2013) 132:105–113 DOI 10.1007/s12064-012-0174-z
ORIGINAL PAPER
Identification of MFS proteins in sorghum using semantic similarity Manoj Kumar Sekhwal • Vinay Sharma • Renu Sarin
Received: 12 August 2012 / Accepted: 18 December 2012 / Published online: 9 January 2013 Ó Springer-Verlag Berlin Heidelberg 2012
Abstract The antiporters, uniporters and symporters are the functional classes of MFS that play major role in ions homeostasis, regulation of pumps and channels, membrane structure, transporters activity in tolerance to abiotic stresses. Major facilitator superfamily (MFS) encodes Na?/ H? antiporter that are considered as being sensors of the molecule transports. A large number of MFS proteins have been identified in several plants, rice, maize, Arabidopsis etc. However, the majority of proteins in sorghum are described as putative, uncharacterized till date. This suggested that identified proteins of MFS in sorghum are far from saturation. Hence, we developed gene ontology (GO) terms semantic similarity based method using GOSemSim measure of R package. As a result, total 2,568 high (100 %) semantic similar orthologous proteins from 7 plant species were obtained. These data were used to predict function of 257 putative uncharacterized proteins from 18 families of MFS in Sorghum. Consequently, the identified proteins belonged to the function of regulation of pumps and channels, membrane structure, transporters activity, ions homeostasis, transporter mechanisms and binding process. These identified functions appear to have a distinct
Electronic supplementary material The online version of this article (doi:10.1007/s12064-012-0174-z) contains supplementary material, which is available to authorized users. M. K. Sekhwal V. Sharma (&) Department of Bioscience and Biotechnology, Banasthali University, P.O. Banasthali Vidyapith, 304022 Vanasthali, Rajasthan, India e-mail:
[email protected] R. Sarin Department of Botany and Biotechnology, University of Rajasthan, JLN Marg, Jaipur 302055, Rajasthan, India
mechanism of salt-stress adaptation in plants. The proposed method will help in further identifying new proteins that can help in the development of agronomically and economically important plants. Keywords Protein–protein interaction Major facilitator superfamily Gene ontology Salt-stress Sorghum bicolor Abbreviations MFS Major facilitator superfamily GO Gene ontology PPI Protein–protein interaction BP Biological process CC Cellular component MF Molecular function COGs Cluster of orthologous groups Pfam Protein family
Introduction Since the sequencing of the first plant genome of Arabidopsis thaliana (The Arabidopsis Genome Initiative 2000), several other plant genomes viz. Oryza sativa (Goff et al. 2002), Populus trichocarpa (Tuskan et al. 2006), Physcomitrella patens (Rensing et al. 2008), Cucumis sativus (Huang et al. 2009), Sorghum bicolor (Paterson et al. 2009) and Selaginella moellendorffii (Banks et al. 2011) have been completely sequenced and the sequences of many other genomes may become available in the next few years. Rapid genome sequencing of plant species has deposited enormous amount of genomic data at public databases, such as TAIR (Huala et al. 2001), PlantGDB (Dong et al. 2004), Gene Ontology (Harris et al. 2004), NCBI-Plant
123
106
Genome (Wheeler et al. 2005) and Gramene database (Jaiswal 2011). However, despite the availability of complete genome sequences, there is lack of data on identification of various expressed proteins and their functions. Sorghum bicolor which has been recently sequenced (Paterson et al. 2009) is an important cereal crop for food, fodder and as raw materials for the production of starch, alcohol and biofuels (Mutegi et al. 2010). The extensive agricultural and various other uses of sorghum has necessitated improved resistance of this crop towards biotic and abiotic stresses mainly drought and salt. Major facilitator superfamily (MFS) plays a major role in ion and nutrient homeostasis, regulation of pumps and channels, membrane structure, transporter activity, and protein binding in tolerance to abiotic stress (Pao et al. 1998). The antiporters, uniporters and symporters are the functional classes of MFS. Earlier, phylogenetic analyses have revealed that 17 distinct families of MFS have common features (Marger and Saier 1993). The Gramene plant genome database (Release 34b; http://www.gramene.org/) includes 300, 230, 346, 34, 48, 316 identified proteins of MFS from O. sativa, A. thaliana, Zea mays, Glycine max, P. patens and P. trichocarpa, respectively. In contrast, not much functional information on MFS in sorghum is available hitherto (http://www.gramene.org). Thus, there is an obvious need for the functional identification of MFS proteins in sorghum. In addition, there are a large number of uncharacterized proteins of MFS in other plants too, viz. Vitis vinifera, S. moellendorffii (Gramene release 34b; Jaiswal 2011). Thus, protein identification of MFS is an important step toward understanding the biological functions of these proteins especially in abiotic stress. Recently, computational approaches which are accurate and affordable have also been widely used to identify functional role of proteins (Yin et al. 2008). In past several attempts have been made to annotate the proteins using computational approaches, such as gene neighbor (Rogozin et al. 2002), phylogenetic profiling (Cokus et al. 2007), protein–protein interaction (PPI) (Turanalp and Can 2008), sequence similarity (Lee et al. 2009) and co-expression (Ulitsky and Shamir 2009). Primarily, the established approaches were homology based. The relationship between genes from different genomes is represented as a system of homologue that includes both orthologue and paralogue (Dewey et al. 2011). The orthologue prediction has been completely based on sequence similarity. The orthologues originate from the same ancestor but exist in different species with similar sequence and function (Chen and Jeong 2000). Hence, computational approaches like BLAST (McGinnis and Madden 2004) are being used to check sequence similarity. The proteins that have high similarity score are considered as the main homologues (Chen and Jeong 2000). Orthologous finding
123
Theory Biosci. (2013) 132:105–113
becomes a powerful approach to identify protein function and evolution of protein families (Towfic et al. 2010). The sequence similarity is not only used for orthologue searching but it also predicts protein structure and interaction data (Goh et al. 2000). It has been reported that interacted proteins are co-evolved and co-expressed with their corresponding protein (Raman 2010). The protein interaction has been validated based on the assumption that most of homologue proteins are interacting (Bodt et al. 2009). Large PPI data are available on several online databases viz. DIP (Xenarios et al. 2000), MINT (Zanzoni et al. 2002), BIND (Bader et al. 2003), HPRD (Peri et al. 2004) and IntAct (Kerrien et al. 2007). These databases are integrated at APID (Prieto and De Las Rivas 2006) that defines proteins interaction based on the co-functional, co-expressed and co-located proteins. Different methods used earlier for protein prediction have been based on either orthology relationship or sequence similarity (Sjolander et al. 2011). The semantic similarity between two proteins is usually calculated on the basis of their GO terms similarity (Gomez et al. 2011). The GO terms in Gene Ontology database are organized as directed acyclic graph (DAG) in three aspects of ontologies, viz. molecular function (MF), biological process (BP) and cellular component (CC) (Harris et al. 2004). For determination of semantic similarity, several methods have been reported earlier, viz. Resnik (1999), Jiang Conrath (1997) and Wang et al. (2007). Previously, it has been reported that the interacting proteins in the cell are likely to be in similar locations, or involved in similar biological processes and molecular function (Jain and Bader 2010). Recently, we also reported that GO terms semantic similarity based methods can be used to predict the function of proteins (Sekhwal et al. 2012; Sharma et al. 2012). In the present paper, we report a more dedicated approach to predict proteins, based on the concept that high semantic similar proteins have similar functional property. These high semantic similar proteins were useful to assess the physiological importance of PPI in distinct families of MFS in sorghum.
Materials and methods Data retrieval and analysis A total of 257 putative uncharacterized proteins of 18 distinct families of MFS in sorghum were manually obtained at unique identification number (IPR016196) from Gramene genome database (Gramene release 34b, January 2012) (Supplementary file). To identify potential orthologous proteins for these 257 putative uncharacterized proteins, 2,568 high semantic similar orthologous proteins from 7 plant species were obtained using PSI-BLAST (http://www.ebi.ac.uk/Tools/sss/psiblast/) (Altschul et al.
Theory Biosci. (2013) 132:105–113
1997). This set included orthologous proteins from Zea mays (326), O. sativa Indica (383), O. sativa japonica (745), A. thaliana (366), A. lyrata (294), P. patens (136), S. moellendorffii (318). Although finding orthologous proteins using PSI-BLAST the following parameters were applied: protein databaseUniprot; E-value-1.0e-3; metrix-blosum62; gap opening11; gap extend-1; scores and alignments-500; dropoff-15 default, final dropoff-25 and alignment view-pairwise with active filter. It has been shown earlier that 43 % of the BLAST hits were homologous at E values ranging from 1 to 10 and over 99 % of them are homologous at below the threshold E value 1e-03. Hence, a higher E value threshold provided better coverage at the lower accuracy (Boekhorst and Snel 2007). Proteins sequence, GO terms and families were identified using UniProt database (http://www.uniprot. org/) (Bairoch et al. 2005) that cross-linked with other databases such as Gene Ontology (Harris et al. 2004), Pfam (Bateman et al. 2004). The MFS belonging to 18 families were employed to further investigate the functional and phylogenetic annotation by COGnitor program that compares protein sequences against Clusters of Orthologous Groups (COG) database (version 66) (http://www.ncbi. nih.gov/COG) (Tatusov et al. 2000). Semantic similarity measures Above described PSI-BLAST was utilized to identify orthologous proteins. These orthologous proteins were assigned to identify their GO terms semantic similarity using GOSemSim of R package (http://www.bioc.ism. ac.jp/2.8/bioc/html/GOSemSim.html) (Yu et al. 2010) with Jiang Conrath method (Jiang and Conrath 1997). We performed semantic similarity analysis individually with three GO-terms biological process, cellular component and molecular function for each orthologous protein. The orthologous proteins were classified in two parameters, i.e., with high (100 %) and low (\100 %) semantic similarity. Protein interaction algorithm The algorithm is based on the concept that high semantic similar proteins in a species are more likely to interact. In sorghum, the interaction was confirmed among high semantic similar proteins from a family and distinct families. The algorithm is given as: PPI ¼ C1 ðA \ BÞ \ Cn ðA \ BÞ where ‘A’ and ‘B’ are high semantic similar proteins and C1 to Cn are clusters of these proteins. Each cluster consists of high semantic similar proteins of a family. The symbol ‘\’ denotes the protein interaction.
107
Sorghum proteins interaction The functional annotation of MFS was examined based on the assumption that high similar and relative GOterms have similar functional property. In this study, we used the algorithm to assign interaction between high semantic similar proteins of a family in sorghum. These interactions facilitate to annotate common function of a family. We assumed that with high semantic similar proteins in 18 distinct families have high confidence level of interaction as such proteins are involved in the same cellular component, molecular function and biological process. For instance mitochondrion (GO: 0005739) is more similar to ribosome (GO: 0005840) than nucleolus (GO: 0005730), because both are in cytoplasm. To identify, putative uncharacterized proteins in sorghum, we classified high semantic similar proteins in 16 distinct clusters (Table 1). These proteins allowed us to illustrate interaction among proteins in a cluster and other distinct clusters, which was highly beneficial for the prediction of common functions of MFS in sorghum. In the interaction, clusters were represented as nodes and their GO terms as attribute. The relationship between node and attributes was represented by edge (Fig. 1). The interaction graph was produced by cytoscape software (version 2.8.1) (Shannon et al. 2003).
Results Orthologous finding is a powerful approach to identify protein function and evolution (Towfic et al. 2010). Hence, unknown proteins from one plant species can be used to identify the conserved proteins in other species. In this study, all putative uncharacterized proteins were classified in 16 clusters based on their families (Table 1). These uncharacterized proteins of sorghum from each cluster were assigned to find out their orthologous proteins from seven plant species (described in ‘‘Materials and methods’’). The result showed high semantic similarity among orthologous proteins amid sorghum in distinct families. The semantic similarity was classified in the parameters of high (100 %) and low (\100 %). A total of 2,568 high semantic similar orthologous proteins for 18 distinct families were identified from 7 plant species (Table 1). However, in the terms of evolution the orthologues originate with similar function (Chen and Jeong 2000). Hence, our analyses were strongly identified proteins of MFS based on the high semantic similar orthologous proteins belong to same functional categories.
123
108
Theory Biosci. (2013) 132:105–113
Table 1 High semantic similar proteins of 18 MFS families assigned in 16 different clusters Cluster No.
Protein family [Pfam ID]
PUC
COG function
HSS
1
Sugar transporter [PF00083]
91
COG0477 (GEPR) permeases of the major facilitator superfamily
140
2
Proton-dependent oligopeptide transport [PF00854]
81
COG3104 (E) dipeptide/tripeptide permease
522
3 4
Major facilitator superfamily [PF07690] Nodulin-like [PF06813]
55 10
COG0477 (GEPR) permeases of the major facilitator superfamily COG0477 (GEPR) permeases of the major facilitator superfamily
92 32
5
B cell receptor-associated protein 31-like [PF05529]
1
No related COG
30
6
Endomembrane protein 70 [PF02990]
3
No related COG
134
7
ABC transporter [PF01061]
3
COG1131 (Q) ABC-type multidrug transport system, ATPase component
164
8
Amino acid permease [PF00324]
1
COG0531 (E) amino acid transporters
9
zf-RING [PF13639]
1
No related COG
662
10
2
No related COG
71
11
Aluminium activated malate transporter [PF11744] Ferroportin1 [PF06963]
2
COG0477 (GEPR) permeases of the major facilitator superfamily
31
12
Amino acid transporter protein [PF01490]
1
COG0814 (E) amino acid permeases
56
13
Ion transport protein [PF00520]; cyclic nucleotide-binding [PF00027]
1
COG0664 (T) cAMP-binding domains–catabolite gene activator and regulatory subunit of cAMP-dependent protein kinases
74
14
Leucine rich repeat [PF08263]; Pkinase [PF00069]
1
COG0515 (T) serine/threonine protein kinases
370
15
MtN3_saliva [PF03083]
1
No related COG
115
16
TLC ATP/ADP transporter [PF03219]
3
COG0477 (GEPR) permeases of the major facilitator superfamily
61
14
PUC putative uncharacterized proteins in sorghum, HSS high semantic similar (100 %)
Proteins cluster and annotation To identify 257 putative uncharacterized target proteins in sorghum, we mapped 16 distinct clusters. These clusters were classified according to the family group of proteins. We found majority of these proteins in 16 distinct clusters that showed high semantic similarity. The majority of putative uncharacterized proteins 91, 81, 55 and 10 in sorghum were covered by 1, 2, 3 and 4 clusters, respectively (Supplementary file; Table 1). The proteins in distinct clusters 1, 2 and 4 showed high semantic similarity (100 %). However, cluster 3 proteins showed 62–100, 84–100 and 85–100 % semantic similarity in molecular function, cellular component and biological process, respectively. Evidently, proteins interaction has defined bases on co-functional, co-expressed and co-located proteins (Prieto and De Las Rivas 2006). For instance, SUC2type transporters were found to have physical interaction with other transporters viz. SUC3 and SUC4 in Arabidopsis (Wippel and Sauer 2012). Protein kinase SOS2 physically interacted with the calcium-binding protein SOS3, CBL10 and nucleoside diphosphate kinase in Arabidopsis (Quan et al. 2007). AnnAt1 and AnnAt4 Ca2? and phospholipidbinding proteins in Arabidopsis interacted with each other (Huh et al. 2010). These all proteins have high ([80 %) semantic similarity with their corresponding proteins in all
123
three aspects of ontologies MF, BP and CC. Previously identified kinase–proteins’ interaction in rice also showed high ([70 %) semantic similarity (Ding et al. 2009). In the present work, the classified 16 distinct clusters have confirmed interaction between high semantic similar proteins of MFS in sorghum. The identified proteins in 16 distinct clusters facilitated the annotation of the function of MFS in sorghum. Proteins in clusters 1, 2, 3 and 4 were involved in the transferspecific substance or a group of related substances, such as monosaccharides, sucrose, and inorganic phosphate from one side of a membrane to the other. Proteins in cluster 3 were also involved in stimulus to antibiotic and hydrogen antiporter activity. The proteins in cluster 5 were involved in direct movement of proteins between specific compartments such as endoplasmic reticulum. Proteins in clusters 8, 10 and 11 enabled the transfer of organic molecules, amino acids, aluminum ions, iron ions, respectively from one side of membrane to other. The proteins in cluster 13 facilitated transport of ions and cation through transmembrane channels such as transfer potassium ions by a voltage-gated channel. Cluster 15 occupied as carbohydrate (sugar) transporter plays an important role in abiotic stress tolerance. The proteins in clusters 7, 9, 14 and 16 play important role in protein phosphorylation and binding process. As cellular component most proteins in different
Theory Biosci. (2013) 132:105–113
clusters were involved in ‘‘integral to membrane and plasma membrane’’. As a result, almost all target proteins in distinct clusters were involved in transport activity across cell membrane. Proteins cluster linkage In this study, a large number of data were analyzed in response to functional identification of uncharacterized proteins in sorghum. To classify orthologous proteins in 18 families, a data set of GO terms was built. In this study, the algorithm confirmed the interaction between high semantic similar proteins of distinct clusters in sorghum. We also noted association between 16 distinct clusters using their high semantic similar (80–100 %) GO terms. Figure 1 shows the association of high semantic similar proteins in 16 distinct clusters that were very useful in the annotation of the function of proteins in sorghum. Among these clusters we noted several high semantic similar GO terms that were assigned the association between clusters. Such high semantic similar (100 %) terms were cellular component: [integral to membrane (GO: 0016021)] in clusters 1, 2, 3, 5, 6, 8, 11, 12, 13, 15 and 16; [membrane (GO: 0016020)] in clusters 2, 3 and 7; [plasma membrane (GO: 0005886)] in clusters 1, 2, 3, 4, 6, 11 and 15; [vacuole
109
(GO: 0005773)] in clusters 1, 3, 4, 6; [plant-type vacuole membrane (GO: 0009705)] in clusters 1, 3, 11; molecular function: [transporter activity (GO: 0005215)] in clusters 1, 2, 3; and biological process: [transmembrane transport (GO: 0055085)] in clusters 1 and 3; [ATP binding (GO: 0005524)] clusters 7, 14, 16; [response to wounding (GO: 0009611)] in clusters 3 and 14 (Fig. 1). Consequently, these identified terms were involved in important salt stress adaptation mechanisms like regulation of pumps and channels, membrane structure, transporters activity, ions homeostasis and proteins binding. Functional annotation by COG The functional categories of putative target proteins in sorghum were identified by COGnitor tool. The proteins in clusters 1, 3, 4, 11 and 16 belong to functional category of [(GEPR) carbohydrate amino acid, inorganic ion transport, metabolism and general function prediction]. The proteins in clusters 2, 8 and 12 belong to functional category of [(E) amino acid transport and metabolism]. Proteins in cluster 7 belong to functional category of [(Q) secondary metabolites biosynthesis, transport and catabolism]. Proteins in clusters 13, 14 belong to category of [(T) signal transduction mechanisms]. For clusters 5, 6, 9, 10 and 15
Fig. 1 High semantic similar proteins consisting of 16 distinct clusters of 18 MFS families in sorghum (red). The association between clusters is illustrated using high semantic similar (100 %) GO terms in pink color and other (80–100 %) semantic similar terms in blue color (color figure online)
123
110
no COG records were found (Table 1). We noted that high semantic similar proteins share similar biological process, cellular component and molecular function. Thus, proteins in classified clusters were identified based on high semantic similar orthologous proteins of same family. Consequently, we observed functional categories of proteins in distinct clusters which were involved in signals, solute, cation, and ion transporter mechanisms. These identified proteins play important role in salt adaptation mechanisms.
Discussion Whole genome sequences of several organisms have been available but functional information of proteins is very less reported. Genome sequencing of S. bicolor has been recently completed (Paterson et al. 2009). Their newly identified proteins are described as putative uncharacterized till date (Gramene release 34b). Orthologue detection method presents a powerful approach for finding protein function that participates in similar functional property across different organisms (Towfic et al. 2010). Our study has also validated that the orthologous proteins share similar functional properties. In this study, we identified proteins in sorghum based on their high semantic similar orthologous proteins. Here, a total of 2,568 high semantic similar (100 %) orthologous proteins were identified from 18 families of MFS (Table 1). More similar function of proteins may have greater probability to become interacted proteins, because their genes are consistently co-regulated across distantly related organisms (Teichmann and Babu 2002). Here, 16 classified distinct clusters of high semantic similar proteins in sorghum showed interaction (Fig. 1). The result revealed that classified proteins in distinct clusters belonged to functional categories of regulation of pumps and channels, membrane structure, transporters activity, ions homeostasis and binding. These identified functions appear to be a distinct mechanism of salt-stress adaptation in plants. Identified proteins involved in salt tolerance Sorghum crop is highly tolerant to drought, salt and heat stress and the expression of MFS protein families, specifically was suggested as one of the probable reasons for adaptation to salt stress (Swami et al. 2011). The causes of salt stress are osmotic, nutritional imbalance and ionic effect, but one of the major causes of salinity is high NaCl environment. The high NaCl leads an imbalance of ions such as sodium (Na?) and chloride (Cl-) homeostasis (Tavakkoli et al. 2011). The Na? and Cl- ions toxicity affects major metabolic processes of the plants, such as protein synthesis and lipid metabolism (Lin and Wu 1996). Major causes of
123
Theory Biosci. (2013) 132:105–113
salinity are hyperionic and hyperosmotic that increase reactive oxygen species (ROS) levels and metabolic toxicity in plants (Mittler 2002). Fortunately, plants have developed a variety of salt stress defense system involving several plasma membrane proteins, ion transporters and sodium sensitive enzymes. Toxicity due to several ions (e.g., aluminum, iron, sodium) affects plant growth and development. In this study, the identified proteins in clusters 10 and 11 are involved as transporters to manage aluminum and iron ions. Previously, it has been validated that organic acids are involved in the mechanisms of aluminum tolerance in higher plants (Ma et al. 2001). However, NaCl caused decrease in nitrogen, potassium and calcium in the shoot tissue and accumulation of sodium, phosphorus, iron and zinc in the root tissue (Turan et al. 2010). Previously, it has been recognized that aluminum (Al) toxicity is the major cause of inhibition root growth of plants (Delhaize et al. 1995). Several genes have been encoded for Al tolerance in different plants, such as ZmASL, ALMT2, SAHH in maize and TaMDR1 in wheat (Krill et al. 2010; Sasaki et al. 2002). The ALMT and MATE families that enhance the Al resistance in plants have been also identified (Ryan et al. 2011). The MFS are major class of antiporters, uniporters and symporters (Marger and Saier 1993). These functional classes play a the major role in ions homeostasis such as high toxicity ions (Na?, Cl-) uptake by phosphate, potassium, hydrogen ions in the cell under saline condition (Tester and Davenport 2003). The identified proteins in cluster 3 were involved in hydrogen antiporter activity. Na?/H? antiporters play a major role in the efflux of Na? from the cell particularly in salt and osmotic stress adaptation mechanisms. Vacuolar Na?/H? antiporters play a crucial role in plant salt tolerance. The vacuole Na?/H? antiporter gene TaNHX2 was obtained from wheat (Yu et al. 2007). The salt overly sensitive (SOS) pathway is reported in the response for Na? homeostasis in plants. Na?/H? antiporter are involved in the pathways of ion-homeostasis and oxidative-stress detoxification in salt tolerance (Katiyar-Agarwal et al. 2006). Rice, like other plants contain a number of sucrose transporter genes. Here, identified proteins in clusters 1, 2, 3 and 15 were involved in carbohydrate, sucrose, phosphate transporter activity. The role of sucrose transporter genes was examined during drought and salinity stress conditions. Sucrose transporter genes have been isolated from many different plants and their physiological roles have been studied (Sivitz et al. 2007). Previously reported gene OsSUT2 facilitates transport of sucrose that is up-regulated during drought and salinity (Ibraheem et al. 2011). SLC17 family of transporter was initially characterized as phosphate carriers that mediate the transport of organic anions (Reimer and Edwards 2004).
Theory Biosci. (2013) 132:105–113
The proteins in clusters 1, 3 and 4 were involved in transmembrane transport activity. Theoretically described, the excess sodium (Na?) ions toxicity needs to be effluxed by transmembrane transport protein (ATPase). K?/Na?ATPase protein meditate influx/efflux ions by various pumps and channels (Silva and Geros 2009). In most crop plants, Na? is the primary cause of ion toxicity; conversely high concentration of chloride ions (Cl-) in the cytosol is also harmful during high salinity. The CLCs families have been detected in several ion transport and voltage-gated anion channels of various plant species (Ward et al. 2009). However, CLCa has been reported their role in the antiporter to drive vacuolar nitrate accumulation in A. thaliana (Hechenberger et al. 1996). In rice, OsCLCCc showed the function in the regulation of anion and cation homeostasis and osmotic adjustment at high salinity (Diedhiou and Golldack 2006). Here, identified proteins in cluster 13 are involved in transfer of ions and cation by voltage-gated channel. The identified proteins of MFS in this study play a role in plant development and salt stress adaptation in sorghum.
Conclusions In this study, we performed functional annotation of 257 putative uncharacterized proteins in sorghum using GO terms semantic similarity. We observed that identified proteins belonged to the function of regulation of pumps and channels, membrane structure, transporters activity, ions homeostasis transporter mechanisms, and ions binding. These identified functions appear to be a distinct mechanism of salt-stress adaptation in sorghum plant. The findings from this study will contribute to further identify the proteins that may help in the development of agronomical and economically important plants. Acknowledgments The authors gratefully acknowledge liberal use of the facilities of the DBT supported Centre of Bioinformatics at Banasthali University, India.
References Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402 Bader GD, Betel D, Hogue CW (2003) BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res 31:248–250 Bairoch A, Apweiler R, Wu CH, Winona C, Barker WC, Boeckmann B, Ferro S, Gasteiger E et al (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res 33:D154–D159 Banks JA, Nishiyama T, Hasebe M, Bowman JL, Gribskov M, dePamphilis C, Albert VA, Aono N et al (2011) The Selaginella
111 genome identifies genetic changes associated with the evolution of vascular plants. Science 332:960–963 Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M et al (2004) The Pfam protein families database. Nucleic Acids Res 32:D138–D141 Bodt SD, Proost S, Vandepoele K, Rouze P, de-Peer YV (2009) Predicting protein–protein interactions in Arabidopsis thaliana through integration of orthology, gene ontology and co-expression. BMC Genomics 10:288–302 Boekhorst J, Snel B (2007) Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties. BMC Bioinformatics 8:356–362 Chen R, Jeong SS (2000) Functional prediction: identification of protein orthologs and paralogs. Prot Sci 9:2344–2353 Cokus S, Mizutani S, Pellegrini M (2007) An improved method for identifying functionally linked proteins using phylogenetic profiles. BMC Bioinformatics 8:S7–S18 Delhaize E, Ryan PR (1995) Aluminum toxicity and tolerance in plants. Plant Physiol 107(31):5–21 Dewey CN (2011) Positional orthology: putting genomic evolutionary relationships into context. Brief Bioinform 12:401–412 Diedhiou CJ, Golldack D (2006) Salt-dependent regulation of chloride channel transcripts in rice. Plant Sci 170:793–800 Ding X, Richter T, Chen M, Fujii H, Seo YS, Xie M, Zheng X, Kanrar S et al (2009) A rice kinase–protein interaction map. Plant Physiol 149:1478–1492 Dong Q, Schlueter SD, Brende V (2004) PlantGDB, plant genome database and analysis tools. Nucleic Acids Res 32:D354–D359 Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A et al (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296: 92–100 Goh CS, Bogan AA, Joachimiak M, Walther D, Cohen FE (2000) Co-evolution of proteins with their interaction partners. J Mol Biol 299:283–293 Gomez A, Cedano J, Amela I, Planas P, Pinol J, Querol E (2011) Gene ontology function prediction in mollicutes using protein– protein association networks. BMC Syst Biol 5:49–59 Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S et al (2004) The gene ontology (GO) database and informatics resource. Nucleic Acids Res 32:D258– D261 Hechenberger M, Schwappach B, Fischer WN, Frommer WB, Jentsch TJ, Steinmeyer K (1996) A family of putative chloride channels from Arabidopsis and functional complementation of a yeast strain with a CLC gene disruption. J Biol Chem 271:33632– 33638 Huala E, Dickerman AW, Garcia-Hernandez M, Weems D, Reiser L, LaFond F, Hanley D, Kiphart D et al (2001) The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res 29:102–105 Huang S, Li R, Zhang Z, Li L, Gu X, Fan W, Lucas WJ, Wang X et al (2009) The genome of the cucumber, Cucumis sativus L. Nature Genet 41:1275–1281 Huh SM, Noh EK, Kim HG, Jeon BW, Bae K, Hu HC, Kwak JM, Park OK (2010) Arabidopsis annexins AnnAt1 and AnnAt4 interact with each other and regulate drought and salt stress responses. Plant Cell Physiol 51:1499–1514 Ibraheem O, Dealtry G, Roux S, Bradley G (2011) The effect of drought and salinity on the expressional levels of sucrose transporters in rice (Oryza sativa Nipponbare) cultivar plants. Plant OMICS. J Plant Mol Biol Omics 4:68–74 Jain S, Bader GD (2010) An improved method for scoring protein– protein interactions using semantic similarity within the gene ontology. BMC Bioinformatics 11:562–575
123
112 Jaiswal P (2011) Gramene database: a hub for comparative plant genomics. Methods Mol Biol 678:247–275 Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of ROCLING X, Taiwan, pp 19–33 Katiyar-Agarwal S, Zhu J, Kim K, Agarwal M, Fu X, Huang A, Zhu JK (2006) The plasma membrane Na?/H? antiporter SOS1 interacts with RCD1 and functions in oxidative stress tolerance in Arabidopsis. Proc Natl Acad Sci USA 103:18816–18821 Kerrien S, Faruque YA, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M et al (2007) IntAct-open source resource for molecular interaction data. Nucleic Acids Res 35:D561–D565 Krill AM, Kirst M, Kochian LV, Buckler ES, Hoekenga OA (2010) Association and linkage analysis of aluminum tolerance genes in maize. PLoS ONE 5:e9958–e9968 Lee BJ, Shin MS, Oh YJ, Oh HS, Ryu KH (2009) Identification of protein functions using a machine-learning approach based on sequence-derived properties. Proteome Sci 7:27–45 Lin H, Wu L (1996) Effects of salt stress on root plasma membrane characteristics of salt-tolerant and salt-sensitive buffalo grass clones. Environ Exper Bot 36:239–247 Ma JF, Ryan PR, Delhaize E (2001) Aluminium tolerance in plants and the complexing role of organic acids. Trends Plant Sci 6:273–278 Marger MD, Saier MH Jr (1993) A major superfamily of transmembrane facilitators that catalyse uniport, symport and antiport. Trends Biochem Sci 18:13–20 McGinnis S, Madden TL (2004) BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res 32:W20–W25 Mittler R (2002) Oxidative stress, antioxidants and stress tolerance. Trends Plant Sci 7:405–410 Mutegi E, Fabrice S, Moses M, Ben K, Bernard R, Caroline M, Marangu C, Kamau J et al (2010) Ecogeographical distribution of wild, weedy and cultivated Sorghum bicolor (L.) Moench in Kenya: implications for conservation and crop-to-wild gene flow. Genet Resour Crop Evol 57:243–253 Pao SS, Paulsen IT, Saier MH Jr (1998) Major facilitator superfamily. Microbiol Mol Biol Rev 62:1–34 Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U et al (2009) The Sorghum bicolor genome and the diversification of grasses. Nature 457:551–556 Peri S, Navarro JD, Kristiansen TZ, Amanchy R, Surendranath V, Muthusamy B, Gandhi TK, Chandrika KN et al (2004) Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res 32:D497–D501 Prieto C, De Las Rivas J (2006) APID: agile protein interaction data analyzer. Nucleic Acids Res 34:W298–W302 Quan R, Lin H, Mendoza I, Zhang Y, Cao W, Yang Y, Shang M, Chen S et al (2007) SCABP8/CBL10, a putative calcium sensor, interacts with the protein kinase SOS2 to protect Arabidopsis shoots from salt stress. Plant Cell 19:1415–1431 Raman K (2010) Construction and analysis of protein–protein interaction networks. Autom Exp 2:2–12 Reimer RJ, Edwards RH (2004) Organic anion transport is the primary function of the SLC17/type I phosphate transporter family. Pflugers Arch 447:629–635 Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, Nishiyama T, Perroud PF et al (2008) The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science 319:64–69 Resnik P (1999) Semantic similarity in a taxonomy: an informationbased measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11:95–130
123
Theory Biosci. (2013) 132:105–113 Rogozin IB, Makarova KS, Murvai J, Czabarka E, Wolf YI, Tatusov RL et al (2002) Connected gene neighborhoods in prokaryotic genomes. Nucleic Acids Res 30:2212–2223 Ryan PR, Tyerman SD, Sasaki T, Furuichi T, Yamamoto Y, Zhang WH et al (2011) The identification of aluminium-resistance genes provides opportunities for enhancing crop production on acid soils. J Exp Bot 62:9–20 Sasaki T, Ezaki B, Matsumoto H (2002) A gene encoding multidrug resistance (MDR)-like protein is induced by aluminum and inhibitors of calcium flux in wheat. Plant Cell Physiol 43:177–185 Sekhwal MK, Swami AK, Sarin R, Sharma V (2012) Identification of salt treated proteins in sorghum using gene ontology linkage. Physiol Mol Biol Plants 18:209–216 Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B et al (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13:2498–2504 Sharma V, Sekhwal MK, Swami AK, Sarin R (2012) Identification of drought responsive proteins using gene ontology hierarchy. Bioinformation 8:595–599 Silva P, Geros H (2009) Regulation by salt of vacuolar H?-ATPase and H?-pyrophosphatase activities and Na?/H? exchange. Plant Signal Behav 4:718–726 Sivitz AB, Reinders A, Johnson ME, Krentz AD, Grof CP, Perroux JM et al (2007) Arabidopsis sucrose transporter AtSUC9. Highaffinity transport activity, intragenic control of expression, and early flowering mutant phenotype. Plant Physiol 143:188–198 Sjolander K, Datta RS, Shen Y, Shoffner GM (2011) Ortholog identification in the presence of domain architecture rearrangement. Brief Bioinform 12:413–422 Swami AK, Alam SI, Sengupta N, Sarin R (2011) Differential proteomic analysis of salt stress response in Sorghum bicolor leaves. Environ Exper Bot 71:321–328 Tatusov RL, Galperin MY, Natale DA, Koonin EV (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28:33–36 Tavakkoli E, Fatehi F, Coventry S, Rengasamy P, McDonald GK (2011) Additive effects of Na? and Cl- ions on barley growth under salinity stress. J Exp Bot 62:2189–2203 Teichmann SA, Babu MM (2002) Conservation of gene co-regulation in prokaryotes and eukaryotes. Trends Biotechnol 20:407–410 Tester M, Davenport R (2003) Na? tolerance and Na? transport in higher plants. Ann Bot 91:503–527 The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815 Towfic F, VanderPIas S, Oliver CA, Couture O, Tuggle CK, Greenlee MHW, Honavar V (2010) Detection of gene orthology from gene co-expression and protein interaction networks. BMC Bioinformatics 11:S7–S16 Turan MA, Elkarim AHA, Taban N, Taban S (2010) Effect of salt stress on growth and ion distribution and accumulation in shoot and root of maize plant. Afr J Agricultural Res 5:584–588 Turanalp ME, Can T (2008) Discovering functional interaction patterns in protein–protein interaction networks. BMC Bioinformatics 9:276–293 Tuskan GA, DiFazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S et al (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:1596–1604 Ulitsky I, Shamir R (2009) Identifying functional modules using expression profiles and confidence-scored protein interactions. Bioinformatics 25:1158–1164 Wang JZ, Du Z, Payattakool R, Yu PS, Chen CF (2007) A new method to measure the semantic similarity of GO terms. Bioinformatics 23:1274–1281
Theory Biosci. (2013) 132:105–113 Ward JM, Maser P, Schroeder JI (2009) Plant ion channels: gene families, physiology, and functional genomics analyses. Annu Rev Physiol 71:59–82 Wheeler DL, Smith-White B, Chetvernin V, Resenchuk S, Dombrowski SM, Pechous SW, Tatusova T, Ostell J (2005) Plant genome resources at the national center for biotechnology information. Plant Physiol 138:1280–1288 Wippel K, Sauer N (2012) Arabidopsis SUC1 loads the phloem in suc2 mutants when expressed from the SUC2 promoter. J Exp Bot 63:669–679 Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D (2000) DIP: the database of interacting proteins. Nucleic Acids Res 28:289–291
113 Yin Z, Li C, Han X, Shen F (2008) Identification of conserved microRNAs and their target genes in tomato (Lycopersicon esculentum). Gene 414:60–66 Yu JN, Huang J, Wang ZN, Zhang JS, Chen SY (2007) An Na?/H? antiporter gene from wheat plays an important role in stress tolerance. J Biosci 32:1153–1161 Yu G, Li F, Qin Y, Bo X, Wu Y, Wang S (2010) GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26:976–988 Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, HelmerCitterich M, Cesareni G (2002) MINT: a Molecular INTeraction database. FEBS Lett 513:135–140
123