userfriendly software for simplification of ... - Wiley Online Library

4 downloads 104378 Views 197KB Size Report
upload a text file with a list of ST profiles or upload a text file with complete MLST sequencing data. This tool rapidly con- verts thousands of ST or MLST ...
Methods in Ecology and Evolution 2014, 5, 491–494

doi: 10.1111/2041-210X.12170

APPLICATION

MLST@SNaP: user-friendly software for simplification of multilocus sequence typing and dissemination of microbial population analyses ^s Soares1,2,3 and Ricardo Araujo1,2* Ine 1

IPATIMUP, Institute of Molecular Pathology and Immunology of the University of Porto, Rua Dr. Roberto Frias s/n, Porto, 4200-465, Portugal; 2Faculty of Sciences, University of Porto, Rua do Campo Alegre s/n, Porto, 4169-007, Portugal; and 3 Center of Mathematics of the University of Porto, Rua do Campo Alegre s/n, Porto, 4169-007, Portugal

Summary 1. Microbial multilocus sequence typing is a course-grained approach for classifying micro-organisms that involves intense gene sequencing, limiting its application in clinical and research laboratories. Multiplexes targeting single-nucleotide polymorphisms might replace this impractical method by providing the SNaP profile. We here describe MLST@SNAP, a fast, user-friendly and automated process to convert thousands of MLST sequences or sequence types into SNaP profiles and vice versa. The software largely increases the promptness of microbial population data analyses, being presently designed for the micro-organisms Aspergillus fumigatus, Burkholderia cenocepacia, Burkholderia cepacia, Burkholderia contaminans and Pseudomonas aeruginosa. 2. MLST@SNAP is freely available for MACINTOSH, UNIX and WINDOWS systems as supplementary material of this manuscript, at https://www.mediafire.com/folder/kc4fwabtb8lu1/setupMLST%40SNaP and http://www.portugene.com/MLSTSNaP.html accompanied by tutorial, example files and some data.

Key-words: epidemiology, genetic diversity, MLST data base, molecular diagnosis, SNP

Introduction The current gold standard method for microbial genotyping is multilocus sequence typing (MLST). The strategy is useful in studies concerning microbial population structure, identification of infection sources and outbreaks, definition of endemic populations and monitoring of microbial evolution. It was first employed in 1998, being successfully extended during the next years to dozens of prokaryotic and eukaryotic micro-organisms (Maiden 2006). MLST involves nucleotide comparisons of a set of housekeeping genes and definition of sequence types (ST) for each new profile. Therefore, the method ensures excellent reproducibility, high discriminatory power and unambiguous genomic analysis (Curran et al. 2004; Bain et al. 2007; Unemo & Dillon 2011), and finally, it provides information through international data bases freely available online (http://www.mlst.net/and http://pubmlst.org) (Jolley, Chan & Maiden 2004). Although MLST represents a standard method, it is still work-demanding, cumbersome and impractical to be used in large-scale studies with hundreds of isolates due to the high associated cost. The implementation of methods targeting single-nucleotide polymorphisms (SNPs) has recently been used for detection of specific lineages and microbial identification (Aydin et al. 2001; Dalmasso, Civera & Bottero 2009; Lomonaco et al. *Correspondence author. E-mail: [email protected]

2011). Such mini-sequencing approaches are performed using single-base extension primers in the presence of fluorescentlabelled dideoxynucleoside triphosphates (ddNTPs), recognizing a set of markers that represent site-specific genomic variation. Standardized methods with several SNPs were recently developed for identification and genotyping of Aspergillus fumigatus (Caramalho et al. 2013), Burkholderia cenocepacia (Eusebio et al. 2013a), Burkholderia cepacia and Burkholderia contaminans (manuscript in preparation) and Pseudomonas aeruginosa (Eusebio et al. 2013b); the strategy will soon be extended to other micro-organisms. These assays targeted the most relevant polymorphisms on the MLST scheme and represent a practical and cheap alternative to the gold standard method. The genomic positions selected for these assays were based on the ability to differentiate ST or MLST profiles and practical considerations such as primer design. Non-polymorphic positions were eliminated, as well as redundant polymorphism positions. These redundant positions can be used in the future to replace any position part of the panel that would be left out of MLST sequence revision and/or ST redefinition. A SNaP profile that facilitates and simplifies genomic analyses is obtained in the final; it is a systematic order of the polymorphic sites and its nucleic acids. The aim of the present study was to develop freely available software that accomplishes these recent multiplex assays used for detection, identification and genotyping of

© 2014 The Authors. Methods in Ecology and Evolution © 2014 British Ecological Society

492 I. Soares & R. Araujo microbial pathogenic species. Such a tool avoids extensive MLST sequence analysis, alignment tools and exhaustive examination of thousands of profiles available in online data bases. The software MLST@SNAP is an automated conversion tool that simplifies routine profiling in the laboratories; it also facilitates microbial population analyses and encourages the dissemination of microbial genotyping strategies world-wide.

Design and implementation PROGRAM FEATURES

Each ST encompasses sequencing of seven standard gene fragments used to characterize the microbial isolates. The most relevant polymorphic positions of the MLST sequences were previously selected for each micro-organism, and a SNaP profile is observed following single amplification and minisequencing reactions (Caramalho et al. 2013; Eusebio et al. 2013a,b). The designed program allows choosing the microbial organism and then the selection of two main functions (see Figs 1 and 2 for software overview, diagrams and algorithm details). Conversion of MLST profiles into SNaP profiles Aiming to convert a specific isolate into its corresponding SNaP profile, the user can manually insert the ST profile, upload a text file with a list of ST profiles or upload a text file with complete MLST sequencing data. This tool rapidly converts thousands of ST or MLST sequences into the corresponding SNaP profiles without manual manipulation of the data. The software detects the position of the SNaP profile in each gene and provides the final code that corresponds to the genotyping data. The software detects the insertion of sequences outside the standard format and alerts the user to check the data; it

Standard microbial genotyping tool

M

LS T

se

qu

en

ci

ng

pa

en er Bc Pa aP Na SN , S u, on Af c aP ceB SN PB a SN

ne

ls

Multilocus sequence typing (MLST)

Automated conversion tool

Conversion of SNaP profiles into MLST profiles This approach was designed to convert the SNaP profiles into MLST complete sequences. The software connects to the MLST data base, and it is capable of complementing the information of the SNaP profile with the remaining non-polymorphic and complementary sequences available at the data base. The users can manually insert the genomic information of the polymorphic positions (SNaP profile) for a specific isolate or upload a text file containing a list of SNaP profiles. Therefore, the software can automatically convert hundreds of SNaP profiles into the corresponding MLST sequences. The final result is a list of sequences within the MLST standards that stays available for data analysis. SEQUENCES

Our designed approach was tested with personal and extra data of A. fumigatus, B. cenocepacia, B. cepacia, B. contaminans and P. aeruginosa deposited into online MLST data bases (http://www.mlst.net/). The software may be available to be tested with the genomic data of other micro-organisms as soon as specific SNaP panels are designed based on MLST information. The present software encourages the submission of data into the MLST platform, particularly the new sequences. Our new approach constantly consults MLST data bases before any analysis. SOFTWARE

The algorithm was written in Python, version 2.5.2 and tested on MACBOOK PRO, MAC OS X 10.9 system, LINUX – UBUNTO 12.10 and WINDOWS 7 – 32 bit and 64 bit systems. The interface, created using a tool prepared to interact with the Python programming language – VISUALWX, was designed to run in MACINTOSH, UNIX and WINDOWS 7 systems.

Results

MLST@SNaP Sequence types (ST) or MLST profiles

informs what sequences are not formatted according to the MLST standard rules. Furthermore, only sequences with correct length and orientation are accepted by the software providing the final SNaP profile. Additionally, the software informs the user about the presence of new and previously submitted sequences (ST number is provided with the SNaP profile).

SNaP profiles PROGRAM PERFORMANCE

MLST data bases http://www.mlst.net http://pubmlst.org

le ilab ava be ture) o t ted r fu pec nea (Ex in a

Fig. 1. Flowchart of MLST@SNAP software interaction among genotyping panels; the software allows communication between MLST and the newly SNP-based strategies.

Our designed software easily converted ST or MLST sequences into SNaP profiles, and vice versa, and the data were suitable to perform large-scale microbial population analyses. In fact, the software was tested with the genomic data of several hundreds of microbial isolates and after sequence alignments proved to be an excellent tool to expedite the analyses by simplifying, automating and improving greatly a previous manual and time-consuming process.

© 2014 The Authors. Methods in Ecology and Evolution © 2014 British Ecological Society, Methods in Ecology and Evolution, 5, 491–494

MLST@SNaP for microbial analysis

(a) Convert sequence types (ST) or MLST sequences into SNaP profiles (MLST→SNaP)

493

(b) Convert SNaP profiles into MLST sequences (MLST←SNaP) Input

Input

NO

n ST (n ≥ 1) (manually or .txt file)

OR

OR

n MLST sequences (n ≥ 1) (.txt file)

i=1 Data in standard format

Data in standard format

Data in standard format

YES i=1

i = 1 YES

Connection with MLST data base to identify the corresponding complete MLST sequence for ST i

Identification of positions that define the SNaP profile i

There is interest in determine the corresponding available ST for the n MLST sequence(s)

Extraction and concatenation of SNaP positions

SNaP profile for i

YES

Connection with MLST data base to identify the ST corresponding to MLST sequence i NO (for new data is suggested the submission to MLST database)

n SNaP profile(s)

i=i+1

i≤n

i>n n Complete MLST sequence(s)

Complement the information of the SNaP profile i with the remaining sequence data

i=1

i=i+1 i>n

Complete MLST sequence of i

i=1

Connection to MLST data base to identify the list of MLST sequences corresponding to SNaP profile i

ST for i i≤n

NO

(manually or .txt file)

i=1

i=1

YES

n SNaP profile(s) (n ≥ 1)

NO

Output

(for new data is suggested the submission to MLST database)

(c) Isolate 1 Isolate 2 Isolate 3

a

b

c

e

d

Five polymorphic positions

CTCGTCTTCAAT AATATCGTCAC TTCCGAGGAT CTCGTCTTCAAT AATATTGTCAC ATCCGACGAT Three MLST sequences CTGGTCTTCCAT AATATCGTCAC TTCCGACGAT

i≤n i=i+1 i>n

MLST@SNaP operations

n SNaP profile(s) Output

Isolate 1 Isolate 2 Isolate 3

CACTG CATAC GCCTC

SNaP profiles (abcde positions)

Connection to MLST databases

Isolate 1 Isolate 2 Isolate 3

CTCGTCTTCAAT AATATCGTCAC TTCCGAGGAT CTCGTCTTCAAT AATATTGTCAC ATCCGACGAT Converted sequences CTGGTCTTCCAT AATATCGTCAC TTCCGACGAT

Fig. 2. Design of the MLST@SNAP algorithm: (a) conversion of MLST sequences into SNaP profiles, (b) conversion of SNaP profiles into MLST sequences and (c) example of MLST data simplification by choosing five polymorphic positions that originates the SNaP profiles and vice versa.

The developed algorithm started by a deep scrutiny of the inputted data in order to verify whether the information was correctly inserted. If not, the program aborted and alerted the user with a message specifying the type of error that was found, facilitating the correction of the data and restarting the analysis. When the data have the correct format, it efficiently processes the desired genomic analyses in less than 10 s. A tutorial for the software was added and can be accessed directly on the software or be consulted as supplementary information. The program is connected to the microbial online MLST data bases. Therefore, it is automatically updated with the latest version and complete collection of microbial information. It constantly considers the complete group of polymorphisms and genetic variation described world-wide. As a result, our software is capable of checking whether the inserted information is already available in the online data base or whether the user possessed new data that should be added to the international data base (the software suggests its submission). RESULTS FROM THE TRAINING SETS AND VALIDATION

Figure 2C exemplifies in a simple manner how the software worked converting MLST sequences in SNaP profiles and vice versa; this figure shows an example of five polymorphic positions selected from three MLST genes for three isolates.

The polymorphic positions for A. fumigatus, B. cenocepacia, B. cepacia, B. contaminans and P. aeruginosa had been previously selected based on its genomic value, and methodological and technical suitability. abcde positions exemplified in Fig. 2C originated the SNaP profile for the three isolates; the SNaP profile could be converted once again into MLST sequences by employing MLST@SNAP software as described above in the ‘Design and implementation’ section. This strategy that converted MLST sequences into SNaP profiles and vice versa was extended to several hundred of genomic sequences of A. fumigatus, B. cenocepacia, B. cepacia, B. contaminans and P. aeruginosa. Original genomic data were aligned with the sequences obtained from the MLST@SNAP software employing the Geneious software v4.7 (Biomatters Ltd, Auckland, New Zealand) and BioEdit sequence alignment editor (available at http://www.ctu.edu. vn/~dvxe/Bioinformatic/Software/BioEdit.htm). The original sequence data were obtained from MLST data bases and personal data bases with the genomic data obtained when the SNP panels were previously developed in the laboratory (Caramalho et al. 2013; Eusebio et al. 2013a,b); more than 2700 MLST sequences were compared with the results provided by MLST@SNAP software. The obtained output showed a total agreement (100% for all tested organisms) with the manual processed data, that is, a specific ST or complete MLST sequences were correctly associated with the SNaP profile and vice versa.

© 2014 The Authors. Methods in Ecology and Evolution © 2014 British Ecological Society, Methods in Ecology and Evolution, 5, 491–494

494 I. Soares & R. Araujo

Discussion

Data archiving

We here describe user-friendly software that allows automated conversion of MLST alleles into SNaP profiles or SNaP profiles into MLST complete sequences (see Supplementary Figure S1). The algorithm was implemented as a program that allows for complete automated analyses of hundreds of profiles or sequences available at online data bases in order to avoid exhaustive examination and alignment practices, simplifying microbial population studies. In its current form, the program can already be applied to A. fumigatus, B. cenocepacia, B. cepacia, B. contaminans and P. aeruginosa. In the near future, we intend to extend it to other bacteria and microbial eukaryotes as soon as the most relevant polymorphisms are described for each micro-organism (similarly to what is presently observed with MLST information). The selection of the most informative nucleotide positions might simplify genomic characterization of microbial isolates, particularly in complex samples with multiple strains. A simple and informative genotyping profile (SNaP profile) is obtained after a single reaction of multiplex PCR amplification and mini-sequencing; non-polymorphic, redundant and low polymorphic positions are discarded. SNaPAfu (targets A. fumigatus) (Caramalho et al. 2013), SNaPBcen (targets B. cenocepacia) (Eusebio et al. 2013a), SNaPBceBcon (targets B. cepacia and B. contaminans) (manuscript in preparation) and SNaPaer (targets P. aeruginosa) (Eusebio et al. 2013b) assays represent practical, reproducible, six to seven times cheaper and sensitive alternatives to MLST that improve the diagnosis and surveillance of the pathogens A. fumigatus, B. cenocepacia, B. cepacia, B. contaminans and P. aeruginosa. The software here described represents undoubtedly a new tool for microbial research community, allowing automated analyses of hundreds of genotypes without manual processing. We have designed a graphic interface for microbial organisms to assist extensive genomic and population analyses; reduced running times are now required for those tasks. The authors intend that MLST@SNAP be available at MLST data bases (Jolley, Chan & Maiden 2004) and be part of laboratory routine of researchers and technicians, supporting the dissemination of microbial genotyping in laboratories world-wide. The software, manual, algorithm details, tutorial and example files are presently freely available at https://www.mediafire.com/ folder/kc4fwabtb8lu1/setupMLST%40SNaP, as supplementary material of this manuscript and at http://www. portugene.com/MLSTSNaP.html.

All genomic data can be accessed at international data bases ((http://www.mlst.net/ and http://pubmlst.org). MLST@SNAP is freely available for MACINTOSH, UNIX and WINDOWS systems at https://www.mediafire.com/folder/kc4fwabtb8lu1/setupMLST %40SNaP and http://www.portugene.com/MLSTSNaP.html accompanied by tutorial, example files and some data.

Acknowledgements The authors thank to Goncßalo Oliveira, Jo~ao Carneiro, Nadia Eusebio, Rita Caramalho and Tiago Pinheiro (University of Porto) by the valuable and critical suggestions on the software. No conflict of interest for all the authors.

Author contributions I.S. and R.A. designed and performed the research, analysed the data, and wrote the paper.

References Aydin, A., Baron, H., B€ahring, S., Schuster, H. & Luft, F.C. (2001) Efficient and cost-effective single nucleotide polymorphism detection with different fluorescent applications. BioTechniques, 31, 920–926. Bain, J.M., Tavanti, A., Davidson, A.D., Jacobsen, M.D., Shaw, D., Gow, N.A. & Odds, F.C. (2007) Multilocus sequence typing of the pathogenic fungus Aspergillus fumigatus. Journal of Clinical Microbiology, 45, 1469–1477. Caramalho, R., Gusm~ao, L., Lackner, M., Amorim, A. & Araujo, R. (2013) SNaPAfu: a novel single nucleotide polymorphism multiplex assay for Aspergillus fumigatus direct detection, identification and genotyping in clinical specimens. PLoS ONE, 8, e75968. Curran, B., Jonas, D., Grundmann, H., Pitt, T. & Dowson, C.G. (2004) Development of a multilocus sequence typing scheme for the opportunistic pathogen Pseudomonas aeruginosa. Journal of Clinical Microbiology, 42, 5644–5649. Dalmasso, A., Civera, T. & Bottero, M.T. (2009) Multiplex primer-extension assay for identification of six pathogenic vibrios. International Journal of Food Microbiology, 129, 21–25. Eusebio, N., Coutinho, C., Sa-Correia, I. & Araujo, R. (2013a) SNaPBcen: a novel and practical tool for genotyping Burkholderia cenocepacia. Journal of Clinical Microbiology, 51, 2646–2653. Eusebio, N., Pinheiro, T., Amorim, A.A., Gamboa, F., Saraiva, L., Gusm~ao, L., Amorim, A. & Araujo, R. (2013b) SNaPaer: a practical single nucleotide polymorphism multiplex assay for genotyping of Pseudomonas aeruginosa. PLoS ONE, 8, e66083. Jolley, K., Chan, M.-S. & Maiden, M. (2004) mlstdbNet - distributed multi-locus sequence typing (MLST) databases. BMC Bioinformatics, 5, 86. Lomonaco, S., Knabel, S.J., Dalmasso, A., Civera, T. & Bottero, M.T. (2011) Novel multiplex single nucleotide polymorphism-based method for identifying epidemic clones of Listeria monocytogenes. Applied and Environment Microbiology, 77, 6290–6294. Maiden, M.C. (2006) Multilocus sequence typing of bacteria. Annual Review of Microbiology, 60, 561–588. Unemo, M. & Dillon, J.A. (2011) Review and international recommendation of methods for typing Neisseria gonorrhoeae isolates and their implications for improved knowledge of gonococcal epidemiology, treatment, and biology. Clinical Microbiology Reviews, 24, 447–458. Received 25 November 2013; accepted 10 February 2014 Handling Editor: Matthew Davey

Supporting Information Additional Supporting Information may be found in the online version of this article. Figure S1. Screen dumps of MLST@SNAP software: (a) window for selection of micro-organism, and (b) example window of Aspergillus fumigatus for insertion of ST or MLST data.

© 2014 The Authors. Methods in Ecology and Evolution © 2014 British Ecological Society, Methods in Ecology and Evolution, 5, 491–494

Suggest Documents