2006). Ori-Finder was used to predict the replication origin of the plasmid (Gao and Zhang. 2008). .... Unipro UGENE: a unified bioinformatics toolkit. ... BASys: a web server for automated bacterial genome annotation. .... ankyrin repeat-domain.
SUPPLEMENTARY M&M
Genome assembly and annotation Reads were quality filtered and used for a hybrid assembly approach with MIRA v3.4 (Chevreux et al. 1999) giving an initial assembly (52,968 contigs, 26.5 Mb of assembled reads, 565 pb of N50 and 295 pb of N90). Cardinium contigs were identified from the initial assembly based on their GC content, coverage profile, BLAST similarities, PhymmBL (Brady and Salzberg 2009) identification and paired-end information and reassembled (366 contigs, 1.3 Mb of assembled reads, 15,164 pb of N50 and 1240 pb of N90). SSPACE v2.0 BASIC (Boetzer et al. 2011) and GapFiller v1.9 (Boetzer and Pirovano 2012) were used for scaffolding reassembled contigs and filling scaffold gaps, respectively. Assembly was manually edited with gap4 (Staden et al. 2000) giving a final assembly of 11 chromosomal contigs (612 kb of N50) and a closed circular plasmid contig. Homopolymers were refined with polisher v3.0.8 (Foster et al. 2012). Initial ORF predictions were performed with Prodigal (Hyatt et al. 2010) and uploaded to the annotation servers BASys (Van Domselaar et al. 2005) and RAST (Aziz et al. 2008). Manual curation of the annotations was made using Artemis (Rutherford et al. 2000) and the following databases: tRNAScan-SE (Schattner et al. 2005), Pfam (Punta et al. 2012), Uniprot (The UniProt Consortium 2012), Interpro (Quevillon et al. 2005), BLAST and CDD (Marchler-Bauer et al. 2011) and PHAST (Zhou et al. 2011). tRNA genes with anticodon CAT were discriminated according to Silva et al. (Silva et al. 2006). Ori-Finder was used to predict the replication origin of the plasmid (Gao and Zhang 2008). Metabolic inferences were made using KEGG (Kanehisa et al. 2012) and KAAS (Moriya et al. 2007). Transmembrane domains were predicted with TMHMM2.0 (Käll et al. 2007). Signal peptides were detected using SignalP 4.0 Server (Petersen et al. 2011) with signal P3.0 sensitivity selected. Protein domains (TIGR4131 and TIGR4183) were downloaded and searched against Cardinium cBtQ1 proteome using HMMER (Eddy 2011).
Analyses of B. tabaci and Encarsia spp. samples Encarsia and B. tabaci samples DNA extraction Four individuals from each Encarsia species were washed and placed individually in tubes (Table S4). Whiteflies were captured from twelve locations belonging to four municipalities in the Valencia province (Table S5) and placed alive in 100% ethanol until DNA extraction. From each location, four females were washed with deionized water and individually placed in 0.5 ml tubes. Each individual was homogenized in 30 μl of 5% Chelex (Walsh et al. 1991) (in miliQ water) with a pipette tip, placed 20 minutes at 65ºC and 20 minutes at 99ºC. DNA extractions were centrifuged at 5000 rpm for 5 minutes and supernatants were stored until use at -20ºC, discarding the Chelex pellet.
B. tabaci biotype detection Biotype Q detection was done as described by Khasdan et al (Khasdan et al. 2007) (Table S6). PCR assay for mitochondrial cytochrome oxidase I (COI) fragments was 5 minutes of denaturalization step (95ºC), 35 cycles of amplification step (95ºC for 30 seconds, 52ºC for 30 seconds and 72ºC for 1 minute) and final extension (72ºC for 5 min). PCR COI amplicons were digested with VspI according to manufacturer instructions and band pattern analyzed. Four females from sample B, two females from sample D, two females for sample E and four from sample F were selected for sequencing the whole COI fragment using BigDye Terminator v3.1 Cycle following manufacturer instructions (Applied Biosystems).
Phylogenetic analysis Sequences from COI fragments were quality edited with Staden Package (pregap and gap4) (Staden et al. 2000) and aligned with MAFFT v6.717b (L-INS-i) (Katoh et al. 2002). GTR+I+G were selected as best model using jModeltest2 (Darriba et al. 2012).
The 50 BLASTX best hits for the CHV_004, CHV_p02, CHV_p06 and CHV_p011 were downloaded and aligned with MAFFT (L-INS-i). ProtTest3 (Darriba et al. 2011) was used for select the appropriate model for each alignment. In all cases Gblocks (Castresana 2000) were used for prune the alignment before. Finally, RaxML (Stamatakis 2006) with 1000 rapid bootstraping, optimization for branch length and the appropiate model was used to calculate maximum likelihood trees. PhyloBayes3 (Lartillot et al. 2009) was used for Bayes phylogeny inference (GTR+I+G model) with 3 independent chains (maximum discrepancy between chains smaller than 0.1 and an effective sample size greater than 200). Archeopterix software was used for display and edit the phylogenetic tree (Han and Zmasek 2009).
PCR Symbiont detection Symbionts detection for all the samples were done with the following PCR profile adjusting the annealing temperature for each primer set: 5 minutes of denaturalization step (95ºC), 40 cycles of amplification step (95ºC for 30 seconds, XXºC for 30 seconds and 72ºC for 1 minute) and final extension (72ºC for 5 min) (Table S6). Large fragments of the Cardinium 16S rRNA gene were amplified from the same four females from sample B and F that were used for COI amplification. PCR profile was the same as symbiont detection but adjusting the annealing temperature (Table S6). Sequences were checked with Staden Package (pregap and gap4) and aligned with UGENE (Okonechnikov et al. 2012) against the 16S rRNA from Cardinium cBtQ1 and cEper1.
Gliding genes detection Primers for each gliding gene and gyrB were designed based on Cardinium cBtQ1 sequence (Table S6), the last as a positive internal reporter for the presence of Cardinium. LightCycler 2.0 (Roche) was used with the following PCR profile: 15 minutes of denaturalization step (95ºC), 40 cycles of amplification
step (95ºC for 10 seconds, 58ºC for 20 seconds and 72ºC for 20 seconds) and a melting curve step (68ºC to 95ºC with a ramp rate of 0.2ºC each second). LigthCycler FastStart DNA MasterPLUS SYBR Green I (Roche) mix was used as manufacturer recommendation adding 2 μl of each DNA extraction (1 mM MgCl2). For each individual, the five genes were analyzed separately with their respective nontemplate controls. Melting curves were inspected for detect false positives amplifications (like primerdimer amplifications).
Whole-mount Fluorescent in situ hybridization (FISH) B. tabaci nymphs were collected with a waterfloss device in a mess. FISH procedure was followed as described in Gottlieb et al (Gottlieb et al. 2006). Nymphs were directly transferred into modified Carnoy's fixative (6 chloroform : 3 absolute ethanol : 1 glacial acetic acid) and left overnight. Fixed nymphs were washed with ethanol and transferred to a 6% solution of H2O2 (in ethanol) for at least two hours. Hybridization was performed overnight at room temperature in standard hybridization buffer (20 mM Tris-HCl [pH 8.0], 0.9 M NaCl, 0.01% SDS, 30% formamide) and washed (20 mM Tris-HCl [pH 8.0], 5mM EDTA, 0.1 M NaCl, 0.01% SDS) before slide preparation. Whole nymphs were viewed under an Olympus FV1000 confocal microscope. Portiera specific probe BTP1-FAM (TGTCAGTGTCAGCC CAGAAG), Hamiltonella specific probe BTH-Cy3 (CCAGATTCCCAGACTTTACTCA) (Gottlieb et al. 2008) and Cardinium specific probe Card-Cy5 (TATCAATTGCAGTTCTAGCG) (Matalon et al. 2007) were used. Nymphs treated with Rnase, nonprobe controls and nymphs without Cardinium (Q biotype with Hamiltonella and Rickettsia) were used as specificity probe controls. References Aziz RK et al. 2008. The RAST Server: rapid annotations using subsystems technology. BMC Genomics. 9:75.
Boetzer M, Henkel C V, Jansen HJ, Butler D, Pirovano W. 2011. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics. 27:578–9. Boetzer M, Pirovano W. 2012. Toward almost closed genomes with GapFiller. Genome Biol. 13:R56. Brady A, Salzberg SL. 2009. Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Na. Methods. 6:673–6. Castresana J. 2000. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 17:540–52. Chevreux, B., Wetter, T. and Suhai S. 1999. Genome Sequence Assembly Using Trace Signals and Additional Sequence Information. Comput Sci Biol Proc Ger Conf Bioinforma 99:45–56. Darriba D, Taboada GL, Doallo R, and Posada D. 2011. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics. 27:1164–5. Darriba D, Taboada GL, Doallo R, and Posada D. 2012. jModelTest 2: more models, new heuristics and parallel computing. Nat Methods. 9:772. Eddy SR. 2011. Accelerated Profile HMM Searches. PLoS Comput Biol. 7:e1002195. Foster B et al. 2012. POLISHER : a tool for using ultra short reads in genome sequence improvement. 416091. Gao F, Zhang C-T. 2008. Ori-Finder: a web-based system for finding oriCs in unannotated bacterial genomes. BMC Bioinformatics. 9:79. Gottlieb Y et al. 2006. Identification and Localization of a Rickettsia sp. in Bemisia tabaci (Homoptera: Aleyrodidae). Appl Environ Microbiol. 72:3646–52. Gottlieb Y et al. 2008. Inherited intracellular ecosystem: symbiotic bacteria share bacteriocytes in whiteflies. FASEB J. 22:2591–9.
Han M V, and Zmasek CM. 2009. phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics. 10:356. Hyatt D et al. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 11:119. Käll L, Krogh A, Sonnhammer ELL. 2007. Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server. Nucleic Acids Res. 35:W429–32. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. 2012. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40:D109–14. Katoh K, Misawa K, Kuma K, and Miyata T. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30:3059–66. Khasdan V et al. 2007. DNA markers for identifying biotypes B and Q of Bemisia tabaci (Hemiptera: Aleyrodidae) and studying population dynamics. Bull Entomol. Res. 95:605–613. Marchler-Bauer A et al. 2011. CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res. 39:D225–9. Lartillot N, Lepage T, Blanquart S. 2009. PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics. 25:2286–8. Matalon Y, Katzir N, Gottlieb Y, Portnoy V, and Zchori-Fein E. 2007. Cardinium in Plagiomerus diaspidis (Hymenoptera: Encyrtidae). J Invertebr Pathol. 96:106–108. Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. 2007. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 35:W182–5. Okonechnikov K, Golosova O, and Fursov M. 2012. Unipro UGENE: a unified bioinformatics toolkit. Bioinformatics. 28:1166–7.
Petersen TN, Brunak S, von Heijne G, Nielsen H. 2011. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 8:785–6. Punta M et al. 2012. The Pfam protein families database. Nucleic Acids Res. 40:D290–301. Quevillon E et al. 2005. InterProScan: protein domains identifier. Nucleic Acids Res. 33:W116–20. Rutherford K et al. 2000. Artemis: sequence visualization and annotation. Bioinformatics. 16:944–945. Schattner P, Brooks AN, Lowe TM. 2005. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 33:W686–9. Silva FJ, Belda E, Talens SE. 2006. Differential annotation of tRNA genes with anticodon CAT in bacterial genomes. Nucleic Acids Res. 34:6015–22. Staden R, Beal KF, and Bonfield JK. 2000. The Staden package, 1998. Methods Mol Biol. 132:115– 130. Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 22:2688–90. The UniProt Consortium. 2012. Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res. 40:D71–5. Van Domselaar GH et al. 2005. BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res. 33:W455–9. Walsh PS, Metzger DA, and Higuchi R. 1991. Chelex 100 as a medium for simple extraction of DNA for PCR-based typing from forensic material. Biotechniques. 10:506–513. Zhou Y, Liang Y, Lynch KH, Dennis JJ, Wishart DS. 2011. PHAST: a fast phage search tool. Nucleic Acids Res. 39:W347–52.
SUPPLEMENTARY TABLES
Table S4. Detection of Cardinium gliding genes in different Encarsia species. Species
E. pergandiella
E. pergandiealla
E. hispida
E. inaron
Location
Texas (USA)
Brazil
Italy
Arizona (USA)
Reproduction Phenotype
Cytoplasmic Incompatibility
Parthenogenesis
Parthenogenesis
None
gyrB
+
+
+
+
gldK
-
-
-
-
gldL
-
-
-
-
gldM
-
-
-
-
gldN
-
-
-
-
1
Table S5. Detection of endosymbionts, biotype and genes in field Bemisia tabaci samples from Valencia province (Spain). Pesticide Sample Municipality
Host Plant
Gliding Biotype Hamiltonella
Arsenophonus
Cardinium
Wolbachia
Rickettsia
Treatment A
Museros
B
Museros
Cucumis melo
CHV_p021 genes
no
fail
fail
fail
fail
fail
fail
fail
fail
no
Q
+
-
+
-
-
+
+
no
Q
+
-
+
-
-
+
+
no
S
-
+
+
+
-
+
+
no
Q
+
-
+
-
-
+
+
yes
S
-
+
+
+
-
+
+
no
Q
+
-
+
-
-
+
+
no
Q
+
-
+
-
-
+
+
no
Q
+
-
+
-
-
+
+
no
Q
+
-
+
-
-
+
+
Solanum melongena Solanum C
Museros lycopersicum Solanum
D
Moncada lycopersicum Solanum
E
Moncada lycopersicum
F
Bétera
G
Moncada
Cucumis melo Solanum lycopersicum Solanum
H
Moncada lycopersicum Solanum
I
Moncada lycopersicum Solanum
J
Moncada lycopersicum
2
Pesticide Sample Municipality
Host Plant
Gliding Biotype Hamiltonella
Arsenophonus
Cardinium
Wolbachia
Rickettsia
Treatment
CHV_p021 genes
Solanum K
Perelló
no
Q
+
-
+
-
-
+
+
no
Q
+
-
+
-
-
+
+
lycopersicum L
Perelló
Malva sp.
+: positive PCR detection -: non detected by PCR
3
Table S6. PCR primers used in this work. Annealing
Melting Product
Organims
Gene
Primer name
Sequence (5'-3')
Procedure
Temperature
Curve
Reference
size (bp) (ºC) gldK_F gldK
gldL
gldM
gldN Cardinium
(ºC)
CAGTATCCATGTCACCGAAG
gldK_R
GACTCTGGATCGGATTTAACG
gldL_F
CTATTAAACCTACCTGGCGG
gldL_R
CAGTTGGTAAATTAGGATGGC
gldM_F
CATCTGGGCTTAGGAGTTTAG
gldM_R
CGTCTGTATAGGTCTACTACCC
gldN_F
GGCTTACGTCTTATATGCCAG
This study Gliding genes detection
59
114
77.81
Gliding genes detection
58
136
77.21
Gliding genes detection
59
134
77.48
Gliding genes detection
58
153
79.08
This study
This study
This study
gldN_R
AGGGTATCATCCTGATAGGG
RHS_192_214_F
AACCAGGTGAACAATCTTCTAT
RHS_327_348_R
GTAATTCCGAGTCTTCTTGGG
Car_gyrBF_629
GTGAACAAGACGAACAAGGC
Gliding genes detection internal
Car_gyrBR_757
CTCCTTCTACGAGAATAGGC
control
CFB-F
GCGGTGTAAAATGAGCGTG
This study
CHV_p 021
gyrB
16S
Hamiltonella
16S
CFB-R
ACCTMTTCTTAACTCAAGCCT
CARf
TACTTTACACTGGGGAATAGCC
CARr
GTCGCTGGTCTAACACTAAACA
Hb-F
TGAGTAAAGTCTGGAATCTGG
Gliding genes detection
59
156
80
58
131
80.5
Symbiont detection
58
395
-
16S amplification
52
1300
-
2
Symbiont detection
58
700
-
3
This study 1
4
Arsenophonu 16S s
Hb-R
AGTTCAAGACCGCAACCTC
ArsF3
GTCGTGAGGAARGTGTTARGGTT CCTYTATCTCTAAAGGMTTCGCTGG
ArsR3
4 Symbiont detection
53
581-803
-
Symbiont detection
53
610
-
Symbiont detection
58
900
-
B. tabaci biotype detection
52
816
-
ATG 81F
Wolbachia
wsp
Rickettsia
16S
B. tabaci
COI
TGGTCCAATAAGTGATGAAGAAAC
691R
AAAAATTAAACGCTACTCCA
Rb-F
GCTCAGAACGAACGCTATC
Rb-R
GAAGGAAAGCATCTCTGC
C1-J-2195
TTGATTTTTTGGTCATCCAGAAGT
L2-N-3014
TCCAATGCACT AATCTGCCATATTA
5
6
7
References: 1. Weeks AR, Velten R, Stouthamer R. 2003. Incidence of a new sex-ratio-distorting endosymbiotic bacterium among arthropods. Proc Biol Sci. 270: 1857–1865. 2. Liu Y, Miao H, Hong X-Y. 2006. Distribution of the endosymbiotic bacterium Cardinium in Chinese populations of the carmine spider mite Tetranychus cinnabarinus (Acari: Tetranychidae). J Appl Entomol. 130: 523–529. 3. Zchori-Fein E, Brown JK. 2002. Diversity of Prokaryotes Associated with Bemisia tabaci (Gennadius) (Hemiptera: Aleyrodidae). Ann
Entomol Soc Am. 95: 711–718.
4. Duron O, Hurst GDD, Hornett EA, Josling JA, Engelstädter J. 2008. High incidence of the maternally inherited bacterium Cardinium in spiders. Mol Ecol. 17: 1427–1437. 5. Braig HR, Zhou W, Dobson SL, O’Neill SL. 1998. Cloning and characterization of a gene encoding the major surface protein of the bacterial endosymbiont Wolbachia pipientis. J Bacteriol. 180: 2373–2378. 6. Gottlieb Y, Ghanim M, Chiel E, Gerling D, Portnoy V, et al. 2006. Identification and Localization of a Rickettsia sp. in Bemisia tabaci (Homoptera: Aleyrodidae). Appl
5
Environ Microbiol. 72: 3646–3652. 7. Frohlich D, Torres-Jerez I, Bedford I, Markham P, Brown J. 1999). A phylogeographical analysis of the bemisia tabaci species complex based on mitochondrial DNA markers. Mol Ecol. 8: 1683–1691.
6
Table S7. Non- transposase pseudogenes identified in Cardinium cBtQ1.
Locus_tag
Present in
Product
cEper1
AmAs
Duplicated
Zinc dependent phospholipase C/S1-P1 nuclease hypothetical protein Lysozyme M1 Thymidylate kinase Sodium:solute symporter family protein Sodium:solute symporter family protein ankyrin repeat-domain TPR repeat-containing protein
yes yes yes yes yes yes yes yes
yes yes yes yes yes yes yes yes
no no no no no no yes yes
CHV_a0377 CHV_a0391
Ankyrin repeat-containing protein Putative endonuclease 4 exported protein of unknown function
yes
no
yes
CHV_a0446 CHV_a0452
CAHE_0678-like protein Sodium:solute symporter family protein ABC-type multidrug transport system, ATPase
yes yes
no no
no yes
CHV_b0061 CHV_b0067 CHV_c0017 CHV_c0025 CHV_c0068 CHV_e0037 CHV_e0045 CHV_e0046 CHV_e0050 CHV_g0002
and permease components hypothetical protein Putative peroxiredoxin hypothetical protein Cystathionine gamma-lyase 8-amino-7-oxononanoate synthase Thermostable carboxypeptidase 1 ATP-dependent DNA helicase recG Glycine--tRNA ligase Glutamine amidotransferase subunit pdxT
yes yes yes yes yes yes yes yes yes yes
yes no yes no no no yes yes yes yes
yes no no no no no yes yes yes no
CHV_a0073 CHV_a0075 CHV_a0145 CHV_a0202 CHV_a0203 CHV_a0250 CHV_a0263 CHV_a0345
Gene
Present in
acm tmk
cgl bioF recG glyS pdxT
Comments
Identification not possible
7
CHV_h0005 CHV_k0002
hypothetical protein NUDIX domain-containing protein
yes yes
no no
no no
8