Ancient chromosomal rearrangement ... - Wiley Online Library

3 downloads 0 Views 793KB Size Report
Marion Sinclair-Waters1. | Ian R. Bradbury1,2 | Corey J. Morris2 | Sigbjørn Lien3 |. Matthew P. Kent3 | Paul Bentzen1. 1Biology Department, Dalhousie University ...
Accepted Article

MS. MARION SINCLAIR-WATERS (Orcid ID : 0000-0001-7371-4547)

Article type

: Original Article

Ancient chromosomal rearrangement associated with local adaptation of a post-glacially colonized population of Atlantic Cod in the northwest Atlantic

Marion Sinclair-Waters1*, Ian R. Bradbury1,2, Corey J. Morris2, Sigbjørn Lien3, Matthew P. Kent3 and Paul Bentzen1

1

Biology Department, Dalhousie University, Halifax, Nova Scotia, B3H 4R2, Canada

2

Fisheries and Oceans Canada, Northwest Atlantic Fisheries Centre, St. John’s, Newfoundland, A1C

5X1, Canada 3

Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Faculty of

Biosciences, Norwegian University of Life Sciences, Ås, NO-1432, Norway

*Corresponding author: Marion Sinclair-Waters, Biology Department, Dalhousie University, Halifax, Nova Scotia, B3H 4R2, Canada, [email protected]

Running title: Adaptation and divergence of cod ecotypes.

This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process, which may lead to differences between this version and the Version of Record. Please cite this article as doi: 10.1111/mec.14442 This article is protected by copyright. All rights reserved.

Accepted Article

Keywords: Atlantic Cod; chromosomal rearrangement; local adaptation; population divergence; conservation units, single nucleotide polymorphism

Abstract Intraspecific diversity is central to the management and conservation of exploited species, yet knowledge of how this diversity is distributed and maintained in the genome of many marine species is lacking. Recent advances in genomic analyses allow for genome-wide surveys of intraspecific diversity and offer new opportunities for exploring genomic patterns of divergence. Here, we analyzed genome-wide polymorphisms to measure genetic differentiation between an offshore migratory and a non-migratory population and to define conservation units of Atlantic Cod (Gadus morhua) in coastal Labrador. A total of 141 individuals, collected from offshore sites and from a coastal site within Gilbert Bay, Labrador, were genotyped using an ~11k single nucleotide polymorphism array. Analyses of population structure revealed strong genetic differentiation between migratory offshore cod and non-migratory Gilbert Bay cod. Genetic differentiation was elevated for loci within a chromosomal rearrangement found on linkage group 1 (LG1) that coincides with a previously found double inversion associated with migratory and non-migratory ecotype divergence of cod in the northeast Atlantic. This inverted region includes several genes potentially associated with adaptation to differences in salinity and temperature, as well as influencing migratory behaviour. Our work provides evidence that a chromosomal rearrangement on LG1 is associated with parallel patterns of divergence between migratory and non-migratory ecotypes on both sides of the Atlantic Ocean.

Introduction Understanding the nature and scope of intraspecific genetic diversity is important for effective management and conservation of exploited species. Genetic diversity, particularly when associated with fitness-related traits, is expected to promote sustainability and persistence (Hilborn et al., 2003;

This article is protected by copyright. All rights reserved.

Accepted Article

Schindler et al., 2010). The increasing availability of genomic tools for non-model organisms creates the opportunity to study population structure and genetic variation with unprecedented resolution (Allendorf et al., 2010; Benestan et al., 2015; Bradbury et al., 2013; Hemmer-Hansen et al., 2014; Van Wyngaarden et al., 2017), and improves our ability to characterize the evolutionary processes that influence intraspecific diversity (Bourret et al., 2013; Bradbury et al., 2010; Funk et al., 2012; Larson et al., 2016; Moore et al., 2014). As genomic data on the basis of intraspecific diversity accumulate, there is increasing evidence that structural rearrangements, such as chromosomal inversions, may be important to the evolution of diversity by facilitating adaptive differences among populations and ultimately speciation (Hoffmann & Rieseberg, 2008; Jones et al., 2012; Joron et al., 2006; Lamichhaney et al., 2015; Lowry & Willis, 2010). Atlantic Cod (Gadus morhua), a commercially exploited species for hundreds of years (Kurlansky, 1997), is a benthopelagic fish that inhabits continental shelves throughout the temperate regions of the North Atlantic Ocean. It is characterized by high dispersal potential, high fecundity, and large population sizes (COSEWIC, 2010; Robichaud & Rose, 2004). In the northwest Atlantic, studies using microsatellite loci revealed weak, yet significant, population structure over large spatial scales (Beacham et al., 2002; Bentzen et al., 1996; O’Leary et al., 2007; Ruzzante et al., 1996). This pattern has been supported by more recent studies using single nucleotide polymorphisms (SNPs) that include putatively adaptive loci in analyses of population structure (Bradbury et al. 2010, 2013). These SNP-based approaches also indicate that adaptive variation in Atlantic Cod is not randomly distributed across the genome, but rather, is frequently associated with discrete chromosomal regions, and more specifically, chromosomal inversions. For example, linkage blocks located on three linkage groups (LGs), specifically LG2, LG7, and LG12, are involved in the separation of cod into temperature-associated southern and northern groups on both sides of the Atlantic Ocean (Bradbury et al., 2013), and more recently, fine geographic scale variation in the Gulf of Maine (Barney et al., 2017). In addition, an island of divergence located on LG1 is associated with genetic divergence between migratory and stationary ecotypes in both Icelandic and Norwegian waters (Hemmer-Hansen et al., 2013; Karlsen et al., 2013). LG1 contains the pantophysin (Pan I) locus, which has been

This article is protected by copyright. All rights reserved.

Accepted Article

previously shown to display allele frequency differences between stationary coastal Norwegian cod and the offshore migratory North East Arctic cod (Fevolden & Pogson, 1997). More recently, it has been shown that this highly differentiated region comprises two adjacent chromosomal inversions (Kirubakaran et al., 2016). This double inversion on LG1 is observed at higher frequencies in the migratory North East Arctic population than coastal non-migratory populations in the northeast Atlantic (Berg et al., 2016; Kirubakaran et al., 2016). Inversions such as this were unknown prior to the genomics era because it was difficult to produce banded karyotypes for fish species like Atlantic Cod (Ghigliotti et al., 2012), however due to an increase in available genomic data, we are now able to identify these inversions in Atlantic Cod and examine their role in shaping population structure across the species’ range. Non-migratory populations of Atlantic Cod exist on both sides of the North Atlantic (Robichaud & Rose, 2004). In the northwest Atlantic, the Gilbert Bay population is a small nonmigratory population (Green & Wroblewski, 2000) that inhabits Gilbert Bay, a small (~60 km2) and shallow (mean depth = 33 m, max. depth = 163 m) inlet in southern Labrador (Copeland et al., 2007). Gilbert Bay cod occur mostly within the inlet, however, acoustic tracking has shown that some Gilbert Bay cod will move to waters outside the MPA during the summer months and that the home range of Gilbert Bay cod includes a ~270 km2 area outside the inlet (Morris et al., 2003; Morris et al., 2014). This contrasts migration behaviours of the offshore cod in Newfoundland and Labrador that undergo migrations between coastal feeding grounds and offshore shelf regions of up to 1,667 km (Robichaud & Rose, 2004) and reach depths of up to 375 m (Rose, 1993). To enable conservation of Gilbert Bay cod and their habitat, Gilbert Bay was designated a Marine Protected Area (MPA) under Canada’s Ocean Act in 2005. Despite efforts to protect the Gilbert Bay cod population, decreases in abundance and recruitment have continued (Morris & Green, 2014) likely due to exploitation in coastal fisheries occurring when Gilbert Bay cod move outside the MPA during late summer (Morris et al., 2014). The Gilbert Bay cod population has been identified as the most genetically distinct population of Atlantic Cod in the northwest Atlantic (Ruzzante et al. 2000; Beacham et al. 2002; Hardie et al. 2006;

This article is protected by copyright. All rights reserved.

Accepted Article

Bradbury et al. 2010, 2013); however, the genomic basis of this divergence and its non-migratory behaviour have yet to be resolved. Here, we examine genomic differentiation between Gilbert Bay cod and offshore cod in coastal Labrador using data from thousands of SNPs distributed across all chromosomes. We combine genomic data for Atlantic Cod with linkage map information to characterize how differentiation is structured across the genome, which allows us to assess whether the same chromosomal features that are implicated in divergence between migratory and non-migratory ecotypes in the northwest Atlantic (i.e. LG1) occur in Newfoundland and Labrador waters. We also evaluate whether the Gilbert Bay population qualifies as a separate conservation unit through the application of outlier tests, chromosomal rearrangement detection and examination of genes associated with the divergence of this population. More broadly, we demonstrate the value of considering genomic data and genomic architecture when resolving intraspecific diversity and defining conservation units in economically important marine species.

Methods Sample collection, DNA extraction & genotyping Atlantic Cod were sampled from four offshore sites located in marine waters off Newfoundland and Labrador and one coastal site located in the most inner region of Gilbert Bay, a narrow inlet on the southeast coast of Labrador. From 2010 to 2015, 78 individuals were collected from the offshore sites over four sampling periods that occurred on 5 December 2010, 7 June 2011, 8 October 2015 and 10 October 2015. Another 63 individuals were collected from the coastal site during two sampling periods: 9 September 2012, and 1-10 June 2015 (Fig. 1, Table 1). Fin clips were collected and immediately preserved in 95% ethanol. Genomic DNA was extracted from a 2 mm3 tissue sample taken from the preserved fin clips over two separate batches. The first batch was extracted using a Qiagen DNeasy 96 blood and tissue kit following protocol described by manufacturer and the second batch of DNA was extracted using a standard phenol-chloroform

This article is protected by copyright. All rights reserved.

Accepted Article

procedure (Sambrook et al., 1987). All DNA was quantified using QuantIT PicoGreen (Life Technologies) and was normalized to a final concentration of 50ng/μL prior to genotyping. Genotyping took place at the Centre for Integrative Genetics (CIGENE), Norwegian University of Life Sciences in Ås, Norway using a custom 11K SNP-array developed for Atlantic Cod. The SNParray was developed using whole genome sequencing data from seven Atlantic Cod representing diverse locations in the northeast Atlantic (Kent et al. in prep.). Reads were aligned to the gadMor1 reference genome (Star et al., 2011) and 2,877,794 putative SNPs were identified. Of these, 10,913 SNPs were chosen for the array based on their even physical distribution and certain value-added characteristics including 260 SNPs from previous studies (Hubert et al., 2010; Moen et al., 2008), 672 SNPs in close proximity to candidate genes, and 1595 non-synonymous coding SNPs. In total, 72% SNPs in the array are not gene-centric and were chosen to ensure that all large scaffolds (>10 kbp) were represented by at least 2 SNPs. The first SNP in each scaffold was selected and the next was selected from a small search window with a jump distance determined by the scaffold length (longer scaffolds having longer jump intervals) with longest jumps being 100 kbp and shortest being 10 kbp. On average, the array includes 409 SNPs per chromosome (Kent et al. in prep.).

Quality control filters and population genetic statistics To ensure optimal data quality, we filtered out markers and individuals that failed particular quality thresholds. Any locus that did not qualify as a bi-allelic SNP based on a clustering pattern using a large sample set of more than 5,000 individuals (Kent et al. in prep.) was removed prior to the following filtering steps. Any individual with low genotyping success (less than 85% complete) was removed from the dataset. SNPs with a minor allele frequency lower than 0.01 were removed. SNPs were also filtered to ensure missing data at any loci were not greater than 15%. PLINK, a tool set for manipulating and analyzing large genomic datasets, was used for all filtering procedures (Purcell et al., 2007). The order of filtered SNPs on chromosomes was determined using the linkage map for Atlantic Cod (Grove et al. in prep.) that contains 9,354 markers assigned to 23 linkage groups. The

This article is protected by copyright. All rights reserved.

Accepted Article

map, constructed in a comprehensive pedigree of 2,951 individuals, was used to integrate, order, and orientate scaffolds into the chromosomes sequences in the gadMor2 assembly (Tørresen et al., 2017). Per locus pairwise genetic differentiation, Weir & Cockerham’s FST (1984), between the two putative populations, Gilbert Bay and offshore, was calculated in Arlequin v3.5 (Excoffier & Lischer, 2010) and 10,000 permutations were performed to access significance. Observed heterozygosity (Ho) for each locus was also calculated using the R package diveRsity (Keenan et al., 2013). For both the offshore group and coastal group, allele frequencies were calculated per locus using the genepop_allelefreq function in the R package genepopedit (Stanley et al., 2016).

Analysis of population structure Population structure was examined and visualized using three methods. First, the Bayesian clustering program STRUCTURE 2.3.4 was used to determine population structure with the filtered SNP dataset (Pritchard et al., 2000). For each possible value of K (1-4), three replicates runs consisting of a burn-in period of 100,000 and a run length of 500,000 iterations were performed. An approach implemented in CLUMPAK was used to visualize and determine the best number of populations (or clusters (K)) for our study samples (Kopelman et al. 2015). The likelihood of each value of K was estimated using the DeltaK statistic of Evanno et al. (2005) to determine the number of populations present. The second method used to examine population structure was a discriminant analysis of principal components (DAPC) as implemented in the R package adegenet (Jombart et al., 2010). A DAPC uses no prior knowledge of population structure and estimates population structure by maximizing differences between clusters while minimizing differences within clusters. The optimal number of populations was determined using the function find.clusters, which uses a Bayesian information criterion method. Finally, an analysis of molecular variance (AMOVA) implemented in Arlequin v3.5 (Excoffier & Lischer, 2010) was used to determine the proportion of variance explained by the two groups, Gilbert Bay and offshore, and to calculate a global fixation index (FST).

This article is protected by copyright. All rights reserved.

Accepted Article

Identification and annotation of highly divergent loci Loci that displayed elevated divergence between the two putative populations, Gilbert Bay and offshore, were identified. We used two different approaches for identifying highly divergent loci. First, we selected any SNP that had a value of FST greater than 0.3 as calculated using diveRsity. Second, we used three genome scan methods for detecting SNPs with greater than expected levels of divergence (i.e. outlier loci). To reduce the possibility of false positives, only the SNPs that were identified as outliers using all three methods were considered true outliers. Loci that were not identified as outliers by any of the three methods were classified as neutral. Analyses of population structure (STRUCTURE and DAPC), AMOVA and global FST calculations were repeated using the neutral dataset. The first method for outlier detection was a Bayesian approach as implemented in the program BayeScan v.2.1 (Foll & Gaggiotti, 2008). Following a burn-in period of 50,000, we ran 100,000 iterations with prior odds set at 100. Any SNP with a posterior probability over 0.95 was considered an outlier. The second method used the Fdist approach (Beaumont & Nichols, 1996) as implemented in the program Arlequin v3.5 (Excoffier & Lischer, 2010). In total, 200,000 coalescent simulations were performed and loci with a P-value less than or equal to 0.01 were considered outliers. The final method was based on a principal component analysis (PCA) as implemented in the R package pcadapt (Luu et al., 2017). The statistical method used in pcadapt identifies loci that are significantly associated with population structure as outliers. We used the default method that computed Mahalanobis distance and set K=1 to reflect population structure. Any SNP with a P-value less than 0.05 was considered an outlier.

Gene annotations for the highly divergent SNPs were obtained from Berg et al. (2016). Gene annotations for SNPs that did not overlap with the Berg et al. (2016) study were obtained from Ensembl. A 201 basepair (bp) sequence surrounding the SNP was aligned to the gadMor1 genome using BLAT and parameters that required a 201 bp alignment length and >95% identify.

This article is protected by copyright. All rights reserved.

Accepted Article

Chromosomal rearrangement detection The R package inveRsion (Cáceres et al., 2012) was used to detect large chromosomal rearrangements. Only SNPs with known order in the genome based on the linkage map were included in the analysis. A block size of 3 SNPs was used to determine haplotypes for each candidate breakpoint. Next, a sliding window size of 100 SNPs was used to scan each chromosome and a Bayesian information criterion threshold (thbic) of 0 was used when identifying rearranged regions of the genome. Once candidate breakpoints were identified, the R package invClust (Cáceres & González, 2015) was used to determine the rearrangement genotypes for individuals. To further investigate the possibility of chromosomal rearrangements, patterns of linkage disequilibrium (LD) within each LG were observed. High levels of LD between loci in rearranged regions would be expected due to a reduced rate of recombination (Dobzhansky & Epling, 1948; Feder et al., 2014). We calculated LD between SNPs on each linkage group for both the offshore and Gilbert Bay groups using PLINK (Purcell et al., 2007).

Results Quality control filters and population genetic statistics Genotypes for 141 individuals were obtained. Of the 10,913 genotyped loci, 8,581 were categorized as bi-allelic SNPs and included in the analyses for this study. The overall genotyping rate was 0.98 and no individuals were filtered out due to low overall genotyping success (80% of all individual genotypes associated with one of the two populations. For the DAPC analysis, the lowest BIC value (986.07), and therefore the most probable number of populations, corresponded to K=2. A total of 40 PCA axes and one discriminate function were retained for the analysis of population structure. Finally, in the AMOVA using all loci , 6.01%, 0.03%, and 93.96% of variation was found between Gilbert Bay and offshore, among samples collected within each these two regions, and within each collected sample, respectively. The global fixation index (FST) was 0.06.

This article is protected by copyright. All rights reserved.

Accepted Article

Identifying highly divergent loci The three methods of outlier detection identified a total of 249 outlier loci, of which 35 were detected by all methods. Global FST of the 35 consensus outlier loci was 0.47 (P-value =