InternaConal effort to produce the first Buffalo SNP array (Affymetrix Axiom® ... Axiom Buffalo Genotyping Array. Array: 90k SNPs ... (MON) for news about this!
Using the 90k Buffalo SNP array Ezequiel Luis Nicolazzi, Cur$s P. VanTassell, Daniela Iamar$no, James M. Reecy, Eric FritzWaters, Tad S. Sonstegard, James E. Koltes, Steven G. Schroeder, Ali Ahmad, Jose Fernando Garcia, Luigi Ramunno, Gianfranco Cosenza, John Williams and the Interna'onal Buffalo Consor'um
From SNP array data to applicaBon • Interna$onal effort to produce the first Buffalo SNP array (Affymetrix Axiom® technology) in the framework of the Interna$onal Buffalo Consor$um, coordinated by PTP. Presented at PAG XXI • And then?
– Genotyping of (a lot of) individuals – ExtracBon of SNP genotypes from raw Affymetrix files – QC – Genomic analyses
StarBng data A total of 1605 individuals genotyped with the Affymetrix Axiom Buffalo Genotyping Array Array: 90k SNPs 123k probes genotyped (~33k double-‐SNP probes) Star$ng dataset: – 1376 River Buffalo (Bubablus Bubalis)
• 10 countries (ITA,BRA,COL,EGY,IRN,MOZ,PHL,PAK,ROM,TUR)
– 206 Swamp Buffalo (Bubablus Bubalis)
• 6 countries (IDN, PHL, BRA, CHN, PHL, THA)
– 15 South-‐African Cape Buffalo (Syncerus Caffer) – 15 Indonesian Anoa (Bubalus Depressicornis)
Extract SNP array data
• Affymetrix provides raw intensity files (named “CEL” files) that have to be QC’d to obtain genotype files. • Sojware is/may be an issue
– Windows: Affymetrix Genotyping Console (GUI), automated. – Linux/Unix/Mac: Affymetrix Power Tools (command line) + SNPolisher R package. Step by step procedure, not automated. – Probably not an issue anymore: See Affy’s workshop (MON) for news about this!
AffyPipe hJps://github.com/nicolazzie/AffyPipe
• Created for the Buffalo species purposes, but extended to any species genotyped with Axiom technology • Open-‐source (many improvements came from the Affy-‐users community!) • Features:
Nicolazzi, et al. (2014) Bioinforma'cs
• Avoids step-‐by-‐step procedure • Does not require ANY programming skills • Standardizes the workflow • Automa$c edi$ng of individuals • Automa$c edi$ng of SNP probes • Output file in PLINK format (A/B or A/C/G/T)
First QC ediBng (AffyPipe)
Default thresholds (Affymetrix Best Prac'ce)
River
Swamp
Cape
Anoa
PHR
74.76%
48.34%
0.85%
2.55%
MHR
6.83%
6.65%
72.26%
55.89%
VINO
0.79%
2.1%
1.19%
1.49%
NMH
1.23%
23.85%
3.43%
8.32%
CRBT
4.55%
3.41%
9.43%
20.72%
Other
11.83%
15.65%
12.84%
11.03%
In green “high quality” probes
From: Affymetrix Best Prac'ce manual
Problems encountered
• Unstable # of “high quality” SNP probes obtained from different plate and batch (e.g. more plates in different $mes) extrac$ons • Two probes corresponding to the same SNP giving inconsistent (some$mes opposite) genotype calls. Both studied over River Buffalo (> # of samples)
1) Unstable extracBon of Gtypes • Not an “issue” per se. Intrinsic to Affymetrix procedure of SNP extrac$on (assigning genotypes using bayesian model) • Low number of individuals/plate (max 96) • Repeatability: 99.99%
1) Unstable extracBon of Gtypes Much more stable “consensus” (e.g. same probes being of 3 “high quality” classes) results ajer >500 individuals
SOLUTION: Combine mulBple plates in 1 single extracBon … use as many samples as possible!
2) Inconsistent probes Using full data (all individuals) extrac$on. Concordant probe pair Discordant probe pair PROBE 1.a => genotypes “BB” PROBE 2.a => genotypes “AA” PROBE 1.b => genotypes “BB” PROBE 2.b => genotypes “BB” Most cases within-‐class (mainly monomorphic SNPs). Few “across genotype class” (1 probe PHR, 1 probe MHR or NMH) Not an issue in nearly all genomic analyses. BIG issue in biodiversity analyses.. …especially in biodiversity analyses across buffalo species/populaBons!
2) Inconsistent probes Extent of the problem: • From the “original” 33k double-‐probes/SNP, 20k had both probes with a “high quality” classifica$on. • From these, 5 individual genotypes. Again, lots of work/conference calls with Affymetrix R&D people. Some bexer results with a more stringent “intensity” cutoff during genotype extrac$on. SBll, no final soluBon to this issue. PATCH: IdenBfy the “bestprobe” of the two in the largest populaBon and consider only that PROBE throughout the comparison (especially if mulBple species/pops are considered!).
Conclusions • SNP chip successfully tested on Buffalo (see analyses later on). • A bioinforma$cs pipeline built to automa$cally extract genotypes in Linux/Mac environment, named AffyPipe • Affymetrix is working on a (mul$-‐playorm?) new extrac$on suite (beta-‐tested) • 2 main issues encountered, partly solved. • Advises to users:
– Know the technology and read carefully Affy’s documenta$on! – Know your goals (issues not an issue for most common analyses) – Extract genotypes with the highest number of individuals possible – If comparing mul$ple popula$ons, consider always the same “bestprobe” probes
Thank you for your aJenBon!