Quality control of array genotyping data with argyle - G3: Genes ...
Recommend Documents
Oct 16, 2014 - Furthermore, exome chip data require less hard drive space for storage and .... that were set to. 'missing' as the GenCall algorithm failed to interpret a genotype call. .... Upon failure of a single drive, data could be recovered.
Jan 5, 2011 ... Genes, Genomes, Genetics, and advocating government support of genetics
research. A yeast geneticist, he studies the proteins and DNA.
Feb 4, 2014 - Chen, X., H. Xu, P. Yuan, F. Fang, M. Huss et al., 2008 Integration of external ..... Li, M., Y. He, W. Dubois, X. Wu, J. Shi et al., 2012 Distinct regulatory ..... Trompouki, E., T. V. Bowman, L. N. Lawton, Z. P. Fan, D. C. Wu et al.,.
Jun 2, 2012 - Weiwu Xie and James A. Birchler1. Division of Biological .... 2003; Zhang et al. 2005;. Zhang and Gilmour 2006; West and Proudfoot 2008).
3 Aug 2013 - *Division of Gene Regulation and â Division of Cell Biology, Netherlands Cancer Institute, Netherlands Proteomics Centre,. 1066CX Amsterdam ...
friendly tool, SNPTracker, for comprehensively tracking and unifying the rs IDs ... Manuscript received August 28, 2015; accepted for publication November 13,.
Hulsegge, B., Calus, M.P., Windig, J.J., Hoving-Bolink, A.H., Maurice-Van Eijndhoven,. M.H., Hiemstra, S.J. (2013). Selection of SNP from 50K and 777K arrays ...
Dec 18, 2015 - R package for analysis of genotyping array data tailored to Illumina arrays. ... To these ends, the package consists of a suite of quality-control ...
AF reference data and quality index reference data for different SNP array platforms were ... We have developed a new computer program that utilizes these methods called SNP Array Quality Control (SAQC). ... genotype call rate ...... Core and Nationa
DNA Microarrays: Sample Quality Control, Array Hybridization and Scanning ... Adjust the base plate to position C by loosening the screw under the base, ..... edited by Ioannis Karamanos (SPRINGER SCIENCE+BUSINESS MEDIA, LLC, New.
Mar 10, 2006 - Evaluation of the SPF10-INNO LiPA Human Papillomavirus (HPV) ... cervical cancer, (ii) monitoring multivalent HPV vaccines is essential to ...
May 18, 2018 - In place of the halo, large tubules of straight ... µM crRNA:tracrRNA:Cas9 complexes along with 0.67 µM of the .... germ lines and RNAi is very inefficient in C. elegans sperm. ... We henceforth refer to .... D. A., and B. Baum, 2014
Feb 5, 2018 - aneuploidy in flies, yeast, and human cells, respectively (Heun et al. 2006 ... In budding yeast, ubiquitin-mediated proteolysis of Cse4 has been ...
Quality control occurs after the sequencing by the 454. Sequencing System Software. âDead wellsâ will be dropped out. Misdetec on will be corrected.
Jul 16, 2015 - produce viable mutant progeny, before the live animal resource is ..... people for invaluable help and excellent technical support with this work ...
phism (SNP) markers for routine and low-cost QC geno- ... able in large numbers in the public domain (MaizeGDB; ..... SD after line name indicate seed source A ...
Jun 1, 2011 - including hygromycin B and show deficient K+ uptake (Madrid et al., .... that membrane traffic mutants are sensitive to hygromycin B (recent.
due to Brian Johnson and Chris Veil- leux for software and statistical advice. This study was supported by Perkin-. Elmer NEN Life Sciences, Inc., and by.
Dec 23, 2015 - Sullivanâ, â , §§, Jennifer R Brennanâââ, Leonard McMillanâ¡ and Fernando Pardo-Manuel de Villenaâ, â . âDepartment of Genetics ...
Nov 5, 2015 - necrotrophic protein effectors that mediate host cell death, providing ... encoding domains commonly found in plant disease resistance ... platform, providing a cheap and flexible genetic marker for breeders .... Marker Name.
Jul 28, 2015 - Table S7 Observed and expected number of genes in the haplotype blocks of euchromatic and heterochromatic regions based on the ...
niches at the region 2a/2b border and produce prefollicle cells (Sahai- .... to standard procedures (http://www.fruitfly.org/about/methods/inverse. pcr.html). Multiple attempts ... Table 1 Frequency of cell types labeled by the ET-Flpx2 lines in this
Jun 5, 2015 - intraspecific. Marker development in crop species has been an important aspect to .... sequencing methods in apple (Han et al. 2009) and in ... extraction kit following manufacturer instructions, including RNase di- gestion.
sorghum generated by Dr. Daniel Gorbet (University of Florida, re- tired) based on an ..... bmr12, Drs. Jim Olmstead and Jeff Rollins for helpful discussions, and Dr. John ... Atwell, S., Y. S. Huang, B. J. Vilhjalmsson, G. Willems, M. Horton et al.,
Quality control of array genotyping data with argyle - G3: Genes ...
Quality control of array genotyping data with argyle. Andrew P Morgan. 2015-10-08. Introduction. Proper quality control of array genotypes is an important ...
Quality control of array genotyping data with argyle Andrew P Morgan 2015-10-08
Introduction Proper quality control of array genotypes is an important prerequisite to further analysis. Genotype quality can be assessed on two dimensions: markers (sites) or samples. Both are important, but in general samplelevel quality checks must be performed first, in order to obtain a cleaned dataset useful for assessing the performance of specific markers. Sample-level quality checks are implemented at three levels in argyle: • distribution of hybridization intensities • distribution of genotype calls • biological constrants (concordance between biological sex vs sex-chromosome genotypes; heterozygosity; ...) Failed arrays have onr or more of the following properties, each of which will be discussed in further detail in subsequent sections: • aberrant distribution of hybridization intensities (higher variance but asymmetric, skewed towards low intensities) • excess of no-calls and heterozygous relative to homozygous calls genome-wide • sex-chromosome calls discordant with biological sex For datasets of sufficient size (say, at least 24 samples), these quality checks can be performed without an external reference panel of known good samples: unless an entire batch has failed, failed samples should be outliers with respect to good samples within the batch. In this vignette we use a dataset from 96 Diversity Outbred (???) mice genotyped on the MegaMUGA array.
Primer on hybridization intensities for Illumina arrays What follows is a very brief review of the character of hybridization-intensity data produced by Illumina arrays. For a full discussion see (Steemers et al. 2006). Illumina arrays, unlike Affymetrix platforms, are printed with invariant oligonucleotide probes. The two possible alleles at the target SNP are distinguished via a single-base extension reaction with fluorescentconjugated nucleotides. At a given SNP marker, fluorescence signals are recorded in two channels (denote them x and y) for every sample. The values recorded on the instrument are raw intensities, which we will denote x0 and y0 . The Illumina BeadStudio software uses an “affine polish” algorithm pools information from many samples and many arrays to transform x0 → x and y0 → y, where the transformed intensities x, y ≥ 0 and the total intensity R = x + y ≈ 1. Samples homozygous for allele A are expected to form a cluster near (1, 0); those homozygous for allele B near (1, 0); and heterozygous samples near the 1 : 1 diagonal at a distance of ≈ 1 from the origin. Samples falling near the origin – ie. those with low total hybridization intensity – will be no-calls. A clustering algorithm is applied to the transformed intensities to yield discrete genotype calls. A consequence of the affine intensity transformation is that, at any given marker, samples with non-missing calls are expected to have total intensity ≈ 1 regardless of their genotype call. Although R is often used as 1
p the measure of total intensity, argyle defines the alternative parameter d = x2 + y 2 (the distance from the origin in intensity space). The distribution of d across all markers for each sample is a sensitive indicator of its overall quality. We use d rather than R because it is less prone to overestimation of total hybridization signal in highly heterozygous samples (cf. the triangle inequality).
Data import First load argyle and set the working directory. library(argyle) setwd("~/argyle") Load the marker map for MegaMUGA. load("snps.megamuga.Rdata") Import genotypes from BeadStudio output files, and check that import was successful. geno ... Reading genotypes and intensities for 77808 markers x 96 samples from < ./datasets/mega_example/Final Constructing genotype matrix... Constructing intensity matrices... 77808 sites x 96 samples Done.
Subsequent quality checks run much faster if genotypes have been converted to numeric encoding (ie. allele counts), which we will do before proceeding. geno