Example of Amplicon with one SNP

8 downloads 0 Views 1MB Size Report
successfully distinguished class IV SNP (A>T or T>A) in the amplicon where additional ... HRM analysis of DNA amplicons is cost-effective and relatively high ...
Optimized High Resolution (HRM) Analysis Alghoritm Improves Detection Rate of Genetic Variants Jernej Kovač (1), Katarina Trebušak Podkrajšek (1), Tadej Battelino (2)

1. University Medical Centre Ljubljana, Division of Paediatrics, Department for Special Diagnostics 2. University Medical Centre Ljubljana, Division of Paediatrics, Department of Endocrinology, Diabetes and Metabolic Diseases

HRM analysis of DNA amplicons is cost-effective and relatively high throughput screening method that allows detection of genetic variants through dissociation of saturating DNA dyes from denaturing double stranded DNA and consequential loss of fluorescence intensity as function of temperature. The changes in the DNA sequence (SNPs and mutations) cause the change in the melting profile (shape of the meting curve) of the analysed DNA fragment compared to the wild type DNA and the quality of comparison algorithm used to compare those melting profiles is crucial to distinguish between different genetic variants. To improve the detection rate of hard-to-detect genomic variants (SNPs class IV for example), we introduced additional steps to the mathematical analysis and comparison of melting profiles of analysed DNA amplicons. First of all the principal component analysis (PCA) of melting profiles is performed followed by execution of two different unsupervised clustering algorithms – density-based spatial clustering of application with noise (DBSCAN) and expectation maximization (EM) clustering algorithm.

Example of Amplicon with one SNP

Expectation–maximization (EM) algorithm is an iterative method that attempts to find the maximum likelihood estimator of a parameter θ of a parametric probability distribution. The EM iteration alternates between performing an expectation (E) step, which computes the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.

Example of Amplicon with two SNPs The results on the left show the When there are present more comparison of variant calling than one SNP in the analysed with our HRM analysis algorithm amplicon, the melting profiles (a) and HRM v2.0.1 software of the samples become more a (c). Both analysis Experimental workflow complex and the a were performed usanalysis of the reHRM Mastermix (Eva Green Dye) ing data from the sults is rather difDNA sample (10 ng/sample) aligned HRM curves ficult. The pictures Water + PCR primers (b). When there is on the right shows Σvolume= 6 µL only one SNP preresults of the analysent in the amplisis of amplicon with con, the resulting two different SNPs. variant calling is acThere are 9 possible HRM curve capture (ABI 7500 fast RT-PCR System) variants when we b curate with comb mercial software as have two SNPs prewell as our method. sent, but only 4 variPCA + clustering analysis One SNP does not ants are present in pose any obstacle for our analysed popucompared analysis lation. Our analytiDNA sequencing of candidate samples methods and the hocal algorithm (a) as mozygous and hetwell as commercial erozygous variants software (c) both Confirnmation of discovc c of c.47C>T (SOD2 detected 4 different ered genetic variants with genotyping (KBscience, KASp.Val16Ala, rs4880) clusters present in PAR probes) were successfully our population but distinguished. The confidence of the commercial software had variant calling is relatively high troubles with correct clustering >80% (data not shown). of samples (see arrows on pictures a, b & c). Density-based spatial clustering of applications with Principal Component Analysis (PCA) alghoritm is mathematical tool for analysis of multivariate data. It uses orthogonal transformation of set of observations (experimental data) of (possibly) correlated variables into linearly uncorrelated principal components. Each successive calculated principal component describes smaller portion of variance between the datasets. The number of principal components used to describe the variance of our data describes ~95% of variance found in our datasets (data not shown).

Conclusion

noise (DBSCAN) alghoritm is a density-based clustering algorithm because it finds a number of clusters starting from the estimated density distribution of corresponding nodes. The cluster is defined by the parameter of “density reachability”. This means that points that are located in the same cluster are directly density-reachable from each member of assigned cluster. DBSCAN algorithm is defined by two parameters: ε and minimum number of points required to form a cluster (minPts). The parameter ε defines the space in which the algorithm searches for neighbours. If the assigned space is empty the data point is labelled as noise, if the assigned space contains enough neighbours the cluster is formed.

Our HRM analysis algorithm shows improvement with the quality of variant calling compared to commercial software. Moreover our algorithm successfully distinguished class IV SNP (A>T or T>A) in the amplicon where additional SNP was present (data not shown). The demonstrated improvement of variant calling increases the success rate of novel SNP and mutation detection in population studies. We plan to automate the calculation of clustering parameters by introducing additional statistical and software solutions which will further improve variant calling and make analysis more user friendly.

Suggest Documents