ResqMi - a versatile algorithm and software for ...

7 downloads 5921 Views 2MB Size Report
ResqMi - a versatile algorithm and software for ... ABACUS: Adaptive Background Genotype Calling Scheme. [Cutler et al. ..... Faster, easier handling in ResqMi.
ResqMi - a versatile algorithm and software for Resequencing Microarrays Stephan Symons, Kirstin Weber, Michael Bonin and Kay Nieselt GCB 2008 Dresden, September 9-12, 2008

September 12, 2008

1 / 28

Stephan Symons

ResqMi

Introduction

Outline

2 / 28

1

Introduction Resequencing Microarrays

2

ResqMi Calling Algorithms Efficient calling algorithm ReAnalyze

3

ResqMi

4

Results Experiments Results

5

Discussion

Stephan Symons

ResqMi

Introduction

Resequencing Microarrays

Resequencing Microarrays Hybridization based sequencing of known nucleotide sequences Applications: Genetic diseases → analyze respective genes Infectious diseases → analyze characteristic sequences

Major producers: Affymetrix: GeneChip Sequence Analysis Platform NimbleGen: CGS Platform

Oligonucleotide arrays, Probe length 25 (Affymetrix) Base at position 13 interrogates current position 4 × 2 probes for each position in target sequence for each possible base sense and antisense probes

3 / 28

Stephan Symons

ResqMi

Introduction

Resequencing Microarrays

Analysis Workflow

Workflow focussing on Affymetrix Arrays: CEL, CHP files

Hybridized Array Scanning Image Analysis

Crucial Step: Base Calling CEL File (Spot Intensities)

Task: Calculate sequence from intensities

Base Calling

Na¨ıve: Call base with highest intensity Na¨ıve algorithm: unreliable calls on poor data.

4 / 28

Stephan Symons

ResqMi

CHP File (Called Sequence)

Introduction

Resequencing Microarrays

Analysis Workflow A C G

Antisense

Sense

Task: Calculate sequence from intensities

Sense

T

Crucial Step: Base Calling

Antisense

Workflow focussing on Affymetrix Arrays: CEL, CHP files

Na¨ıve: Call base with highest intensity Na¨ıve algorithm: unreliable calls on poor data.

...A 4 / 28

Stephan Symons

ResqMi

C...

Introduction

Resequencing Microarrays

Analysis Workflow A C G

Antisense

Sense

Task: Calculate sequence from intensities

Sense

T

Crucial Step: Base Calling

Antisense

Workflow focussing on Affymetrix Arrays: CEL, CHP files

Na¨ıve: Call base with highest intensity Na¨ıve algorithm: unreliable calls on poor data.

...A 4 / 28

Stephan Symons

ResqMi

C...

Introduction

Resequencing Microarrays

Calling Algorithms

ABACUS: Adaptive Background Genotype Calling Scheme [Cutler et al. 2001]

Data integrity checks to filter out sites of poor quality Likelihood-based method

Model-P [Zhan et al. 2005] Employs physical model based on target sequence.

All calling algorithms have quite good performance. But: No-call ratios of ≈ 5% [Affymetrix 2006] →≥ 500 bases per experiment to inspect. Manual inspection and editing: time consuming.

5 / 28

Stephan Symons

ResqMi

Introduction

Resequencing Microarrays

Calling Algorithms

ABACUS: Adaptive Background Genotype Calling Scheme [Cutler et al. 2001]

Data integrity checks to filter out sites of poor quality Likelihood-based method

Model-P [Zhan et al. 2005] Employs physical model based on target sequence.

All calling algorithms have quite good performance. But: No-call ratios of ≈ 5% [Affymetrix 2006] →≥ 500 bases per experiment to inspect. Manual inspection and editing: time consuming.

5 / 28

Stephan Symons

ResqMi

Introduction

Resequencing Microarrays

Problems and opportunities Applications for the analysis of resequencing arrays: Efficiency: fast base calling, least possible number of ambiguous calls. Usability: User-friendly interaction, swift navigation through sequence and intensity, identify the impact of mutations. Visualization: Overview and position specific visualization of intensity and sequence data Here we present: ResqMi - a free and extensible application offering calling algorithms and visualization, easy navigation and editing. An efficient base calling algorithm. ReAnalyze - A scheme to revise no-calls.

6 / 28

Stephan Symons

ResqMi

Introduction

Resequencing Microarrays

Problems and opportunities Applications for the analysis of resequencing arrays: Efficiency: fast base calling, least possible number of ambiguous calls. Usability: User-friendly interaction, swift navigation through sequence and intensity, identify the impact of mutations. Visualization: Overview and position specific visualization of intensity and sequence data Here we present: ResqMi - a free and extensible application offering calling algorithms and visualization, easy navigation and editing. An efficient base calling algorithm. ReAnalyze - A scheme to revise no-calls.

6 / 28

Stephan Symons

ResqMi

ResqMi Calling Algorithms

Outline

7 / 28

1

Introduction Resequencing Microarrays

2

ResqMi Calling Algorithms Efficient calling algorithm ReAnalyze

3

ResqMi

4

Results Experiments Results

5

Discussion

Stephan Symons

ResqMi

ResqMi Calling Algorithms

Efficient calling algorithm

Calling Algorithm

Model based algorithm, based on work of Clark et al. 2007: originally used for whole genome SNP detection. General rationale: Use na¨ıve calls, unless quality is poor. Two quality measures: Conformance Ratio of brightest to second brightest intensity

Natural approach, similar to manual editing. Extensively applied to whole genome SNP detection

8 / 28

Stephan Symons

ResqMi

ResqMi Calling Algorithms

Efficient calling algorithm

Conformance Non-reference calls: lower intensities in adjacent positions Less reliable signals around mismatches [Hacia et al. 1999]. Calls are reliable if neighbors are reference calls.

Conformance Ci = Fraction of base calls equal to reference

Mismatch site

Sliding window around i, depending on raw call. Reference call: i − 10 . . . i + 10 else: i − 20 . . . i − 10 and i + 10 . . . i + 20. Stretch of match sites

Threshold µ for Ci . 9 / 28

Stephan Symons

ResqMi

ResqMi Calling Algorithms

Efficient calling algorithm

Conformance Non-reference calls: lower intensities in adjacent positions Less reliable signals around mismatches [Hacia et al. 1999]. Calls are reliable if neighbors are reference calls.

Conformance Ci = Fraction of base calls equal to reference

Mismatch site

Sliding window around i, depending on raw call. Reference call: i − 10 . . . i + 10 else: i − 20 . . . i − 10 and i + 10 . . . i + 20. Stretch of match sites

Threshold µ for Ci . 9 / 28

Stephan Symons

ResqMi

ResqMi Calling Algorithms

Efficient calling algorithm

Intensities Ratio

Na¨ıve base calls are unreliable if bases have nearly equally bright intensities. Small differences might be due to chance. Positions with higher differences are more reliable.

Ratio of brightest (Pi ) to second-brightest (Qi ) intensity ∆i =

Ambiguous sites

Pi Qi

→ Signal to Noise Ratio. Threshold ν for If ∆i . Clearly distinguishable sites

10 / 28

Stephan Symons

ResqMi

ResqMi Calling Algorithms

Efficient calling algorithm

Intensities Ratio

Na¨ıve base calls are unreliable if bases have nearly equally bright intensities. Small differences might be due to chance. Positions with higher differences are more reliable.

Ratio of brightest (Pi ) to second-brightest (Qi ) intensity ∆i =

Ambiguous sites

Pi Qi

→ Signal to Noise Ratio. Threshold ν for If ∆i . Clearly distinguishable sites

10 / 28

Stephan Symons

ResqMi

ResqMi Calling Algorithms

Efficient calling algorithm

Strict Call

For position i, consider raw base call Bi Using thresholds µ for conformance and ν for the intensities ratio: A strict call Si = Bi is made if Ci > µ and ∆i > ν If Ci or ∆i below threshold: Si = n If Bi is non-reference and within i − 5 . . . i + 5 is a non-reference call with higher intensity, call Si =n

→ Filters calls for reliable calls. → Rejects non-reference calls that are likely due to chance.

11 / 28

Stephan Symons

ResqMi

ResqMi Calling Algorithms

Efficient calling algorithm

Consensus Call

Compare strict calls from sense (Sis ) and antisense (Sia ) strands If Sis is complementary to Sia call respective base. Non-reference call rejected if brighter non-reference call within i − 5 . . . i + 5. If the strict calls in both strands differ: 1 2

Strict consensus call: return n Relaxed consensus call: return resulting base to its IUPAC code. Example: Sis =a, Sia =c, return r

The resulting sequence can be used for further analysis

12 / 28

Stephan Symons

ResqMi

ResqMi Calling Algorithms

Efficient calling algorithm

Base Calling For all positions i in both strands, 1 Calculate raw (na¨ıve) Calls 2 Calculate Conformance using raw calls 3 Calculate Intensities ratios 4 Produce Strict calls Using strict calls from both strands, 5

Build Consensus calls

For each strand

Raw Calls

Conformance

Raw Data

Strict Call

Consensus Call Called Sequence

Ratio

13 / 28

Stephan Symons

ResqMi

ReAnalyze

ResqMi Calling Algorithms

ReAnalyze

ReAnalyze

Any calling algorithm produces n calls. Many ns are easily resolvable: 1

2 3

Homozygous - highest intensity correspond to complement bases. Call would be reference call. High ∆: clear signal in both strands.

ReAnalyze: Call base, if all criteria are satisfied. Relaxed version of Calling algorithm. Useful to revise data called by any algorithm.

14 / 28

Stephan Symons

ResqMi

ResqMi Calling Algorithms

ReAnalyze

ReAnalyze

Any calling algorithm produces n calls. Many ns are easily resolvable: 1

2 3

Homozygous - highest intensity correspond to complement bases. Call would be reference call. High ∆: clear signal in both strands.

ReAnalyze: Call base, if all criteria are satisfied. Relaxed version of Calling algorithm. Useful to revise data called by any algorithm.

14 / 28

Stephan Symons

ResqMi

ResqMi

Outline

15 / 28

1

Introduction Resequencing Microarrays

2

ResqMi Calling Algorithms Efficient calling algorithm ReAnalyze

3

ResqMi

4

Results Experiments Results

5

Discussion

Stephan Symons

ResqMi

ResqMi

Presenting ResqMi “Resequencing using Microarrays” Focuses on Affymetrix GeneChip Sequence Analysis platform. Interface optimized on usability Editing called sequence Fast navigation around bases.

Base calling algorithms: Model based, ReAnalyze. Visualization and quality control Freely available for Windows, MacOS, Linux

16 / 28

Stephan Symons

ResqMi

ResqMi

Sequence Browsing and editing

Sequences organized fragment-wise Overview of fragment Editable sequence table Intensity windows connected to adjust to the currently selected position

17 / 28

Stephan Symons

ResqMi

ResqMi

Quality Control

Plots of CEL files Intensity, Standard Deviation, Coefficient of Variation Highest base per position, Base Color Intensities

Reference R vs Target T M-Values M = log2 ( RT ) A-Values A = (log2 R + log2 T )/2

Enlarge and export images

18 / 28

Stephan Symons

ResqMi

ResqMi

Quality Control

Plots of CEL files Intensity, Standard Deviation, Coefficient of Variation Highest base per position, Base Color Intensities

Reference R vs Target T M-Values M = log2 ( RT ) A-Values A = (log2 R + log2 T )/2

Enlarge and export images

18 / 28

Stephan Symons

ResqMi

ResqMi

Quality Control Plots of CEL files Intensity, Standard Deviation, Coefficient of Variation Highest base per position, Base Color Intensities

Reference R vs Target T M-Values M = log2 ( RT ) A-Values A = (log2 R + log2 T )/2

Enlarge and export images

18 / 28

Stephan Symons

ResqMi

ResqMi

ResqMi in Detail

Visualization Intensities Profile, Spike, Sequence Logo

In detail table of calls Calls, Quality, force, original calls

Annotations for called sequence Overview of all sequences

19 / 28

Stephan Symons

ResqMi

ResqMi

ResqMi in Detail

Visualization Intensities Profile, Spike, Sequence Logo

In detail table of calls Calls, Quality, force, original calls

Annotations for called sequence Overview of all sequences

19 / 28

Stephan Symons

ResqMi

ResqMi

ResqMi in Detail

Visualization Intensities Profile, Spike, Sequence Logo

In detail table of calls Calls, Quality, force, original calls

Annotations for called sequence Overview of all sequences

19 / 28

Stephan Symons

ResqMi

ResqMi

ResqMi in Detail

Visualization Intensities Profile, Spike, Sequence Logo

In detail table of calls Calls, Quality, force, original calls

Annotations for called sequence Overview of all sequences

19 / 28

Stephan Symons

ResqMi

Results

Outline

20 / 28

1

Introduction Resequencing Microarrays

2

ResqMi Calling Algorithms Efficient calling algorithm ReAnalyze

3

ResqMi

4

Results Experiments Results

5

Discussion

Stephan Symons

ResqMi

Results

Experiments

Experiments

Name CFTR Mito SARS

Target CFTR Human Mitochondrium SARS Coronavirus

Bases 9511 37756 30588

Fragm. 84 480 3

Exp. 17 14 44

For CFTR and Mito, calls made using Affymetrix software were available Exploring parameter space: µ ∈ {0.5, 0.6, 0.7, 0.8, 0.9} ν ∈ {1.01, 1.05, 1.1, 1.25, 1.33} in each combination, for strict and relaxed consensus calls.

21 / 28

Stephan Symons

ResqMi

Results

Experiments

Experiments

Name CFTR Mito SARS

Target CFTR Human Mitochondrium SARS Coronavirus

Bases 9511 37756 30588

Fragm. 84 480 3

Exp. 17 14 44

For CFTR and Mito, calls made using Affymetrix software were available Exploring parameter space: µ ∈ {0.5, 0.6, 0.7, 0.8, 0.9} ν ∈ {1.01, 1.05, 1.1, 1.25, 1.33} in each combination, for strict and relaxed consensus calls.

21 / 28

Stephan Symons

ResqMi

Results

Results

No call rate

Mean No−call ratio 0.5

µ=0.5

µ=0.9

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.3





0.2

● ● ● ● ● ● ● ● ● ● ●

− 0.1

rate of n calls

µ=0.8











− 0.0

Linear effect of µ and ν on calling rate, µ: smaller effect than ν; µ = 0.9: many ns Strict calls: more n, Relaxed calls: more ambiguous calls

µ=0.7

● ● ● ● ● ● ●

0.4

No call ratio:

µ=0.6

ν=1.01 ν=1.05 ν=1.1 ν=1.25 ν=1.33 ν=1.01 ν=1.05 ν=1.1 ν=1.25 ν=1.33 ν=1.01 ν=1.05 ν=1.1 ν=1.25 ν=1.33 ν=1.01 ν=1.05 ν=1.1 ν=1.25 ν=1.33 ν=1.01 ν=1.05 ν=1.1 ν=1.25 ν=1.33



CFTR, relaxed consensus CFTR, strict consensus CFTR, Affymetrix Mito, relaxed consensus Mito, strict consensus Mito, Affymetrix SARS, relaxed consensus SARS, strict consensus

22 / 28

Stephan Symons

ResqMi

Results

Results

Non-reference call rate

0.15

Mean non−reference ratio





0.10

− ●

µ=0.5

µ=0.6

µ=0.7

CFTR, relaxed consensus CFTR, strict consensus CFTR, Affymetrix Mito, relaxed consensus Mito, strict consensus Mito, Affymetrix SARS, relaxed consensus SARS, strict consensus

µ=0.8

µ=0.9

● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

ν=1.01 ν=1.05 ν=1.1 ν=1.25 ν=1.33 ν=1.01 ν=1.05 ν=1.1 ν=1.25 ν=1.33 ν=1.01 ν=1.05 ν=1.1 ν=1.25 ν=1.33 ν=1.01 ν=1.05 ν=1.1 ν=1.25 ν=1.33 ν=1.01 ν=1.05 ν=1.1 ν=1.25 ν=1.33

0.00

Parameters have similar impact on non-reference call rates Conformance cutoff: more effect than ratio cutoff Consensus method: heavy impact

0.05

Non-reference call:

rate of non−reference calls



23 / 28

Stephan Symons

ResqMi

Results

Results

Results

Concordance between calls reference: Cohen’s K ≥ 0.5. Results comparable to Affymetrix. Relaxing thresholds: little effect, Ci ≈ 0.25 for some i Called data using parameters µ = 0.8 and ν = 1.25 (K ≥ 0.54). Applying ReAnalyze: Cutoff ν ≥ 1.2 (empirically determined) Reducing the n-rate by up to 9.35%.

24 / 28

Stephan Symons

ResqMi

Discussion

Outline

25 / 28

1

Introduction Resequencing Microarrays

2

ResqMi Calling Algorithms Efficient calling algorithm ReAnalyze

3

ResqMi

4

Results Experiments Results

5

Discussion

Stephan Symons

ResqMi

Discussion

Calling algorithm

Calling algorithm Call ratios better or comparable to Affymetrix Good concordance with reference Faster, easier handling in ResqMi

Impact of parameters is linear and predictable µ = 0.8 and ν = 1.25 give good results. ReAnalyze can further improve calling rate

26 / 28

Stephan Symons

ResqMi

Discussion

Take home message

Download ResqMi! It’s free - Open source software Fast calling (5 min for our datasets) (other software take far longer ) Fast and flexible navigation, editing, data handling Helps identifying impact of mutations ReAnalyze reassesses all called sequences Visualization,Quality Control, Export...

www-ps.informatik.uni-tuebingen.de/resqmi

27 / 28

Stephan Symons

ResqMi

Discussion

Acknowledgements

Kirstin Weber Michael Bonin Kay Nieselt ¨ Gunnar Ratsch

Thank you!

28 / 28

Stephan Symons

ResqMi

Suggest Documents