S9 Appendix. Simulation Set C (variable size). - PLOS

2 downloads 0 Views 237KB Size Report
S9 Appendix. Simulation Set C (variable size). (a). 0.00. 0.25. 0.50. 0.75. 1.00. Fisher. X2. GC. PCA. DAPC. CMH. treeW. AS. Score 1. Score 2. Score 3. FPR. (b).
S9 Appendix. Simulation Set C (variable size). (b)

1.00

1.00

0.75

0.75

sensitivity

FPR

(a)

0.50

0.25

0.25

0.00

0.00

r he Fis

X2

GC

A

PC

C

P DA

H CM

AS eW tre

e or

1

Sc

e or

2

Sc

e or

3

r

he

Fis

Sc

(c)

X2

GC

A

PC

C

P DA

H CM

tre

AS eW

Sc

AS eW tre

Sc

e or

1

e or

2

Sc

e3

or

Sc

(d)

1.00

1.00

0.75

0.75

F1.score

PPV

0.50

0.50

0.25

0.50

0.25

0.00

0.00

er

h Fis

X2

GC

A

PC

C

P DA

H CM

tre

AS eW

Sc

e or

1 Sc

e or

2 Sc

e or

3

r he

Fis

X2

GC

A

PC

C

P DA

H

CM

e1

or

e2

or

Sc

e3

or

Sc

Performance by association test (Set C, accessory genome). Each association testing method was applied to a set of simulated datasets containing 100 individuals and 5,000 genetic loci, a size typical of gene presence-or-absence data (N = 80). Datasets were simulated with a relatively high level of recombination, R = 0.2, so that performance could be examined under conditions of frequent gain and loss of genetic elements.

(a)

(b)

0.15

1.00

sensitivity

0.75

FPR

0.10

0.05

0.50

0.25

0.00

0.00 5 12 1− 10

00 −1

75 0−

76

5

0 15

6−

12

5 17 1− 15

0 20

5

−7

6−

50

17

76

−1

00

Number of Individuals

12

5

0

12

6−

5

15

0

17

20

1−

15

17

6−

Number of Individuals

(c)

(d)

1.00

1.00

0.75

0.75

F1.score

PPV

1−

10

0.50

0.25

0.50

0.25

0.00

0.00 25 −1

0

5

−7 50

7

10 6−

1 10

0 15 6− 12

15

5 17 1−

Number of Individuals

0

17

6

0 −2

−7

50

5

−1

76

5

00

1−

10

0

12

5

15

6−

12

1−

15

0

17

20

6−

17

Number of Individuals

Performance by number of individuals (Set C). Each association testing method was applied to datasets simulated across a range of sizes, ranging the number of individuals from 50 to 200 and the number of loci from 10,000 to 100,000 (N = 80). All simulated datasets in this figure were generated with R = 0.01. Here, the interquartile mean performance of each association testing method is presented by number of individuals.

(a)

(b)

0.15

1.00

sensitivity

0.75

FPR

0.10

0.05

0.50

0.25

0.00

0.00 10

0 −3

50 0−

3

0 −7 50

0 −9 70

0 10

0

0

−3



90

−5

10

Number of Loci (thousands)

0

00

0

50

70

−9

−1

90

Number of Loci (thousands)

(c)

(d)

1.00

1.00

0.75

0.75

F1.score

PPV

−7

30

0.50

0.25

0.50

0.25

0.00

0.00 0 −3

10

0 −5 30

50

0 −7

0 −9 70

Number of Loci (thousands)

0

90

0 −1

−3

10

0

0

−5

30

0

50

−7

00

0

−9

70

−1

90

Number of Loci (thousands)

Performance by number of genetic loci (Set C). Each association testing method was applied to datasets simulated across a range of sizes, ranging the number of individuals from 50 to 200 and the number of loci from 10,000 to 100,000 (N = 80). All simulated datasets in this figure were generated with R = 0.01. Here, the interquartile mean performance of each association testing method is presented by number of genetic loci.