examples from mental health studies. Mizanur Khondoker, Richard Dobson, Caroline Skirrow, Andrew Simmons, Daniel Stahl q. Lea veâoneâout. 10âfold CV. 0.
Supplementary file for:
A comparison of machine learning methods for classification using simulation with multiple real data examples from mental health studies Mizanur Khondoker, Richard Dobson, Caroline Skirrow, Andrew Simmons, Daniel Stahl
10
10 8
kNN
●
8
10 8
LDA
10
SVM
8
RF
6 ●
4
4
2
2
●
Bias (%)
6
6
Bias (%)
●
4
Bias (%)
●
2
2
4
Bias (%)
6
●
10−fold CV
Leave−one−out
0 10−fold CV
Leave−one−out
0 10−fold CV
Leave−one−out
10−fold CV
Leave−one−out
0
0
●
●
Figure S1: Boxplots of bias as % of true error (0.50) in leave-one-out and 10-fold cv error rates for classifying two groups of observations with no systematic differences between the groups. All estimates are average over 500 repeats of simulated datasets. Data were simulated form normal distribution according to model (1) with a training set size 100 (50 in each group). Each of the box plots shows the distribution of bias for various number of variables (1 to 50) considered in classifying the subjects.
1
0.26
● ●
●
●
0.18
●
●
●
0.1
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
●
●
● ●
● ● ●
●
0.1
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
0.0
0.20
0.25
0.22
Within subject variability(σe)
0.5
1.0
RF SVM LDA kNN
●
0.20
● ●
●
●
●
●
1
3
5
7
9
11
13
15
●
●
0.05
●
●
●
0.15 ●
0.14
●
17
●
2.0
●
0.10
●
1.5
Lower bound of effect size(δ)
●
●
●
0.05
●
●
● ●
0.10
●
0.15
●
0.16
0.18
●
0.20
0.22
●
Biological variability(σb)
Average error rate
0.25
x
● ●
0.14
Average error rate
0.00 0.05 0.10 0.15 0.20 0.25 0.30
Leave−one−out error (n=100, p=5)
●
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Correlation(ρ)
Replication(r)
●
●
●
0.1
0.5
1.0
1.5
2.0
●
2.5
3.0
3.5
4.0
●
●
●
●
0.1
0.5
1.0
1.5
●
2.0
●
●
2.5
●
●
0.00
●
0.00
●
Biological variability(σb)
3.0
3.5
4.0
Within subject variability(σe)
●
0.0
0.5
1.0
●
●
●
1.5
●
●
●
2.0
0.04
Lower bound of effect size(δ)
0.03
0.04
●
0.02
0.03
0.01
0.02 0.01
Average error rate
0.08 0.06 0.04 ● ●
●
0.02
●
0.02
0.04
0.04
0.08
0.06
0.12
x
●
0.00
Average error rate
Leave−one−out error (n=100, p=50)
● ●
● ●
1
●
●
3
5
●
●
●
●
●
●
7
9
11
13
15
17
Replication(r)
●
●
0.5
0.6
0.00
●
0.1
0.2
0.3
0.4
●
●
●
0.7
0.8
0.9
RF SVM LDA kNN
Correlation(ρ)
Figure S2: Average leave-one-out cross-validation error at varying levels of different data characteristics: biological variability(σb ), experimental noise (σe ), lower bound of effect size (δ), replication (r), and correlation between variables (ρ). Top and bottom panels correspond to feature set sizes of 5 and 50 variables respectively and a common training set size (n=100). Each plot compares error rates for the four methods at varying levels of a particular parameter as shown on the x-axis for given values of the other parameters. The given values are selected from the set (n = 100, σb = 2.5, σe = 1.5, θmin = 2, r = 3), the correlation structure being of the hub-Toeplitz form (except for the plots against ρ, which are based on single-block exchangeable correlation matrix to make the plot against ρ meaningful).
2
0.05
0.05
0.06
x ●
●
●
●
●
●
0.03
0.03
0.02
●
●
● ●
●
●
0.1
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
●
0.1
●
●
●
●
0.5
1.0
1.5
2.0
●
0.01
0.02
● ●
Biological variability(σb)
2.5
3.0
3.5
4.0
● ●
0.0
Within subject variability(σe)
0.5
1.0
1.5
●
●
●
●
●
2.0
0.035
0.03
0.045
0.04
Lower bound of effect size(δ)
● ●
●
0.02
0.025
●
RF SVM LDA kNN
●
●
●
0.01
●
● ●
●
●
●
5
● ●
●
10
●
0.00
●
●
0.015
SE of error estimates
●
0.04
0.04
●
0.00
SE of error estimates
0.07
Standard error (SE) of error estimates (n=100, p=25)
15
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Correlation(ρ)
Replication(r)
0.04
0.035
0.03
x
0.025
0.04
●
● ●
●
0.1
0.5
●
●
●
1.0
1.5
2.0
2.5
3.0
3.5
4.0
●
0.1
Biological variability(σb)
●
0.5
●
●
1.0
1.5
●
2.0
●
●
●
●
2.5
3.0
3.5
4.0
Within subject variability(σe)
0.0
0.5
1.0
●
●
●
1.5
●
2.0
0.015
0.03
0.020
0.04
Lower bound of effect size(δ)
0.02
● 0.01
0.010 ● ●
●
5
●
●
●
10
Replication(r)
●
●
15
●
●
●
●
0.1
0.2
0.3
●
●
0.4
0.5
●
●
●
●
0.6
0.7
0.8
0.9
RF SVM LDA kNN
0.00
0.005
SE of error estimates
●
●
0.00
0.005
●
0.01
0.015
0.02
0.03 0.02 0.01
●
●
0.00
SE of error estimates
Standard error (SE) of error estimates (n=100, p=75)
Correlation(ρ)
Figure S3: Standard errors of leave-one-out cross-validation error estimates at varying levels of different data characteristics. Top and bottom panels correspond to feature set sizes of 25 and 75 respectively and a common training sample size (n=100). Each plot compares SEs of leave-one-out error estimates for the four methods at varying levels of a particular parameter as shown on the x-axis for given values of the other parameters. The given values are selected from the set (n = 100, σb = 2.5, σe = 1.5, θmin = 2, r = 3), the correlation structure being of the hub-Toeplitz form (except for the plots against ρ, which are based on single-block exchangeable correlation matrix to make the plot against ρ meaningful) For p = 75 SVM provides the most stable (lowest SE) estimates of leave-one-out error. LDA was found to give most precise estimates of error rates for p = 25 and higher correlations between features, but appears to be least stable for larger feature set size to sample size ratio (p/n > 0.5). Strength of RF is visible in situations where data are more variable and have smaller effect 3 sizes, in which cases it provides more stable error estimates than kNN and LDA.
●
● ●
●
●
●
1.00
●
x
●
●
●
●
●
●
●
●
●
●
0.96
● ●
●
0.92
●
0.88
0.88
0.90
0.92
0.90
●
0.94
0.95
●
●
0.1
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
0.1
0.5
Biological variability(σb)
●
●
●
●
●
2.0
2.5
3.0
3.5
4.0
0.0
●
● ●
0.94
0.94
●
0.92 1
3
5
7
9
11
13
15
17
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1.5
2.0
RF SVM LDA kNN
● ●
0.96
0.96
●
1.0
●
●
●
0.5
Lower bound of effect size(δ)
●
0.98
0.98
●
1.5
1.00
●
●
1.0
Within subject variability(σe)
0.92
Average sensitivity
● ●
●
0.96
●
0.98
●
0.85
Average sensitivity
1.00
Sensitivity (n=100, p=25)
0.9
Correlation(ρ)
Replication(r)
●
●
●
●
●
●
●
●
●
x
●
●
●
1.00
●
●
0.1
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
●
●
●
●
0.98 0.97 0.96 0.1
Biological variability(σb) ●
●
●
●
●
●
●
1.0
1.5
2.0
2.5
3.0
3.5
4.0
●
●
●
●
●
●
●
●
5
7
9
11
Replication(r)
13
15
17
1.0
●
0.94 3
0.5
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1.5
2.0
●
0.96
0.985 0.975 1
0.0
Lower bound of effect size(δ)
0.98
1.00
●
0.5
Within subject variability(σe)
0.995
●
Average sensitivity
●
● ●
0.95
0.96
0.92
0.97
0.94
0.98
0.96
●
0.99
0.98
●
0.99
●
1.00
●
0.90
Average sensitivity
1.00
Sensitivity (n=100, p=75)
RF SVM LDA kNN
0.9
Correlation(ρ)
Figure S4: Average sensitivity at varying levels of different data characteristics: biological variability(σb ), experimental noise (σe ), lower bound of effect size (δ), replication (r), and correlation between variables (ρ). Top and bottom panels correspond to feature set sizes of 25 and 75 variables respectively and a common training set size (n=100). Each plot compares sensitivity for the four methods at varying levels of a particular parameter as shown on the x-axis for given values of the other parameters. The given values are selected from the set (n = 100, σb = 2.5, σe = 1.5, θmin = 2, r = 3), the correlation structure being of the hub-Toeplitz form (except for the plots against ρ, which are based on single-block exchangeable correlation matrix to make the plot against ρ meaningful).
4
●
● ●
●
●
●
1.00
●
x
●
0.96
●
0.94 0.92
3.0
3.5
4.0
0.1
0.5
0.98
●
2.0
●
●
2.5
3.0
3.5
4.0
0.0
1.0
●
●
●
● ● ●
0.94
0.94
0.96
0.96
●
0.92 1
3
5
7
9
11
13
15
17
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1.5
●
●
2.0
RF SVM LDA kNN
●
●
●
0.5
Lower bound of effect size(δ)
●
0.98
●
1.5
1.00
●
●
●
1.0
Within subject variability(σe)
0.92
Average specificity
Biological variability(σb)
●
●
0.88
0.90 2.5
●
●
0.88 2.0
●
0.92
0.95 0.90
1.5
●
●
● ●
1.0
●
●
●
0.5
●
●
●
0.1
●
●
0.96
●
0.98
●
0.85
Average specificity
1.00
Specificity (n=100, p=25)
0.9
Correlation(ρ)
Replication(r)
●
●
●
●
●
●
●
●
●
x
●
1.5
2.0
2.5
3.0
3.5
4.0
●
●
●
●
●
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Within subject variability(σe) ●
●
●
●
●
●
●
●
●
●
0.0
0.5
1.0
1.5
2.0
Lower bound of effect size(δ) ●
●
0.97
0.995
0.95
0.985
RF SVM LDA kNN
0.93
0.975
●
0.98 0.5
0.99
●
●
0.96 0.1
Biological variability(σb) ●
●
0.95
0.96 1.0
●
●
0.97
0.98 0.97
0.94 0.92
0.5
●
●
●
0.99
●
0.99
0.98 0.96
●
0.1
Average specificity
●
●
●
1.00
●
1.00
●
0.90
Average specificity
1.00
Specificity (n=100, p=75)
1
3
5
7
9
11
Replication(r)
13
15
17
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Correlation(ρ)
Figure S5: Average specificity at varying levels of different data characteristics: biological variability(σb ), experimental noise (σe ), lower bound of effect size (δ), replication (r), and correlation between variables (ρ). Top and bottom panels correspond to feature set sizes of 25 and 75 variables respectively and a common training set size (n=100). Each plot compares specificity for the four methods at varying levels of a particular parameter as shown on the x-axis for given values of the other parameters. The given values are selected from the set (n = 100, σb = 2.5, σe = 1.5, θmin = 2, r = 3), the correlation structure being of the hub-Toeplitz form (except for the plots against ρ, which are based on single-block exchangeable correlation matrix to make the plot against ρ meaningful).
5
p=50 0.12 0.08
●
0.05 ●
●
20
●
●
50
● ●
●
100
●
0.00
●
10
0.04
0.10
0.20 0.15
●
200
●
10
● ●
20
●
●
50
●
●
100
●
200
0.00
0.25
0.15
0.30
0.20
p=25
●
0.10
Average error rate (leave−one−out)
p=5
●
10
20
●
●
●
50
●
100
●
●
200
Sample size(n)
0.04
● ●
10
20
●
●
●
50
Sample size(n)
●
100
●
●
200
0.00 0.02 0.04 0.06 0.08 0.10 0.12
0.08
0.12
p=100
●
0.00
Average error rate (leave−one−out)
p=75
Legend
● ●
RF SVM LDA kNN
● ●
10
●
20
●
●
50
●
100
●
●
200
Sample size(n)
Figure S6: (Non-normal data) Average leave-one-out cross-validation error at varying levels of training sample and feature set sizes. Error rates are plotted against 9 different values of n as given in Table 1 (main paper). The five plots correspond to five different values of p (feature set size): 5, 25, 50, 75 and 100 respectively. The patterns and order of performances for Poisson data look very similar to that we observed for Gaussian data (Figure 2, main paper). Although SVM performs better than the other methods for larger feature set sizes (relative to sample size), the method can perform poorly if the sample size is too small (n < 20). LDA can theoretically handle feature set size as high as the sample size (p < n), but it’s performance seem to deterirate as the fature set size (p) grows beyond approximately half the sample size (e.g., see the plot for p = 25).
6