A one-way ANOVA test for functional data with ...

3 downloads 0 Views 520KB Size Report
Solutions. • Asymptotic version of the ANOVA F-test (AsF) ... Bootstrap procedures based on pointwise F-test ... hypotheses in the case of equal variances.
A one-way ANOVA test for functional data with graphical interpretation Mari Myllymäki based on joint work with Tomáš Mrkviˇcka and Ute Hahn

SSIAB12, 23-25.5.2018, Aalborg

Luonnonvarakeskus

Naturresursinstitutet

Natural Resources Institute Finland

www.luke.fi

The problem • Classical year temperature curves. • The water temperature data sampled at the water level of Rimov reservoir in Czech republic every day for last 36 years were used. • The water temperature data are naturally smoothed.

0

100

200

days in a year

300

25 20 15 0

5

10

temperature

20 15 0

5

10

temperature

20 15 10 0

5

temperature

Third group 03 − 14

25

Second group 91 − 02

25

First group 79 − 90

0

100

200

days in a year

300

0

100

Figure: The temperature curves in the three groups. 2

200

days in a year

c

Natural Resources Institute Finland

300

The problem We have J groups which contain n1 , . . . , nJ functions and denote the functions by Tij , j = 1, . . . , J, i = 1, . . . , nj . Assume that {Tij , i = 1, . . . , ni } is an i.i.d. sample from a stochastic process SP(µj , γj ) with a mean function µj and a covariance function γj (s, t), s, t ∈ R for j = 1, . . . , J. We want to test the hypothesis H0 : µ1 (r ) = . . . = µJ (r ), r ∈ R.

100

200

days in a year

300

25 20 15 0

5

10

temperature

20 15 0

5

10

temperature

20 15 10

temperature

5 0 0 3

Third group 03 − 14

25

Second group 91 − 02

25

First group 79 − 90

0

100

200

300

days in a Natural cyear

0

100

200

Resources Institute Finlanddays in a year

300

Solutions • Asymptotic version of the ANOVA F -test (AsF) • Random univariate projection method (RPM) • Bootstrap procedures based on pointwise F -test • Wavelet smoothing techniques • Dimension reduction • Permutation F -max test • Permutation p-min test But none of the available methods is able to give an graphical interpretation of the test results in the original space of the functions which can help the user to understand what are the reasons of potential rejections, when or where the potential differences appear.

4

c

Natural Resources Institute Finland

Our solution We propose to base the ANOVA test on global rank envelope tests (Myllymäki et al., 2017; Mrkviˇcka et al., 2017). We need to choose appropriate test functions for the test, and be able to generate simulations under the null hypothesis.

Myllymäki, Mrkviˇcka, Grabarnik, Seijo and Hahn (2017). Global envelope tests for spatial processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79: 381–404. doi: 10.1111/rssb.12172 Mrkviˇcka T., Myllymäki M., Hahn U. (2017) Multiple Monte Carlo testing, with applications in spatial point processes. Statistics & Computing 27 (5): 1239-1255. doi: 10.1007/s11222-016-9683-9

5

c

Natural Resources Institute Finland

Our solution

The hypothesis H0 : µ1 (r ) = . . . = µJ (r ), r ∈ R can be tested by the rank test where the test vector is taken to be T = (T 1 (r), T 2 (r), . . . , T J (r)), where T i (r) = (T i (r1 ), . . . , T i (rK ) is the average of functions in the ith group.

6

c

Natural Resources Institute Finland

Our solution

The simulations, which are necessary for applying the rank test, are produced by simple permutation of raw functions Tij (r ) among the groups. The observed and simulated (permuted) test vectors T1 , . . . , Ts are exchangeable for permutations under the correspondent null hypotheses in the case of equal variances. In the following example, we used the global extreme rank length (ERL) envelope test (with s = 2500).

7

c

Natural Resources Institute Finland

Temperature curves Rank envelope test (using extreme rank length): 0.004 1

2

20

20

15

15

10

10

5

5

0

0

T �r �

300

200

100

0

300

200

100

0

Each subplot shows the comparison of a group with respect to variability of all groups. The 95% global envelope demonstrates the variability of average curve in the original space of curves.

3 20 15 10 5 0 300

200

100

0

r Data function

8

Central function

c

Natural Resources Institute Finland

Temperature curves Rank envelope test (using extreme rank length): 0.004 1

2

20

20

15

15

10

10

5

5

0

0

T �r � 15 10

300

20

200

100

0

300

200

100

0

3

The result corresponds to the finding that the summer comes earlier in the last decade than earlier.

5 0 300

200

100

0

r Data function

8

Central function

c

Natural Resources Institute Finland

Our solution The hypothesis H0 is equivalent to the hypothesis H00 : µi (r ) − µj (r ) ≡ 0, i = 1, . . . , J − 1, j = i, . . . , J. This hypothesis corresponds to the post-hoc test done usually after the ANOVA test is significant. However, this hypothesis can be directly tested by the rank test, if the test vector is taken to consist of differences of the group averages of test functions. We can shortly write that T0 = (T 1 (r) − T 2 (r), T 1 (r) − T 3 (r), . . . , T J−1 (r) − T J (r)).

9

c

Natural Resources Institute Finland

Temperature curves Rank envelope test (using extreme rank length): 0.003 1−2

1−3

Comparison of 3 groups of curves via difference of group averages.

4 2.5

2

0.0

0 −2

−2.5

−4

−5.0

T �r � 0.0 −2.5

300

200

100

0

r Data function

10

Central function

300

2.5

200

100

0

300

200

100

0

2−3

The subplots correspond to the difference of the two group averages: negative values corresponds to the situation that the second group average is higher than the first group average.

c

Natural Resources Institute Finland

Temperature curves Rank envelope test (using extreme rank length): 0.003 1−2

1−3

4 2.5

2

0.0

0 −2

−2.5

−4

−5.0

T �r �

300

2.5

200

100

0

300

200

100

0

2−3

The test contains implicitely the post hoc comparison of groups!

0.0 −2.5

300

200

100

0

r Data function

10

Central function

c

Natural Resources Institute Finland

Our solution

Recall that both tests described above are done at one common exact significance level α, which means that it is not necessary to perform the ANOVA test prior to the post-hoc test. Instead it is possible to apply only the post-hoc test obtaining an answer about the overall ANOVA test and also about the differences of groups.

11

c

Natural Resources Institute Finland

Power comparison - Simulation study The proposed tests were compared with existing one-way ANOVA tests available in R We compared the tests with artificial examples of three groups for four different cases with two different errors (independent and Brownian errors) and six different contaminations. (2000 permutations) The estimated powers of our procedures were significantly greater than the powers of AsF and RPM, Fb and GPF methods. They were similar to p-min and F -max tests, because they are of similar nature. Some of the tests have different nature. Further tests were done for ten groups. Lower power was observed, but increasing the number of permutations increased the power.

12

c

Natural Resources Institute Finland

Correction for unequal variances To deal with different variances of functions in different groups, consider the rescaled functions Tij (r ) − T (r ) p Sij (r ) = p · Var(T (r )) + T (r ), Var(Tj (r )) where the group sample variance Var(Tj (r )) corrects the unequal variances, the overall sample mean T (r ) and overall sample variance Var(T (r )) are involved to keep the mean and variability of the functions at the original scale.

13

c

Natural Resources Institute Finland

Correction for unequal variances

The test vectors are exchangeable for permutations under the correspondent null hypotheses in the case of equal variances. → exact test The test vectors are asymptotically exchangeable for permutations under correspondent null hypotheses in the case of unequal variances. The asymptotics is taken over minJj=1 nj . → asymptotically exact test

14

c

Natural Resources Institute Finland

Comparison of groups of point patterns To describe our approach we reanalyse the data of Diggle et al (1991) containing three groups of pyramidal neurons in the cingulate cortex of humans, the normal (control) group, schizoaffective group and schizophrenic group. One representative pattern from each group can be seen here: normal

schizoaffective

schizophrenic





● ●













● ●



● ● ●







● ●

● ● ●

















● ●



● ●



● ●





● ● ●









● ●

● ●

















● ●

● ●









● ●















● ●

● ●









● ●













15











● ●





● ●





● ●





● ●

● ●





● ● ●

c

Natural Resources Institute Finland

● ●

● ●

● ●

Comparison of groups of point patterns We chose L-functions as our summary functions. Since the point patterns have different numbers of points, the L-functions are estimated with different precisions. The variance of the estimated L-function behaves approximately as 1/mij , where mij is the number of points in the ij-th point pattern. We scale the estimated centred L-functions, ˆc (r ) − Lc (r ) L c ij p τ + L (r ), Sij (r ) = 1/mij c

where (r ) is thePsample mean of all functions and P Lp τ = ij 1/mij / ij 1 is the estimator of the overall standard deviation of the functions. Then all Sij (r ) have (approximately) equal variances. 16

c

Natural Resources Institute Finland

Comparison of groups of point patterns

0.05

0.10

0.15 r

0.20

0.25

0.00 −0.06

−0.04

^ L (r ) − r −0.02

0.00 ^ L (r ) − r −0.02 −0.04 −0.06

−0.06

−0.04

^ L (r ) − r −0.02

0.00

0.02

schizophrenic

0.02

schizoaffective

0.02

normal

0.05

0.10

0.15 r

0.20

0.25

0.15 r

0.20

0.25

0.15 r

0.20

0.25

0.20

0.25

0.02 −0.06

−0.04

^ L (r ) − r −0.02

0.00

0.02 0.00 −0.04 −0.06 0.10

0.10

schizophrenic

^ L (r ) − r −0.02

0.00 ^ L (r ) − r −0.02 −0.04 −0.06 0.05

0.05

schizoaffective

0.02

normal

0.05

0.10

0.15 r

0.20

0.25

0.05

0.10

0.15 r

Figure: The original (top) and scaled (bottom) estimated centred L-functions in the three groups. 17

c

Natural Resources Institute Finland

Rank envelope test (using extreme rank length): 0.068 0.02 0.01 0.00 −0.01 −0.02 0.25 1

0.2 1500

0.15 1

0.1 1500

0.05 1

0.2 1500

0.15 1

0.1 1500

0.05 1

0.2 1500

0.15 1

0.1 1500

0.05 1

^ diff. of group means ( L (r ) − r)

Comparison of groups of point patterns

r Data function

Central function

Rank envelope test for comparison of the three groups of L-functions via difference of group weighted averages using 2500 permutations. The left subplot corresponds to the difference between the 1–2 group, the middle subplot corresponds to the difference between the 1–3 group and the right subplot corresponds to the difference between the 2–3 group. 18

c

Natural Resources Institute Finland

Discussion and conlusions

• The proposed tests are exact, i.e. the type I error of the test can be chosen a priori. • The post-hoc comparison can be made together with the test • The global envelope helps to interpret the result of the test, it shows the area responsible for the rejection • The correction for unequal variances among the group is also available

19

c

Natural Resources Institute Finland

R library GET

The methods are implemented in R library GET. Get GET at

https://github.com/myllym/GET

The function graph.fanova can be used for the new graphical functional ANOVA.

20

c

Natural Resources Institute Finland

References

Myllymäki M., Mrkviˇcka T., Seijo H., Grabarnik P., Hahn U. (2017) Global envelope tests for spatial processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79: 381–404. doi: 10.1111/rssb.12172 Mrkviˇcka T., Myllymäki M., Hahn U. (2017) Multiple Monte Carlo testing, with applications in spatial point processes. Statistics & Computing 27 (5): 1239-1255. doi: 10.1007/s11222-016-9683-9 Mrkviˇcka T., Myllymäki M., Hahn U. A one-way ANOVA test for functional data with graphical interpretation. arXiv:1612.03608 [stat.ME] (http://arxiv.org/abs/1612.03608)

21

c

Natural Resources Institute Finland

22

c

Natural Resources Institute Finland

R library GET: rimov example data(rimov) groups