Changepoint Detection in Multinomial Logistic Regression with Application to Sky-Cloudiness Conditions in Canada QiQi Lu 1 Department
2 Climate
1
Xiaolan L. Wang2
of Mathematics and Statistics, Mississippi State University
[email protected] Research Division, ASTD, STB, Environment Canada
[email protected]
10th International Meeting on Statistical Climatology August 20-24, 2007, Beijing China
Lu, Wang
Changepoint Detection in Cloudiness Condition
Introduction Continuation-Ratio Logit Model with Random Effects Test Statistics Examples
Outline
1
Introduction
2
Continuation-Ratio Logit Model with Random Effects
3
Test Statistics
4
Examples
Lu, Wang
Changepoint Detection in Cloudiness Condition
Introduction Continuation-Ratio Logit Model with Random Effects Test Statistics Examples
Cloudiness Conditions in Canada
Cloudiness Conditions has 11 categories (0, 1, . . ., 10). It follows a multinormal distribution. Categories are ordered. Categories are hierarchically related.
Lu, Wang
Changepoint Detection in Cloudiness Condition
Cloudiness Condition at Hay River A T × K contingency table (T=25, K=11) Assume product-multinomial sampling
Observed Annual Average Counts Year
0
1
2
3
4
5
6
7
8
9
10
Total
1975
18
20
15
15
12
9
10
15
20
45
63
242
1976
14
27
17
15
14
10
11
15
20
46
55
243
1977 .. .
10 .. .
25 .. .
18 .. .
12 .. .
11 .. .
9 .. .
10 .. .
13 .. .
23 .. .
54 .. .
58 .. .
244 .. .
1997
32
18
11
11
9
9
7
9
15
42
80
243
1998
42
17
11
12
7
8
6
8
11
34
87
243
1999
27
16
13
9
8
9
5
10
16
39
90
242
Total sample size
6085
Lu, Wang
Changepoint Detection in Cloudiness Condition
100 80 60 40 0
1985
1995
40
60
80
100
Time of Observation
0 1975
1985
1995
1975
1985
1995
Time of Observation
60
80
100
Time of Observation
Frequency in category 10
100 80 60 40
1975
Frequency in category 7
100 80 60 40 20
1995
0
20
20
Frequency in category 3
100
1995
0 1985
Time of Observation
0 1995
1985
Time of Observation
Frequency in category 6
100 80 60 40 1975
Frequency in category 9
100 80 60 40 20
1985
Time of Observation
80 1975
0 1995
0 1975
60
1995
20
Frequency in category 5
100 80 60 40
Frequency in category 4
20
1985
Time of Observation
Frequency in category 8
1985
Time of Observation
0 1975
40 0
1975
20
1995
40
1985
Time of Observation
20
1975
20
Frequency in category 2
100 80 60 40 0
20
Frequency in category 1
100 80 60 40 20 0
Frequency in category 0
Frequencies of Cloudiness Condition at Hay River A
1975
1985
1995
Time of Observation
1975
1985
1995
Time of Observation
Individual category yearly frequency series at Hay River A
Lu, Wang
Changepoint Detection in Cloudiness Condition
Introduction Continuation-Ratio Logit Model with Random Effects Test Statistics Examples
Changepoints
A changepoint is a time at which the structural pattern of a time series changes. Changepoints can be caused by moving a recording station, changing observers, or redefining the categories... Assume at most one changepoint (AMOC). For cloudiness conditions, the changes could occur in all categories, or just in some categories (at least two).
Lu, Wang
Changepoint Detection in Cloudiness Condition
Introduction Continuation-Ratio Logit Model with Random Effects Test Statistics Examples
Objective
Develop a test for a changepoint in a sequence of independent multinomial variables. Identify which categories have experienced the significant changes.
Lu, Wang
Changepoint Detection in Cloudiness Condition
Introduction Continuation-Ratio Logit Model with Random Effects Test Statistics Examples
Continuation-Ratio Logits
logit(ωtk ) = log
πtk ωtk = log , k = 1, . . . , K −1. 1 − ωtk πt,k+1 + . . . + πtK
Y ∈ {1, . . . , K } denotes an ordered categorical response variable. πtk , k = 1, . . .P , K , is the probability of outcome in category k at time t with K k=1 πtk = 1. ωtk = P(Y = k|Y ≥ k, t) = πtk /(πtk + πt,k+1 + . . . + πtK ), k = 1, . . . , K − 1.
Lu, Wang
Changepoint Detection in Cloudiness Condition
Continuation-Ratio Logit Random Effects Model
logit(ωtk ) = αk + βk t + ∆k 1[t>τ ] + utk , k = 1, . . . , K − 1, t = 1, . . . , T Fixed effects: i. αk is intercept and βk is linear time trend. ii. ∆k is the changepoint parameter with the unknown changepoint time τ .
Random effects: i. utk is random effects. ii. ut = (ut1 , . . . , ut(K −1) ) has NK −1 (0, Σ). iii. logit(ωt )(ωt = (ωt1 , . . . , ωt(K −1) ) has a multivariate binomial logit-normal distribution.
Lu, Wang
Changepoint Detection in Cloudiness Condition
Introduction Continuation-Ratio Logit Model with Random Effects Test Statistics Examples
Interpretation of parameters
Conditional on the random effects, the trend βk is a log odds ratio of ωtk at times t + 1 and t when the series presents no sudden changes; the changepoint introduces a change in the log odds ratio of ωtk at time τ + 1 and τ by ∆k .
Lu, Wang
Changepoint Detection in Cloudiness Condition
Introduction Continuation-Ratio Logit Model with Random Effects Test Statistics Examples
Why?
Continuation-Ratio Logit model is derived from the assumption about the underlying stepwise response mechanism (Tutz 1991). It has the advantage of being a simple decomposition of a multinomial distribution under some assumption. In this study, we assume that i. utk ∼ N(0, σk2 ), k = 1, . . . , K − 1. ii. utk ’s are independent.
Lu, Wang
Changepoint Detection in Cloudiness Condition
Introduction Continuation-Ratio Logit Model with Random Effects Test Statistics Examples
Marginal Log-Likelihood Function `(θ, Σ) =
T K −1 X X
h(αk , βk , ∆k , σk ),
t=1 k=1
θ = (α1 , . . . , αK −1 , β1 , . . . , βK −1 , ∆1 , . . . , ∆K −1 ) Σ contains parameters σ1 , . . . , σK −1 . +∞
exp (αk + βk t + ∆k I[t>τ ] + utk ) 1 + exp (αk + βk t + ∆k I[t>τ ] + utk ) −∞ Mtk −ytk 1 × 1 + exp (αk + βk t + ∆k I[t>τ ] + utk ) u2 (− tk2 ) × σk−1 e 2σk dutk Z
h(αk , βk , ∆k , σk ) = log
Lu, Wang
Changepoint Detection in Cloudiness Condition
ytk
Introduction Continuation-Ratio Logit Model with Random Effects Test Statistics Examples
Likelihood Ratio Test Statistics Our goal is to test H0 : ∆k = 0
for all k
Ha : ∆k 6= 0
for some k = 1, . . . , K − 1
Test Statistics: Lmax = max L(τ ), 1≤τ ≤T −1 where
L(τ ) = −2 `(θ , Σ ) − `(θ (τ ), Σ (τ )) . ˆ(0)
ˆ (0)
Lu, Wang
ˆ(a)
ˆ (a)
Changepoint Detection in Cloudiness Condition
Introduction Continuation-Ratio Logit Model with Random Effects Test Statistics Examples
Partition of Lmax Lmax has K − 1 degrees of freedom. Partition Lmax into K − 1 components Lk (ˆ τ ), k = 1, . . . , K − 1. Each Lk (ˆ τ ) has 1 degree freedom and is independent of others. X T (0) (0) ˆ (0) (0) (a) ˆ(a) ˆ (a) (a) Lk (ˆ τ ) = −2 h(ˆ αk , βˆk , ∆ , σ ˆ ) − h(ˆ α , β , ∆ , σ ˆ ) k k k k k k t=1
Lu, Wang
Changepoint Detection in Cloudiness Condition
40 20
95th Lmax Percentile
0
Log likelihhod statistics
60
Introduction Continuation-Ratio Logit Model with Random Effects Test Statistics Examples
1975
1980
1985
1990
1995
2000
Time of Observation (Year) Likelihood Statistics Lu,Log Wang Changepoint Detection in Cloudiness Condition
10
15
●
95th Lmax Percentile ● ●
5
●
●
● ●
●
● ●
0
Component Test Statistics
20
25
Introduction Continuation-Ratio Logit Model with Random Effects Test Statistics Examples
0
1
2
3
4
5
6
7
8
9
10
Category Component Statistics Lu, Wang TestChangepoint Detection in Cloudiness Condition
100 60 40 20
Frequency in category 3
0
1995
100 60 40
Frequency in category 7
0 1975
1985
1995
1975
1985
1995
Time of Observation
60
Raw Frequencies Estimated without changepoint Estimated with changepoint
40
Frequency in category 10
60 40
0
0
80
100
Time of Observation
100
1985
80
100 60 40 20
Frequency in category 6 1995
20
Frequency in category 9
1975
Time of Observation
0 1985
80
100
1995
80
100 60 40 20
Frequency in category 5
1975
80 60 40
1995
1985 Time of Observation
Time of Observation
20
1985 Time of Observation
80
100 1975
0 1995
0 1975
60
1995
80
100 80 60 40
Frequency in category 4
20
1985 Time of Observation
Frequency in category 8
1985 Time of Observation
0 1975
40 0
1975
20
1995
Time of Observation
20
1985
20
Frequency in category 2
80
100 60 40 0
20
Frequency in category 1
80
100 80 60 40 20
Frequency in category 0
0 1975
1975
1985
1995
Time of Observation
Lu, Wang
1975
1985
1995
Time of Observation
Changepoint Detection in Cloudiness Condition
Future Work
Multiple undocumented changepoints in cloudiness conditions. Take into account the autocorrelation and periodicity in the detection method.
Lu, Wang
Changepoint Detection in Cloudiness Condition
Thank you!
Lu, Wang
Changepoint Detection in Cloudiness Condition
Power Study
∆k = κσk
Table: Detection Powers
τ 5 13
κ = 0.0 0.056 0.054
Lu, Wang
κ = 0.5 0.044 0.026
κ = 1.0 0.458 0.278
Changepoint Detection in Cloudiness Condition