A Fast Approach for Multiple Change-point Detection

1 downloads 0 Views 3MB Size Report
Harchaoui & Lévy-Leduc (2010) proposed to rephrase the change-point estimation ... of two-dimensional change-point estimation as a high dimensional sparse ...
Biometrika (2015), xx, x, pp. 1–23

C 2007 Biometrika Trust Printed in Great Britain

A Fast Approach for Multiple Change-point Detection in Two-dimensional Data B Y V INCENT B RAULT, C E´ LINE L E´ VY-L EDUC UMR MIA-Paris, AgroParisTech, INRA, Universit´e Paris-Saclay, 75005, Paris, France [email protected] [email protected]

5

AND J ULIEN

C HIQUET ´ LaMME - UMR CNRS 8071, Universit´e d’Evry-Val-d’Essonne ´ 23 boulevard de France, 91037 Evry, France [email protected] S UMMARY We propose a novel approach for estimating the location of block boundaries (change-points) in a random matrix consisting of a block wise constant matrix observed in white noise. Our method consists in rephrasing this task as a variable selection issue. We use a penalized leastsquares criterion with an `1 -type penalty for dealing with this problem. We first provide some theoretical results ensuring the consistency of our change-point estimators. Then, we explain how to implement our method in a very efficient way. Finally, we provide some empirical evidence to support our claims and apply our approach to HiC data coming from molecular biology which are used for better understanding the chromatin structure.

10

15

Some key words: change-points, lasso, two-dimensional data, HiC experiments.

1. I NTRODUCTION Detecting automatically the block boundaries in a block wise constant matrix corrupted with noise is a very important issue which may have several applications. One of the main situations in which this problem occurs is the detection of chromosomal regions having close spatial location in the nucleus. Detecting such regions will improve our understanding of the influence of the chromosomal conformation on the cells functioning. The data provided by the most recent technology called HiC consist of a list of pairs of locations along the chromosome which are often summarized as a square matrix such that each entry corresponds to the number of interactions between two positions along the chromosome, see Dixon et al. (2012). Since this matrix can be modeled as a block wise constant matrix corrupted by some additional noise, it is of particular interest to design an efficient and fully automated method to find the block boundaries of large matrices, which may typically have several thousands of rows and columns, in order to identify the interacting chromosomal regions. A large literature is dedicated to the change-point detection issue for one-dimensional data. This problem can be addressed from a sequential (online) Tartakovsky et al. (2014) or from a retrospective (off-line) Brodsky & Darkhovsky (2000) point of view. Many off-line approaches are based on the dynamic programming algorithm which retrieves K change-points within n observations of a one-dimensional signal with a complexity of O(Kn2 ) in time Kay (1993).

20

25

30

35

V. B RAULT, J. C HIQUET AND C. L E´ VY-L EDUC

2

40

45

50

55

60

Such a complexity is however prohibitive for dealing with very large data sets. In this situation, Harchaoui & L´evy-Leduc (2010) proposed to rephrase the change-point estimation issue as a variable selection problem. This approach has also been extended by Vert & Bleakley (2010) to find shared change-points between several signals. To the best of our knowledge no method has been proposed for addressing the case of two-dimensional data where the number of rows and columns may be very large (n × n ≈ 5000 × 5000, namely 2 × 107 observations). The only statistical approach proposed for retrieving the change-point positions in the two-dimensional framework is the one devised by L´evy-Leduc et al. (2014) but it is limited to the case where the blockwise matrix is assumed to be blockwise constant on the diagonal and constant outside the diagonal blocks. It has first to be noticed that the classical dynamic programming algorithm cannot be applied in such a framework since the Markov property does not hold anymore. Moreover, the grouplars approach of Vert & Bleakley (2010) cannot be used in this framework since it would only provide change-points in columns and not in rows. As for the generalized Lasso recently devised by Tibshirani & Taylor (2011) or the two dimensional fused Lasso of Hoefling (2010), they are very helpful for image denoising but do not give access to the change-point positions. The paper is organized as follows. In Section 2, we first describe how to rephrase the problem of two-dimensional change-point estimation as a high dimensional sparse linear model and give some theoretical results which prove the consistency of our change-point estimators. In Section 3, we describe how to efficiently implement our method. In Section 4, we provide experimental evidence of the relevance of our approach on synthetic and real data coming from molecular biology called HiC data.

2.

S TATISTICAL FRAMEWORK 2·1. Statistical modeling In this section, we explain how the two-dimensional retrospective change-point estimation issue can be seen as a variable selection problem. Our goal is to estimate t?1 = (t?1,1 , . . . , t?1,K ? ) 1 and t?2 = (t?2,1 , . . . , t?2,K ? ) from the random matrix Y = (Yi,j )1≤i,j≤n defined by 2

Y = U + E,

(1)

where U = (Ui,j ) is a blockwise constant matrix such that Ui,j = µ?k,` 65

70

if t?1,k−1 ≤ i ≤ t?1,k − 1 and t?2,`−1 ≤ j ≤ t?2,` − 1,

with the convention t?1,0 = t?2,0 = 1 and t?1,K ? +1 = t?2,K ? +1 = n + 1. An example of such a ma1 2 trix U is displayed in Figure 1. The entries Ei,j of the matrix E = (Ei,j )1≤i,j≤n are iid zeromean random variables. With such a definition the Yi,j are assumed to be independent random variables with a blockwise constant mean. Let T be a n × n lower triangular matrix with nonzero elements equal to one and B a sparse matrix containing null entries except for the Bi,j such that (i, j) ∈ {t?1,0 , . . . , t?1,K ? } × 1 {t?2,0 , . . . , t?2,K ? }. Then, (1) can be rewritten as follows: 2

Y = TBT> + E,

75

(2)

where T> denotes the transpose of the matrix T. For an example of a matrix B, see Figure 1. Let Vec(X) denotes the vectorization of the matrix X formed by stacking the columns of X into a single column vector then Vec(Y) = Vec(TBT> ) + Vec(E). Hence, by using that Vec(AXC) = (C> ⊗ A)Vec(X), where ⊗ denotes the Kronecker product, (2) can be rewritten

3

A Fast Approach for Multiple Change-point Detection in Two-dimensional Data t?2,0

t?2,1

t?2,2 t?2,3

t?2,K ?

t?1,0 µ?1,1

µ?1,2 µ?1,3 µ?1,4

t?2,K ? +1

2

t?2,0

2

t?1,0

µ?1,5

t?1,1

t?1,1 µ?2,1

t?1,2

µ?3,1

µ?2,2 µ?3,2

µ?2,3 µ?3,3

µ?2,4 µ?3,4

µ?2,5 t?1,2

µ?3,5

t?1,K ? 1

t?1,K ? µ?4,1

µ?4,2 µ?4,3 µ?4,4

1

µ?4,5

t?1,K ? +1 1

t?1,K ? +1 1

t?2,1

t?2,K ?

t?2,2 t?2,3

t?2,K ? +1

2

2

B1,1

0

0

0

B1,5

0

0

B1,8

0

B1,10

0

0

B1,13

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

B7,1

0

0

0

B7,5

0

0

B7,8

0

B7,10

0

0

B7,13

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

B11,1

0

0

0

B11,5

0

0

B11,8

0

B11,10

0

0

B11,13

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

B13,1

0

0

0

B13,5

0

0

B13,8

0

B13,10

0

0

B13,13

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Fig. 1. Left: An example of a matrix U with n = 16, K1? = 3 and K2? = 4. Right: The matrix B associated to this matrix U.

as: Y = X B + E,

(3)

where Y = Vec(Y), X = T ⊗ T, B = Vec(B) and E = Vec(E). Thanks to these transformations, Model (1) has thus been rephrased as a sparse high dimensional linear model where Y and E are n2 × 1 column vectors, X is a n2 × n2 matrix and B is n2 × 1 sparse column vectors. Multiple change-point estimation Problem (1) can thus be addressed as a variable selection problem:  b n ) = Argmin kY − X Bk2 + λn kBk1 , B(λ (4) 2 B∈Rn

2

P 2 where kuk22 and kuk1 are defined for a vector u in RN by kuk22 = N i=1 ui and kuk1 = PN i=1 |ui |. Criterion (4) is related to the popular Least Absolute Shrinkage and Selection Operator (LASSO) in least-square regression. Thanks to the sparsity enforcing property of the `1 norm, the estimator Bb of B is expected to be sparse and to have non-zero elements matching with those of B. Hence, retrieving the positions of the non zero elements of Bb thus provides estimators b n ) the set of active of (t?1,k )1≤k≤K1? and of (t?2,k )1≤k≤K2? . More precisely, let us define by A(λ variables: n o b n ) = j ∈ {1, . . . , n2 } : Bbj (λn ) 6= 0 . A(λ b n ), consider the Euclidean division of (j − 1) by n, namely (j − 1) = nqj + rj For each j in A(λ then b t1 = (b t1,k )1≤k≤|Ab (λ 1

n )|

80

b n )}, b ∈ {rj + 1 : j ∈ A(λ t2 = (b t2,` )1≤`≤|Ab (λ 2

where b t1,1 < b t1,2 < · · · < b t1,|Ab (λ 1

n )|

,

n )|

85

90

b n )} ∈ {qj + 1 : j ∈ A(λ

b t2,1 < b t2,2 < · · · < b t2,|Ab (λ 2

n )|

. (5)

V. B RAULT, J. C HIQUET AND C. L E´ VY-L EDUC

4

95

100

b n )} In (5), |Ab1 (λn )| and |Ab2 (λn )| correspond to the number of distinct elements in {rj : j ∈ A(λ b and {qj : j ∈ A(λn )}, respectively. As far as we know, neither thorough practical implementation nor theoretical grounding have been given so far to support such an approach for change-point estimation in the two-dimensional case. In the following section, we give theoretical results supporting the use of such an approach. 2·2. Theoretical results In order to establish the consistency of the estimators b t1 and b t2 defined in (5), we shall use assumptions (A1–A4). These assumptions involve the two following quantities ? Imin = ? Jmin

105

=

min |t?1,k+1 − t?1,k | ∧ min ? |t?2,k+1 − t?2,k |,

0≤k≤K1?

min

0≤k≤K2

1≤k≤K1? ,1≤`≤K2? +1

|µ?k+1,`

− µ?k,` | ∧

min

1≤k≤K1? +1,1≤`≤K2?

|µ?k,`+1 − µ?k,` |,

which corresponds to the smallest length between two consecutive change-points and to the smallest jump size between two consecutive blocks, respectively.

(A1) The random variables (Ei,j )1≤i,j≤n are iid zero mean random variables such that there exists a positive constant β such that for all ν in R, E[exp(νE1,1 )] ≤ exp(βν 2 ). ? )−1 λ → 0, as n tends to infinity. (A2) The sequence (λn ) appearing in (4) is such that (nδn Jmin n

(A3) The sequence (δn ) is a non increasing and positive sequence tending to zero such that ? 2 / log(n) → ∞, as n tends to infinity. nδn Jmin

110

? ≥ nδ . (A4) Imin n

P ROPOSITION 1. Let (Yi,j )1≤i,j≤n be defined by (1) and b t1,k , b t2,k be defined by (5). Assume ? ? b b that |A1 (λn )| = K1 and that |A2 (λn )| = K2 , with probabilty tending to one, then,     P max ? b t1,k − t?1,k ≤ nδn ∩ max ? b t2,k − t?2,k ≤ nδn → 1, as n → ∞. (6) 1≤k≤K1

115

1≤k≤K2

The proof of Proposition 1 is based on the two following lemmas. The first one comes from the Karush-Kuhn-Tucker conditions of the optimization problem stated in (4). The second one allows us to control the supremum of the empirical mean of the noise. b where X and Bb are defined in L EMMA 1. Let (Yi,j )1≤i,j≤n be defined by (1). Then, Ub = X B, (3) and (4) respectively, is such that n X

120

n X

n X

n X

λn sign(Bbj ), if Bbj 6= 0, Ubk,` = 2 k=rj +1 `=qj +1 k=rj +1 `=qj +1 n n n n X X X X λn b Yk,` − Uk,` ≤ , if Bbj = 0, 2 k=rj +1 `=qj +1 k=rj +1 `=qj +1 Yk,` −

(7)

(8)

where qj and rj are the quotient and the remainder of the Euclidean division of (j − 1) by n, respectively, that is (j − 1) = nqj + rj . In (7), sign denotes the function which is defined b which is by sign(x) = 1, if x > 0, −1, if x < 0 and 0 if x = 0. Moreover, the matrix U, b b b b such that U = Vec(U), is blockwise constant and satisfies Ui,j = µ bk,` , if t1,k−1 ≤ i ≤ b t1,k − 1

A Fast Approach for Multiple Change-point Detection in Two-dimensional Data

5

and b t2,`−1 ≤ j ≤ b t2,` − 1, k ∈ {1, . . . , |Ab1 (λn )|}, ` ∈ {1, . . . , |Ab2 (λn )|}, where the b t1,k , b t2,k , b b A1 (λn ) and A2 (λn ) are defined in (5).

125

L EMMA 2. Let (Ei,j )1≤i,j≤n be random variables satisfying (A1). Let also (vn ) and (xn ) be two positive sequences such that vn x2n / log(n) → ∞, then   sX n −1 P  max (sn − rn )−1 En,j ≥ xn  → 0, as n → ∞, 1≤rn v requires at worse 2n2 operations.

145

L EMMA 4. Let A = {a1 , . . . , aK } and for each j in A let us consider the Euclidean division of j − 1 by n given by j − 1 = nqj + rj , then    = ((n − (qak ∨ qa` )) × (n − (rak ∨ ra` )))1≤k,`≤K . X >X (9) A,A

1≤k,`≤K

 Moreover, for any non empty subset A of distinct indices in 1, . . . , n2 , the matrix XA> XA is invertible. L EMMA 5. Assume that we have at our disposal the Cholesky factorization of XA> XA . The updated factorization on the extended set A ∪ {j} only requires solving a |A|-size triangular system, with complexity O(|A|2 ). Moreover, the downdated factorization on the restricted set A\ {j} requires a rotation with negligible cost to preserve the triangular form of the Cholesky factorization after a column deletion. Remark 2. We were able to obtain a closed-form expression of the inverse (XA> XA )−1 for some special cases of the subset A, namely, when the quotients/ratios associated with the Euclidean divisions of the elements of A are endowed with a particular ordering. Moreover, for

150

155

6

160

165

170

V. B RAULT, J. C HIQUET AND C. L E´ VY-L EDUC

addressing any general problem, we rather solve systems involving XA> XA by means of a Cholesky factorization which is updated along the homotopy algorithm. These updates correspond to adding or removing an element at a time in A and are performed efficiently as stated in Lemma 5. These lemmas are the building blocks for our LARS implementation given in Algorithm 1, where we detail the leading complexity associated with each part. The global complexity is in O(mn2 + ms2 ) where m is the final number of steps in the while loop. These steps include all the successive additions and deletions needed to reach s, the final targeted number of active ˆ associated with the series of m variables. At the end of day, we have m block-wise prediction Y ˆ estimations of B(λ). Concerning the memory requirements, we only need to store the n × n data matrix Y once. Indeed, since we have at our disposal the analytic form of any sub matrix extracted from X > X , we never need to compute neither store this large n2 × n2 matrix. This paves the way for quickly processing data with thousands of rows and columns. Algorithm 1. Fast LARS for two-dimensional change-point estimation Input: data matrix Y, maximal number of active variables s. // Initialization Start with no change-point A ← ∅, Bˆ = 0 ˆ = X > Y with Lemma 3 Compute current correlations c while λ > 0 or |A| < s do

// O(n2 )

// Update the set of active variables ˆj = λ} Determine next change-point(s) by setting λ ← kˆ ck∞ and A ← {j : c > Update the Cholesky factorization of XA XA with Lemma 4 // O(|A|2 ) // Compute the direction of descent −1 >X Get the unormalized direction w ˜A ← X·A sign(ˆ cA ) ·A q > sign(ˆ Normalize wA ← αw ˜A with α ← 1/ w ˜A cA )

// O(|A|2 )

Compute the equiangular vector uA = XA wA and a = X > uA with Lemma 3 // O(n2 ) // Compute the direction step Find the maximal step preserving equicorrelation γin ←

min+ j∈Ac

n

λ−cj λ+cj α−aj , α+aj

o

n o ˆA /wA Find the maximal step preserving the signs γout ← min+ − B j∈A The direction step that preserves both is γˆ ← min(γin , γout ) ˆ←c ˆ − γˆ a and BˆA ← BˆA + γˆ wA accordingly Update the correlations c

// O(n)

// Drop variable crossing the zero line if γout < γin then n o Remove existing change-point(s) A ← A\ j ∈ A : Bˆj = 0 Downdate the Cholesky factorization of XA> XA ˆ recorded at each iteration. Output: Sequence of triplet (A, λ, B)

// O(|A|)

A Fast Approach for Multiple Change-point Detection in Two-dimensional Data

7

4.

N UMERICAL EXPERIMENTS 4·1. Synthetic data. The goal of this section is to assess the statistical and numerical performances of our methodology. We generated observations according to Model (1) where U is a symmetric blockwise constant matrix defined by   10101 0 1 0 1 0    1 0 1 0 1 µ?k,` k∈ 1,...,K ? +1 ,`∈ 1,...,K ? +1 =  (10)  , { } { }  1 2  01010 10101 and the Ei,j are zero mean i.i.d. Gaussian random variables of variance σ 2 where σ is in {1, 2, 5}. Some examples of data generated from this model with n = 500, K1? = K2? = 4 can be found in Figure 2 (Top). Statistical performances. For each value of σ in {1, 2, 5}, we generated 1000 matrices following the model described in Section 4·1 with (t?1,k )1≤k≤K1? = ([nk/(K1? + 1)] + 1)1≤k≤K1? and (t?2,k )1≤k≤K2? = ([nk/(K2? + 1)] + 1)1≤k≤K2? , where [x] denotes the integer part of x and b 2 for the different K1? = K2? = 4. Figure 2 (middle) displays the mean square error n−2 kB − Bk 2 samples (in gray) and the median of the mean square errors in red as a function of the number of active variables s defined in Algorithm 1. We can see from this figure that even in the high noise level case, the mean square error is small. Moreover, the ROC curves displayed in the bottom part of Figure 2 ensure that the change-points in rows: (t?1,k )1≤k≤K1? are properly retrieved with a very small error rate even in high noise level frameworks. The same results hold for the change-points in columns: (t?2,k )1≤k≤K2? but are not displayed since they are very similar. Numerical performances. We implemented Algorithm 1 in C++ using the library armadillo for linear algebra developed by Sanderson (2010) and also provide an interface to the R platform R Core Team (2015) through the R package blockseg which will be available from the Comprehensive R Archive Network (CRAN). All experiments were conducted on Linux workstation with Intel Xeon 2.4 GHz processor and 8 GB of memory. We generated data as in Model (10) for different values of n: n ∈ {100, 250, 500, 1000, 2500, 5000} and different values of the maximal number of activated variables: s ∈ {50, 100, 250, 500, 750}. The median runtimes obtained from 4 replications (+ 2 for warm-up) are reported in Figures 3. The left (resp. the right) panel gives the runtimes in seconds as a function of s (resp. of n). These results give experimental evidence for the theoretical complexity O(mn2 + ms2 ) that we established in Section 3 and thus for the computational efficiency of our approach: applying blockseg to matrices containing 107 entries takes less than 2 minutes for s = 750. Comparison to other methodologies. Since, to the best of our knowledge, no two-dimensional method are available, we propose to compare our approach to an adaptation of the CART procedure of Breiman et al. (1984) and to an adaptation of Harchaoui & L´evy-Leduc (2010) (HL) dedicated to univariate observations. We adapt the CART methodology by using the successive boundaries provided by CART as changepoints for the two-dimensional data. The associated ROC curve is displayed in red in Figure 5. For adapting the HL methodology, we apply it to each row of the data matrix and for each λ, we obtain the change-points of each row. The change-points appearing in the different rows are claimed to be change-points for the two-dimensional data either if they appear at least in one row (the associated ROC curve for this approach called HL1 is displayed in purple in Figure 5)

175

180

185

190

195

200

205

210

V. B RAULT, J. C HIQUET AND C. L E´ VY-L EDUC

8

σ=2

σ=5

Mean square error

data example

σ=1

True positive rate

sparsity level (s) 1.00

1.00

1.00

0.75

0.75

0.75

0.50

0.50

0.50

0.25

0.25

0.25

0.00

0.00 0.00

0.05

0.10

0.15

0.20

0.00 0.00

0.05

0.10

0.15

0.20

0.00

0.05

0.10

0.15

0.20

False positive rate Fig. 2. Top: Examples of matrices Y generated from the model described in Section 4·1. Middle: Mean square erb 22 for the different realizations in gray rors n−2 kB − Bk and the median of the mean square errors in red as a function of the number of nonzero elements in Bb for each scenario. Bottom: ROC curves for the estimated changepoints in rows.

215

or if they appear in ([n/2] + 1) rows (the associated ROC curve for this approach called HL2 is displayed in blue in Figure 5). Since the procedures HL1 and HL2 are much slower than ours, the ROC curves are displayed for matrices of size 250 × 250. In this comparison four different

A Fast Approach for Multiple Change-point Detection in Two-dimensional Data ●



timings (seconds, log-scale)



103



103

● ●

● ●















10

2



2

10

● ● ● ● ●

sample size (n) 100 250 500 1000 2500 5000

● ● ● ●

sparisty (s) 50 100 250 500 750





● ●

101

9









101



● ●





● ●







100









100































● ●





−1



−1

10

10 ●







200

400

600

0

1000

2000

3000

4000

5000

sample size (n)

sparsity level (s)

Fig. 3. Left: Computation time (in seconds) for various value of n as a function of the sparsity level s = |A| reached at the end of the algorithm. Right: Computation time (in seconds) as a function of sample size n.

types of matrices U were considered corresponding to the following (µ? k,` )     10101 10000         0 1 0 1 0 0 1 0 0 0 ?,(1) ?,(2)    µk,` = 1 0 1 0 1 , µk,` = 0 0 1 0 0 , 0 1 0 1 0 0 0 0 1 0 10101 00001     10000 0 −1 −1 −1 −1         0 1 1 1 1 −1 −1 0 −1 0  ?,(3) ?,(4)   = = µk,` 0 1 1 0 0 and µk,` −1 0 1 0 1  0 1 0 1 0 −1 −1 0 −1 0  01001 −1 0 1 0 1 with the corresponding matrices (Bi,j )(i,j)∈{t?

? ? ? 1,0 ,...,t1,4 }×{t2,0 ,...,t2,4 }

.



   1 −1 1 −1 1 1 −1 0 0 0         −1 2 −2 2 −2 −1 2 −1 0 0  (1) (2)   Bi,j =  Bi,j =   1 −2 2 −2 2  ,  0 −1 2 −1 0  −1 2 −2 2 −2  0 0 −1 2 −1 1 −2 2 −2 2 0 0 0 −1 2     0 −1 0 0 0 1 −1 0 0 0         −1 1 1 −1 1 −1 2 0 0 0  (4) (3)    Bi,j =  0 0 0 −1 0  and Bi,j =  0 1 0 0 0   0 −1 0 0 0  0 0 −1 2 −1 0 0 0 −1 2 0 1 0 0 0 ?,(1)

?,(2)

The first configuration (µk,` ) is the one used in Equation (10). The second one (µk,` ) is close to the block diagonal model which correspond to the so-called cis-interactions in the human HiC ?,(3) ?,(4) experiments. The third (µk,` ) and fourth (µk,` ) cases are chosen because some rows only

220

10

V. B RAULT, J. C HIQUET AND C. L E´ VY-L EDUC

contain one non zero parameter in the associated matrix B, which makes the corresponding change-points more difficult to detect. The corresponding Y matrices are displayed in Figure 4 when the variance of the addtive noise is equal to 1. µ?,(2)

µ?,(3)

µ?,(4)

Fig. 4. From left to right: Examples of simulated matrices Y for µ?,(1) , µ?,(2) and µ?,(3) with σ = 1. 225

230

235

240

245

250

255

We can see from Figure 5 that our method outperforms the other ones and is less sensitive to the shape of the matrix U. Model selection. In the previous experiments we did not need to explain how to choose the number of estimated change-points since we used ROC curves for comparing the methodologies. However, in real data applications, it is necessary to propose a methodology for estimating the number of changepoints. This is what we explain in the following. 2 In practice, we take s = Kmax where Kmax is an upper bound for K1? and K2? . For choosing the final change-points we shall adapt the well-known stability selection approach devised by Meinshausen & B¨uhlmann (2010). More precisely, we randomly choose M times n/2 columns 2 and n/2 rows of the matrix Y and for each subsample we select s = Kmax active variables. Finally, after the M data resamplings, we keep the change-points which appear a number of times larger than a given threshold. By the definition of the change-points given in (5), a changepoint b t1,k or b t2,` may appear several times in a given set of resampled observations. Hence, the score associated with each change-point corresponds to the sum of the number of times it appears in each of the M subsamplings. To evaluate the performances of this methodology, we generated observations according to the model defined in (1) and (10) with s = 225 and M = 100. The results are given in Figure 6 which displays the score associated to each change-point for a given matrix Y (top). We can see from the top part of Figure 6 some spurious change-points appearing near from the true change-point positions. In order to identify the most representative change-point in a given neighborhood, we keep the one with the largest score among a set of contiguous candidates. The result of such a post-processing is displayed in the bottom part of Figure 6 and in Figure 7. More precisely the boxplots associated to the estimation of K1? (resp. the histograms of the estimated change-points in rows) are displayed in the bottom part of Figure 6 (resp. in Figure 7) for different values of σ and different thresholds (thresh) expressed as a percentage of the largest score. We can see from these figures that when thresh is in the interval [20, 40] the number and the location of the change-points are very well estimated even in the high noise level case. In order to further assess our methodology we generated observations following (1) with n = 1000 and K1? = K2? = 100 where we used for the matrix U the same shape as the one of the matrix µ? given in (10) except that K1? = K2? = 100. In this framework, the proportion

11

A Fast Approach for Multiple Change-point Detection in Two-dimensional Data

True positive rate

σ=1

σ=2

σ=5

σ = 10

1.00

1.00

1.00

1.00

0.75

0.75

0.75

0.75

0.50

0.50

0.50

0.50

0.25

0.25

0.25

0.25

0.00

0.00 0.00

0.25

0.50

0.75

1.00

0.00 0.00

0.25

0.50

0.75

1.00

0.00 0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

True positive rate

False positive rate 1.00

1.00

1.00

1.00

0.75

0.75

0.75

0.75

0.50

0.50

0.50

0.50

0.25

0.25

0.25

0.25

0.00

0.00 0.00

0.25

0.50

0.75

1.00

0.00 0.00

0.25

0.50

0.75

1.00

0.00 0.00

0.25

0.50

0.75

1.00

True positive rate

False positive rate 1.00

1.00

1.00

1.00

0.75

0.75

0.75

0.75

0.50

0.50

0.50

0.50

0.25

0.25

0.25

0.25

0.00

0.00 0.00

0.25

0.50

0.75

1.00

0.00 0.00

0.25

0.50

0.75

1.00

0.00 0.00

0.25

0.50

0.75

1.00

True positive rate

False positive rate 1.00

1.00

1.00

1.00

0.75

0.75

0.75

0.75

0.50

0.50

0.50

0.50

0.25

0.25

0.25

0.25

0.00

0.00 0.00

0.25

0.50

0.75

1.00

0.00 0.00

0.25

0.50

0.75

1.00

0.00 0.00

0.25

0.50

0.75

1.00

False positive rate Fig. 5. ROC curves for the estimated change-points in rows for our method (in green), HL1 (in purple), HL2 (in blue) and CART (in red). Each row is associated to a matrix: first row for µ?,(1) , second row for µ?,(2) , µ?,(3) and µ?,(4) .

of change-points is thus ten times larger than the one of the previous case. The corresponding results are displayed in Figures 8, 9 and 10. We can see from the last figure that taking a threshold equal to 20% provides the best estimations of the number and of the change-point positions. This threshold corresponds to the lower bound of the thresholds interval obtained in the previous configuration.

260

V. B RAULT, J. C HIQUET AND C. L E´ VY-L EDUC

12

σ=1

σ=2



20 15

25

80

20

60

15

● ●



10

● ●



● ●

5



5



● ●



20

● ●











40





10

σ=5





● ●

● ●

● ●

10

20

30

40

0

50

0



10

20

30

40

50

10

20

30

40

50

Fig. 6. Top: Scores associated to each estimated changepoints for different values of σ; the true change-point positions in rows and columns are located at 101, 201, 301 and 401. Bottom: Boxplots of the estimation of K1? for different values of σ and thresh after the post-processing step.

50%

σ=1 400

400

300

300

200

200

200

100

100

100

0

0

40% 30%

100

200

300

400

400 300

0 0

100

200

300

400

500

500

400

400

300

300

200

200

200

100

100

100

0

0 0

100

200

300

400 500

400

400

300

300

200

200

100

100

0

0 100

200

300

400 500

400

400

300

300

200

200

100

100

0 100

200

300

400

500 400

400

300

300

200

200

100

100

0

200

300

400

100

200

300

400

400

0

100

200

300

400

0

100

200

300

400

0

100

200

300 200 100 0 100

200

300

400

400 300 200 100 0 100

200

300

400

300

400

400 300 200 100

0 0

300

400

0 500

200

0 100

0 0

100

300

0

500

0

400

0

500

0

10%

σ=5

500

0

20%

σ=2

500

0 0

100

200

300

400

0

100

200

Fig. 7. Barplots of the estimated change-points for different variances (columns) and different thresholds (rows) for the model µ?,(1) .

300

400

A Fast Approach for Multiple Change-point Detection in Two-dimensional Data σ=1

σ=2

σ=5 200

200 150



150

150

● ● ● ●

● ● ● ● ● ● ● ● ● ●

100



● ● ● ●

● ● ●

100

● ● ●

● ●





● ● ● ● ● ● ● ● ● ● ●

100

● ● ● ● ● ● ● ●



● ● ● ● ● ● ● ● ●



● ● ● ● ● ● ● ● ● ● ● ●

50

50



● ●

● ● ● ● ● ● ● ● ● ● ●

● ●

0 30

40

● ● ● ● ● ●



40

50

● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

0 20

● ● ● ● ●

● ●

● ● ● ● ● ● ● ● ● ● ● ●

10



50

● ●

0

50

10

20

30

40

50

10

20

30

Fig. 8. Boxplots of the estimation of K1? for different values of σ and thresholds (thresh) after the post-processing step.

50%

σ=1

σ=2 400

400

300

300

300

200

200

200

100

100

100

0

0 0

200

400

600

800

40%

400

600

800

0

300

300

300

200

200

200

100

100

100

0

0 200

400

600

800

200

400

600

800

0

400

300

300

300

200

200

200

100

100

100

0

0 0

250

500

750

250

500

750

400

400

300

300

300

200

200

200

100

100

100

0

0 0

250

500

750

250

500

750

1000

400

400

300

300

300

200

200

200

100

100

100

0

0 0

250

500

750

1000

600

800

250

500

750

0

250

500

0

250

500

0

250

500

750

0 0

400

400

0 0

400

200

0 0

400

400

30%

200

400

0

20%

0 0

400

400

10%

σ=5

400

750

1000

0 0

250

500

750

1000

Fig. 9. Barplots of the estimated change-points for different variances (columns) and different thresholds (rows) in the case where n = 1000 and K1? = K2? = 100.

750

1000

13

V. B RAULT, J. C HIQUET AND C. L E´ VY-L EDUC

14

50%

σ=1

σ=2 400

400

300

300

300

200

200

200

100

100

100

0

0 0

50

100

150

200

250

40%

100

150

200

250

400 300

300

200

200

200

100

100

100

0

0 50

100

150

200

250

50

100

150

200

250

400

300

300

300

200

200

200

100

100

100

0

0 0

50

100

150

200

250

50

100

150

200

250

400

300

300

300

200

200

200

100

100

100

0

0 0

50

100

150

200

250

50

100

150

200

250

400

300

300

300

200

200

200

100

100

0 50

100

150

200

250

150

200

250

0

50

100

150

200

250

0

50

100

150

200

250

0

50

100

150

200

250

0

50

100

150

200

250

100

0 0

100

0 0

400

400

50

0 0

400

400

0

0 0

400

400

30%

50

300

0

20%

0 0

400

400

10%

σ=5

400

0 0

50

100

150

200

250

Fig. 10. Zoom of the barplots of Figure 9.

A Fast Approach for Multiple Change-point Detection in Two-dimensional Data

15

4·2. Application to HiC data In this section, we applied our methodology to publicly available data (http:// chromosome.sdsc.edu/mouse/hi-c/download.html) already studied by Dixon et al. (2012). More precisely, we studied the interaction matrices of Chromosomes 1 and 19 of the mouse cortex at a resolution 40 kb and we compared the number and the location of the estimated change-points found with our approach with those obtained by Dixon et al. (2012) on the same data since no ground truth is available. We display in Figure 11 the number of change-points in rows found by our approach as a function of the threshold thresh used in our adaptation of the stability selection approach presented in the previous section. We also display in this figure a red line corresponding to the number of change-points found by Dixon et al. (2012). 600





270







160



500

265



● ● ● ●

● ●

120



400

● ● ●

● ●

● ● ●



80



300

● ● ●

● ● ●

● ● ●



● ●

200



40 0.0

0.5

1.0

1.5

2.0

0

5

10

15

Fig. 11. Number of change-points in rows found by our approach as a function of the threshold (’•’) in % for the interaction matrices of Chromosome 1 (left) and Chromosome 19 (right) of the mouse cortex. The red line corresponds to the number of change-points found by Dixon et al. (2012).

We also compute the two parts of the Hausdorff distance for the change-points in rows which is defined by        d btB , bt = max d1 btB , bt , d2 btB , bt , (11) where bt and btB are the change-points in rows found by our approach and Dixon et al. (2012), respectively. In (11), d1 (a, b) = sup inf |a − b| ,

(12)

d2 (a, b) = d1 (b, a) .

(13)

b∈b a∈a

More precisely, Figure 12 displays the boxplots of the d1 and d2 parts of the Hausdorff distance without taking the supremum in orange and blue, respectively. We can observe from Figure 12 that some differences indeed exist between the segmentations produced by the two approaches but that the boundaries of the blocks are quite close when the number of estimated change-points are the same, which is the case when thresh = 1.8% (left) and 10% (right). In the case where the number of estimated change-points are on a par with those of Dixon et al. (2012), we can see from Figure 13 that the change-points found with our strategy present a lot of similarities with those found by the HMM based approach of Dixon et al. (2012).

275

280

V. B RAULT, J. C HIQUET AND C. L E´ VY-L EDUC

16

















● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

















200



● ●



● ●

300

● ● ●

● ● ●

150 ● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●



● ●

200





● ●

● ●







● ●

● ●

















● ●

● ●

● ●

● ●

● ●

● ●













● ● ● ●

● ●

100

● ● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●







































● ●

● ●

● ● ●

● ● ●















● ●



● ●

100



● ●



● ●

● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ●



● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●



● ●

● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●



● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●



● ●



● ● ● ● ● ● ● ● ● ● ● ● ● ●



● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●



● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●



● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●



● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ●

● ● ● ●

● ● ● ● ● ●

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ●



● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ●

● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ●

● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ●

● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ●



● ● ●

● ●

50

● ● ●



● ● ● ●

● ● ● ● ● ● ● ● ●









● ● ● ●

● ● ●



● ●

● ● ●

● ●

● ●

● ● ●

● ●

● ●

● ●







● ● ●

● ● ●







● ● ● ● ●

● ● ●

0

● ●





● ● ● ● ● ● ● ● ●



● ● ●

● ● ● ●

0 0

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Fig. 12. Boxplots for the infimum parts of the Hausdorff distances d1 (orange) and d2 (blue) between the changepoints found by Dixon et al. (2012) and our approach for the Chromosome 1 (left) and the Chromosome 19 (right) of the mouse cortex for the different thresholds in %.

Fig. 13. Topological domains detected by Dixon et al. (2012) (upper triangular part of the matrix) and by our method (lower triangular part of the matrix) from the interaction matrix of Chromosome 1 (left) and Chromosome 19 (right) of the mouse cortex with a threshold giving 232 (resp 85) estimated change-points in rows and columns.

285

290

Our method also gives access to the estimated change-point positions for different values of the thresholds. Figure 14 displays the different change-point locations that can be obtained for these different values of the threshold. However, contrary to our method, the approach of Dixon et al. (2012) can only deal with binned data at the resolution of several kilobases of nucleotides. The very low computational burden of our strategy paves the way for processing data collected at a very high resolution, namely at the nucleotide resolution, which is one of the main current challenges of molecular biology.

A Fast Approach for Multiple Change-point Detection in Two-dimensional Data 50

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ●●● ● ●●● ● ●●● ● ●●● ● ●●● ● ●●● ● ●●● ● ●●● ● ●●● ● ●●● ● ●●● ● ●●● ● ●●● ● ●●● ● ●●● ● ● ●●● ●● ● ●●● ●● ● ●●● ●● ● ●●● ●● ● ●●● ●● ● ●●● ●● ● ●●● ●● ● ●●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ●● ● ● ● ●

40

30

20

10

0

50

40

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ●●●● ● ● ●● ● ●●●●●● ● ● ●● ● ● ●●● ●●●● ● ● ●● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ●●● ●● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●● ● ●● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ●● ●●●● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ● ●●● ●●●● ● ● ● ●● ●● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ●●● ●● ●●●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ●●● ● ●● ● ●●●● ●●● ● ●●● ●●● ● ●●● ● ● ● ● ● ●● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ●● ● ● ●● ● ●●● ● ●●● ●●● ●●● ● ● ● ● ● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ●● ● ● ● ●● ●●● ●●● ●●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ●● ● ●● ● ● ●●● ●● ● ●●● ● ● ● ● ● ● ●● ●●● ●●● ● ● ●●● ● ●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ●●● ●● ●● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ●● ●●●

0

1000

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ●●● ● ● ● ●● ● ●● ● ●●● ● ● ● ● ●● ● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ●● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●●●● ● ●● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●●● ● ● ●●●●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ●● ● ● ● ● ● ● ●●●●● ● ● ● ●

2000

30 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●●● ● ● ●●● ● ● ●●● ● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ●● ● ●● ●● ●● ● ● ● ● ● ● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ●● ● ●● ●● ● ●● ●● ●● ● ● ● ● ● ●●● ● ● ●● ● ●● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●

3000

● ● ● ● ● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ●● ●● ● ● ●● ● ●● ●● ● ● ●● ● ●● ●● ● ● ●● ● ●● ●● ● ● ●● ● ●● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ●● ●● ●● ● ● ● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●●● ● ● ● ●●● ● ● ●● ●● ● ● ●● ● ●● ●● ● ●● ●●● ● ● ● ●● ●●● ● ● ● ●● ●● ● ● ●● ● ●● ●● ● ● ●●● ●●● ● ● ●● ●●● ● ●●● ●● ● ●● ● ●● ●● ● ● ●● ● ● ●● ●● ●● ●● ● ● ●●● ● ●●●● ● ● ● ●● ● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ●

20

10

0

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ●● ● ● ●● ● ●● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ●● ● ●●● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ●●●● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ●● ● ● ● ●● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ●● ● ●● ● ● ●●● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ● ●●● ●● ●● ● ● ● ● ●● ● ●● ● ● ●●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ●● ●● ● ● ● ● ●● ● ●● ● ● ●●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ●● ●● ●● ● ● ● ● ●● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ● ● ● ● ●● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ●● ● ●●● ●● ● ● ● ● ●● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ●● ● ●●● ●●● ●● ● ● ●● ● ●● ● ● ●●● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ● ●● ● ●●● ●●● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●●●● ● ●● ●● ● ● ● ●●● ● ● ● ●● ● ●●● ●●● ●● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ● ●●●● ●● ●●● ●● ● ● ● ●●● ● ● ● ●● ● ●●● ●●● ●● ● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●●●● ●● ●●● ●● ● ● ● ●●● ● ● ● ●● ● ●●● ●●● ●● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ●● ● ●●●● ●● ●●● ●● ● ●●● ●●● ● ● ● ●● ● ●●● ●● ●● ●● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ●● ● ● ●● ● ●●●● ●● ●●● ●● ● ●●●● ●●● ● ● ● ●● ● ●●● ●● ●● ●● ● ● ● ●●●● ● ●● ● ●● ● ● ●● ● ● ●● ●●●●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ●●●● ●● ● ●●● ● ● ●● ● ●●●● ●●● ●● ●●● ● ●●● ●● ●● ●● ● ● ● ●●●● ● ●● ● ●● ● ● ●● ● ● ●● ●●●●● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ●●●●● ●● ● ● ●●● ● ●● ●● ● ●●●● ●●● ●● ● ●●●●●● ●● ●● ●● ● ● ● ●●●● ● ●● ● ●● ● ● ●● ● ● ●● ●●●●●● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ●●●●● ● ●● ● ● ●●● ● ●● ●● ●● ● ●●● ●●● ● ● ●●●●●● ●●● ●● ●● ● ● ● ● ●●●● ● ●● ● ● ●● ● ● ●● ● ● ●● ●●●●●● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ●●●● ● ●● ● ● ●●● ● ●● ●● ●● ● ●●● ●●● ● ● ● ●●●●●● ●●● ●● ●● ● ● ● ● ●●●● ● ●● ● ● ●●● ● ● ●● ● ● ●● ●●●●●● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ●●●●● ● ● ●● ● ● ●●● ● ●● ●● ●● ● ●●● ●●● ● ● ● ●●●●●● ●●● ●● ●● ● ● ● ● ●●●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ●●●●●●● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ● ●● ●●●●● ● ● ●● ● ● ●●● ●●●● ●● ●● ● ●●● ●●● ● ● ● ● ●●●●●● ●●●● ●● ●● ● ● ● ● ●●●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●●●●●●● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ●●●●● ● ● ●● ● ●● ● ● ●●●● ● ●● ●● ● ●●● ●●● ● ● ● ●●●●●● ●●●● ●● ●● ● ● ● ● ●●●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●●●●●●● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ● ● ●● ● ●● ●●●● ●●● ●● ● ●● ● ● ●●●● ●●●● ●● ● ●●● ●●● ● ● ● ●●●●●● ● ● ●●● ● ●● ●● ● ● ● ● ● ●●●● ● ●● ●● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●●●●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●●●● ●●● ●● ● ●● ● ●●●● ●●● ●●●● ●● ● ●●● ●●● ●● ● ● ●●●●●●● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ●●●● ● ●● ●● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●●●●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●●●● ●●● ●● ● ● ●●●●●● ●●●●●● ●●● ● ● ●●● ●●● ●● ● ●● ●●●●●●● ● ● ● ●●● ● ●●● ● ● ●● ● ● ● ●● ● ●●●●● ● ●● ●● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ●●●●●● ● ● ● ●●● ● ●●● ●●● ●●● ● ● ● ●● ●● ●● ● ●●● ● ● ●●● ●● ● ●●●●●● ●●●●●● ● ●●● ● ● ●●● ●● ● ● ●● ●●●●●●● ● ● ● ●● ●● ● ●●● ● ● ●● ● ● ● ●● ●●●●●●● ● ●● ●● ● ●●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●●●●●● ● ● ● ●● ● ● ●●● ● ●●● ●●● ● ● ●● ●●● ●● ●● ● ●● ● ●●● ●●● ● ●●●●●● ●●●●●● ● ●● ●●●●●● ●● ● ●●● ● ●●●●●●● ● ● ● ●● ●● ● ●●● ● ● ●● ● ● ● ●● ●●●●●●● ● ●● ●● ● ●●● ●● ● ●● ● ● ● ●● ●● ●● ● ● ●●●●●● ● ● ● ●● ● ● ●●●● ● ●●● ●●● ●● ● ●● ● ● ●● ● ●● ●● ●● ● ●●●● ●●● ● ●●●●●● ●●●● ●●● ● ●●●●●●●●● ●● ●●● ●●● ● ●●●●●●● ● ● ● ●● ●● ● ●●● ● ● ●● ● ● ● ●● ●●●●●●● ● ●● ●● ●●●● ●●● ●● ● ●● ● ● ● ●● ●●●● ●●●●● ●● ● ● ●● ● ● ●●●● ● ●●● ●●● ● ●●● ● ● ● ●● ● ●● ●● ● ●●●● ●● ● ●●●● ●●● ●●● ●● ●●● ●●●●●●●●●●●● ●● ●●● ●●● ●●●●●●● ●● ● ● ●● ●● ●●● ● ● ●● ● ● ● ●● ● ● ●● ●●●●● ● ●● ●● ●●●●● ●● ●● ●● ● ●● ● ● ●● ● ●●● ●● ●● ●●●●● ●● ● ● ●● ● ●●●●●●● ●●●●●●● ●●●● ●● ● ●● ● ●● ●● ●●● ● ● ●●● ●●● ● ●●● ●● ●●● ●● ● ●● ●●●●●●●● ●●●● ●●●● ●●●●●●● ●● ● ● ●● ●● ●●● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ●● ● ●● ●●● ●●● ● ●●● ●● ●● ● ●● ● ● ●● ● ●●●● ●●● ●● ●● ● ●● ●● ●● ● ● ●●● ● ●●● ● ●●● ● ●● ●●●● ●●●● ●●●●● ●● ● ● ●● ● ●●● ● ● ● ●●● ● ● ●● ● ●●● ● ●●● ●● ● ●● ● ●● ●●●●●● ●●● ●● ●●●●● ●●●● ● ● ● ● ● ●●● ●●● ● ● ● ●●● ●●●● ● ●● ● ●● ● ●● ●●● ●● ●● ● ●●● ●●● ●●● ●● ● ●● ● ● ●●●● ●●● ●● ●● ● ●● ●● ● ● ●● ● ● ●●● ● ● ●●●● ● ●●●●●●● ●● ● ●●●●●● ●●●●●● ● ● ●● ● ●●●● ● ● ●● ●● ● ● ●● ● ●●●●● ●● ●● ●●●●●● ●●● ●● ●● ●● ●●● ● ● ● ● ● ●●● ●●● ● ● ●●●● ●●●● ● ●●●● ●● ● ●●● ●●●● ●● ●● ●●● ●●●● ●●● ●●●● ● ●●●● ● ●●● ●● ●● ● ●● ●● ● ● ●● ● ● ●●● ● ● ● ●●●● ●●●●●● ●●● ● ●● ● ●●●●●● ● ●●●● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ●● ●● ● ●●●●● ●●●●●●● ●●● ● ● ● ●●●●●●●● ●● ●●●●●●● ● ● ●● ● ●●● ●● ● ●● ●●●●●●●●● ● ●●●● ● ●●● ●●● ●●● ● ●●● ●● ● ●● ●● ● ● ●● ●● ● ● ● ●●● ● ●● ●● ●● ●●● ●●●● ● ● ● ●●● ● ●● ● ● ● ● ● ●●● ● ●●● ● ●● ● ● ● ● ●●●● ●●● ● ● ● ● ●●●●● ● ● ●●● ● ● ●●● ●● ● ●●●●● ●●●● ●●●●●● ● ● ●●●● ● ● ●●● ●●● ●●● ● ● ● ●● ● ●● ●●● ● ●●● ●●● ●●● ● ●●● ● ●● ● ●●● ●● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ●● ● ● ● ●● ● ●● ● ● ●●● ● ●● ● ●●●●●● ● ● ●●● ●● ● ● ● ●● ●● ●●●● ●●●

4000

500

17

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ●● ●● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ●● ●● ● ● ●● ● ●● ●● ●● ● ●● ● ●● ● ●● ●● ● ● ●● ● ●● ● ●●● ●● ● ● ●● ● ●● ● ●●●●● ●● ● ● ●● ● ●● ● ● ●●●●● ● ●● ● ● ●● ● ●● ● ● ● ● ●●●● ● ●● ● ● ●● ● ●●●● ● ●● ● ● ● ● ●● ●●● ● ●●● ● ● ●● ● ●●●● ● ●●●● ● ●●● ●●● ● ●● ●●● ● ● ●● ●● ● ●●●●●●●● ● ●●●●● ● ●● ●●● ● ● ●● ●● ●●●●●●●●●●●● ●● ●● ●●●● ●●●●● ●●● ●●● ●● ●

1000

1500

Fig. 14. Plots of the estimated change-points locations (xaxis) for different thresholds (y-axis) from 0.5% to 50% by 0.5%. The estimated change-point locations associated to threshold which are multiples of 5% are displayed in red.

5. P ROOFS 5·1. Proofs of statistical results 2 Proof of Lemma 1. A necessary and sufficient condition for a vector Bb in Rn to minimize the P 2 P 2 function Φ defined by: Φ(B) = ni=1 (Yi − (X B)i )2 + λn ni=1 |Bi |, is that the zero vector in 2 Rn belongs to the subdifferential of Φ at Bb that is:  λn b X > (Y − X B) = , 2 j   > b ≤ λn , X (Y − X B) 2 j 

if Bbj 6= 0, if Bbj = 0.

Using that X > Y = (T ⊗ T)> Y = (T> ⊗ T> )Y = Vec(T> YT), where (T> YT)i,j = Pn Pn b b  `=j Yk,` , and that U = X B, Lemma 1 is proved. k=i Proof of Lemma 2. Note that   sX n −1 En,j ≥ xn  ≤ P  max (sn − rn )−1 1≤rn nδn + P t2,k − t?2,k > nδn , (14) 1≤k≤K1

310

1≤k≤K2

it is enough to prove that both terms in (14) tend to zero for proving (6). We shall only prove that the second term in the rhs of (14) tends to zero, the proof being the same for the first term. Since PK2? P(max1≤k≤K2? |b P(|b t2,k − t?2,k | > nδn ), it is enough to prove that t2,k − t?2,k | > nδn ) ≤ k=1 for all k in {1, . . . , K2? }, P(An,k ) → 0, where An,k = {|b t2,k − t?2,k | > nδn }. Let Cn be defined by   ? ? Cn = max ? |b (15) t2,k − t2,k | < Imin,2 /2 . 1≤k≤K2

315

It is enough to prove that, for all k in {1, . . . , K2? }, P(An,k ∩ Cn ) and P(An,k ∩ Cn ) tend to 0, as n tends to infinity. Let us first prove that for all k in {1, . . . , K2? }, P(An,k ∩ Cn ) → 0. Observe that (15) implies t2,k ≤ t?2,k . t2,k < t?2,k+1 , for all k in {1, . . . , K2? }. For a given k, let us assume that b that t?2,k−1 < b Applying (7) and (8) with rj + 1 = n, qj + 1 = b t2,k on the one hand and rj + 1 = n, qj + 1 = t?2,k on the other hand, we get that t?2,k −1 t?2,k −1 X X Yn,j − Ubn,j ≤ λn . j=bt2,k j=b t2,k P P Hence using (1), the notation: E([a, b]; [c, d]) = bi=a dj=c Ei,j and the definition of Ub given by Lemma 1, we obtain that ? t2,k )(µ?K ? +1,k − µ bK1? +1,k+1 ) + E(n; [b t2,k , t?2,k − 1]) ≤ λn , (t2,k − b 1

which can be rewritten as follows ? t2,k )(µ?K ? +1,k − µ?K ? +1,k+1 ) + (t?2,k − b t2,k )(µ?K ? +1,k+1 − µ bK1? +1,k+1 ) (t2,k − b 1

320

1

1

+E(n; [b t2,k , t?2,k − 1]) ≤ λn .

19

A Fast Approach for Multiple Change-point Detection in Two-dimensional Data Thus, P(An,k ∩ Cn ) ≤ P(λn /(nδn ) ≥ |µ?K ? +1,k − µ?K ? +1,k+1 |/3) 1

1

+ P({|µ?K ? +1,k − µ bK1? +1,k+1 | ≥ |µ?K ? +1,k − µ?K ? +1,k+1 |/3} ∩ Cn ) 1

1

1

+ P({|E(n; [b t2,k , t?2,k − 1])|/|t?2,k − b t2,k | ≥ |µ?K ? +1,k − µ?K ? +1,k+1 |/3} ∩ An,k ). (16) 1

325

1

? /3, v = nδ and The first term in the rhs of (16) tends to 0 by (A2). By Lemma 2 with xn = Jmin n n (A3) the third term in the rhs of (16) tends to 0. Applying Lemma 1 with rj + 1 = n, qj + 1 = b t2,k on the one hand and rj + 1 = n, qj + 1 = (t?2,k + t?2,k+1 )/2 on the other hand, we get that

? ? )/2−1 (t?2,k +t?2,k+1 )/2−1 (t2,k +t2,k+1 X X b Yn,j − Un,j ≤ λn . j=t? j=t? 2,k

2,k

Since b t2,k ≤ t?2,k , Ubn,j = µ bK1? +1,k+1 within the interval [t?2,k , (t?2,k + t?2,k+1 )/2 − 1] and we get that bK1? +1,k+1 |/2 ≤ λn + |E(n, |[t?2,k , (t?2,k + t?2,k+1 )/2 − 1])|. (t?2,k+1 − t?2,k )|µ?K ? +1,k+1 − µ 1

Therefore the second term in the rhs of (16) can be bounded by   P λn ≥ (t?2,k+1 − t?2,k )|µ?K ? +1,k − µ?K ? +1,k+1 |/12 1 1   ? ? −1 ? ? + P (t2,k+1 − t2,k ) E(n, , |[t2,k , (t2,k + t?2,k+1 )/2 − 1]) ≥ |µ?K ? +1,k − µ?K ? +1,k+1 |/6 1

1

By Lemma 2 and (A2), (A3) and (A4), we get that both terms tend to zero as n tends to infinity. We thus get that P(An,k ∩ Cn ) → 0, as n tends to infinity. Let us now prove that P(An,k ∩ Cn ) tend to 0, as n tends to infinity. Observe that

330

P(An,k ∩ Cn ) = P(An,k ∩ Dn(`) ) + P(An,k ∩ Dn(m) ) + P(An,k ∩ Dn(r) ), where  Dn(`) = ∃p ∈ {1, . . . , K ? },  Dn(m) = ∀k ∈ {1, . . . , K ? },  Dn(r) = ∃p ∈ {1, . . . , K ? },

b t2,p ≤ t?2,p−1 ∩ Cn , t?2,k−1 < b t2,k < t?2,k+1 ∩ Cn , b t2,p ≥ t?2,p+1 ∩ Cn .

335

Using the same arguments as those used for proving that P(An,k ∩ Cn ) → 0, we can prove that (m) (`) P(An,k ∩ Dn ) → 0, as n tends to infinity. Let us now prove that P(An,k ∩ Dn ) → 0. Note that K2? −1

P(Dn(`) )



X

? ? P({t?2,k − b t2,k > Imin /2} ∩ {b t2,k+1 − t?2,k > Imin /2})

k=1 ? +P(t?2,K ? − b t2,K2? > Imin /2). 2

(17)

V. B RAULT, J. C HIQUET AND C. L E´ VY-L EDUC

20

Applying (7) and (8) with rj + 1 = n, qj + 1 = b t2,k on the one hand and rj + 1 = n, qj + 1 = t?2,k on the other hand, we get that t?2,k −1 t?2,k −1 X X Yn,j − Ubn,j ≤ λn . j=bt2,k j=b t2,k 340

Thus, ? ? P({t?2,k − b t2,k > Imin /2} ∩ {b t2,k+1 − t?2,k > Imin /2}) ? ? ≤ P(λn /(nδn ) ≥ |µK ? +1,k − µK ? +1,k+1 |/3) 1

1

? +P({|µ?K ? +1,k − µ bK1? +1,k+1 | ≥ |µ?K ? +1,k − µ?K ? +1,k+1 |/3} ∩ {b t2,k+1 − t?2,k > Imin /2}) 1

1

1

+P({|E(n; [b t2,k , t?2,k − 1])|/(t?2,k − b t2,k ) ≥ |µ?K ? +1,k − µ?K ? +1,k+1 |/3} 1

∩{t?2,k

1

? −b t2,k > Imin /2}).

(18)

Using the same arguments as previously we get that the first and the third term in the rhs of (18) tend to zero as n tends to infinity. Let us now focus on the second term of the rhs of (18). Applying (7) and (8) with rj + 1 = n, qj + 1 = b t2,k+1 on the one hand and rj + 1 = n, qj + 1 = t?2,k on the other hand, we get that b bt2,k+1 −1 t2,k+1 −1 X X Ubn,j ≤ λn . Yn,j − j=t? j=t? 2,k

2,k

Hence, t2,k+1 − 1])| ≤ λn . bK1? +1,k+1 )(b t2,k+1 − t?2,k ) + E(n, [t?2,k ; b |(µ?K ? +1,k − µ 1

The second term of the rhs of (18) is thus bounded by ? P({λn (b t2,k+1 − t?2,k )−1 ≥ |µ?K ? +1,k − µ?K ? +1,k+1 |/6} ∩ {b t2,k+1 − t?2,k > Imin /2}) 1

1

t2,k+1 − 1])| ≥ |µ?K ? +1,k − µ?K ? +1,k+1 |/6} +P({(b t2,k+1 − t?2,k )−1 |E(n, [t?2,k ; b 1

1

? /2}), ∩{b t2,k+1 − t?2,k > Imin

which tend to zero by Lemma 2, (A2), (A3) and (A4). It is thus proved that the first term in the rhs of (17) tends to zero as n tends to infinity. The same arguments can be used for addressing ? /2. the second term in the rhs of (17) since b t2,K2? +1 = n and hence b t2,K2? +1 − t?2,K ? > Imin 2

(r)

345

350

Using similar arguments, we can prove that P(An,k ∩ Dn ) → 0, which concludes the proof of Proposition 1.  5·2. Proofs of computational lemmas Proof of Lemma 3.. Consider X v for instance (the same reasoning applies for X > v): we have X v = (T ⊗ T)v = Vec(TVT> ) where V is the n × n matrix such that Vec(V) = v. Because of its triangular structure, T operates as a cumulative sum operator on the columns of V. Hence, the computations for the jth column is done by induction in n operations. The total cost for the n columns of TV is thus n2 . Similarly, right multiplying a matrix by T> boils down to perform cumulative sums over the rows. The final cost for X v = Vec(TVT> ) is thus 2n2 in case of a dense matrix V, and possibly less when V is sparse. 

A Fast Approach for Multiple Change-point Detection in Two-dimensional Data Proof of Lemma 4.. Let A = {a1 , . . . , aK }, then   = (T ⊗ T)> X >X •,A (T ⊗ T)•,A , A,A

21 355

(19)

where (T ⊗ T)•,A (resp. (T ⊗ T)> •,A ) denotes the columns (resp. the rows) of T ⊗ T lying in A. For j in A, let us consider the Euclidean division of j − 1 by n given by: (j − 1) = nqj + rj , then (T ⊗ T)•,j = T•,qj +1 ⊗ T•,rj +1 . Hence, (T ⊗ T)•,A is a n2 × K matrix defined by: i h (T ⊗ T)•,A = T•,qa1 +1 ⊗ T•,ra1 +1 ; T•,qa2 +1 ⊗ T•,ra2 +1 ; . . . ; T•,qaK +1 ⊗ T•,raK +1 . Thus, (T ⊗ T)•,A = T•,QA ∗ T•,RA , where QA = {qa1 + 1, . . . , qaK + 1}, RA = {ra1 + 1, . . . , raK + 1} 

and ∗ denotes the Khatri-Rao product, which is defined as follows for two n × n matrices A and B A ∗ B = [a1 ⊗ b1 ; a2 ⊗ b2 ; . . . an ⊗ bn ] , where the ai (resp. bi ) are the columns of A (resp. B). Using Liu & Trenkler (2008), we get that     > > (T ⊗ T)> •,A (T ⊗ T)•,A = T•,QA T•,QA ◦ T•,RA T•,RA , where ◦ denotes the Hadamard or entry-wise product. Observe that by definition of T, (T> T ) = n − (qak ∨ qa` ) and (T> •,RA T•,RA )k,` = n − (rak ∨ ra` ). By (19), •,QA •,QA k,` > X X A,A is a Gram matrix which is positive and definite since the vectors T•,qa1 +1 ⊗ T•,ra1 +1 , T•,qa2 +1 ⊗ T•,ra2 +1 , . . . , T•,qaK +1 ⊗ T•,raK +1 are linearly independent.

360

Proof of Lemma 5.. The operations of adding/removing a column to a Cholesky factorization are classical and well treated in books of numerical analysis, see e.g. Golub & Van Loan (2012). An advantage of our settings is that there is no additional computational cost for computing X > Xj when entering a new variable j thanks to the closed-form expression (9). 

6. C ONCLUSION In this paper, we proposed a novel approach for retrieving the boundaries of a block wise constant matrix corrupted with noise by rephrasing this problem as a variable selection issue. Our approach is implemented in the R package blockseg which will be available from the Comprehensive R Archive Network (CRAN). In the course of this study, we have shown that our method has two main features which make it very attractive. Firstly, it is very efficient both from the theoretical and practical point of view. Secondly, its very low computational burden makes its use possible on very large data sets coming from molecular biology.

ACKNOWLEDGEMENTS The authors would like to thank the French National Research Agency ANR, which partly supported this research through the ABS4NGS project (ANR-11-BINF-0001-06).

365

370

375

22

V. B RAULT, J. C HIQUET AND C. L E´ VY-L EDUC R EFERENCES

380

385

390

395

400

405

B REIMAN , L., F RIEDMAN , J. H., O LSHEN , R. A. & S TONE , C. J. (1984). Classification and Regression Trees. Statistics/Probability Series. Belmont, California, U.S.A.: Wadsworth Publishing Company. B RODSKY, B. & DARKHOVSKY, B. (2000). Non-parametric statistical diagnosis: problems and methods. Kluwer Academic Publishers. D IXON , J. R., S ELVARAJ , S., Y UE , F., K IM , A., L I , Y., S HEN , Y., H U , M., L IU , J. S. & R EN , B. (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380. E FRON , B., H ASTIE , T., J OHNSTONE , I., T IBSHIRANI , R. et al. (2004). Least angle regression. The Annals of statistics 32, 407–499. G OLUB , G. H. & VAN L OAN , C. F. (2012). Matrix computations. JHU Press. 3rd edition. H ARCHAOUI , Z. & L E´ VY-L EDUC , C. (2010). Multiple change-point estimation with a total variation penalty. Journal of the American Statistical Association 105, 1480–1493. H OEFLING , H. (2010). A path algorithm for the fused lasso signal approximator. J. Comput. Graph. Statist. 19, 984–1006. K AY, S. (1993). Fundamentals of statistical signal processing: detection theory. Prentice-Hall, Inc. L E´ VY-L EDUC , C., D ELATTRE , M., M ARY-H UARD , T. & ROBIN , S. (2014). Two-dimensional segmentation for analyzing hi-c data. Bioinformatics 30, i386–i392. L IU , S. & T RENKLER , G. (2008). Hadamard, khatri-rao, kronecker and other matrix products. Int. J. Inform. Syst. Sci. 4, 160–177. ¨ M EINSHAUSEN , N. & B UHLMANN , P. (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72, 417–473. O SBORNE , M. R., P RESNELL , B. & T URLACH , B. A. (2000). A new approach to variable selection in least squares problems. IMA journal of numerical analysis 20, 389–403. R C ORE T EAM (2015). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. S ANDERSON , C. (2010). Armadillo: An open source C++ linear algebra library for fast prototyping and computationally intensive experiments. Tech. rep., NICTA. TARTAKOVSKY, A., N IKIFOROV, I. & BASSEVILLE , M. (2014). Sequential Analysis : Hypothesis Testing and Changepoint Detection. CRC Press, Taylor & Francis Group. T IBSHIRANI , R. J. & TAYLOR , J. (2011). The solution path of the generalized lasso. Ann. Statist. 39, 1335–1371. V ERT, J.-P. & B LEAKLEY, K. (2010). Fast detection of multiple change-points shared by many signals using group lars. In Advances in Neural Information Processing Systems.

Suggest Documents