Degree-Pruning Dynamic Programming Approaches

1 downloads 0 Views 498KB Size Report
constrained degree-pruning dynamic programming (g(dp)2) ap- proach to obtain the ... programming g(dp)2 algorithms are presented to obtain the central time ...... 67–72, 1975. [24] H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,” IEEE Transactions on Acoustics, Speech.
IEEE TRANSACTIONS ON CYBERNETICS, VOL. **, NO. 1, APRIL 2016

1

Degree-Pruning Dynamic Programming Approaches to Central Time Series Minimizing Dynamic Time Warping Distance Tao Sun, Hongbo Liu, Hong Yu, Chun Lung Philip Chen, Fellow, IEEE

Abstract—The central time series crystallizes the common patterns of the set it represents. In this paper, we propose a global constrained degree-pruning dynamic programming (g(dp)2 ) approach to obtain the central time series through minimizing Dynamic Time Warping (DT W ) distance between two time series. The DT W matching path theory with global constraints is proved theoretically for our degree-pruning strategy, which is helpful to reduce the time complexity and computational cost. Our approach can achieve the optimal solution between two time series. An approximate method to the central time series of multiple time series (called as m g(dp)2 ) is presented based on DT W Barycenter Averaging (DBA) and our g(dp)2 approach by considering hierarchically merging strategy. As illustrated by the experimental results, our approaches provide better W GSS (Within-Group Sum of Squares) and robustness than other relevant algorithms. Index Terms—Time Series; Dynamic Time Warping; Barycenter Averaging; Within-Group Sum of Squares; Central Time Series

I. I NTRODUCTION Time series data has been used in a wide range of applications ranging from biology, finance, multimedia to image analysis [1], [2], [3], [4], [5]. It draws attention of researchers across the world because of its importance and usefulness [6], [7], [8], [9], [10]. Time series clustering is one of the most fundamental techniques to explore the common pattern implied in a set of time series [11], [12], [13]. Its goal is to identify the structures in the unlabeled time series by organizing data into homogeneous groups, where the within-groupobject similarity is maximized while the between-group-object similarity is minimized [14], [15], [16]. It is usually transferred Manuscript received xx, xxxx; revised xx, xxxx; accepted xx, xxxx. This work is supported partly by the Macau Science and Technology Development under Grant 008/2010/A1 and Multiyear Research Grants, the National Natural Science Foundation of China (Grant No. 61173035, 61472058, 61572540) and the Program for New Century Excellent Talents in University (Grant No. NCET-11-0861). T. Sun is with the School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian 116023, China, with an affiliate appointment in the Institute of Cognitive Information Technology (ICIT) at Dalian Maritime University, Dalian 116026, China (e-mail: [email protected]). H. Liu is with the Institute of Cognitive Information Technology (ICIT) and the School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China (e-mail: [email protected]). H. Yu is with the School of Information Science & Engineering, Dalian Ocean University, Dalian 116023, China, with an affiliate appointment in the Institute of Cognitive Information Technology (ICIT) at Dalian Maritime University, Dalian, Liaoning 116026, China (e-mail: yuhong [email protected]). C. L. Philip Chen is with the Faculty of Science and Technology, University of Macau, Macau, China and UMacau Research Institute, Zhuhai 519080, China (e-mail: [email protected]).

to measure their distances and then to obtain the central time series [17], [18]. Regarding the distance measures, various kinds of methods are introduced [14], such as Euclidean distance, Mikowski distance, Pearsons correlation distances, Kullback-Liebler distance and dynamic time warping (DT W ) distance. Among these methods, Euclidean distance and DT W are two of the most important ones. The former is simple and easy to use. The Euclidean distance-based central time series can be obtained by simply averaging the Euclidean distance of entries at the same position of each time series. It is very effective for time series whose shapes are changing synchronously [19], [20]. However, since most time series may shift elastically in many real-world applications [21], their shape features are lost while using Euclidean distance. Although it is possible to maintain the feature of elastic shift using DT W distance [22], it is difficult to obtain the optimal solution of central time series of DT W distance with acceptable computational complexity. In this paper, we propose a degree-pruning dynamic programming approach to minimize within-group sum square of dynamic time warping distance to obtain the central time series. The rest of this paper is organized as follows. We outline the related studies and methods in Section II. We then define the relevant concepts and prove the theorems in Section III. In Section IV, our global constrained degree-pruning dynamic programming g(dp)2 algorithms are presented to obtain the central time series with DT W distance. Furthermore, we extend our g(dp)2 algorithm to m g(dp)2 approach to approximate the central time series of multiple time series. In Section V, the experimental results and analysis are provided in details. In Section VI we draw conclusions and discuss possible directions for future works. II. R ELATED W ORKS Dynamic time warping (DT W ) was introduced to measure the similarity of time series by Itakura [23], and was refined by Sakoe and Chiba [24]. DT W provides non-linear alignments between two time series to maintain their elastic shifting [25], [26]. However, it has certain drawbacks: (1) the search space is too large, (2) it may distort the shape of time series. Therefore, some constraints were introduced in order to reduce the search space and to avoid distortion in the shape of time series. SakoeChiba Band and Itakura Parallelogram used global constraints to speed up the DT W computation [24], [27], [28]. Jeong et

IEEE TRANSACTIONS ON CYBERNETICS, VOL. **, NO. 1, APRIL 2016

al. [29] proposed weighted DT W , which penalizes the points with higher phase difference between a reference point and a testing point in order to prevent minimum distance distortion. The difficulties also lie in how to determine the central time series with DT W distance. Different from “normal” distances, even if time series x is near to time series y and z, it doesn’t mean y must be near to z, as this triangular inequivalent DT W (y, z) < DT W (x, y) + DT W (x, z) isn’t always satisfied. This property of DT W leads the difficulties of querying and clustering [27]. Much effort has been devoted to seek its solutions [30], [31], [32]. Salvador suggested that the distances from the central time series to the two original time series should be equal [25]. Gupta et al. proposed successfully N LAAF (NonLinear Alignment and Averaging Filters) algorithm to pairwise generate optimally the central time series by backtracking along the optimal alignment path [30]. Niennattrakul et al. proposed an averaging subroutine and put forward a novel shape averaging method, called Prioritized Shape Averaging (P SA), using hierarchical clustering approach [33]. In their studies, a new DT W averaging function, called Scaled Dynamic Time Warping (SDT W ), is introduced to focus on calculating the central time series of two time series. Petitjean et al. proposed DT W Barycenter Averaging (DBA) to obtain the central time series through averaging a set of time series, in which the initial time series was selected randomly. They iteratively calculated the DT W distance between each individual time series and the temporary central time series to be refined, and generated new temporary central time series until the central time series was stable [11]. DBA is then used in classification problem [34] and then the nearest centroid classifiers are successfully obtained in their experiments. Petitjean et al. also proposed an averaging method by using genetic algorithms (GA) called COM ASA (Compact Multiple Alignment for Sequence Averaging) [35]. COM ASA can obtain better central time series among multiple samples, but it usually costs much more time complexity. Although DBA and COM ASA have become popular, they are only approximate iterating algorithms and their central time series are suboptimal solutions for both two time series and multiple time series [36]. It is necessary to find out an optimal solution to minimize the total distance from the central time series to all other time series [37]. In this paper, a novel approach is proposed to obtain central time series with the minimal within-group sum square of dynamic time warping distance. Our method also preserves shape characteristics for time series. The pruning strategy helps us to reduce the time complexity of algorithm. In particular, our method is also applied to the time series with different lengths. III. C ENTRAL T IME S ERIES T HEORY In order to formalize the problem, here we introduce related concepts and theorems. A. DTW with baseline constraints Let x[x1 , x2 , . . . , xn ] denote a time series vector in ndimensional Euclidean space. x(i) is the i-th entry, and

2

x(i1 : i2 ) = [xi1 , xi1 +1 , . . . , xi2 ] (1 ≤ i1 ≤ i2 ≤ n) is a subsequence of x. Since the matching relationship in many cases are not “one-to-one” while calculating the distance between two given time series, sometimes, the shape characteristics of time series are warped. For two given m-length time series x[x1 , x2 , . . . , xm ] and n-length time series y[y1 , y2 , . . . , yn ], let Wxy denote a warping path from x to y. Let Wyx denote a contrary warping path from y to x relative to Wxy , i.e. Wyx = {(jk , ik )|(ik , jk ) ∈ Wxy }. Their matching operator M (Wxy , i) is defined as the subset of y’s subscript matching to the i-th entry of x as Eq. (1). We also denote |M(Wxy , i)| as the matching degree of i-th entry of x. M(Wxy , i) = {j|(i, j) ∈ Wxy }, for i = 1, · · · , m.

(1)

Let (ik , jk ) be the k-th element of Wxy and K is the length of Wxy (1 ≤ k ≤ K). It is known that the warping path in DT W distance metric is subject to several constraints such as “boundary conditions”, “continuity” and “monotonicity” [38], [39]. Given m-length time series x, n-length time series y, and a K-length warping path Wxy , the baseline constraints of DT W are as follows: (1) (i1 , j1 ) = (1, 1) and (iK , jK ) = (m, n); (2) ik+1 − ik ≤ 1 and jk+1 − jk ≤ 1, k = 1, 2, . . . , K − 1; (3) ik+1 − ik ≥ 0 and jk+1 − jk ≥ 0, k = 1, 2, . . . , K − 1. In this paper, the DT W distance adopt the form of DT Wp called “monotonic DT W ” [40]. Write DT Wp (x(1 : i), y(1 : √ j)) = p q(i, j) denote the Lp norm of monotonic DT W distance between x(1 : i) and y(1 : j), wherein  p  |xi − yj | + q(i − 1, j − 1) p q(i, j) = min |xi − yj | + q(i − 1, j)   |xi − yj |p + q(i, j − 1)

(2)

Theorem 1: Given two time series x and y and their DT W warping path Wxy , for ∀(i, j), there is |M(Wxy , i)| = 1 or |M(Wyx , j)| = 1. Proof: If (i, j) ∈ Wxy , |M(Wxy , i)| > 1 and |M(Wyx , j)| > 1, after removing (i, j), Wxy still satisfies the DT W baseline constraints, while the corresponding DT W distance is decreased. Since DT W is the minimum among all possible warping paths, the theorem follows.

B. DTW with global constraints Suppose x and y are two time series, we often have to match globally to obtain the central time series c from x and y. Meanwhile, it is quite essential to maintain relatively symmetrical matchings between two group warping paths, i.e., Wxc and Wyc . The Sakoe-Chiba Band global constraints are (1) Wxy is subject to the baseline constraints of DT W ; (2) For a given positive integer e (0 ≤ e ≤ min(m, n) − 1), for ∀(ik , jk ) ∈ Wxy , |ik − jk | ≤ e.

IEEE TRANSACTIONS ON CYBERNETICS, VOL. **, NO. 1, APRIL 2016

3

C. Matching degree constraints Definition 1 (Central time series of two time series): Let x and y be two time series, we define their central time series as follows. C(x, y) = argmin(DT W 2 (x, t) + DT W 2 (t, y))

(3)

t

Notice that, in order to make the calculation convenient and without loss of generality, the squared form is adopted for the minimizing problem in Equ. 1, avoiding the calculations of square root. Figure 1 illustrates the central time series, in which the time series c is the central time series of x and y. Note that the central time series has the same length and an overall similar shape to x and y.

Fig. 1: Illustration of central time series of two time series

Theorem 3: For two given time series x and y, there exist central time series c and the corresponding DT W warping paths Wcx and Wcy , satisfying (t, p) ∈ Wcx and (t, q) ∈ Wcy , |M(Wcx , t)| = |M(Wyc , q)| = 1, or |M(Wcy , t)| = |M(Wxc , p)| = 1. Proof: Since c is the central time series of x and y, Wcx and Wcy satisfy Eq. (3). Assume (t, p) is the a-th element in Wcx and (t, q) is the b-th element in Wcy , respectively. From Theorem 1, we have |M(Wcx , t)| = 1 or |M(Wxc , p)| = 1, and |M(Wcy , t)| = 1 or |M(Wyc , q)| = 1, thus it suffices to consider the following two cases. 1) If |M(Wxc , p)| > 1 and |M(Wyc , q)| > 1, c′ will be obtained by removing the t-th entry from c. Analogously, ′ ′ Wcx and Wcy can also be obtained by removing the ath element from Wcx and the b-th element from Wcy respectively. 2) If 1 < |M(Wcx , t)| ≤ |M(Wcy , t)|, we will have c′ by replacing the a-th entry of c from the subsequence z = [λ, λ, · · · , λ], where λ = c(t), and |M(Wcx , t)| is ′ the length of z. Wcx can also be obtained by replacing the matching (t, p) using one-to-one matching from the subsequence z to the subsequence M(Wcx , t). In a similar way, by replacing the matching (t, q) in Wcy by the one-to-many matching from the subsequence z to the ′ subsequence M(Wcy , t), we have Wcy . ′

Given two time series x[x1 , x2 , . . . , xm ], y[y1 , y2 , . . . , yn ], assuming that time series c[x] is a 1-length central time series of x and y, we define the minimal square sum of the DT W distances from c to x and y as follows. m n ∑ ∑ 2 E(x, y) = min( (xi − x) + (yj − x)2 ) x

i=1

(4)

j=1

∑m In order to obtain an x that minimizes ( i=1 (xi − x)2 ∑ n + j=1 (yj − x)2 ), let ∑n ∑m d( i=1 (xi − x)2 + j=1 (yj − x)2 ) =0 (5) dx Set the mean of the entries of x and y as follows. ∑m ∑n i=1 xi + j=1 yj A(x, y) = (6) m+n Obviously, A(x, y) is the solution of Equ. 4 from Equ. 5. In this paper, E(x, y) is also named the Sum of Squares of Deviations(SSD) from the mean of all the entries of x and y. E(x, y) is computed as follows. E(x, y) = min( x

m n ∑ ∑ (xi − x)2 + (yj − x)2 ) i=1

(7)

j=1

Theorem 2: Suppose x and y are two time series, c is the corresponding k-length central time series of x and y, and Wcx and Wcy are warping paths from c to x and y, respectively. Then we have c(t) = A(X ∪ Y ), where X = {x(i)|i ∈ M(Wcx , t)} and Y = {y(j)|j ∈ M(Wcy , t)}, for t = 1, 2, . . . , k. Proof: From Eqs. (4) and (6), Theorem 2 follows immediately.



With c′ , Wcx and Wcy as defined above, we could guarantee that the square sum of DT W in Equ. 3 is not increasing while the matching between paths satisfies the constraints. Recursively applying the above procedure, it is easy to show ′ ′ ′ that |M(Wcx , t)| = |M(Wyc , q)| = 1, or |M(Wcy , t)| = ′ |M(Wxc , p)| = 1 holds. IV. C ENTRAL T IME S ERIES A LGORITHMS Following Theorem 3, we present the degree-pruning dynamic programming algorithm with baseline constraints to obtain the central time series of two time series with minimized the square sum of DT W . And then, we investigate the cases with global constraints. A. Degree-pruning dynamic programming algorithm with baseline constraints (b(dp)2 ) In our degree-pruning dynamic programming algorithm with baseline constraints, the problem of obtaining the central time series of two time series x(1 : m) and y(1 : n) is decomposed into the sub-problems of minimizing the square sum of the DT W distance from two given subsequences x(1 : i) and y(1 : j) to their central time series, where the DT W distance is subject to the baseline constraints described in Section III-A. Each sub-problem is further recursively converted into minimize the sum of a “smaller” sub-problems and the sum of squares of deviations from the mean of the entries justly aligning to the last entry of the current central time series. From Theorem 3, we adopt a degree-pruning strategy to reduce its computational time complexity. For given m-length time series x and n-length time series y, denote by Cij the central time series of x(1 : i) and y(1 : j).

IEEE TRANSACTIONS ON CYBERNETICS, VOL. **, NO. 1, APRIL 2016

Moreover, denote by Dij the square sum of DT W 2 (Cij , x(1 : i)) and DT W 2 (Cij , y(1 : j)), for i = 1, · · · , m; j = 1, · · · , n, satisfying the following equality:  Cij = argmin(DT W 2 (c, x(1 : i)) + DT W 2 (c, y(1 : j))) c

4

We are able to derive a pruning strategy, which ensures that only one variable (p or q) rather than the couple (p, q) (p ∈ {1, 2, · · · , m}, q ∈ {1, 2, · · · , n}) is needed in each of the four recursive relationships in Eq. (10). The time complexity is thus significantly reduced.

Dij = DT W 2 (Cij , x(1 : i)) + DT W 2 (Cij , y(1 : j)) (8) can be recursively

For i = 1, · · · , m and j = 1, · · · , n, Dij converted to the followings: (a) For k-length central time series Cij , in order to get the minimum of Eq. (10), the last entry of Cij , i.e. Cij (k) must be mapped to some latter entries of x(1 : i) and some latter entries of y(1 : j). (b) Sequentially, Cij (1 : k − 1) should be mapped to the remaining former part of x, and the remaining former part of y. Thus we obtain the recursive sub-problem of size smaller than i, j. Given m-length time series x[x1 , · · · , xm ] and n-length time series y[y1 , · · · , yn ], assume that there are two subsequences x(ia : ib ), y(ja : jb ) during obtaining the central time series, where 1 ≤ ia ≤ ib ≤ m and 1 ≤ ja ≤ jb ≤ n. Assume U is the set composed of all the entries of x(ia : ib ) and y(ja : jb ). A(x(ia : ib ), y(ja : jb )) is the mean of the entries of set U , and E(x(ia : ib ), y(ja : jb )) is the sum of squares of deviations from the mean of the entries of set U from Theorem 2 and Eq. (4). For short representation, let Ajiaa,i,jbb denote A(x(ia : ib ), y(ja : jb )), and let Eijaa,i,jbb denote E(x(ia : ib ), y(ja : jb )). Eq. (9) gives the formulas of Ajiaa,i,jbb and Eijaa,i,jbb . {

∑ib

∑jb xi +

yj

j=ja a Ajiaa,i,jbb = ii=i b −ia +jb −ja +2 ∑jb ∑ ib (yj − Aijaa,i,jbb )2 (xi − Ajiaa,i,jbb )2 + j=j Eijaa,i,jbb = i=i a a

(9) From Eq. (2), considering four recursive relationships, Dij are as follows.  q,j Di−1,q−1 + Ei,i    q,j D + Ei,i i,q−1 Dij = min (10) p,q Dp−1,j−1 + E j,j  p,i   j,j Dp−1,j + Ep,i where p ∈ {1, 2, · · · , i}, q ∈ {1, 2, · · · , j}. During the calculation of Dij , Cij is obtained at the same time. Consider Eq. (8) and the right side of Eq. (10), ia , ja , ib and jb are used to record the subscripts of that the sum D + E in Eq. (10) when it reaches the minimum, i.e. Dij = Dia ,ja + Eijbb,i,j . Accordingly, Dij is obtained by Eq. (11).  jb ,j  Dij = Dia ,ja + Eib ,i Cij = Cia ,ja ⊕ Ajibb,i,j , s.t. 1 ≤ ia ≤ ib ≤ i (11)   1 ≤ ja ≤ jb ≤ j The symbol “⊕” is a joint operator between time series and a real number, e.g. [1, 2, 3, 4] ⊕ 5 = [1, 2, 3, 4, 5]. The recursive formula of Dij is illustrated in Figure 2. From Theorem 3, it is impossible that there are two “oneto-many” mappings from x and y to any given entry of Cij .

Algorithm 1 degree-pruning dynamic programming algorithm with baseline constraints (b(dp)2 ) Require: m-length time series x and n-length y. Ensure: Central time series C and the corresponding distance D. 1: Let M denote a (m + 1) × (n + 1) matrix of tuples (ia , ja , ib , jb , E, k) 2: for i = 0 to m, j = 0 to n do 3: M(i, j) = (0, 0, 0, 0, ∞, 0) 4: end for 5: M(0, 0) = (0, 0, 0, 0, 0, 0) 6: for i = 1 to m, j = 1 to n do 7: for p = i downto 1 do 8: update(i, j, p − 1, j − 1, p, j) 9: update(i, j, p − 1, j, p, j) 10: end for 11: for q = j downto 1 do 12: update(i, j, i − 1, q − 1, i, q) 13: update(i, j, i, q − 1, i, q) 14: end for 15: end for 16: D = M(m, n).E 17: Let (i, j) = (m, n) 18: while p > 0 and q > 0 do 19: Let (ia , ja , ib , jb , E, k) = M(i, j) 20: Let C(k) = A(x(ib : i), y(jb : j)) 21: Let (i, j) = (ia , jb ) 22: end while 23: return C, D 24: 25: 26: 27: 28: 29: 30:

function update(i, j, ia , ja , ib , jb ) E ′ = M(ia , ja ).E + E(x(ib : i), y(jb : j)) if E ′ < M(i, j).E M(i, j) = (ia , ja , ib , jb , error, M(ia , ja ).k + 1) end if end function

Algorithm 1 outlines our baseline algorithm (b(dp)2 ). We define a matching matrix M in Line 1 of Algorithm 1, where length denotes the length of the central time series of x(1 : i), y(1 : j). Each element of matrix M is represented by a tuple (ia , ja , ib , jb , error, length). We have the recursive relationship M(i, j).error = M(ia , ja ).error + E(x(ib : i), y(jb : j)). The values of (ia , ja , ib , jb ) are obtained from four possible cases: (p − 1, j − 1, p, j), (p − 1, j, p, j), (i − 1, q − 1, i, q) and (i, q − 1, i, q) by Lines 8, 9, 12 and 13 in Algorithm 1 respectively, as illustrated in Figure 2. The dynamic programming is employed to minimize the square sum of DT W , and finally M(i, j).error stores the square sum of the DT W distances from the central time series to x(1 : i) and y(1 : j).

IEEE TRANSACTIONS ON CYBERNETICS, VOL. **, NO. 1, APRIL 2016

x x1

c

c1

y y1

· · · xp−1

···

xp

· · · ck−1

···

xi

ck

yj

yj−1

j,j Ep,i

Dp−1,j−1 +

j,j reaching the (a) The case of Dp−1,j−1 + Ep,i minimum

x x1

c

c1

y y1

· · · xp−1

···

xp

· · · ck−1

xi

ck

···

yj

Dp−1,j

j,j Ep,i

+

j,j (b) The case of Dp−1,j +Ep,i reaching the minimum

y y1

c

c1

x x1

· · · yq−1

···

yq

· · · ck−1

···

yj

C. Degree-pruning dynamic programming algorithm with global constraints (g(dp)2 )

xi

xi−1

q,j Ei,i

q,j (c) The case of Di−1,q−1 + Ei,i reaching the minimum

c

c1

x x1

Take the calculation of D23 for example. Correspondingly, C23 is the central time series of [1, 2] and [3, 4, 5], and D23 = DT Wp2 (C23 , [1, 2]) + DT Wp2 (C23 , [3, 4, 5]). The candidates of the right hand side of Eq. (10) totally include the summation 3,3 2,3 3,3 2,3 of 6 terms D12 + E2,2 , D11 + E2,2 , D22 + E2,2 , D21 + E2,2 , 3,3 3,3 2,3 D12 + E2,2 and D13 + E2,2 . In fact, only D11 + E2,2 arrives at the minimum, i.e. D23 = Dia ja + Eijbb,2,3 when (ia , ib , ja , jb ) = (2, 2, 2, 3). As (ia , ib , ja , jb ) = (2, 2, 2, 3) is subject to the constraints in Eq. (11), using Cia ja and Ajibb,i,j , we can easily deduce Cij (C23 ) 2,3 i.e. C11 and A2,3 2,2 . In fact, A2,2 = 3.667 can be gotten directly from Eq. (9), while C11 = [2] can be obtained directly by Eq. (11). Thus we have C23 = Cia ja ⊕ Ajibb,i,j = C11 ⊕ A2,3 2,2 = [2] ⊕ [3.667] = [2, 3.667] by Eq. (11). Consider the matrix M in algorithm 1, its ij-th element M(i, j) can be represented by a 6-tuple as (m1 , m2 , m3 , m4 , m5 , m6 ). For the above computational example, when (i, j) = (2, 3), we have (m1 , m2 , m3 , m4 ) = (ia , ib , ja , jb ) = (2, 2, 2, 3), m5 = D23 and m6 (i.e. the length of C23 ).

ck

Di−1,q−1 +

y y1

5

· · · yq−1

···

yq

· · · ck−1

··· Di,q−1

xj

ck

xi +

q,j Ei,i

q,j (d) The case of Di,q−1 +Ei,i reaching the minimum

Fig. 2: Illustration of Dij in Eq. (10)

From Lines 6, 7 and 11 in Algorithm 1, we arrive at the time complexity of b(dp)2 algorithm is o(n3 ). B. Computational Example of b(dp)2 Given x = [1, 2, 3, 4, 5], y = [3, 4, 5, 6, 7], let [ia, ib] = [3, 5] and [ja, jb] = [4, 5]. Then we have x(ia : ib) = x(3 : 5) = [3, 4, 5], and y(ja : jb) = y(4 : 5) = [6, 7]. Ajiaa,i,jbb is the mean of the set {3, 4, 5, 6, 7} composed of all the entries of = 5. Eijaa,i,jbb is the x(ia : ib) and y(ja : jb), i.e. 3+4+5+6+7 5 sum of squares of deviations from the mean 5, i.e. (3 − 5)2 + (4 − 5)2 + (5 − 5)2 + (6 − 5)2 + (7 − 5)2 ) = 10. Initially, let D00 = 0 and C00 be an empty time series as “[ ]”. As we can directly calculate Eijaa,i,jbb for ia, ib, ja, jb = 1, 2, · · · , 5 (ia ≤ ib, ja ≤ jb) from Eq. (9), Dij for i, j = 1, 2, · · · , 5 can be recursively calculated as the minimal sum of D + E with smaller subscripts, according to Eq. (10).

In this section, Sakoe-Chiba Band global constraints (described in Section III-B) are considered in calculating the central time series. Let the constraint parameter be denoted by e, which satisfies that |ik − jk | ≤ e for each element (ik , jk ) in DT W warping path. Similar with the definitions of Cij and Dij , when we consider that the length of the central time series is subject to a given constant k, Cijk and Dijk are proposed as follows. For m-length time series x and n-length time series y, let Cijk represent the k-length central time series with global constraints of x(1 : i) and y(1 : j), and let Dijk represent the square sum of the DT W distances from Dijk to x(1 : i) and y(1 : j), for i = 1, · · · , m; j = 1, · · · , n; k = 1, · · · , max(m, n) − e. Comparing Eq. (12) with Eq. (8), subscript k is added to constrain the length of central time series.  Cijk = argmin(DT W 2 (c, x(1 : i)) + DT W 2 (c, y(1 : j))) D

|c|=k

2 2 ijk = DT W (Cijk , x(1 : i)) + DT W (Cijk , y(1 : j)) (12) The recursive relationships of Dijk and Cijk are determined by Eqs. (13) and (14), respectively.

Dijk

 Di−1,q−1,k−1    D i,q−1,k−1 = min p,q  Dp−1,j−1,k−1    Dp−1,j,k−1

+ + + +

q,j Ei,i q,j Ei,i j,j Ep,i j,j Ep,i

(13)

where p ∈ {max(1, k − e − 1), · · · , i}, q ∈ {max(1, k − e − 1), · · · , j}, and |i − k| ≤ e, |j − k| ≤ e.

IEEE TRANSACTIONS ON CYBERNETICS, VOL. **, NO. 1, APRIL 2016

6

 jb ,j  Dijk = Dia ,ja ,k−1 + Eib ,i jb ,j Cijk = Cia ,ja ,k−1 ⊕ Aib ,i , s.t. 1 ≤ ia ≤ ib ≤ i   1 ≤ ja ≤ jb ≤ j (14) E.g. x = (1, 2, 3, 4, 5) and y = (3, 4, 5, 6, 7). If (i, j, k) = (2, 3, 1), Cijk = [3] is the central time series of [1, 2] and [3, 4, 5] with the length constrain k = 1. If (i, j, k) = (2, 3, 2), then Cijk = [2, 3.667] is the central time series with the length constrain k = 2. In fact, the time complexity of calculating the central time series with global constraints is O(ne4 ). Unfortunately, this calculation is too cost-intensive to be used in practice. If we consider the reduction of the time cost with approximate global constraints, the algorithm can be presented by the following steps: (a) Replace Line 6 in Algorithm 1 by “for i = 1 to n, j = max(1, i − 2e) to min(n, i + 2e) do”; (b) Replace Line 7 in Algorithm 1 by “for p = i downto max(1, i − 2e, j − 2e)”; (c) Replace Line 11 in Algorithm 1 by “for q = j downto max(1, i − 2e, j − 2e)”. The corresponding time complexity is obviously O(ne2 ) and is applicable in practice. And the corresponding algorithm is called global constrained degree-pruning dynamic programming (g(dp)2 ). D. Central time series algorithm for multiple time series In this subsection we present the central time series algorithm for multiple time series (m g(dp)2 ). For a multiple time series set S = {s1 , · · · , sN }, the central time series of S is defined as: c = argmin x

N ∑

DT W 2 (x, sk )

(15)

d′i = min{dij }, p = argmax{d′i }, and q = argmin{dpj }, j̸=i

i

j̸=p

thus up , uq are the central pair selected to be merged. Each selected pair is relatively nearer than others with the square sum of DT W distances. The main steps of our algorithm m g(dp)2 are presented as follows: (a) Randomly select M different elements from S, composed an initial set of central time series. (b) Call function divideDataset and divide S into M subsets. (c) Calculate each central time series of these M subsets by DBA. (d) Call function mergeCenters to pairwise merge M central time series into M/2 central time series by g (dp)2 , then let M = M/2. (e) Repeat Step (d), until only one element is left in the set of central time series which is our output result. In function mergeCenters, the computational complexity of DBA at Line 12 in Algorithm 2 is totally O(ns nl n2 ), where ns (i.e. ns = |S|) is the count of all the time series samples, nl is the number of iterations of DBA, and n is the length of time series. The computational complexity of dij at Line 13 is O(M 2 n2 ), where M is the count of the input subsets of the function, in fact, M