a transfer forecasting model for container throughput guided by ...

10 downloads 0 Views 221KB Size Report
Apr 20, 2013 - Accurate forecasting of future container throughput for a port is very ..... the source time series, including Dalian Port, Qingdao Port, Xiamen Port ...
J Syst Sci Complex (2014) 27: 181–192

A TRANSFER FORECASTING MODEL FOR CONTAINER THROUGHPUT GUIDED BY DISCRETE PSO XIAO Jin · XIAO Yi · FU Julei · LAI Kin Keung

DOI: 10.1007/s11424-014-3296-1 Received: 1 April 2012 / Revised: 20 April 2013 c The Editorial Office of JSSC & Springer-Verlag Berlin Heidelberg 2014 Abstract Accurate forecast of future container throughput of a port is very important for its construction, upgrading, and operation management. This study proposes a transfer forecasting model guided by discrete particle swarm optimization algorithm (TF-DPSO). It firstly transfers some related time series in source domain to assist in modeling the target time series by transfer learning technique, and then constructs the forecasting model by a pattern matching method called analog complexing. Finally, the discrete particle swarm optimization algorithm is introduced to find the optimal match between the two important parameters in TF-DPSO. The container throughput time series of two important ports in China, Shanghai Port and Ningbo Port are used for empirical analysis, and the results show the effectiveness of the proposed model. Keywords Analog complexing, container throughput forecasting, discrete particle swarm optimization, transfer forecasting model. XIAO Jin Business School, Sichuan University, Chengdu 610064, China. Email : [email protected]. XIAO Yi School of Information Management, Central China Normal University, Wuhan 430079, China; Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China. Email : [email protected]. FU Julei Information System & Management, National University of Defense Technology, Changsha 410073, China. Email : [email protected]. LAI Kin Keung Department of Management Sciences, City University of Hong Kong, Hong Kong, China. ∗ This research is partly supported by the Natural Science Foundation of China under Grant Nos. 71101100 and 70731160635, New Teachers’ Fund for Doctor Stations, Ministry of Education under Grant No. 20110181120047, Excellent Youth Fund of Sichuan University under Grant No. 2013SCU04A08, China Postdoctoral Science Foundation under Grant Nos. 2011M500418, 2012T50148, and 2013M530753, Frontier and Cross-innovation Foundation of Sichuan University under Grant No. skqy201352, Soft Science Foundation of Sichuan Province under Grant No. 2013ZR0016, Humanities and Social Sciences Youth Foundation of the Ministry of Education of China under Grant No. 11YJC870028, Selfdetermined Research Funds of CCNU from the Colleges’ Basic Research and Operation of MOE under Grant No. CCNU13F030.  This paper was recommended for publication by Editor WANG Shouyang.

182

1

XIAO JIN, et al.

Introduction

Accurate forecasting of future container throughput for a port is very important for its construction, upgrading and the operation management[1–3] . At present, there are lots of studies on analysis and forecasting of container throughput. The existing approaches can be classified into three categories: 1) Time series models, which establish a mathematical model only by historical throughput data, include autoregressive integrated moving average model (ARIMA), exponential smoothing, gray model (GM), decomposition approach (X-11), etc. For example, Chou, et al.[4] adopted seasonal ARIMA model to analyze Kaohsiung harbor’s container volume. 2) Causal analysis models. These models examine the correlation between the throughput and a series of economic indicators of port hinterland, and build a forecasting model according to the relevant economic indicators. At present, such methods include regression analysis, the elasticity coefficient method, etc. For instance, Seabrooke, et al.[5] predicted the cargo growth and the development of the Hong Kong Port by means of regression analysis, and Hui, et al.[6] forecasted the cargo throughput of Hong Kong Port by estimating a cointegrated error correction model. 3) Nonlinear dynamics forecasting models. The time series models and causal analysis models can achieve satisfactory prediction performance when the container throughput time series is linear or near linear. However, the factors affecting the container throughput are complicated, and the fluctuation of container throughput usually appears highly nonlinear dependencies. Thus, the performance may be very poor only by these linear models. In recent years, some nonlinear dynamics forecasting models have been introduced to container throughput forecasting, such as artificial neural network (ANN), genetic programming (GP). For example, Lam, et al.[7] applied ANN to Hong Kong Port cargo throughput forecasting, while Chen and Chen[8] forecasted the container throughput of Taiwan’s major ports by GP, X-11, and seasonal ARIMA, and found that the performance of GP was optimal. However, these nonlinear models have their own disadvantages, e.g., ANN often suffers from local minima and over-fitting, while GP is sensitive to parameter selection[9] . The limitations of the above models motivate the demand to looking for new method for container throughput forecasting. In fact, Peters[10] pointed out that most economic time series were long-memory processes, and the future developments of the time series depended on not some parts but all of the historical data. In this case, prediction approaches based on historical pattern matching should be chosen, which can overcome the disadvantages of short-memory process forecasting methods. As a pattern matching method, the analog complexing (AC)[11] just provides an alternative approach for container throughput forecasting. It was first applied to meteorological forecasting[11]. Further, Lemke and Mueller[12] enhanced AC algorithm by an inductive self-organizing approach and an advanced selection procedure to make it also applicable to evolutionary processes. In the past decades, AC algorithm has been successfully applied to many areas, such as stock price prediction, medical data analysis, marketing data analysis[13, 14] . However, there is no application in container throughput forecasting. Meanwhile, the traditional AC method only utilizes the forecasted time series itself, i.e., target domain data. In theory, if there are enough data points in the target time series, the AC

TRANSFER FORECASTING MODEL FOR CONTAINER THROUGHPUT

183

can achieve satisfactory forecast performance. Unfortunately, the data points of many ports in reality are very few. Taking the main ports in China as an example, their monthly time series began from 2001, and the data points are less than 150. In fact, there are plenty of data available outside the target domain and they often come from other time series, i.e., source domain. Therefore, it is expected to improve the forecasting accuracy of container throughput if we can make full use of the time series in the related source domain. To solve this issue, the transfer learning technique[15] developed from machine learning area provides a good idea. Its basic idea is to utilize the knowledge acquired from the related tasks to assist people in learning target task for better performance. In recent years, transfer learning has gotten more and more attention, and been applied to many areas successfully, such as text mining, speech processing, and image recognition[15] , but is scarce in the field of economic time series forecasting. This study combines transfer learning technique, AC and particle swarm optimization algorithm, and constructs a transfer forecasting model guided by discrete particle swarm optimization algorithm (TF-DPSO) for container throughput. The empirical analysis results show that the forecasting performance of TF-DPSO is better than that of some existing models. The structure of this study is organized as follows. It describes the formulation process of the proposed TF-DPSO model in detail in Section 2; Conducts empirical analysis in Section 3. Finally, the conclusions are included in Section 4.

2 2.1

Methodology Formulation Analog Complexing

Analog complexing (AC) algorithm was developed by Lorece[11] . It can be considered as a sequential pattern matching method. In general, the AC algorithm for forecast is a four-step procedure[14] . 1) Generation of candidate patterns Given an m-dimensional time series xt = {x1t , x2t , · · · , xmt }, t = 1, 2, · · · , N , a pattern is defined as a table Pk (i) with k rows (observations) beginning from the i-th line (period), where k is the pattern length (i = 1, 2, · · · , N − k + 1): ⎛ ⎞ x1i ··· xli ··· xmi ⎜ ⎟ .. .. .. .. .. ⎜ ⎟ ⎜ ⎟ . . . . . ⎜ ⎟ ⎜ ⎟ Pk (i) = ⎜ x1,i+j · · · . (1) xl,i+j ··· xm,i+j ⎟ ⎜ ⎟ ⎜ ⎟ .. .. .. .. .. ⎜ ⎟ . . . . . ⎝ ⎠ x1,i+k−1 · · · xl,i+k−1 · · · xm,i+k−1 k×m

In general, the last pattern P R = Pk (N − k + 1) just before the forecast origin is selected as the reference pattern. Then all possible candidate patterns Pk (i)(i = 1, 2, · · · , N − k) are compared with the reference pattern.  Example For a three-dimensional series with five observations D=

1 5 7 10 15

2 5 9 11 16

3 6 9 13 16

, the pattern

184

XIAO JIN, et al.

1 2 3

5 5 6 length k = 3, then there are two candidate patterns: P3 (1)= 5 5 6 , P3 (2)= 7 9 9 , and a 799 10 11 13

7 9 9 R reference pattern P = 10 11 13 . 15 16 16 2) Transformation of analogues Because the system is dynamic, patterns with similar shapes may have different means and standard deviations. Thus, to compute the similarity between the candidate pattern and reference pattern, we must look for a transformation from candidate patterns to reference pattern to describe these differences. It is advisable to define the transformed pattern Pk∗ (i) as a linear function of the pattern Pk (i): ⎛ ⎞ x∗1i ··· x∗li ··· x∗mi ⎜ ⎟ .. .. .. .. .. ⎜ ⎟ ⎜ ⎟ . . . . . ⎜ ⎟ ⎜ ⎟ ∗ (2) Pk (i) = ⎜ x∗1,i+j ··· x∗l,i+j ··· x∗m,i+j ⎟ , ⎜ ⎟ ⎜ ⎟ .. .. .. .. .. ⎜ ⎟ . . . . . ⎝ ⎠ x∗1,i+k−1 · · · x∗l,i+k−1 · · · x∗m,i+k−1 where x∗l,i+j = ai0l + ai1l xl,i+j , i = 1, 2, · · · , N − k; j = 0, 1, · · · , k − 1; l = 1, 2, · · · , m. Regarding the data xlz (l = 1, 2, · · · , m; z = N − k + 1, N − k + 2, · · · , N ) in reference pattern P R as the datum value (i.e., the dependent variable value), and the data in candidate pattern Pk (i) as the independent variable value, the unknown weights ai0l and ai1l for Pk∗ (i) can be estimated by the least squares (LS) method. The parameter ai0l could be interpreted as the difference of state between the reference pattern and the candidate pattern, and the parameter ai1l is considered to be some uncertainties. It is worth noting that the pattern length k ≥ 3, because if k = 2, the transformed candidate patterns will be the same as the reference pattern. Taking the candidate pattern P3 (1) in Step 1) as an example, obviously the dimension m=3. We need to estimate the weights a10l and a11l (l = 1, 3) in each

1 dimension. In the

2, 7 R first dimension, the column vectors in P and P3 (1) are 10 and 5 , thus we can get 15

7

∗ a101 = 5.25 and a111 = 1.25

1 by LS. Further, the transformed value of the first column in P3 (1) is 6.5 11.5 = 5.25 + 1.25 ∗ 5 . Similarly, the weights in the second dimension and third dimension 14

7

can be computed: a102 = 6.60, a112 = 1.01, a103 = 5.67, a113 = 1.17. Finally, the transformed pattern ⎞ ⎛ 6.5 8.62 9.18 ⎟ ⎜ ⎟ P3∗ (1) = ⎜ ⎝11.5 11.65 12.69⎠ . 14 15.69 16.2 3) Selection of most similar patterns To measure the similarity between the transformed candidate pattern P3∗ (1) in Step 2) and the reference pattern P R , we need to compute the distance between them. In AC, the distance between the i-th (i = 1, 2, · · · , N − k) candidate pattern and reference pattern is defined as

m k−1  1  (xr,i+j − xr,N −k+j+1 )2 . di = (3) k + 1 j=0 r=1

TRANSFER FORECASTING MODEL FOR CONTAINER THROUGHPUT

185

Then the pattern similarity between them is defined as: si =

1 . di

(4)

4) Combination forecasting Suppose F most similar candidate patterns are selected in Step 3), then the continuation of each selected pattern is transformed to the reference pattern for forecasting. For instance, suppose the candidate pattern P3 (1) in the above example is selected and the forecast interval length is 1, then

it can seen from the initial observations D that the continuation of P3 (1) is 10 11 13 . According to the estimated weights in Step 2), the three dimensions of the continuation of P3 (1) are transformed 6.60+ 11*1.01=17.71,

as follows: 5.25+10*1.25=17.75, and 5.67+13*1.17=20.88. Finally, 17.75 17.71 20.88 is just regarded as the one step forecasting result of the selected pattern P3 (1). Let R1 , R2 , · · · , RF be the forecasting results of F patterns, and the combination forecasting results are gotten by R∗ =

F

(wi Ri ),

(5)

i=1

where wi (i = 1, 2, · · · , F ) are the weights. In this study, they are computed according to pattern similarity. 2.2

Transfer Learning Technique

The concept of transfer learning originates from the Psychology[15] . It means the ability that people can use their experience and technologies learned in the related areas to assist in learning the new task. Learning new knowledge direct is the traditional machine learning paradigm that we are familiar with. Such methods often suppose that the learning tasks are independent with each other, and they will discard the past learning experience and results in learning new task. Until the 1990s, transfer learning has gained more and more attention with the development of machine learning. Its basic idea is to utilize the data or information of related source tasks to assist in modeling of target task[15, 16] . 2.3

Process of Transfer Forecasting Model Guided by DPSO

The transfer forecasting model guided by DPSO contains two phases: 1) Selection of related time series in source domain. The forecasting performance may be damaged if the data are transferred improperly, that is, negative transfer. Thus, how to transfer the most useful data is very important. 2) Parameter optimization based on DPSO. There are two important parameters in AC: Pattern length k and the number of most similar selected patterns F . The conventional AC model usually needs to fix k at a certain level first, generate all candidate patterns, select F most similar ones with the reference pattern to combine, and get the combination forecasting result; and then it changes the value of k repeatedly. Thus, each value of k corresponds to a combination forecasting result, and finally the optimal forecasting result is selected from all combination forecasting results. However, in real forecasting issues, the optimal forecasting may be the combination of the continuations from some candidate patterns

186

XIAO JIN, et al.

with different k, and the conventional AC cannot find such optimal forecasting. At the same time, the conventional AC model is difficult to find the optimal match between k and F . To overcome the deficiency of the conventional AC model, discrete particle swarm optimization is introduced to optimize the parameters in TF-DPSO model. 2.3.1

Selection of Related Time Series in Source Domain

To describe the relevance between the source domain time series and target time series, we calculate their Pearson correlation coefficient (PCC). However, because the number of data points in different time series is often different, the PCC cannot be calculated directly. Thus, we segment each time series into a series of sub-time series with length of L (e.g., L = 12 for monthly data) starting from the latest time point, and discard the sub-time series of the last part which has less than 12 data points directly. Suppose that Y is the target time series of a forecasting issue, there are M related time series i X (i = 1, 2, · · · , M ) in source domain, and Y and X i are divided into n1 and n2 sub-time series respectively, then the PCC between Yj (j = 1, 2, · · · , n1 ) and Xki (k = 1, 2, · · · , n2 ) is calculated as follows: n i i t=1 (Xk,t − X k )(Yj,t − Y j ) i , (6) rj,k =   n n i − X i )2 2 (X (Y − Y ) j,t j k k,t t=1 t=1 i

where n means the number of data points in each sub-time series (n = 12), X k and Y j are the mean values of sub-time series Xi and Yj , respectively. Further, the average similarity between Y and Xi (i = 1, 2, · · · , M ) is defined: n1 n2 i j=1 k=1 rj,k i r = . (7) n1 ∗ n1 It is easy to know that −1 ≤ ri ≤ 1. In TF-DPSO model, we can transfer some source domain time series which have the largest similarity with the target time series to target domain. 2.3.2

Parameter Optimization Based on DPSO

Particle swarm optimization (PSO) is a population-based stochastic optimization algorithm[17] . It is originally proposed for continuous problems. However, in many practical issues, the values of variables are discrete, such as the two parameters in TF-DPSO model proposed in this study. Kennedy and Eberhart[18] developed the first discrete version of PSO for binary problems (BPSO). It solves the problem of moving the particle through the problem space by changing the velocity in each vector to the probability of each bit being in one state or the other. Further, Al-kazemi and Mohan[19] proposed a multiphase discrete PSO (M-DPSO) algorithm. For the detail modeling process of M-DPSO, please refer to [19]. In this study, we apply the M-DPSO to optimize the parameters k and F in TF-DPSO model. After repeated experiments, we find that good forecasting results can be gotten when 1 ≤ F ≤ 8, and 3 ≤ k ≤ 13. Therefore, we let the length of potential solution (each particle) be 14 binary series, where the front three bits denote the value of F , the latter eleven bits denote if the candidate patterns are considered when k = 3, 4, · · · , 13 in turn.

TRANSFER FORECASTING MODEL FOR CONTAINER THROUGHPUT

187

Taking the particle in Figure 1 as an example, the front three bits are 1, 0, and 1, respectively, and the corresponding decimal value is 1*22 +0*21+1*20(=5). Further, as for 3 binary series, the corresponding decimal values vary from 0 to 7. Therefore, the final value of F equals the decimal value plus 1, i.e., 5+1=6. As for the latter eleven bits, we can see that the 2nd, 3rd, 8th, and 11th bits are 1. To make the 1st bit denote the candidate pattern with the length of 3, we only need to add 2 to the above ordinals. Thus, the candidate patterns with the length of 4(=2+2), 5(=3+2), 10(=8+2), and 13(=11+2) will be considered simultaneously. 1

0

1

0

1

1

0

0

Number of selected most similar patterns

0

0

1

0

0

1

Pattern length

Figure 1 The encoding of a potential solution

To evaluate the forecasting results of each particle, we define the following fitness function: N

min F =

1 (yt − yt )2 , n t=1

(8)

where yt and yt are the observed values and predicted values in test set, and N is the number of observations in test set. As the function is minimization, individuals with less amount of fitness are chosen for each generation. Let X1 , X2 , · · · , XM be M source domain time series, Y the target domain time series, Train the training set for modeling in target domain, Test the test set used to verify the performance  of each particle in target domain (Y = Train Test ), Pg the generation number of DPSO, and Pn be the population size. Figure 2 shows the modeling process of TF-DPSO model.

3 3.1

Empirical Analysis Data

The monthly container throughput time series of Shanghai and Tianjin Ports have been used in our experiments. The data are downloaded from the Ministry of Transport of the People’s Republic of China (http://www.moc.gov.cn/). For both Ports, we take the monthly data from January 2001 to December 2011, and there are 132 observations. For convenience of modeling, the data from January 2001 to December 2010 are used for the training set (120 observations), and the remainder are used as the test set (12 observations) for evaluating the performance of prediction. Finally, we forecast 12 out-of-sample data points, i.e., the container throughput in 2012. Further, we regard the monthly container throughput time series of other six ports as the source time series, including Dalian Port, Qingdao Port, Xiamen Port, Shenzhen Port,

188

XIAO JIN, et al.

Guangzhou Port, and Ningbo Port. The former five time series are from January 2001 to December 2011, and the last one is from January 2006 to December 2011. These ports are the largest 8 container throughput ports in China at present. Their development trend of container throughput is very similar with each other, which can ensure the effect of transfer learning to some extent. It is worth noting that the container throughput time series of Tianjin Port is also regarded as the source time series when forecasting the container throughput of Shanghai Port, and vice versa.

Source domain time series X1üXM

Target domain initial training set Train

Compute the similarity between the target series and each source series

Transfer some source series with highest similarity to Train Generation i = 0 Initialize the population, the best position Pb of each particle and Gb of initial population Population individual j = 1 Update the velocity and position of the j-th particle Compute the values of F and k in the particle Generate all candidate patterns with length k in Train Calculate the similarity and select F most similar candidate patterns Combine the continuations of the selected patterns to get the combination forecasting in Test Calculate the fitness value of the j-th particle, and update Pb and Gb if necessary j=j+1

i=i+1

Yes Yes

j