RESEARCH ARTICLE
Forecasting leading industry stock prices based on a hybrid time-series forecast model Ming-Chi Tsai ID1☯, Ching-Hsue Cheng ID2*, Meei-Ing Tsai2☯, Huei-Yuan Shiu2☯ 1 Department of Business Administration, I-Shou University, Dashu District, Kaohsiung City, Taiwan, 2 Department of Information Management, National Yunlin University of Science and Technology, Douliou, Yunlin, Taiwan ☯ These authors contributed equally to this work. *
[email protected]
a1111111111 a1111111111 a1111111111 a1111111111 a1111111111
OPEN ACCESS Citation: Tsai M-C, Cheng C-H, Tsai M-I, Shiu H-Y (2018) Forecasting leading industry stock prices based on a hybrid time-series forecast model. PLoS ONE 13(12): e0209922. https://doi.org/ 10.1371/journal.pone.0209922
Abstract Many different time-series methods have been widely used in forecast stock prices for earning a profit. However, there are still some problems in the previous time series models. To overcome the problems, this paper proposes a hybrid time-series model based on a feature selection method for forecasting the leading industry stock prices. In the proposed model, stepwise regression is first adopted, and multivariate adaptive regression splines and kernel ridge regression are then used to select the key features. Second, this study constructs the forecasting model by a genetic algorithm to optimize the parameters of support vector regression. To evaluate the forecasting performance of the proposed models, this study collects five leading enterprise datasets in different industries from 2003 to 2012. The collected stock prices are employed to verify the proposed model under accuracy. The results show that proposed model is better accuracy than the other listed models, and provide persuasive investment guidance to investors.
Editor: Alessandro Spelta, Universita Cattolica del Sacro Cuore, ITALY Received: August 11, 2018 Accepted: November 27, 2018
Introduction
Published: December 31, 2018 Copyright: © 2018 Tsai et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: There is two Excel spreadsheet file with data about each experiment conducted for the present work discussed on paper as a Supporting Information ZIP file. Funding: The author(s) received no specific funding for this work. Competing interests: The authors have declared that no competing interests exist.
The prices forecast of stock is the most key issue for investors in the stock market, because the trends of stock prices are nonlinear and nonstationary time-series data, which makes forecasting stock prices a challenging and difficult task in the financial market. Conventional time series models have been used to forecast stock prices, and many researchers are still devoted to the development and improvement of time-series forecasting models. The most well-known conventional time series forecasting approach is autoregressive integrated moving average (ARIMA)[1], which is employed when the time-series data is linear and there are no missing values [2]. Statistical methods, such as traditional time series models, usually address linear forecasting models and variables must obey statistical normal distribution [3]. Therefore, conventional time series methods are not suitable for forecasting stock prices, because stock price fluctuation is usually nonlinear and nonstationary. Further, most conventional time-series models utilize one variable (the previous day’s stock price) only [4], when, there are actually many influential factors, such as market indexes, technical indicators, economics, political environments, investor psychology, and the fundamental
PLOS ONE | https://doi.org/10.1371/journal.pone.0209922 December 31, 2018
1 / 24
A time-series model for forecasting leading industry stock prices
financial analysis of companies that can influence forecasting performance [5]. In practice, researchers use many technical indicators as independent variables for forecasting stock prices. How to select the key variables from numerous technical indicators is a critical step in the forecasting process. Investors usually prefer to select technical indicators depending on their experience or feelings for forecasting stock prices despite this behavior be highly risky. However, choosing unrepresentative indicators may result in losing profits for investors. Therefore, selecting the relevant indicators to forecast stock prices is one of the important issues for investors. Financial researchers must identify the key technical indicators that have higher relevance to the stock price by indicator selection. Therefore, proposed models must incorporate indicator selection in the stock forecasting process to enhance forecasting accuracy. Recently, there have been many new forecasting techniques used to construct efficient and precise machine learning models, but forecasting stock prices is still a hot topic [6, 7, 8]. To overcome the shortcomings of traditional time series models, nonlinear approaches have been proposed, such as fuzzy neural networks [9, 10, 11, 12], and support vector regression (SVR) [13, 14, 15, 16]. SVR utilizes the minimized structural risk principle to evaluate a function by using the minimized the upper bound of the generalized error [17, 18]. The minimized structural risk principle could get better generalization from limited size datasets [19]. Further, SVR has a global optimum and exhibits better prediction accuracy due to its implementation of the structural risk minimization principle, which considers both the training error, and the capacity of the regression model [15, 20]. Although SVR has shown a great number of experimental results in many applications such as economic and financial predictions, the main problem of SVR is the determination of its parameters, which requires practitioner experience [21]. In the literature, genetic algorithms (GA) have been successfully used in a wide range of problems included machine learning, multiobjective optimization problems and multimodal function optimization [22]. GA is a search algorithm inspired by evolution and is usually used to solve optimization problems. Therefore, proposed models utilize GA to optimize the parameters of SVR and obtain better forecasting performance. From the related work mentioned above, previous studies have shown some drawbacks: (1) Many researches select key technical indicators depending on experiences and ideas [23], (2) Most statistical methods follow some assumptions in different datasets, and obey the statistical distributions [3], (3) Most previous time series models consider only one feature to forecast stock indexes [23], and (4) The parameter of SVR is difficult to determine [24, 25, 26]. This paper proposes a novel GA-SVR time series model based on indicator selection to overcome these problems, and the proposed model contributes the following: (1) In feature selection, this study applies multivariate adaptive regression spline (MARS), stepwise regression (SR), and kernel ridge regression to get the key technical indicators for investors. (2) The proposed model optimizes the parameters of SVR by genetic algorithm (GA) to increase the forecast accuracy. (3) The results could provide persuasive investment guidelines for investors. The remaining contents of this paper are organized as follows. Section 2 describes the related methodology that incorporate the technical indicator, MARS, genetic algorithm, SVR, and stepwise regression. Section 3 presents the proposed algorithm. Section 4 provides the experimental results and comparisons. Conclusion of this paper are explained in Section 5.
Related work This section introduces the related work containing the technical indicator, multivariate adaptive regression splines, genetic algorithm, support vector regression, and briefly stepwise regression.
PLOS ONE | https://doi.org/10.1371/journal.pone.0209922 December 31, 2018
2 / 24
A time-series model for forecasting leading industry stock prices
Technical indicator The technical indicator (TI) is an investment guidance for investors based on evaluating the profits of securities from analyzing trading data of marketing activities, such as past prices and volumes [5]. Stock market data have highly nonlinear, and many researches have focused on the technical indicator to increase the investment return [27, 28]. A technical indicator is a formula, which transfers trading data (open price, the lowest price, the highest price, average price, closing price and volume) into different technical indicators, and try to forecast future prices based on analyzing the past pattern of stock prices [29, 30]. Technical analysis utilizes basic market data, and assumes that the involved factors are included in the stock exchange information [31]. Based on literature review, this paper collected some technical indicators as Table 1. To consider more features, that affect the stock price and volatility, this paper incorporates the microeconomic features that affect the stock price, and these collected factors are listed in Table 2.
Multivariate adaptive regression splines Friedman [46] proposed multivariate adaptive regression splines (MARS), it is a simple nonparametric regression algorithm. The main advantages of MARS is its capacity to grasp the Table 1. Technique indicators. Indicator
Explanation
MA5
MA5ð5 days moving averageÞ ¼ pc þpc current day [32]
MA10
MA10 ð10 days moving averageÞ ¼ pc þpc trading day [32]
5BIAS
The difference between the closing price and MA5, which utilizes the stock price nature of returning back to average price for analyzing the stock trends [31]
10BIAS
The difference between the closing price and MA10, which employs the stock price nature of returning back to average price for analyzing the stock trends [31]
RSI
RSI measures the magnitude of recently gain to recently loss in an trial to determine overbought and oversold conditions of an asset [31]
12PSY
PSY12 (12 days psychological line) = (Dup12/12) � 100, Dup12 is the number of days when price is going up within 12 days [23]
10WMS%R
Williams %R is usually drawn by using negative values. For analysis and discussion, ignore the negative symbols. It is the best to wait the security’s price until change direction before placing your trading [31]
MACD
MACD presents the difference between a fast and slow exponential moving average (EMA) for closing prices. Fast is a short-period average, and slow is a long period one [31]
MO1
MO1(t) = price(t) − price(t − n), n = 1 [33]
MO2
MO2(t) = price(t) − price(t − n), n = 2 [33]
Transaction volume
Transaction volume presents a basic yet very important element of market timing strategy. Volume gives clues for the intensity of given price moving [34]
CDP value
Divide the previous price movement into five values and make the intraday trading decision based on the five value [31]
Exponential Moving Average (EMA)
EMA is defined as a linear transformation of time series to a smoother time series by P1 k xet ¼ l K¼0 ð1 lÞ xt k Where 0 < i subject to ðv � φðxi ÞÞ þ b qi � ε þ x�i > : xi ; x�i � 0; for i ¼ 1; � � � ; n
ð3Þ
The symbols ξi and x�i are two positive slack variables to calculate the error (qi − f(xi)) from 2 the boundaries of the ε–insensitivity zone. ðxi þ x�i Þ denotes the empirical risk, 12 kvk is the structural risk to prevent over-learning and the lack of applied universality and ∁ denotes the regularization constant for specifying the trade-off between the empirical risk and the regularization terms. Based on the sequentially modifying coefficient C, band area width ε, and kernel function K, the optimal parameter can be solved by the Lagrange method [19]. This study based on
PLOS ONE | https://doi.org/10.1371/journal.pone.0209922 December 31, 2018
5 / 24
A time-series model for forecasting leading industry stock prices
Vapnik [18] utilized the SVR-based regression function, and it is defined as XN f ðx; zÞ ¼ ðai a�i ÞKðx; xi Þ þ b i¼1
ð4Þ
where αi and a�i are the Lagrangian multipliers that satisfy the equality ai a�i ¼ 0, αi and a�i � 0. K(x, xi) is the kernel function which represents the inner product hφ(xi), φ(x)i. The radial basis function (RBF) has been widely used as the kernel function, and this study utilizes RBF because of its capabilities and simple implementation [50]. Kðx; xi Þ ¼ exp ð gkxi
2
xj k Þ
ð5Þ
where γ is the RBF width.
Stepwise regression SR is a simple multiple regressions, it establishes a model by adding or removing features based on the statistics of F-test, that is, SR utilized the forward and backward procedures to add or remove features based on F statistics. SR adds the feature to the model if the p-value of variable is less than the given significant level (p < .05), and removes the variable from the model if the p-value of variable is greater than the given significant level [51].
Proposed model This study based on previous studies has found some drawbacks in time series forecast: (1) Based on subjective experiences and opinions to select important technical indicators [23]. (2) Previous methods need to follow some assumptions in different datasets, and obey the statistical distributions [3]. (3) Previous time series models consider only one feature to forecast the stock index [23]. (4) The best SVR parameters are difficult to determine [24, 25, 26]. To overcome these problems, this study is based on our conference paper [52] to extend the proposed methods for solving the forecast problems. That is, this paper proposes a GA-SVR time series model based on feature selection to forecast the leading industry stock price. Hence, the proposed model contributes the following. (1) This study utilizes MARS, SR, and KRR to choose the key technical indicators for investors. (2) Use a GA to optimize the SVR parameters for enhancing the forecast accuracy. (3) The results can provide the investment guidance to investors. The proposed model included three blocks as Fig 1, it can be briefly described as follows: 1. Data preprocessing: this proposed model transforms daily basic stock data (open price, the lowest price, the highest price, average price, closing price and volume) into technical indicators. Then utilizes MARS, SR, and KRR methods to select the key indicators. 2. Modeling: Build a forecast model by using SVR and employs GA to optimize the parameters of SVR. 3. Forecasting: The optimized GA-SVR forecast model is utilized to forecast the stock price, and compare the proposed models with the listing models under the accuracy. For easy computation, this section proposed an algorithm with six steps, the detailed step is described as follows: Step 1: Transform trading data into technical indicators This step collected daily stock trading data (open, close, the highest, the lowest price and volume), and transformed these data into technical indicators [31], such as MA, PSY, RSI,
PLOS ONE | https://doi.org/10.1371/journal.pone.0209922 December 31, 2018
6 / 24
A time-series model for forecasting leading industry stock prices
Fig 1. Research processes of the proposed model. https://doi.org/10.1371/journal.pone.0209922.g001
BIAS, and WMS%R. In addition, this paper also incorporate other indicators, such as exchange rate, NT dollars to US dollars, and the momentum. These technical indicators used are listed in Tables 1 and 2, respectively. Step 2: Select key features (MARS, SR, and KRR) From step 1, the collected data has been transformed into technical indicators; this step utilized MARS, SR, and KRR to choose the key indicators. For removing collinearity, this step
PLOS ONE | https://doi.org/10.1371/journal.pone.0209922 December 31, 2018
7 / 24
A time-series model for forecasting leading industry stock prices
also run SR multi-collinearity to eliminate the high multi-collinearity indicators. For comparing the three feature selection methods, this study selected the number of features are as similar as possible. Step3: Construct the SVR forecast model To build the SVR forecast model, this step used the selected features as input features, and the RBF function is used as the kernel function, due to it can handle the nonlinear and highdimensional data. To build the forecasting model, three parameters that should be set: the loss function ε, the regularization constant C, and the RBF width σ. To obtain a better forecast model, this step utilizes a genetic algorithm to optimize these parameters. Step4: Optimize the SVR parameters by GA at minimal RMSE To get better forecasting accuracy, this step employed genetic algorithm to optimize the SVR parameters C and σ under minimal RMSE (Eq 6) for training dataset. The RMSE is defined as: rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 Xn 2 RMSE ¼ ðyt ^y t Þ t¼1 n
ð6Þ
where yt is the real stock index, y^t is the forecasted stock price, and n is the number of records. Step 4 has six sub-steps, which is described the operation of the GA processes as follows: Step4.1.1: Initialize the parameter for GA. The initial population was set 80 individual solutions, and randomly generated in this substep, the SVR parameters are encoded into a chromosome by a binary string. In addition, the maximal generations, the crossover probability, the mutation probability, and are given as 2000, 0.8, and 0.08 [53] respectively. Step4.1.2: Evaluate fitness. This sub-step uses a pre-defined fitness function (RMSE) to evaluate fitness of each chromosome for determining the goodness of fit for each solution. Step4.1.3: Check the stopping criterion This sub-step sets the stopping rule: If one of the two conditions is got, then the GA process is stopped: 1. The maximal number of generations is reached (2000). 2. The optimal solution is smaller than the given minimal RMSE, the minimal RMSE is set as 10−5. If the criterion is not achieved then repeatedly re-run a new iterative process (Step 4.1.2 to 4.1.5). Step4.1.4: Select the parents by the fitness function Selection is to screen out the fit chromosome to be copies for increasing the offspring sharing and eliminating the poorer chromosome for decreasing the offspring sharing. Roulette wheel selection is employed to select the chromosomes for reproduction. Step4.1.5: Perform crossover and mutation
PLOS ONE | https://doi.org/10.1371/journal.pone.0209922 December 31, 2018
8 / 24
A time-series model for forecasting leading industry stock prices
Re-unite the parents to produce and mutate offspring, this step one-point crossover is used. Then, it selected randomly a member of the population, and changed one randomly selected bit in its bit string representation [3]. Step5: Forecast the stock price by using the optimized models From Step 4, the SVR parameters were determined. the testing data is applied to predict the next day’s stock prices by using the optimized forecast model. Step6: Compare the performance For evaluating the forecast accuracy, the propoesed model will be compared with the listing models under the RMSE criterion. The seven comparison models are as follows: (1) integrated KRR and SR (KRR-SR model), (2) integrated KRR and MARS (KRR-MARS model), (3) integrated KRR and GA-SVR (KRR-GA-SVR model), (4) integrated SR and MARS (SR-MARS model), (5) integrated SR and KRR (SR-KRR model), (6) integrated MARS and SR (MARS-SR model), and (7) integrated MARS and KRR (MARS-KRR model). Where KRR-SR model denotes using SR to build the forecast model after selecting the features by KRR, similarly other combined models have the same meanings, and the detailed abbreviations are presented in Table 3.
Experiment and comparisons This study employs Taiwan’s stock as experimental datasets, the selected companies are different leading industries from “business today (www.businesstoday.com.tw)” which published the 1000 largest companies from Mainland China, Taiwan, and Hong Kong. The experimental datasets including Chunghwa Telecom (CHT), China Steel, Hon Hai, Cathay Financial Holdings and Taiwan Semiconductor Manufacturing Company (TSMC), were practically collected from 2003 to 2012. To compare the accuracy of a long test period forecast and short test period forecast, this study implements two experiments for each dataset, and the two experimental designs are listed in Table 4. Table 3. The meaning of model abbreviation. Abbreviation
Method
MARS
Multivariate adaptive regression splines
GA
Genetic algorithm
SR
Stepwise Regression
KRR
Kernel Ridge Regression
GA-SVR
Applying Genetic algorithm to optimize the Support vector regression parameters
KRR-SR
Employing KRR to select the key features and bulid model by SR
KRR-GA-SVR
Employing KRR to select the key features and bulid model by GA-SVR
KRR-MARS
Employing KRR to select the key features and bulid model by MARS
SR-KRR
Employing SR to select the key features and bulid model by KRR
SR-MARS
Employing SR to select the key features and bulid model by MARS
SR-GA-SVR (proposal model B)
Employing SR to select the key features and bulid model by GA-SVR
MARS-KRR
Employing MARS to choose the key features and bulid model by KRR
MARS-SR
Employing MARS to choose the key features and bulid model by SR
MARS-GA-SVR (proposal model A)
Employing MARS to choose the key features and bulid model by GA-SVR
https://doi.org/10.1371/journal.pone.0209922.t003
PLOS ONE | https://doi.org/10.1371/journal.pone.0209922 December 31, 2018
9 / 24
A time-series model for forecasting leading industry stock prices
Table 4. The experiment of the long and short test period. Experiment
Training period
Testing period
Short test period
2003 to 2011 year
2012
Long test period
2003 to 2009 year
2010 to 2012 year
https://doi.org/10.1371/journal.pone.0209922.t004
First, this study conducts an initial experiment to explore the performance of the GA-SVR model. The forecasting performance of the GA-SVR is compared with SR, KRR, and MARS and the results are shown in Table 5. From Table 5, we can see that the GA-SVR model generates the smallest RMSE by the CHT, China Steel and Hon Hai datasets. Therefore, this study combines feature selection with the GA-SVR model as the proposed forecasting model. Second, this study sets the parameters of different forecasting models for the following experiment. In the parameter settings for the MARS model, the training data is utilized to build the MARS model, the maximal number of BFs of the MARS model’s parameter is set as 2000 and the other parameters are set as default [54]. For the KRR model, the parameter lambda for Tikhonov regularization of kernel ridge regression is set as 0.001 to build the forecasting model. In the SR model, the training data is employed to build the forecasting model, the high variance inflation factors (VIF) that are higher than 10 are removed first. Based on the initial experimental result (GA-SVR model performs better than the other models), this paper combines the GA-SVR model and feature selection method as the proposed model. Then, this study proposes model A and model B based on different feature selection methods. Model A uses MARS to select features, and model B utilizes stepwise regression as the feature selection method. In comparison, this study selects the same number of features for different feature selection methods. The selected features by MARS, SR and KRR are listed in Table 6. After finding the key features, this study constructs the forecast model by SVR and optimizes the parameters of the MARS-GA-SVR and SR-GA-SVR models by GA, the optimized parameters are listed in Table 7.
Experimental results In the section, this study verifies the performance of the proposed model by using five different industry datasets including the Chunghwa Telecom datasets (CHT), China steel datasets, Hon Table 5. The initial performance comparisons for three companies in RMSE. company
model
Test period (year) 1
CHT
SR
0.0662
KRR
15.08679
MARS GA-SVR China Steel
SR KRR MARS GA-SVR
Hon Hai
SR KRR MARS GA-SVR
3 0.7664 34.3449785
0.1435
2.3455
0.001162354
0.00329521
0.25348
0.34462
3.91307785
4.16090037
0.24370
0.32520
0.000651508
0.000922785
1.89469 4.27619926
2.20621 12.5219447
1.9228
2.151
0.002353294
0.006689822
https://doi.org/10.1371/journal.pone.0209922.t005
PLOS ONE | https://doi.org/10.1371/journal.pone.0209922 December 31, 2018
10 / 24
A time-series model for forecasting leading industry stock prices
Table 6. Selected features by MARS, SR and KRR for five companies. Company
Method
Indicator1
Indicator2
Chunghwa Telecom
MARS
CDP value
MO2 MO1
China Steel
Hon Hai
Cathay Financial Holdings
TSMC
Indicator3
Indicator4
FBBQ
SR
CDP value
KRR
Sale Month
P/E
MARS
Daily price change
TAIEX index
FBAQ
SR
CDP value
MO2
MO1
-
KRR
10BIAS
FBBQ
P/E
-
MARS
FBBQ
ln(ROI)
Transaction volume ln(ROI)
SR
EMA
MACD
KRR
Sale Month
RSI6
MO1
MARS
ROI
FBBQ
CDP value
SR
5BIAS
MA5
MO2
KRR
ROI
TAIEX -Daily index change
TAIEX index
MARS
MO2
FBAQ
FBBQ
SR
CDP value
MO2
MO1
KRR
10W%R
MA15
EMA
Note: FBAQ denotes The Final Best Ask Quote; FBBQ represents The Final Best Bid Quote https://doi.org/10.1371/journal.pone.0209922.t006
Hai datasets, Cathay Financial Holdings datasets, and Taiwan Semiconductor Manufacturing Company datasets. The CHT datasets are employed in the first experiment; the computational process follows the proposed algorithm in Section 3. The predictions of the MARS-GA-SVR and SR-GA-SVR models are demonstrated in Fig 2. Fig 2 shows that the long test period results have more overlap between the forecast line and real closing price line than the results in the short test period in model A (MARS-GA-SVR). For proposed model B (SR-GA-SVR), there is more overlap between the forecast line and real close price line in the short test period results (in Fig 2). The RMSE for the listing models and proposed models, are shown in Table 8, and the results show that the performances of proposed models are better than other models. In the short test period, the proposed model B (SR-GA-SVR) generates the smallest RMSE in listing models as Table 8. Moreover, the proposed model A (MARS-GA-SVR) generates the smallest RMSE in the long test period (in Table 8). Table 9 and Fig 3 illustrate the numerical results for the China Steel datasets. From Fig 3, the result of model A shows that the real closing price line and the forecast line in the long test period have more overlap than in the short test period. Similarly, compared to the results in the long test period, the results generated by model B have more overlap between closing price line and the forecast line in short test period. From Table 9, the proposed models generate the smallest RMSE in the listing models. In the short test period, the proposed model B generates the smallest RMSE. In addition, the proposed model A generates the smallest RMSE in the long period. Experiments on the Hon Hai datasets are presented in Table 10 and Fig 4. Fig 4 shows the excellent performance of the proposed model (the forecast line almost completely overlaps the closing price line). Form Table 10, model A and model B generate the smallest RMSE in the long training period and the short training period, respectively. Table 11 and Fig 5 show the experiments for the Cathay Financial Holdings datasets. In Fig 5, the numerical results clearly show that the forecast line deviates from the closing price line. From Table 11, the KRR-GA-SVR model generates the smallest RMSE in the short test
PLOS ONE | https://doi.org/10.1371/journal.pone.0209922 December 31, 2018
11 / 24
A time-series model for forecasting leading industry stock prices
Table 7. The optimal parameters of GA searching for five companies. Company
Method
Training period(year)
ε
C
σ
Training RMSE
Chunghwa Telecom
Proposed model A
9
0.2
85.115
621.018
0.10075
7
0.2
85.115
621.018
0.08261
9
0.4
49.157
332.041
0.09963
7
0.5
95.012
525.560
0.08395
9
0.1
94.601
687.127
1.11637
7
0.2
96.738
988.762
1.10806
9
0.5
60.4713
0.0253
0.25387
7
0.2
87.2495
9.356
0.20543
9
0.4
53.6446
6.1135
0.30688
7
0.1
6.5271
0.6631
0.58843
9
0.5
51.7702
2.7261
0.504085
7
0.3
25.0826
52.8511
0.31624
9
0.1
13.2041
134.089
0.26024
7
0.4
74.9381
836.899
0.09486
9
0.3
17.8872
328.5681
0.11489
7
0.5
67.8872
782.3546
9
0.3
26.6437
44.2146
10.8103
7
0.3
45.8883
20.0447
13.0136
9
0.5
58.5778
3.428
0.97916
7
0.1
60.9315
2.9168
1.04814
9
0.5
25.0216
0.834
1.04227
7
0.2
37.1928
2.604
0.94704
9
0.2
15.4725
42.8472
1.69138
7
0.3
5.9417
38.8681
1.29369
9
0.3
12.3139
4.3282
0.99827
7
0.4
25.2229
3.4814
1.0201
9
0.3
72.1026
1.1468
0.98032
7
0.5
26.0217
1.3375
0.98209
9
0.3
97.3413
731.4384
3.55611
7
0.4
88.6254
48.0336
3.72557
Proposed model B KRR-GA-SVR China Steel
Proposed model A Proposed model B KRR-GA-SVR
Hon Hai
Proposed model A Proposed model B KRR-GA-SVR
Cathay Financial Holdings
Proposed model A Proposed model B KRR-GA-SVR
TSMC
Proposed model A Proposed model B KRR-GA-SVR
0.09896
https://doi.org/10.1371/journal.pone.0209922.t007
period, and the proposed model B in the long test period generates a smaller RMSE than other models. Next, experiments on the TSMC datasets are illustrated in Table 12 and Fig 6. The forecast results in Fig 6 show that the forecast line obviously deviates from the closing price line. From Table 12, the proposed model B generates the smallest RMSE in the short and long test periods.
Significance test To test whether proposed model is superior to the KRR-MARS, KRR-SR, KRR-GA-SVR, MARS-SR, MARS-KRR, SR-MARS and SR-KRR models in the stock price forecasting, this study applies the Wilcoxon signed-rank test. We use RMSE to test the significance between the proposed model and the listed models. Tables 13 and 14 present the Z statistic of the two-tailed Wilcoxon sign test between proposed models and the listed models. From Table 13, the proposed models have a significant difference (p