Available online at www.sciencedirect.com
ScienceDirect Procedia Computer Science 91 (2016) 284 – 286
Information Technology and Quantitative Management (ITQM 2016)
The enhanced classification for the stock index prediction Hyeuk Kim*, Sang Tae Han Department of Applied Statistics, Hoseo University, Asan-si, Chungcheongnam-do, 31499, Republic of Korea
Abstract It is one of the hardest challenges to predict the movement of the stock price. We propose the modified bootstrap method in random forests to predict the direction of movement of the stock index price. The training set generated by the modified bootstrapping considers the impact of response variable simultaneously and is applied in random forests. The real KOSPI data are used for the experiments and the result shows that the proposed method performs better than the original method in various situations. by Elsevier B.V. This is an openB.V. access article under the CC BY-NC-ND license © 2016 2016Published The Authors. Published by Elsevier (http://creativecommons.org/licenses/by-nc-nd/4.0/). Selection and/or peer-review under responsibility of the organizers of ITQM 2016 Peer-review under responsibility of the Organizing Committee of ITQM 2016
Keywords: Bootstrap; Random forests; Stock price prediction
1. Introduction In the stock market, it is essential for the trader to predict the movement of the stock price. He makes a profit when he predicts the direction of movement of the stock price correctly and makes a loss when he predicts it oppositely. Therefore predicting stock performance is a very large and profitable area of study. Many companies have developed stock predictors based on statistical models. However it is very hard to predict the movement of the stock price because the characteristics of stock market are dynamic, nonlinear, complicated, nonparametric, and chaotic [1]. There are four approaches for predicting the stock price: technical analysis, fundamental analysis, traditional time series forecasting, and machine learning method. In this paper, we predict the stock price in machine learning method. The first three methods have several disadvantages compared to machine learning method. First, it is hard to develop a new model for prediction since they have already been investigated for a long time. Secondly they have been unsuccessful in predicting the stock price so far due to the nonparametric and chaotic stock market.
* Corresponding author. Tel.: +82-41-540-5905 ; fax: +82-41-540-5908 . E-mail address:
[email protected].
1877-0509 © 2016 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the Organizing Committee of ITQM 2016 doi:10.1016/j.procs.2016.07.077
285
Hyeuk Kim and Sang Tae Han / Procedia Computer Science 91 (2016) 284 – 286
We propose the modified training set for learning the stock price data and we use this training set in random forests [2]. Random forests is one of the most powerful classification methods. This paper is organized as follows. In section 2, we explain the independent variables for predicting the direction of movement for stock index price and introduce the proposed method. We compare the performances between the proposed training set and the ordinary training set for real KOSPI data set in section 3. We end this article with a conclusion in section 4. 2. The proposed method There are two directions such as ‘Up’ and ‘Down’ for predicting the stock price in classification view. Even though the target variable belongs to categorical data, it is reasonable to distinguish a big change of movement from a little change in movement in any direction. It is important to predict the direction of movement in stock price since the ultimate purpose of the stock price prediction is to make a profit. The instances with little movement around 0 are not valuable since they are very hard to predict correctly and make overfitting. Ensemble method usually uses the bootstrapped data set for the training set. In bootstrapping, we give more weights to the instances with big changes and less weights to the instances with little changes. The weight is determined based on the degree of the change of stock index and used as the probability for selection in bootstrapping. 3. Experiments We experiment with financial data set to predict the movement of KOSPI (Korea composite stock price index). We choose the following variables as the independent variables among many variables which influences KOSPI. Table 1. a list of the independent variables Category KOSPI Dow Jones Foreign exchange Commodity
Variables close price, open price, high price, low price, trading volume, training amount close price, open price, high price, low price KRW-USD, KRW-JPY, KRW-CNY, KRW-EUR, USD-JPY, USD-CNY, USDEUR oil, gold
All variables are changed into the log ratio between two consecutive days to remove the trend. The degree of change of the close price is used for the weight in the weighted bootstrap in the training set and it is used in the enhanced random forests. We compare the performance of the enhanced random forests with the performance of the ordinary random forests for various time periods. The experiments are repeated twenty times and compared by the average accuracy. Table 2. Performance comparison in KOSPI Training Set
Test Set
Random Forests
Enhanced Random Forests
p-value
2012~2013
2014
0.599 (0.011)
0.612 (0.009)
0.00
2011~2012
2013
0.590 (0.012)
0.596 (0.010)
0.01
2010~2011
2012
0.589 (0.016)
0.621 (0.012)
0.00
286
Hyeuk Kim and Sang Tae Han / Procedia Computer Science 91 (2016) 284 – 286
2009~2010
2011
0.550 (0.009)
0.566 (0.007)
0.00
2008~2009
2010
0.571 (0.012)
0.577 (0.010)
0.01
2011~2013
2014
0.607 (0.012)
0.616 (0.010)
0.00
2010~2012
2013
0.600 (0.010)
0.589 (0.012)
0.00
2009~2011
2012
0.625 (0.011)
0.616 (0.007)
0.00
2008~2010
2011
0.556 (0.010)
0.573 (0.011)
0.00
2007~2009
2010
0.575 (0.009)
0.594 (0.009)
0.00
2010~2013
2014
0.617 (0.011)
0.628 (0.009)
0.00
2009~2012
2013
0.580 (0.011)
0.598 (0.010)
0.00
2008~2011
2012
0.621 (0.015)
0.638 (0.011)
0.00
2007~2010
2011
0.532 (0.011)
0.563 (0.008)
0.00
2006~2009
2010
0.605 (0.014)
0.603 (0.009)
0.32
The proposed method shows the better performance than the ordinary method in 12 out of 15 cases. One case shows that there are no differences between two methods. 4. Conclusion and Future work The stock price movement has the different impact based on the degree of change in price. We modify the bootstrap by assigning the different weights based on the degree of the price change. The proposed method can be applied in any ensemble methods such as bagging [3] and boosting [4, 5, 6] in classification. The following topics will be included in future research. We can compare the performances between the proposed method and the original method based on the degree of change. In this work, we just compare the performances based on the average accuracy. Moreover, we can improve the performance by adding more independent variables such as other commodities and the technical indexes.
References [1] Abu-Mostafa, Y. S., Atiya, A. F. Introduction to financial forecasting. Applied Intelligence 1996;6(3):205-213. [2] Breiman, L. Random forests. Machine Learning 2001;45:5-32. [3] Breiman, L. Bagging predictors. Machine Learning 1996;24:123-140. [4] Freund, Y., Schapire, R. Experiments with a new boosting algorithm. Proceedings of International Conference on Machine Learning 1996;148-156. [5] Freund, Y., Schapire, R. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Science 1997;55(1):119-139. [6] Schapire, R. E. The strength of weak learnability. Machine Learning 1990;5:197-227.