The Enhanced Classification for the Stock Index ... - Science Direct

Available online at www.sciencedirect.com

ScienceDirect Procedia Computer Science 91 (2016) 284 – 286

Information Technology and Quantitative Management (ITQM 2016)

The enhanced classification for the stock index prediction Hyeuk Kim*, Sang Tae Han Department of Applied Statistics, Hoseo University, Asan-si, Chungcheongnam-do, 31499, Republic of Korea

Abstract It is one of the hardest challenges to predict the movement of the stock price. We propose the modified bootstrap method in random forests to predict the direction of movement of the stock index price. The training set generated by the modified bootstrapping considers the impact of response variable simultaneously and is applied in random forests. The real KOSPI data are used for the experiments and the result shows that the proposed method performs better than the original method in various situations. by Elsevier B.V. This is an openB.V. access article under the CC BY-NC-ND license © 2016 2016Published The Authors. Published by Elsevier (http://creativecommons.org/licenses/by-nc-nd/4.0/). Selection and/or peer-review under responsibility of the organizers of ITQM 2016 Peer-review under responsibility of the Organizing Committee of ITQM 2016

Keywords: Bootstrap; Random forests; Stock price prediction

1. Introduction In the stock market, it is essential for the trader to predict the movement of the stock price. He makes a profit when he predicts the direction of movement of the stock price correctly and makes a loss when he predicts it oppositely. Therefore predicting stock performance is a very large and profitable area of study. Many companies have developed stock predictors based on statistical models. However it is very hard to predict the movement of the stock price because the characteristics of stock market are dynamic, nonlinear, complicated, nonparametric, and chaotic [1]. There are four approaches for predicting the stock price: technical analysis, fundamental analysis, traditional time series forecasting, and machine learning method. In this paper, we predict the stock price in machine learning method. The first three methods have several disadvantages compared to machine learning method. First, it is hard to develop a new model for prediction since they have already been investigated for a long time. Secondly they have been unsuccessful in predicting the stock price so far due to the nonparametric and chaotic stock market.

* Corresponding author. Tel.: +82-41-540-5905 ; fax: +82-41-540-5908 . E-mail address: [email protected].

1877-0509 © 2016 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the Organizing Committee of ITQM 2016 doi:10.1016/j.procs.2016.07.077

285

Hyeuk Kim and Sang Tae Han / Procedia Computer Science 91 (2016) 284 – 286

We propose the modified training set for learning the stock price data and we use this training set in random forests [2]. Random forests is one of the most powerful classification methods. This paper is organized as follows. In section 2, we explain the independent variables for predicting the direction of movement for stock index price and introduce the proposed method. We compare the performances between the proposed training set and the ordinary training set for real KOSPI data set in section 3. We end this article with a conclusion in section 4. 2. The proposed method There are two directions such as ‘Up’ and ‘Down’ for predicting the stock price in classification view. Even though the target variable belongs to categorical data, it is reasonable to distinguish a big change of movement from a little change in movement in any direction. It is important to predict the direction of movement in stock price since the ultimate purpose of the stock price prediction is to make a profit. The instances with little movement around 0 are not valuable since they are very hard to predict correctly and make overfitting. Ensemble method usually uses the bootstrapped data set for the training set. In bootstrapping, we give more weights to the instances with big changes and less weights to the instances with little changes. The weight is determined based on the degree of the change of stock index and used as the probability for selection in bootstrapping. 3. Experiments We experiment with financial data set to predict the movement of KOSPI (Korea composite stock price index). We choose the following variables as the independent variables among many variables which influences KOSPI. Table 1. a list of the independent variables Category KOSPI Dow Jones Foreign exchange Commodity

Variables close price, open price, high price, low price, trading volume, training amount close price, open price, high price, low price KRW-USD, KRW-JPY, KRW-CNY, KRW-EUR, USD-JPY, USD-CNY, USDEUR oil, gold

All variables are changed into the log ratio between two consecutive days to remove the trend. The degree of change of the close price is used for the weight in the weighted bootstrap in the training set and it is used in the enhanced random forests. We compare the performance of the enhanced random forests with the performance of the ordinary random forests for various time periods. The experiments are repeated twenty times and compared by the average accuracy. Table 2. Performance comparison in KOSPI Training Set

Test Set

Random Forests

Enhanced Random Forests

p-value

2012~2013

2014

0.599 (0.011)

0.612 (0.009)

0.00

2011~2012

2013

0.590 (0.012)

0.596 (0.010)

0.01

2010~2011

2012

0.589 (0.016)

0.621 (0.012)

0.00

286

Hyeuk Kim and Sang Tae Han / Procedia Computer Science 91 (2016) 284 – 286

2009~2010

2011

0.550 (0.009)

0.566 (0.007)

0.00

2008~2009

2010

0.571 (0.012)

0.577 (0.010)

0.01

2011~2013

2014

0.607 (0.012)

0.616 (0.010)

0.00

2010~2012

2013

0.600 (0.010)

0.589 (0.012)

0.00

2009~2011

2012

0.625 (0.011)

0.616 (0.007)

0.00

2008~2010

2011

0.556 (0.010)

0.573 (0.011)

0.00

2007~2009

2010

0.575 (0.009)

0.594 (0.009)

0.00

2010~2013

2014

0.617 (0.011)

0.628 (0.009)

0.00

2009~2012

2013

0.580 (0.011)

0.598 (0.010)

0.00

2008~2011

2012

0.621 (0.015)

0.638 (0.011)

0.00

2007~2010

2011

0.532 (0.011)

0.563 (0.008)

0.00

2006~2009

2010

0.605 (0.014)

0.603 (0.009)

0.32

The proposed method shows the better performance than the ordinary method in 12 out of 15 cases. One case shows that there are no differences between two methods. 4. Conclusion and Future work The stock price movement has the different impact based on the degree of change in price. We modify the bootstrap by assigning the different weights based on the degree of the price change. The proposed method can be applied in any ensemble methods such as bagging [3] and boosting [4, 5, 6] in classification. The following topics will be included in future research. We can compare the performances between the proposed method and the original method based on the degree of change. In this work, we just compare the performances based on the average accuracy. Moreover, we can improve the performance by adding more independent variables such as other commodities and the technical indexes.

References [1] Abu-Mostafa, Y. S., Atiya, A. F. Introduction to financial forecasting. Applied Intelligence 1996;6(3):205-213. [2] Breiman, L. Random forests. Machine Learning 2001;45:5-32. [3] Breiman, L. Bagging predictors. Machine Learning 1996;24:123-140. [4] Freund, Y., Schapire, R. Experiments with a new boosting algorithm. Proceedings of International Conference on Machine Learning 1996;148-156. [5] Freund, Y., Schapire, R. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Science 1997;55(1):119-139. [6] Schapire, R. E. The strength of weak learnability. Machine Learning 1990;5:197-227.

The Enhanced Classification for the Stock Index ... - Science Direct

The Enhanced Classification for the Stock Index ... - Science Direct

Suggest Documents

CHANGES IN THE CONDITION INDEX FOR ... - Science Direct

Formal Criteria for the Classification of Service Based ... - Science Direct

Texture-based Classification for the Automatic Rating ... - Science Direct

The Dividend Premium in the CEE Stock Market - Science Direct

Liquidity of the European Stock Markets Under the ... - Science Direct

The Effect of Stock Valuation on the Company's ... - Science Direct

The Effect of Stock Valuation on the Company's ... - Science Direct

A Network Analysis of the Greek Stock Market - Science Direct

Stock Market Efficiency and the MACD. Evidence from ... - Science Direct

Japanese stock price reactions to stock dividend ... - Science Direct

Enhanced diastereoselectivity in the addition of ester ... - Science Direct

The role of teacher in web enhanced learning ... - Science Direct

Application of Enhanced SWOT Analysis in the Future ... - Science Direct

Enhanced hyphal growth and spore production of the ... - Science Direct

Enhanced Electrical Resistivity after Rapid Cool of the ... - Science Direct

Nanoclay Enhanced the Mechanical Properties of ... - Science Direct

Measuring the Creative Province: A Synthetic Index ... - Science Direct

Refractive Index Measurement Using the Laser Profiler - Science Direct

Assessing the Classification Accuracy of Multisource ... - Science Direct

Privacy for the Stock Market - Computer Science

Classification and Prediction of Stock Market Index ... - ScienceDirect

A general approach to the automatic classification of ... - Science Direct

Experimental investigation of chimney-enhanced ... - Science Direct

Enhanced photocatalytic degradation properties of ... - Science Direct