Predicting the direction of Indonesian stock price

Predicting the direction of Indonesian stock price movement using support vector machines and fuzzy Kernel C-Means Z. Rustam, D. F. Vibranti, and D. Widya

Citation: AIP Conference Proceedings 2023, 020208 (2018); doi: 10.1063/1.5064205 View online: https://doi.org/10.1063/1.5064205 View Table of Contents: http://aip.scitation.org/toc/apc/2023/1 Published by the American Institute of Physics

Articles you may be interested in Application of SVM-KNN using SVR as feature selection on stock analysis for Indonesia stock exchange AIP Conference Proceedings 2023, 020207 (2018); 10.1063/1.5064204 Predicting the Jakarta composite index price using ANFIS and classifying prediction result based on relative error by fuzzy Kernel C-Means AIP Conference Proceedings 2023, 020206 (2018); 10.1063/1.5064203 Classification of cancer data using support vector machines with features selection method based on global artificial bee colony AIP Conference Proceedings 2023, 020205 (2018); 10.1063/1.5064202 Application of support vector machines for reject inference in credit scoring AIP Conference Proceedings 2023, 020209 (2018); 10.1063/1.5064206 Comparison of fuzzy robust Kernel C-Means and support vector machines for intrusion detection systems using modified kernel nearest neighbor feature selection AIP Conference Proceedings 2023, 020215 (2018); 10.1063/1.5064212 Comparison between support vector machine and fuzzy Kernel C-Means as classifiers for intrusion detection system using chi-square feature selection AIP Conference Proceedings 2023, 020214 (2018); 10.1063/1.5064211

Predicting the Direction of Indonesian Stock Price Movement Using Support Vector Machines and Fuzzy Kernel C-Means Z. Rustam a), D. F. Vibranti, and D. Widya Department of Mathematics, Faculty of Mathematics and Natural Sciences (FMIPA), Universitas Indonesia, Depok 16424, Indonesia a)

Corresponding author: [email protected]

Abstract. The nature of stock price fluctuations becomes a challenge for the investors to gain return in investing stocks. To overcome this problem, investors need some accurate predictions in order to anticipate future stock price movement. However, predicting the direction of stock price movement is a complex task due to many uncertain factors affecting the stock price itself. Therefore, this paper studied the application of Support Vector Machines and Fuzzy Kernel C-Means in predicting the direction of stock price movement of Indonesian stock market, particularly on banking subsector. Using the stock historical data, eight technical indicators have been computed to obtain two different approaches for the input model. One of them use the computed indicators while the other process the computed indicators into trends. The results suggest that, in general view, Support Vector Machines with technical indicators represented as trend being the input model outperforms the other prediction models. However, in particular condition, the best model of the entire observation with 92 % accuracy is given by FKCM with computed indicators as the input by using ! = 100 and 90 % training data. Keywords: Fuzzy Kernel C-Means, prediction, stock price movement, Support Vector Machines, technical indicators

INTRODUCTION Every human being has an obligation to meet their basic needs in order to live a life. Investing stocks become a promising option for most people to ensure the fulfillment of their future needs. But it should be emphasized that stock is also known as a high-risk high-return investment instrument [1]. Therefore, stock investments do not always generate profit that matches our expectations. Due to fluctuating behavior of stock prices, investor should be guided by an accurate prediction on the direction of stock price movement with an eye to achieving the desired returns. One way to predict stock price trends is through technical analysis which makes use of specific indicators. Technical indicators are the key tools in overseeing stock price movement and helping investors to make trading decisions [2]. This happens as a result of the ability of technical indicators to utilize information about stock prices in the past to read stock price movement in the future [3]. A total of eight indicators are used in this experiment, and then each of them is processed into two types of inputs, namely continuous and discrete type of inputs. Indicators that have been calculated are considered as continuous type of input. While the discrete type of input obtained from the transformation of technical indicator values into trends using certain criteria owned by each technical indicators [4]. With the help of technical indicators, there are many prediction methods implemented in the research articles which predict the stock price movement. Most of them using some methods that are classified as machine learning to predict future circumstances [5]. In this study, two models of machine learning applied to predict stock price movements are Support Vector Machines and Fuzzy Kernel C-Means. Both models utilize kernel trick to overcome stock price data that is not linearly separable.

Proceedings of the 3rd International Symposium on Current Progress in Mathematics and Sciences 2017 (ISCPMS2017) AIP Conf. Proc. 2023, 020208-1–020208-7; https://doi.org/10.1063/1.5064205 Published by AIP Publishing. 978-0-7354-1741-0/$30.00

020208-1

TABLE 1. Selected Technical Indicators used in This Paper Technical Indicators

Formulas

!! − !!!!

Momentum

! !!! !!

Simple Moving Average

! !! ×! + !"#!!! × 1 − ! , !>! !"#!" − !"#!" 100 100 − 1 + !!! !" /! / !!! !!! !!! !!! !"!!! /! !! − !!" ×100 !!" −!!"

Exponential Moving Average Moving Average Convergence/Divergence Relative Strength Index Stochastic K%

! !!! %!!

Stochastic D%

×100 ! !! − !!!! !!!! − !!!!

Williams %R

!! is the close price of day t, n represents number of trading days used, !"#!!! means Exponential Moving Average value one day before, ! is the weight coefficient, !"!!! is an increasing price change and !"!!! is a decreasing price change, %!! represent the Stochastic K% value of day i, !!!! and !!!! are the highest and lowest price for the last x days, respectively.

Therefore, the objective of this study is to predict the direction of the Indonesian stock market movement, especially PT. Bank Rakyat Indonesia, Tbk (BBRI) stock, using Support Vector Machines and Fuzzy Kernel CMeans. This study focuses on comparing the prediction performance of both prediction methods by considering two different types of inputs to each method. The remaining portion of this paper proceeds as follows. A discussion about related works used in this paper is given in the next section. Then, section after that explain research data and its processing procedure. Later on, the experimental results are reported. Eventually, the last section contains conclusions of the entire experiments.

RELATED WORKS This section discusses a brief explanation about technical indicators, kernel function and some methods used in this paper, namely Support Vector Machines and Fuzzy Kernel C-Means.

Technical Indicators Technical analysis is the process of evaluating stocks, which aims to determine price trends in the future [6]. Technical indicators, some mathematical formulation that make use of historical stock price data, is used as a tool in analysis technical. Referring to the previous study [4, 7], a number of eight technical indicators used in this paper along with its specific formulation are illustrated in Table 1.

Kernel Method Kernel method is a method used as a tool in applying linear problem solver into a nonlinear problem [8]. This is done by a nonlinear mapping !: ! → !(!) which maps any data point x in original data space ! ! into feature space F. Since we do not need to know the explicit form of the concrete mapping function, kernel function is applied. The kernel function used in this paper is Gaussian Radial Basis kernel function, defined as [9]: ! !! , !! = !"# −

!! !!! !! !

020208-2

!

(1)

where ! is a constant parameter of Gaussian Radial Basis function. Some value options for ! parameter are examined in the experiment, that is ! = 0.0001, 0.001, 0.05, 0.1, 1, 5, 10, 50, 100, 1000.

Support Vector Machines Support Vector Machines (SVM), proposed by Vapnik [10], is a supervised learning method used to solve classification problems. In SVM, we search for the maximum margin in order to generate optimal separating hyperplane which separates or classify the data into two different classes. Suppose that given a set of data point !! , !! , ! = 1,2, … , !, where !! ∈ ! ! , is a set of d-dimensional input vectors and !! ∈ −1, 1 , is the class label. In real life problem, the data is not linearly separable. Hence, kernel function ! !! , !! = !(!! )! !(!! ) is used to map the input vectors into a higher dimensional space !(!! ) ∈ ! called feature space [11]. This kernel function is the general form of kernel function represented in Equation 1. Note that the general form of decision boundary is !(!! ) = ! ∗ ∙ !! + ! ∗ , where ! ∗ represent the normal vector of hyperplane. So the resulting decision boundary is represented as follows [12]: ! !!! !! !! . !

! ! = !"#

!! , !! + !

(2)

To find the optimal value of !! , quadratic programming problem is obtained as follows: Maximize

! !!! α!

−

!

! !!!

!

! !!! α! α! . y! y! . K(x ! , x ! )

subject to 0 ≤ α! ≤ C ! !!! α! y!

= 0,

(3) (4)

i = 1, 2, … , N

(5)

where C is the regularization parameter which control the trade-off between margin. The value of C is set as 1 in this experiment.

Fuzzy Kernel C-Means Fuzzy C-Means Fuzzy C-Means (FCM) is a commonly used fuzzy clustering method which is an unsupervised learning method. In this method, an unlabeled set of data is grouped into clusters based on similarities between data [9]. So, for an unlabeled data set ! = {!! , … , !! } ⊂ ! ! , FCM algorithm partitions the data by minimizing an objective function defined as follows [13]: ! !!!

J! (U, V) =

! ! !!! u!"

x ! − v!

!

(6)

where c is the total number of clusters, n is the total number of data, ! is any real number that controls the fuzziness of the clustering, ! > 1, !!" denote the membership degree of !! in !-th cluster satisfying !!!! u! !" = 1, !! is the ddimensional cluster center, and ∗ is any norm represents similarities between data and the cluster center. This ! optimization process is done iteratively with the update of membership degree !!" and cluster center !! are given as follows [13] :

u!" =

! ! !!!

!! !!!

!/(!!!)

, v! =

! ! !!! !!" !! ! ! !!! !!"

(7)

!! !!!

Fuzzy Kernel C-Means Fuzzy Kernel C-Means (FKCM) is an extension of the FCM algorithm to overcome the weaknesses of FCM. The type of data that becomes the input of FCM algorithm will greatly affect the resulting accuracy. This is indicated when the input is nonlinearly separable, its resulting accuracy will be less than satisfactory [13]. In dealing with that deficiencies, it is necessary to map the data from the original data space into a higher-dimensional feature space. Therefore, the kernel method is used.

020208-3

Defined a nonlinear mapping !: ! → !(!) which maps any data point x in original data space ! ! into feature space F. Using that transformation, the Euclidean distance between data and cluster center in Equation 6 can be replaced by a distance measurement between data that has been transformed defined as follows [13]: Φ ! −Φ y

!

= K !, ! + K !, ! − 2K !, !

(8)

Suppose that the kernel function used in this case is the Gaussian Radial Basis function defined in Equation 1, and some value options for ! parameter are also examined. So, from Equation 1, ! !, ! = !"# −

!!! ! !! !

=

!"# 0 = 1. The same changes will occur when different kernel function is applied. By substituting both properties into Equation 8, we can convert Equation 6 into the following form [9] : J! U, V =

! !!!

! ! !!! u!"

x ! − v!

=

! !!!

! ! !!! u!"

! x! , x! + ! v! , v! − 2! x! , v!

= 2

! !!!

! ! !!! u!"

!

1 − ! x ! , v!

(9)

! In minimizing Equation 9 under constraint !!" , we have the update of membership degree !!" and cluster center !! as follows:

u!" =

!

!!! !! ,!!

! !!! !

!!! !! ,!!

! (!!!) ! (!!!)

, v! = (!!!)

The iteration process above will stopped when max !!" criterion and t is the iteration steps.

! ! !!! !!" ! !! ,!! !! ! !! ! ! ,! ! ! !!! !"

(!)

− !!"

(10)

< ! , 0 < ! < 1, where ! is the stopping

RESEARCH DATA In predicting the direction of Indonesian stock price movement, daily stock price historical data of PT. Bank Rakyat Indonesia, Tbk (BBRI) is used. The experimental dataset consists of those historical data in one year period of time, from 01-01-2016 to 30-12-2016, with a total of 254 trading days. As discussed in the technical indicator section on related works, eight technical indicators were used in this study. It should be noted that this paper discussed two different types regarding model inputs. So the daily stock price historical data is processed into both inputs along with their respective criteria and calculations. In the end, performance of each model generated from each method and those input types are evaluated and compared against one another. For more detail, those input types will be explained as follows: a. Continuous type of input Consider the list of technical indicators and their formulas in the Table 1. In this type of input, each technical indicator is computed using the corresponding formula. Then those value are normalized into [-1, 1]. b. Discrete type of input The main idea of this type of input is to turn those computed technical indicators into trends in stock prices by exploiting the certain criteria owned by each technical indicators. It is said to be discrete because it consist of two values, that is ‘-1’ if the trend is down and ‘1’ if the trend is up. This type of input data is adopted form the discussion in the previous study [4]. Explanations regarding the specific properties of each technical indicators that could help investor in describing the future direction of stock price movement are described as follows. For momentum, positive sign of momentum tells that the trend is up, and vice-versa. For moving averages, when stock price in that day is greater than the corresponding both simple and exponential moving average value, then it indicates that the trend is up and vice-versa. For stochastic oscillators such stochastic K%, stochastic D%, and Williams %R, when their plot moves upward, then the price tends to go up, and viceversa. For relative strength index, if the value is less than 30 then the stock price tends to go up, but if the value is greater than 70 then the stock price tends to go down, and if the value lies between 30 and 70 then its trend will follow the stock price trend. For MACD, if the value goes up, then the price is also going up and vice-versa. The two different input data given above are then being processed into SVM and FKCM to generate prediction models. Note that experiments to determine the best parameter values are applied to those prediction models, given

020208-4

76,518

76,518 56,479 72,670 70,725 79,667 71,743 79,367 67,841 79,667 69,752 72,891 63,925 81,646 73,213 79,872 62,641 79,261 74,814 80,133 62,194 79,642 76,962 77,145 62,194 79,261 78,237 78,690 62,194 79,261 77,209 80,468 62,194 79,261

60

39,807 39,758 48,296

80

39,758 39,758 47,308

ACCURACY (%)

100

40 20 0

0,0001

0,001

0,05

0,1

1

5

10

50

100

1000

PARAMETER LEVELS (!) FKCM continuous type input FKCM discrete type input SVM continuous type input

SVM discrete type input

FIGURE 1. Accuracy of SVM and FKCM Method with Continuous and Discrete Input Data Types using Gaussian RBF Kernel Function over Different Value of !

the value option of the parameter listed in kernel method section. In each experiment, every input data is divided into training data and testing data. We use !% of the entire datasets from each input data as training data, while (100 − !)%, ! = 10,20, … , 90 of the entire datasets from each input data as testing data.

EXPERIMENTAL RESULTS Based on the result of the experiment that has been done, the best prediction models using specific Gaussian RBF kernel function parameters listed in kernel method section are obtained for both SVM and FKCM with corresponding input types. This can be seen from the comparison between each model using different parameter values. Figure 1 shows the average accuracy of each percentage of training data used for each following methods with their input data types. It can be seen from Fig. 1 that the best model of FKCM with the continuous type of input is obtained by using ! = 100 with an average accuracy of 78.24%. While the best model of FKCM with a discrete type of input is obtained by using ! = 1000 with an average accuracy of 80.47%. The best model of SVM with continuous type of input is obtained by using ! = 0.05 with an average accuracy of 70.72%. While the best model of SVM with discrete type of input is obtained by using ! = 1 with an average accuracy of 81.65%. If viewed on the basis of the above results, it can be seen that the performance of models using discrete type of input are always better than performance of models using continuous type of input, regardless of what parameter levels being used. This result applies to both SVM and FKCM method. This may happen because models using discrete type of input exploit the nature of technical indicators that are basically able to illustrate the trend of stock price itself, so it helps the system to classify the data correctly. Furthermore, Fig. 2 will show the results of performance evaluation of the models using parameters of corresponding best models mentioned above for each percentage of training data used in the model. As we can see, Fig. 2 displays the performance result of models using parameters of the corresponding best models obtained from the previous results appear in Fig. 1, that we have mentioned in the earlier paragraph. We need to see each of the best models as a unity between its corresponding method, type of input, and parameter value. Hence, we have to use specifically FKCM with continuous input type using ! = 100, FKCM with discrete input type using ! = 1000, SVM with continuous input type using ! = 0.05, and SVM with discrete input type using ! = 1, respectively in this observation. Those apply for each percentage of data training used. If viewed from the overall prediction model, based on the observation on Figure 2, it can be said that the best model with 92% accuracy is given by FKCM with continuous type of input by using ! = 100 and 90% training data.

020208-5

74,834 74,172 73,510 73,510

76,190 82,540 75,397 80,159

81,188 82,178 70,297 82,178

76,000 85,333 68,000 85,333

88,000 78,000 70,000 86,000

92,000 88,000 64,000 88,000

20

79,661 75,706 71,186 81,356

10

73,267 81,188 76,733 81,188

62,996 77,093 67,401 77,093

ACCURACY (%)

100 90 80 70 60 50 40 30 20 10 0

30

40

50

60

70

80

90

PERCENTAGE OF DATA TRAINING USED (%) FKCM continuous type input, σ=100 FKCM discrete type input, σ=1000 SVM continuous type input, σ=0.05

SVM discrete type input, σ=1

FIGURE 2. The accuracy of The Best Models over Percentages of Training Data Used

CONCLUSIONS Based on the analysis of the experimental results above, it can be concluded that the use of discrete type of input have a significant effect on both SVM and FKCM method. Specifically, when viewed based on the average accuracy regardless of training data used by the prediction, it is found that models using SVM method with discrete type of input outperform the other prediction models. However, in particular condition, the best model of the entire observation with 92% accuracy is given by FKCM with continuous type of input by using ! = 100 and 90% training data.

ACKNOWLEDGMENTS This research is funded by HIBAH Publikasi Internasional Terindeks untuk Tugas Akhir (PITTA) Universitas Indonesia 2017.

REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9.

T. T. Zhao and W. Y. Chen, A Two-method Applying Support Vector Machine for Investment Decision, in Proceeding Chinese Guidance, Navigation and Control Conference, Nanjing, 2016 (IEEE, Piscataway, 2017). pp. 1150-1155. R. Dash and P. K. Dash, The Journal of Finance and Data Science (JFDS) 2, 42 (2016). M. Kumar and M. Thenmozhi, Forecasting Stock Index Movement: A Comparison of Support Vector Machines and Random Forest, in Proceedings 9th Capital Markets Conference, Navi Mumbai, 2005 (Indian Institute of Capital Market, Navi Mumbai, 2006). Available at: http://dx.doi.org/10.2139/ssrn.876544 J. Patel, S. Shah, P. Thakkar, and, K. Kotecha, Expert Syst. Appl. 42, 259 (2015). R. Lacomin, Feature Optimization on Stock Market Predictor, in 13th International Conference on Development and Application Systems, Suceava, 2016 (IEEE, Piscataway, 2016). pp. 243-247 S.B. Achelis, Technical Analysis from A to Z, 2nd edition (McGraw-Hill, New York, 2000). Y. Kara, M. A. Boyacioglu, and O. K. Baykan, Expert Syst. Appl. 38, 5311 (2011). L. Liu, B. Shen, and X. Wang, Journal of Computers: Special Issue on Embedded and Multimedia Computing, 25 (1), 12 (2014). D. Vanisri, International Journal of Computer Science and Mobile Computing (IJCSMC) 3 (12), 254 (2014).

020208-6

10. C. Cortes and V. Vapnik, Machine Learning 20, 273 (1995). 11. E. Esme and B. Karlik, Appl. Soft Comput. 46, 452 (2016). 12. Z. Rustam and A. S. Talita, Journal of Theoretical and Applied Information Technology (JATIT) 81, 161 (2015).

020208-7