Predicting the direction of stock market prices using Ensemble Learning

0 downloads 0 Views 790KB Size Report
Investments in stock markets involve very high risk due to its complexity and ... Analysis, 2) Time Series Forecasting, 3) Machine Learning and 4) Modeling and ...
Predicting the direction of stock market prices using Ensemble Learning By Luckyson Khaidem Snehanshu Saha Sudeepa Roy Dey

PROBLEM STATEMENT I

Investments in stock markets involve very high risk due to its complexity and dynamic nature.

I

Many variables influence the market value in a particular day such as economic condition, investor’s sentiment etc. Because of this, stock markets are susceptible to quick changes, causing random fluctuations in the stock price.

I

Market risk is positively correlated with forecasting error. And hence forecasting error needs to be minimized to ensure minimal risk in investment.

I

Errors in forecasting can be minimized by treating the problem of stock forecasting as a classification problem.

I

Design an intelligent system using Machine Learning techniques that learns from the market data and proposes an optimized trading strategy to investors

EXISTING APPROACHES AND RESULTS I

Researchers have used a wide variety of approaches.

I

Among the major methodologies used are: 1) Technical Analysis, 2) Time Series Forecasting, 3) Machine Learning and 4) Modeling and Predicting volatility of stocks using differential equations.

I

Some of the machine learning algorithms that have been used are SVM, Neural Network, Linear Discriminant Analysis, Linear Regression, KNN, Naive Bayesian Classifier etc.

I

Some of the existing approaches have not taken the non linearity of the problem into consideration and hence, use of linear discriminant type machine learning algorithms is futile

I

These algorithms have been able to achieve accuracy results in the range 60-70%.

Proposed Approach Data Collection

Exponential Smoothing

Feature Extraction

Random Forest

Stock Market Prediction Figure 1: Proposed Methodology

RESULTS ACHIEVED

Figure 2: Output from Apple Inc. Data set

Figure 3: Output from GE Data set

RESULTS ACHIEVED

Figure 4: ROC curve corresponding to Apple dataset

RESULTS ACHIEVED

Figure 5: ROC curve corresponding to GE Data set

RESULTS ACHIEVED

Figure 6: Time Window vs Accuracy for 3M stock data

WHY RANDOM FOREST ?

Figure 7: Test For Linear Seperability

WHY RANDOM FOREST ?

I

Stock data is inherently non linear in nature

I

Random Forests can learn highly irregular data

I

Random Forests can classify large amounts of data with high accuracy

I

Random Forests are natural candidate for parallelization since it comprise of highly de-correlated decision trees.

I

Random Forests converge as the number of trees in the ensemble increase

ERROR BOUND I I

I

I

Random Forests have upper error bounds. Define margin function mr (X , Y ) = Pθ (h(X , θ) = Y ) − maxj6=Y (Pθ (h(X , θ) = j)) (1) Strength of the forest is defined as the expected value of margin s = Ex,y mr (X , Y ) (2) Generalization error is bounded above by Chebychev’s inequality as Error = PX ,Y (mr (X , Y ) ≤ 0) ≤ var (mr )/s 2

I

(3)

Chebychev’s inequality: Let X be any random variable and C > 0. Then, P(|X − E (X )| ≥ C ) ≤

var (X ) C2

(4)

OOB ERROR AND CONVERGENCE

Figure 8: OOB error rate vs Number of Estimators

OUTCOME

I

A paper on this topic, co-authored by Dr. Snehanshu Saha and Mrs. Sudeepa Roy Dey, has been submitted to the journal of Applied Mathematical Finance.