Automated stock trading and Portfolio optimization using XCS ...

Automated Stock Trading and Portfolio Optimization Using XCS Trader and Technical Analysis

Anil Chauhan [email protected]

Master of Science Artificial Intelligence School of Informatics University of Edinburgh 2008

Abstract Financial market is highly dynamic system for which finding underlying price pattern is highly complex. We have extended the previous work done on automatic stock trading using extended classifier system (XCS) by implementing Q (1) and Q (λ) Reinforcement Learning algorithm. We developed 14 XCS agents using different technical indicators like Moving averages,RSI,CMF,SAR,ADX etc. We showed that by modeling financial prediction as single step reinforcement learning problem and using the concept of delayed reward for checking correctness of action taken, all the benchmarks strategies like buy and hold, 'keeping money in bank' etc could be beaten. We have also shown that stock price movement is co-related with other day price movement and reformulated the financial forecasting as a multi step process. We introduced the concept of passive set and found that multi step problem formulation gives best results. Q learning gave 18% better performance than single step reward only RL. Finally we build a portfolio management and optimization system which learns online and does monthly or quarterly rebalancing using the best trader to trade. The results showed that reacting to the market dynamics doesn’t necessarily give us the best result. We showed that such a system give us average performance between the best trader and the worst trader. We also employed different trading strategies like “using more than 1 best agent” and “mean reversal strategy” to do portfolio optimization.

ii

Acknowledgements I would like to thank my supervisor Sonia Schulenburg for introducing me to the world of Finance and Classifier Systems and for giving constant feed back on my project. I would also like to thank Abu ul Hassan for sharing previous version of XCS java code with me. Many thanks to my friend Santosh for reviewing my initial draft of thesis and sharing ideas on the same.

iii

Declaration I declare that this thesis was composed by me, that the work contained herein is my own except where explicitly stated otherwise in the text, and that thesis work has not been submitted for any other degree or professional qualification except as specified.

(Anil Chauhan [email protected])

iv

Table of Contents 1 Introduction..............................................................................................................................1 1.1 Introduction and Purpose....................................................................................................1 1.2 Motivation...........................................................................................................................2 1.3 Objective ............................................................................................................................3 1.4 Outline................................................................................................................................4 2 Background & Related Work.................................................................................................5 2.1 Background.........................................................................................................................5 2.1.1 Market Efficiency........................................................................................................5 2.1.1.1 Version of Efficient Market Hypothesis (EMH)..................................................5 2.1.2 Technical Analysis.......................................................................................................6 2.1.3 The Portfolio:..............................................................................................................6 2.1.3.1 Why do we need Portfolio?..................................................................................7 2.1.3.2 Portfolio Management:.........................................................................................7 2.2 Related Work......................................................................................................................7 2.2.1 Machine Learning in Finance and Portfolio Management.........................................8 2.3 XCS Introduction from Stock Trading Perspective..........................................................10 2.3.1 XCS Input and Output...............................................................................................11 2.3.2 XCS Frame Work [15]..............................................................................................12 2.3.3 XCS Learning Cycle..................................................................................................13 2.3.3.1 Updating XCS Parameters..................................................................................15 2.3.3.2 Genetic Algorithm role and rule evolution[15]..................................................15 2.3.4 Deviation from other LCS based Systems.................................................................16 2.3.5 Mind of XCS System..................................................................................................17 3 Implementation .....................................................................................................................18 3.1 Technical Analysis Usage in XCS....................................................................................18 3.1.1 Description of individual technical Indicators..........................................................19 3.1.2 Combining different technical indicators and working mechanism of different Agents.................................................................................................................................24

v

3.1.2.1 Composition of 14 Agents:.................................................................................25 3.1.3 Advantage and “Scope of Improvement” of current approach................................26 3.2 Improving the learning of eXtended Classifier System....................................................27 3.2.1 Classifiers in multi step Reinforcement learning problems......................................28 3.2.2 Implementing Q learning in Classifier......................................................................30 3.2.3 Eligibility trace and Watkins's Q(λ) .........................................................................30 4 Experimentation.....................................................................................................................32 4.1 FTSE data and Stability of the XCS System....................................................................32 4.1.1 FTSE Data.................................................................................................................32 4.1.2 Stability of the XCS System.......................................................................................32 4.2 Comparative Study of the 3 different Algorithm..............................................................34 4.2.1 Setting the parameters for the experiments...............................................................34 4.2.1.1 Setting Initial Exploration Rate..........................................................................35 4.2.1.2 Setting discount rate (gamma)............................................................................36 4.2.1.3 Setting Trace Decay Parameter...........................................................................37 4.3 Experimental Results for 3 learning Algorithm................................................................37 4.3.1 Observations:............................................................................................................39 4.4 Fault in the previous Reward giving strategy...................................................................40 4.4.1 Experiments with improved delayed reward Strategy...............................................40 4.4.1.1 Setting Initial Exploration Rate..........................................................................41 4.4.1.2 Setting Discount Rate (gamma)..........................................................................42 4.4.1.3 Setting Trace Decay (λ)......................................................................................42 4.4.2 Results with new delayed reward strategy................................................................43 4.4.2.1 Observations:......................................................................................................45 4.4.3 Experimental Results for all 3 learning algorithm with new delayed reward strategy ............................................................................................................................................48 4.4.3.1 Observations.......................................................................................................51 4.4.4 Fault in Q (1) learning ............................................................................................51 4.4.5 Experimentation with passive Set..............................................................................52 4.4.5.1 Finding optimum parameters..............................................................................52 4.4.5.2 Observations for Passive set...............................................................................55 5 Implementation & Experimentation-Portfolio Optimization............................................58 5.1 Implementation.................................................................................................................59

vi

5.2 Portfolio Performance:......................................................................................................60 5.3 Results...............................................................................................................................61 Portfolio Management Results –.......................................................................................62 5.3.1.1 Observations: Portfolio Management results......................................................63 5.3.1.2 Analysis and Comments on Portfolio Management System..............................63 5.3.2 Change of Portfolio construction Strategy................................................................64 5.3.2.1 Results of Portfolio Management using more than 1 Agent...............................65 5.4 Experimentation with Portfolio Management taking few best companies in the Portfolio ................................................................................................................................................65 5.4.1 Steps Followed..........................................................................................................66 5.4.2 Results.......................................................................................................................66 5.4.3 Observation:..............................................................................................................66 5.4.4 Experimentation with Mean Reversal Strategy.........................................................67 5.4.4.1 Results.................................................................................................................67 5.4.4.2 Observations.......................................................................................................68 6 Conclusion & Future Work..................................................................................................69 6.1 Conclusion .......................................................................................................................69 6.2 Future Work......................................................................................................................71 Bibliography..............................................................................................................................72 Appendix....................................................................................................................................76

vii

List of Figures Figure1 XCS Frame Work [15]...............................................................................................13 Figure 2: Voting Strategy [19].................................................................................................17 Figure3: Q-learning: An off-policy TD control algorithm. [14]...........................................29 Figure4:Back ward view of eligibility trace [14]....................................................................30 Figure5: Tabular version of Watkins's Q (λ) algorithm. [14]...............................................31 Figure6: Mean Performance of 5 companies for different number of runs........................34 Figure7: DSGI Price Chart .....................................................................................................46 Figure8: DSGI, Wealth Chart of agents with old reward strategy......................................46 Figure9: DSGI, Agent’s performance with new delayed reward giving strategy...............47 Figure10: LAND, Price Chart.................................................................................................47 Figure11: LAND, Meta Agents wealth chart with old reward strategy..............................48 Figure12: LAND, Meta Agents wealth chart with new delayed reward strategy...............48 Figure13: Portfolio Management Results...............................................................................62 Figure14: Portfolio management using either best 5 or best 10 or best 20 companies......66 Figure15: Portfolio Management System using trend reversal strategy.............................67 Figure16: Comparison mean reversal strategy with normal strategy.................................68

viii

List of Tables Table1. Composition of different Agents................................................................................26 Table2: Details of 10 FTSE 100 Companies...........................................................................32 Table3: Experimental results for 100 run on 5 FTSE100 companies..................................33 Table4: Setting Exploration rate.............................................................................................35 Table5: Combined Results for Different exploration rate....................................................36 Table6: Setting discount rate...................................................................................................36 Table7: Setting trace decay parameter...................................................................................37 Table8: Experimental Results for 3 learning methodology..................................................39 Table9: Setting Exploration Rate............................................................................................41 Table10: Setting discount rate.................................................................................................42 Table11: Setting trace decay....................................................................................................42 Table12: Comparative results for 90 FTSE 100 companies with delayed reward strategy for single step reward only RL................................................................................................45 Table13: Results of 90 FTSE100 Companies for different RL with new delayed reward strategy.......................................................................................................................................50 Table 14: Comparative Results for FTSE 100 companies using Passive Set......................54 Table15: Comparison with Only active set and with additional passive set approach......55 Table16: Combined Results single step RL without Passive set and multi step Q Learning with passive set..........................................................................................................................57 Table17: Portfolio Optimization using single best Agent....................................................61 Table18: Monthly Portfolio management Using Best 3 agents.............................................64 Table19: Monthly Portfolio Rebalancing using different Number of Best Agents............65 Table 20: Portfolio Management System with quarterly re-balancing taking best 5 companies..................................................................................................................................65

Chapter 1 1 Introduction 1.1 Introduction and Purpose Advances in modern machine learning such as evolutionary computation have enabled us not only to analyze data more efficiently, but also to understand any underlying patterns present in the financial market. This effective exploitation of new computation methods will help organizations to make better informed decisions which will further improve their competitive edge [5]. Many different approaches like Neural Networks (NN), Genetic Algorithms (GA) have been widely applied to predict the financial market. However, for some of these systems the input data might not be as rich in information content as technical indicators such as various types of moving averages, break out rules, maximum and minimum prices in the preceding days or fundamental indicators such as dividends, interest rates and money supply. More recently, the academic world has shown some promise in the area of learning classifier systems (rule based models) by Overcoming some of the most common drawbacks neural models present to practitioners, such as the lack of explanatory power, high variance in results and the need to continually retrain the nets when performance starts to decrease. In addition, very few models have addressed the integration of more than one learning paradigm within a single platform. In this project, first we will try to improve the learning of the classifier system by incorporating different Reinforcement learning algorithm. In all the previous version of eXtended Classifier Systems (XCS), financial forecasting was solely considered to be a single step Reinforcement Learning problem. However, we believe price movement is not completely erratic and prices do follows patterns of ups and downs. Price of any day affects the coming day prices and is in some way co-related. Due to this we will try to model financial forecasting as a multi step process and

1

implement Q (1) and Q (λ) Reinforcement Learning algorithm. Secondly, we will investigate the methods of portfolio construction and portfolio optimization. This will be achieved by using a system which evolves technical trading agents, each learning to trade stocks by modeling groups of traders using a variety of sets of technical indicators. For portfolio construction, the main task will be to build a portfolio management system attached to XCS which picks the best agents that can give the maximum benefit in the long run. For simplicity, the equity market will be the prime focus for this investigation, although (if time permits), tests could be extended to foreign exchange to address scalability of the approach.

1.2 Motivation The financial market is a highly dynamic system which depends on multiple factors such as bank interest rates, company base strength and everyday changing news. The motivation behind this project is to develop learning systems that can learn in an online fashion and cope with rapidly changing financial market environment. The main idea is to develop set of robust agents which uses Technical Analysis information and different Reinforcement Learning Algorithm to do automatic stock trading using eXtended Classifier System (XCS). They should be robust in the sense that they should be able to trade profitably (beat different benchmark strategies) on all FTSE 100 stocks. Finally we wish to build a portfolio optimization system which will interact with this modified eXtended Classifier System (XCS) and harnesses the strength of best agents and use different strategies like “taking best companies in the portfolio”,”mean reversal strategies” etc to do monthly or quarterly portfolio rebalancing.

2

1.3 Objective The project has been subdivided into following sub tasks -: 1) Optimize the problem formulation for XCS: The agents present in current system converts the information given by the technical indicators into Input (Binary String) for the XCS. The XCS System further tries to learn the optimum decision it should take when faced with a particular combination of binary bits. The better we define the problem (i.e. combine the technical indicator information) for the XCS, the better it will learn the underlying price pattern and more robust and reliable decisions (buy, sell or hold) we expect it to give. The aim is to develop such robust agents. 2) Formulate automatic stock trading and financial forecasting as a multi step problem instead of single step problem by implementing Q learning and Q (λ) learning algorithm. 3) Experiment with reward mechanism of XCS reinforcement learning portion to see if the delayed reward feedback is better than currently employed immediate reward feedback. 4) Explore the possibility of either giving negative reward to the agents for taking any incorrect action or creating and rewarding passive set for the incorrect decision taken by an agent. 5) Build a Portfolio Management System attached to the current XCS System. The portfolio management system will be responsible for -: a) Portfolio construction using different trading strategies like "utilizing combinations of best agents instead of a single agent", "mean reversal strategies" etc. b) Optimize the portfolio by pro-active monthly, quarterly or yearly rebalancing. 6) Compare the performance of the trading agents against benchmark agents like buy and hold, bank etc.

3

1.4 Outline The rest of this thesis is organized as follows: Chapter 2 gives the background information on the subject and briefs the readers about the current state of the art in “Financial forecasting domain”. Chapter 3 talks about how we have implemented different agents and learning algorithm as proposed in this thesis. It also talks about what errors we found during implementation and how we changed our approach to tackle different problems. Chapter 4 presents the experimental results obtain after implementing different Learning algorithm and our critical analysis of the results. Chapter 5 presents the implementation and experimental results for portfolio construction and optimization system and critical analysis of the same.

4

Chapter 2 2 Background & Related Work 2.1 Background 2.1.1 Market Efficiency Maurice Kendall in his random walk experiment (1953) [16] found that the stock prices are completely random and has no relation to the past performance. The unpredictable price movement seems to confirm the irrationality of the market. However, on deeper analysis it became apparent that random price movement indicates a well functioning or efficient market and not an irrational one [17]. In its most basic form Efficient Market Hypothesis says that markets are information efficient i.e. all the available information that could be used for profit making quickly gets absorbed in the stock prices and the prices may increase or decrease only in response to new unavailable and unpredictable information.

2.1.1.1 Version of Efficient Market Hypothesis (EMH) There are 3 forms of EMH, which differs in what “all available information” is composed of. 1) Weak Form hypothesis states that stock prices already reflect all the market trading information like past price, volume movement etc. It means that if any form of past price or volume data movement could be used be generate reliable trading signal then all investors would have used them by now making the information fruitless[17]. It suggests that any form of Technical analysis is useless. 2) Semi strong form hypothesis states that any publicly available information like prospects of firm including fundamental data on the firm’s product line, quality of management, balance sheet composition, and patents held, earning 5

forecast and accounting practices must also be already reflected in the stock prices [17]. This hypothesis makes the fundamental analysis also useless. 3) Strong form hypothesis states that stock prices also reflect information available only to company insiders. Such information also generally gets spread very quickly, leaving very less room for making profits. Summarizing, we can say that if there is any pattern or information that is exploitable, then mass of astute investors would attempt to profit from such predictability, which would ultimately move stock price and cause the trading strategy to self destruct.

2.1.2 Technical Analysis Technical analysis is mainly the search for recurrent and predictable patterns in the stock prices by using the past price or volume data. Technical analysis like weather forecasting doesn’t result in absolute prediction about the future but help investors anticipate what is most likely to happen to the prices over time. Dow Theory lies at the root of technical analysis. 2 important points from Dow Theory are -: 1) Prices discount everything. Current price of stock fully reflects all the information. Technical analysis utilizes the information captured by the price to interpret what the market is saying with the purpose of forming a view on the future [18]. 2) Price Movements are not totally random. Most technicians believe that there are inter spread period of trending prices in between random fluctuations. Technician aim is to identify the trend and then make use of it to trade or invest. More detail about technical analysis and how we have used it in our XCS System is presented in section 3.1.

2.1.3 The Portfolio: A portfolio is a combination of different investment assets mixed and matched for the purpose of achieving an investor's goal(s). A portfolio can be viewed as a piechart where each portion represents an allocation of the investment [6].

6

2.1.3.1 Why do we need Portfolio? The aim of portfolios is diversification. Different securities perform differently at any given point in time, so the idea is that with a mix of assets, the entire portfolio would not suffer the impact of a decline of any one security. It’s like following the simple practice of not putting all your eggs in one basket. Spreading investment across various types of assets and markets reduces the risk of catastrophic financial losses.

2.1.3.2 Portfolio Management: Portfolio management is defined as the art and science of making decisions about investment mix and policy, matching investments to objectives, asset allocation for individuals and institutions, and balancing risk against performance [6]. It is an attempt to maximize return at a given appetite for risk. In the case of mutual and exchange traded funds (ETFs), there are two forms of portfolio management: passive and active. Passive management simply tracks a market index, commonly referred to as indexing or index investing. Active management usually involves a single manager, co managers, or a team of managers who attempt to beat the market return by actively managing a fund's portfolio through investment decisions based on research and decisions on individual holdings. Closed end funds are generally actively managed.

2.2 Related Work Fama’s Efficient Market Hypothesis [20] (Section 2.1.1) and the Martingale Model [21], [22] rules out any strategy or publicly available information or private or return/dividends information, or use of technical analysis for excessive market returns. There are proponents on both sides who believe we can somehow predict the price and others who believe the prices are completely random. For example Burton J. Malkiel in his Random Walk experiment [23] [24] showed that prices are completely random, whereas MIT's Prof. Andrew Lo and Craig A. MacKinlay [25] published work, points out that there is a long trend in the prices. Lo and MacKinlay investigated the weekly US stock from 1962 to 1985 and found that random walk hypothesis could be easily rejected .They also described several techniques for detecting predictabilities and evaluating their statistical and economic significance. Work done by Pin Chen and Mu 7

Yen Chen [7] using an XCS based decision support system with technical indicators has shown promises to predict stock price fluctuations efficiently and generated good returns. Schulenburg [10] in her PhD research developed an LCS model of artificial traders and tested it in the stock market using several groups of technical indicators. Stone [26] in his PhD work applied ZCS on foreign exchange market. Competitive returns were generated in their work in most of the cases, which suggests LCS models can be successfully applied when modeling financial markets. More recently, Chen, Lin [8] used XCS for predicting future market price movements. The model used moving averages of price and volume for constructing environmental message. Gershoff and Schulenburg [9] explored the collective behavior of XCS agents to achieve accuracy in prediction.

2.2.1 Machine Learning in Finance and Portfolio Management Moody and Saffell [1] presented methods for optimizing portfolio by using an adaptive algorithm, Recurrent Reinforcement Learning (RRL), for discovering investment policies. They demonstrated how direct reinforcement can be used to optimize risk adjusted investment returns (including the differential Sharpe ratio) while accounting for the effects of transaction costs. The RRL algorithm learns profitable trading strategies in two ways: ● Maximize risk adjusted return as measured by Sharpe ratio. They used a modified derived form of Sharpe ratio called differential Sharpe ratio for online optimization of trading system. ● Avoid the downside risk by maximizing the Downside Deviation (DD) ratio, which is defined as square root of the average of the square of the negative returns. Using DD as measure of risk they used downside deviation ratio DDR to measure the utility function. RRL trader performed far better than Q trader and enables a simpler problem representation, avoids Bellman’s curse of dimensionality and offers compelling advantage in efficiency. GAO and Chan [2] presented a trading and portfolio management system called QSR which uses Q learning and Sharpe ratio algorithm. They used absolute

8

profit and relative risk adjusted profit as performance function to train the system respectively. The experiments conducted on trading example based on foreign exchange rate showed promising results. Neuneir [3] formalized asset allocation as a Markovian Decision Problem and optimized it using dynamic programming and Q learning algorithms. Neural networks were used for value function approximations. Experimental results on German Stock market showed this strategy to be better than heuristic benchmark policy. Schulenburg and Ross [29][30] developed a LCS model where trader used technical indicators to predict the price of IBM stocks. The system was able to beat all the benchmark agents. Kyong and Sungky [4] used genetic algorithms to propose a portfolio optimization scheme for index fund management. Index funds are designed to copy the benchmark index with relatively small number of stock. The paper reported that index fund could improve its performance greatly with the proposed GA portfolio scheme. There proposed scheme is based on three fundamental variables: Portfolio beta, trading amount and market capitalization. They demonstrated the results, for index fund designed to track the Korean Stock Price Index. Dempster and Jones [11] aimed to develop an adaptive trading system that trades profitably by emulating the behavior of technical traders who adapt to the market by changing it trading strategies. There trading system uses Genetic programming to find the best combination of technical indicator to trade. The genetic algorithm can chose the combination of technical indicator from initial set of 6 technical indicators namely AMA, CCI, MACD, MA Crossover, Price Channel, RSI and Stochastic. They used a modified form of Sterling ration to gauge the performance of trading strategies. S = Return/ (1 + modified drawdown) Alongside finding trading strategies via genetic programming they also tried to optimize the built portfolio by quarterly re optimizing it. There experimental results showed that such a system which uses combination of technical indicator can make profit. The best strategy employed was able to give a return of 7% pa. However on an average they weren’t able to beat buy and hold strategy. They also showed that trading in adaptive manner wherein quarterly optimization of the trading strategy is done is

9

ultimately loss making, which highlight the penalty for over-reaction to short term market behavior [11]. Schulenburg and Wong [12] experimented on Portfolio allocation using XCS System by combining input data using technical analysis, general market condition and options market conditions. There best performing agents performed substantially better than benchmark agents like buy and hold, trend following, bank agent and random agent. However, XCS agent’s performance varies depending on initial random seed chosen and a single best performing agent can’t tell much about the performance of overall system in general. Dempster, Payne, Romahi and Thompson [13] used Technical indicators for Intraday FX Trading using Reinforcement Learning and Genetic programming technique. The set of technical indicator used by them were price channel break out, adaptive moving average, relative strength index, stochastic, moving average convergence divergence, moving average crossover, momentum oscillator and commodity channel index. The performance of the System was judged on the basis of Sharpe ratio and sterling ratio. There experiments were able to generate significant insample and out of sample profits. However none of the methods produces significant profits at realistic transaction costs.

2.3 XCS Introduction from Stock Trading Perspective XCS stands for extended Classifier System. It is an accuracy based classifier system which is different from other classifier in the way that classifier fitness is derived from estimated accuracy of reward predictions instead of from reward prediction themselves. It is an Online learning machine, which improves its behavior with time through interaction with environment. XCS learns through reinforcement and the aim is not only to get more reward but to maximize the value (Summation of all the rewards in long run). The System is given least amount of prior information, so that most of the machine knowledge results from adaptation to the environment. We don’t tell it how to do things but let it learn through fed inputs, action it takes and the reinforcement it gets. If it does well, we give it positive reward else penalize it in some form.

10

2.3.1 XCS Input and Output XCS Input Unit : The input to XCS is binary vector e.g. 10010110 where each bit can be thought of as crossing the threshold of continuous valued Output of some sensors. In our XCS, everyday the system gets previous day data about the individual stock price (open, high, low and close) and volume. The Meta agents present in the system apply technical analysis on this information to get buy (1), sell (0) signals. A very simple example is moving average of price and volume. Let’s say a particular agent calculates the moving average of price and volume for the past 10 days. If the closing price of stock is greater than 10 day moving average of price, it suggests price may go up and that’s a buy (1) sign. Similarly, if the volume of the stock is greater than 10 day moving average of volume, it also predicts a buy (1) sign. So the input string that will be fed into the XCS will be 11. This is a very simple example. In actual, system uses more advance technical analysis to form the input binary string which may range from 6 to 9 bits in length. Each bit can be either 0 (sell) or 1 (buy). More detail about how the problem is being defined to the XCS System using technical analysis can be found in section 3.1. XCS Output: XCS output is discrete action or decisions. For example in our case it is either 0(sell) or 1(buy). The final aim of learning cycle for XCS is to learn what action it should take for a particular combination of input binary bits. Please note XCS uses unsupervised learning (Reinforcement Learning) wherein at any point we don’t tell it what is right and what is wrong. It has to find this out through experimental trial and error and reward mechanism. Depending on the input string sometimes it is easy to predict what the correct action is. For example if input string is 101111 i.e. out of 6 bits, 5 bits are suggesting to buy the stock, then the correct action must be to buy the stock. However, at other times the correct action might not be so evident. For example if the input string is 111000, the proportion for both buy and sell signal are equal and we expect it to learn what weight would be appropriate to give to individual signal. Even in real life scenario, on a given day an actual trader might face with situation wherein there is no clear sign of buy or sell from combination of technical indicator. He/She then have to judge from experience which technical indicator information should be given more weight and decide accordingly. 11

2.3.2 XCS Frame Work [15] XCS contains population of Classifiers. Each classifier in the population is characterized by 5 main components -: 1) Condition part C, which specifies on what problem instances the classifier is applicable. 2) Action part A, specifies what action classifier takes when condition C is fulfilled. 3) Reward prediction P, estimates what payoff or reward classifier can expect on executing the action. 4) Reward prediction error ε estimated the mean absolute difference of R with respect to the actual reward. 5) Fitness F estimates the scaled, relative accuracy of classifier with respect to other overlapping classifiers in the action set it is present. In Short a classifier is a set of : => Prediction is similar to payoff of Reinforcement Learning. Eg 01#1## : 1 => 693.2 {0 sell : 1  Buy : # : don’t care} This classifier says if first bit is 0, second is 1 and fourth is 1 and I don’t care about others then after taking action 1(Buy), 693.2 will be the payoffs. The payoffs are updated with the learning of system. The above given classifier’s condition matches with following 8 input string. It might be the sum of all there payoffs. 010100 010110

There can be 8 such cases

010101 ………. It’s different from other action based systems like Neural Network in the sense that in Neural Networks payoff information for any Input are distributed over the whole Network. Each classifier acts only a subset of problems. It checks whether given condition is one on which it can act. If condition is there, it acts on it and predict certain payoff. 12

Figure1 XCS Frame Work [15] [P]: Classifier population

F: fitness of prediction α 1\ε

[M] : Match Set

ε: error in prediction

p: predicted value

2.3.3 XCS Learning Cycle In the starting the classifier population [P] is generally empty. The agents use the stock price and volume data to form the input string (For details please see section 3.1). The Input is fed into classifier population [P], which detects if there is any match. The 4 classifier marked with -- in fig 1 matches the Input 0011. They are put in a match set [M]. If no classifier matches the given input, XCS creates classifier by covering mechanism (A rule is created at random and has random action and is assigned a low prediction). A new rule has a certain number of don’t care sign (#) in random position. The # sign give classifier an initial generality due to which it can be tested on many input problem instances.

13

Covering is necessary only initially and vast majority of new rules are derived from existing rules. For example, suppose Input string is 11000101 and there is no classifier which matches this input .Then the rule created is 1##0010# : 01  10 Continuing the process, after creation of Match set, XCS estimates payoff for each possible action by forming a prediction array P (A). In fig1, 2 classifiers in the match sets are predicting 01 and two are predicting 11. We take the Fitness weighted average of prediction for each action Predicted weighted

= Σ prediction * fitness

Average

-------------------------------Σ fitness

Eg P(action=01) =

43*99 + 27* 3 --------------------

= 42.57

99+3 Similarly P(action=11) =

16.6

Hence, P(A) shows fitness weighted average of all reward prediction estimates of the classifier in [M] that advocate classification A. The System follows an ε greedy policy i.e. it takes the best action most of the time, but with small probability ε (exploration probability) it also takes suboptimal action and chooses random action from those in the prediction array. All classifiers in match set [M] that specifies chosen action A forms the action set [A]. In fig 1, we have chosen the action with maximum prediction i.e. 01 and 2 classifiers having this action are put in action set. The System executes the prescribed action. Next day the correctness of the action taken is judged by the stock price movement. For every correct action a reward of 1000 is given. For wrong action reward of 0 is given. It differs from normal Reinforcement Learning methodology in the sense that for incorrect action, negative reward is not given. For example let’s suppose the system predicts rise in the price of stock and it buys the share. If next day the prices go up, then a reward of 1000 is given. This reward is used to update the parameters of classifiers in action set [A].

14

2.3.3.1 Updating XCS Parameters Initially on creation of a classifier, it is given a very low prediction value. After getting the reward for the executed action, its parameters are updated as follows-: Prediction

: Pj  Pj + α(R - Pj)

α Is learning rate (~ 0.2) so if R > Pj then Pj value is increased i.e. it’s prediction will go up. As can be seen, if this particular classifier is updated many times, Pj will tend toward ‘R’ i.e. predicted value will tend towards the actual return from the process. Similarly, other parameters are updated as Error

: Ej  Ej + α(|R – Pj| - Ej)

Accuracy

: Kj ==

Ej

Relative

: Kj’=

Kj / Σ Kj

−m

if Ej > Eo else

−n

Eo

over [A]

Accuracy Relative accuracy shows relative accuracy of classifier with respect to classifiers in action set. Fitness

: Fj  Fj + α(Kj’ - Fj)

Fitness of the classifier is an estimate of its accuracy with respect to accuracies of other classifiers in the action set it occurs

2.3.3.2 Genetic Algorithm role and rule evolution[15] XCS applies Genetic algorithm for rule evolution. If the average time since the last GA was applied, exceeds certain threshold then genetic reproduction is invoked in the current action set [A]. The GA selects 2 parental classifier based on there relative fitness in action set [A]. Two offspring’s are generated reproducing the parents by applying crossover and mutation. Parents and offspring’s both compete in the same population [P]. Niche mutation is applied in the classifier which means that the mutated classifiers still matches the current problem instance or input binary string they were able to act previously. If the offspring condition is subsumed by some other classifier than it is not inserted into the population and only the numerosity of the subsumer classifier is increased by 1. The classifier population is fixed and deletion is

15

done if over populated. Excess classifiers are deleted from [P] with probability proportional to the action set size estimate that the classifiers occur in. If classifiers are more experienced with less fitness there probability of deletion is more [15]. For more information on this, readers are encouraged to read chapter 4 of martin butz book. The classifiers which are more general will more often be part of an action set and thus undergo more reproduction events and thus propagates faster. Thus the GA process is expected to evolve the accurate, maximally general solution as the final outcome. For example the below mentioned classifier undergoes cross over to give offspring on the right hand side. 10##|11:1

10##1#:1

-----(1)

#00011:2

------(2)

 #000|1#:2 Please note result of crossing are : A classifier (1) which is more general than both, A classifier (2), which is more specific than both. A more specific classifier can never be less accurate. It is not the case always but the process tends on balance to search along generality specific dimensions, using piece of existing higher accuracy classifiers. It is clear that population will tend towards having classifiers with greater accuracy [15].

2.3.4 Deviation from other LCS based Systems •

XCS reproduces classifiers selecting from the current action set instead of from the whole population.

•

Relative accuracy based fitness measure the performance of a classifier.

•

Reproduction favors those who’s condition matches and come more often in the action set.

•

Deletion occurs from whole of the population.

16

2.3.5 Mind of XCS System

Figure 2: Voting Strategy [19] Our modeled XCS System in its basic form consists of 7 Agents which uses different set of technical analysis information to create the Input binary string. Each agent has 25 copies which simultaneously do the trading and prediction. One voting agent combines the prediction of these 25 agents and presents it to the meta-agent. The system learns in an online fashion. There are two separate phases, learning phase and trading phase. During the learning phase all the agents simply explores and updates the parameters of classifiers and no actual money is invested. During the trading phase, out of 25 agents, system randomly picks some agents who explores (take random sub-optimal action) and other agents exploit (take best possible action which is supposed to give maximum reward). While combining the decision, voting-agent consider the factor of current wealth of 25 agents and discard the decision of those agents who are loss making. Also any agent who is exploring (taking random action), his action is not taken into account. Meta agent finally takes the decision of either to buy or sell using the composite predictive power of 25 XCS Agent. For portfolio management system 14 different types of XCS Agents were used. Due to continuous process of exploring and exploitation even during the trading phase, learning of the system never stops. Due to this continuous learning, if the dynamics of the market changes, we expect system to capture those variations also.

17

Chapter 3 3 Implementation 3.1 Technical Analysis Usage in XCS Technical analysis overall is more of an art than a science. There is no single kind of technical indicator which can work for all the stocks in the market. In our XCS System technical analysis information is used to make the input binary string. For our purpose we used and coded 14 individual technical indicators. There are open source library for technical indicators. However, we have coded our own set of technical indicators, so that some form of heuristic can further be applied to individual technical indicators to generate more robust buy or sell signal. For example one such technical indicator, Relative Strength Index (RSI) ranges from 0 to 100 and gives over bought and over sold condition for RSI greater than 70 and RSI less than 30 respectively. We used heuristic to generate buy or sell signal in the range 30 and 70.More details about this can be found in Section 3.1.1. This section is further divided into following parts -: 3.1.1 Description of individual technical indicators and how they are used to generate buy or sell signal. 3.1.2 Description of agents which combines different technical indicator information. 3. Advantages and “scope of improvement” of the current approach.

18

3.1.1 Description of individual technical Indicators Technical indicator is defined as a series of data points that are derived by applying a formula to the price data of security which can be combination of the open, high, low or close over a period of time [18]. These data points can be used to generate buy or sell signal which we shall shortly see. Technical indicators can provide unique viewpoint on the strength and direction of the underlying price action. Different technical indicators employed in our XCS System are -:

Moving average: It is a lagging indicator which simply calculates average price of security over a specified number of periods. Moving average filters out random noise and offers a smooth perspective of price action. They work well when stock develops a strong trend. Usage of Moving average: In our XCS System Moving average is used in 2 ways to generate buy and sell signal. The location of current price, relative to the moving average: 10 and 20 day moving average is used for this purpose. if (MA10[index]-close>=0){binary+="0";}else{binary+="1";} if (MA20[index]-close>=0){binary+="0";}else{binary+="1";} Location of shorter moving average relative to longer moving average. if(MA20[index]>MA10[index]){binary+="0";}else{binary+="1";} Please note 1 is for buy signal and 0 is for sell signal. Binary is the appended binary string which is fed as input to the XCS system. We have deliberately used shorter moving averages (10 and 20) to reduce the lag in the signal and concentrate on the short term trends rather than long term trend. Parabolic SAR: SAR stands for stop and reverse. It was developed by J. Welles Wilder Jr to find trends in market price. It develops dotted line either above or below the security price. The dotted line below the price establish the trailing stop for a long position (generates

19

buy signs) and the lines above establish the trailing stop for short position (our System doesn’t short and only generates sell sign) Usage in XCS: SAR value greater than current day high of day gives sell sign. if(sar[index]> high){binary+="0";}else{binary+="1";} * Details of SAR Calculation is given in appendix Average Directional Index (ADX): It evaluate strength of current trend, be it up or down. ADX is based on accumulation distribution line. Usage in XCS: Positive and negative direction index (+ DI, -DI ) are used to generate buy and sell sign. if(posDI[index]>negDI[index]){binary+=1;}else {binary+=0;} Commodity Channel Index (CCI): CCI is a typical price based momentum indicator which was developed by Donald Lambert to identify cyclical turns in commodities. Usage in XCS: CCI is band oscillator. Movement above + 100 indicates overbought stock and sell signal is given. Similarly movement below -100 gives oversold sign and buy signal is given. Movement between -100 and + 100 doesn’t give clear sign of buy or sell. In such scenario we have used heuristic that if current CCI is greater than past 5 days moving average of CCI then a buy signal should be given. if(CCI[index] >= 100){//over bought binary+=0; }else if (CCI[index] = CCIMA[index]){// + divergence binary +=1; }else{ binary +=0;} Chaikin Money Flow(CMF): CMF is an oscillator based on accumulation distribution line. Usage in XCS: CMF is bullish when it is positive and bearish when it is negative. if(CMF[index]< 0){binary+="0";}else{binary+="1";}

20

MACD: Moving average convergence divergence. It is a centered oscillator that is unique in having both leading and lagging component in it. It is the difference between the 12 day EMA and 26 day EMA of a security. Usage in XCS : A positive macd indicates buy sign and vice versa if(MACD[index]> 0){binary+="1";}else{binary+="0";} A nine day Exponential moving average, EMA of MACD acts as trigger line to give buy sells sign if(MACD[index] > MACDSignal[index]){binary+="1";}else{binary+="0";} Money Flow Index: MFI is a Momentum indicator similar to RSI. It’s a good measure of money flowing into and out of the security. Usage in XCS MFI above 80 indicates overbought stock and gives sell sign and below 20 indicates oversold stock and gives buy sign. In between 20 and 80 there is no clear sign of buy and sold and so we have used positive divergence to create the buy or sell sign. If current MFI is greater than 5 day average of MFI then a buy sign is given. if(MFI[index] >= 80){ binary+=0; }else if (MFI[index] MFIMA[index]){ binary+=1; }else{ binary +=0;} On balance Volume (OBV): It’s a Volume based oscillator. Usage in XCS A rising bullish OBV line indicates that the smart money is flowing into the stock and shows price uptrend. We have used it as, if the current OBV is greater than past 5 days OBV then a sell sign may be given.

21

if (OBV[index]>=OBVMA[index]){binary+=1;}else {binary+=0;} Percentage Price Oscillator (PPO): This oscillator formed by taking difference of longer moving average from shorter moving average of price in percentage form. Usage in XCS: Used in 2 ways-: if(PPO[index]> 0){binary+="1";}else{binary+="0";} if(PPO[index] > PPOSignal[index]){binary+="1";}else{binary+="0";} PPOSignal is found by taking 5 day moving average of PPO. Percentage Volume Oscillator (PVO): Similar to PPO except that instead of price, Volume is used for calculation. Usage in XCS if(PVO[index]> 0){binary+="1";}else{binary+="0";} if(PVO[index] > PVOSignal[index]){binary+="1";}else{binary+="0";} Relative Strength Index (RSI) : RSI is a momentum oscillator which compares the magnitude of a stock’s recent gains to the magnitude of its recent losses and turns that information into a number that range from 0 to 100 [18]. Usage in XCS: RSI above 70 and below 30 indicates overbought and oversold condition and gives sell and buy signal respectively. In between 30 and 70 we have used heuristic that if current day RSI is greater than past 5 day average of RSI then a buy sign is given. if(RSI[index] >= 80){ binary+="0"; }else if(RSI[index] = RSIMA[index]){ binary+="1"; }else{ binary+="0"; }

22

Stochastic Oscillator: It is a momentum indicator. Usage in XCS Reading below 20 are considered over sold and above 80 are considered over bought. We have used fast percent D in XCS. In between 20 and 80, heuristic similar to RSI is used. if(fastPercentD[index] >= 80){ binary+="0"; }else if(fastPercentD[index] = fastPercentDMA[index]){ binary+="1"; }else{ binary+="0"; } Cross over of FastPercentK with respect to fast percent D is also used to generate buy and sell signs. if(fastPercentK[index] > fastPercentD[index]) {binary+="1";}else{binary+="0";} * for details about Stochastic Oscillator calculation, please see the appendix. StochRSI : It is a momentum oscillator wherein Stochastic oscillator is combined with RSI. Usage in XCS: if(stochRSI[index] >= 80){ binary+="0"; }else if(stochRSI[index] = stochRSIMA[index]){ binary+="1"; }else{ binary+="0";} ROC : Rate of change is centered oscillator. It gives percentage price change over the last 20 days. Buy signal generated if ROC is greater than zero. Usage in XCS: if(ROC[index]< 0){binary+="0";}else{binary+="1";}

23

Williams % R: It’s a momentum indicator that works much like the Stochastic Oscillator. Usage in XCS: if(willPercentR[index] >= -20){//overbought binary+="0"; }else{ binary+="1"; } if(( -80

Automated stock trading and Portfolio optimization using XCS ...

Automated stock trading and Portfolio optimization using XCS ...

Suggest Documents

India Stock Trading Portfolio

Automated Stock Trading Using Machine Learning Algorithms

CS229 Project Report Automated Stock Trading Using Machine ...

L. K. Rose Automated Stock Market Trading using Machine Learning

Portfolio of Automated Trading Systems: Complexity and ... - IEEE Xplore

Stochastic portfolio optimization using efficiency

Markowitz portfolio optimization using MOSEK

Automated Stock Price Prediction and Trading ... - IEEE Xplore

Stock Prediction and Automated Trading System - CS Journals

Stock Prediction and Automated Trading System - CS Journals

Portfolio Optimization with Markov-modulated stock prices and interest ...

Modeling stock prices in a portfolio using

Automated Options Trading Using Machine Learning

Portfolio Optimization:

A Channel Portfolio Optimization Framework for Trading ... - iQua Group

An Automated System for Stock Market Trading Based on Logical

Trend Recalling Algorithm for Automated Online Trading in Stock Market

An Automated Framework for Incorporating News into Stock Trading ...

PORTFOLIO OPTIMIZATION USING NEURO FUZZY SYSTEM IN ...

dynamic portfolio optimization using generalized dynamic ... - terrapub

Portfolio Optimization Using Evolutionary Algorithms

Portfolio Optimization Using Matrix Approach - Science Publishing

Three Automated Stock-Trading Agents: A Comparative Study

Three Automated Stock-Trading Agents: A Comparative Study - UCLA