Automated Trading with Machine Learning on Big Data

11 downloads 80887 Views 803KB Size Report
big data, algorithmic trading, mean-variance optimisation, time series prediction ... single best composite source of evidence one might want to exploit to devise ...
Automated Trading with Machine Learning on Big Data Dymitr Ruta Etisalat, BT Innovation Centre (EBTIC) Khalifa University, Abu Dhabi, UAE Email: [email protected]

Abstract—Financial markets are now extremely efficient, nevertheless there are still many investment funds that generate alpha systematically beating markets’ return benchmarks. The emergence of big data gave professional traders the new territory, leverage and evidence and renewed opportunities of their profitable exploitation by Machine Learning (ML) models, increasingly taking over the trading floor by 24/7 automated trading in response to the continuously fed data streams. Rapidly increasing data sizes and strictly real-time requirements of the trading models render large subset of ML methods intractable, overcomplex and impossible to apply in practise. In this work we demonstrate how to efficiently approach the problem of automated trading with large portfolio strategy that continuously consumes streams of data across multiple diverse markets. We demonstrate a simple scalable trading model that learns to generate profit from multiple intermarket price predictions and markets’ correlation structure. We also introduce the stochastic trade diffusion technique to maximise trading turnover while reducing strategy’s exposure to market impact and construct the efficient risk-mitigating portfolio that backtests with the strong positive return. Keywords-machine learning, logistic regression, classification, big data, algorithmic trading, mean-variance optimisation, time series prediction, trade density, market impact, backtesting

I. I NTRODUCTION Trading on financial markets has become a very technical discipline that increasingly relies on constant and incredibly fast analysis of ever larger amounts of data [1]. Designing a successful trading strategy involves beating the market’s future price expectations either by more knowledgable or faster processing of the available information. There is a large body of evidence indicating that the price movements are not predictable since markets are efficient [2]. The logic behind this claim is that any new information or presumable advantage of any market participant would be immediately absorbed and adjusted by the new market equilibrium that would instantly eliminate the trading opportunity [1], [2]. We claim, however, that the assumption that all the market participants are equally informed is false. The emergence of big data makes it even more apparent as it creates a limitless number of creative and real-time methods of data exploitations, opening many new windows of trading opportunity on the financial markets arena. As a result, systematic and diversified trading aided by the state-of-the-art machine learning models emerges as a very powerful technology that enables harvesting, filtering and deep exploitation of massive

amounts of data to deliver a competitive advantage and devise profitable trading strategies. Spectacular successes of some specialised hedge funds seem to support this claim. What we try to promote in this work is the exogenous exploitation of data among the large number of traded assets as well as other supportive time series descriptors. Traditionally, algorithmic trading is guided by the prediction of asset price based on its current trend, its historical time evolution and other characteristics drawn from the same market [1]-[5]. Other methodologies focus on trying to find arbitrage opportunities, derive the trade opportunities from micro or macro economic indicators or even focus on news events as the drivers of significant price movements [1], [4]. Our logic in this work assumes that all the relevant knowledge affecting the price movements is directly embedded into the asset’s price series and therefore it is considered a single best composite source of evidence one might want to exploit to devise a successful trading strategy. However, we go beyond market’s own price history to try to determine its future by attempting to use all other markets’ histories and jointly formulate their explanatory impact on the future price action of the target market. Machine Learning has been successful in predicting future price direction [6], but less in predicting future values unless in a very short prediction horizon [3], [5]. There was a limited effort to try to successfully predict optimal trade actions unless in very naive setups that do not reflect real trading environment [7]. The reality of financial trading research is that the most successful trading models are harnessed for trading and are therefore concealed by the fund managers away from the public and research community. The latter, although equipped with the state of the art predictive modelling skills, lacks the practical experience and understanding of the huge number of complexities of trading on financial markets. We intend to breach this gap a little bit and try to devise a realistic highly scalable trading strategy with the machine learning predictive engine autonomously analysing large data sets across very many markets and generating multiple trading signals that could have been traded in the real financial markets. We also demonstrate how such monstrous, seemingly infeasible task can be successfully accomplished thanks to a support of free big data technologies or at least parallel high throughput computing environment [8], [9].

The remainder of this paper is organised as follows. Section II describes the data and its exploitation for the purpose of automated trading. Section III discusses in details how data features are extracted, selected and used to learn optimal trade actions as a supervised classification process. Finally the simple yet realistic trade simulator and its components are discussed in Section IV followed with the experiments in Section V describing the backtesting process and conclusions drawn in the closing section. II. DATA E XPLOITATION We assume a large number of M markets are traded at different global exchanges during fixed trading hours usually expressed in the local time. During the trading hours each market emits trading data in a form of quotes for bid and ask volumes at different price levels (called orderbook) that surround the current trade price, i.e. the price at which the most recent transaction took place. Whenever the new transaction takes place or there is any new order or cancellation submitted, the state of the orderbook changes, which for the highly liquid markets might happen even thousands of times per second. At that frequency just recoding the time, price, and volume could result in GBytes of data generated by a single market per day. Working with a large number of such markets over longer periods of time may prove difficult or even impossible in the real-time context. Luckily, using this data for devising a trading strategy and trading it live imposes several constraints that simplify the data landscape. The expected trading frequency or trading horizon of our strategy directly informs about what price and time resolution (quantisation level) we might operate on and what might be the reasonable depth of historical data that would still have some impact upon the current price movements: the so-called lookback. If the objective is to devise a high frequency trading strategy with holding periods in the order of seconds or minutes then certainly aggregating prices and volumes at the seconds or subseconds level might be suitable but the useful history would be very short possibly limited to just the current day or even last hour if not minutes or seconds. On the other hand if the strategy is expected to trade infrequently i.e. holding the positions for multiple days, then it might by more suitable to quantise the data at the minute, hour or daily level and set the active lookbacks to weeks or months. In this work we show a prototype of a simple trend breakout detection strategy hence the reasonable data resolution has been achieved by quantisation at the minute level, combined with the lookbacks varying from days to months. To simplify the problem even more and inline with our pricecentric logic described in the previous section, only traded price time series are used, i.e. the volumes and the orderbook moves are ignored. We consider M tradable markets’ price series, accompanied by other non-tradable price series that could be valuable for predictions but due to their nature

cannot be traded. Summarising, the data to be used for the trading strategy presented here covers 1-minute quantised data from about 100 different markets over the period of 14 years from 2000+, which reduced the original data scale from TeraBytes to just 1GB of data. III. E XOGENOUS T RADE S IGNAL G ENERATION P ROCESS In this work we seek some predictability of the individual market price movements using evidence coming from the outside of the market, i.e. for learning and predicting the target market price moves we intend to construct features utilising all other markets price series. Let us consider a pool of N time series xi (t) that contains a subset of M tradable markets, M ≤ N . All of the tradable markets’ time series xi (t), i ∈ {1, .., M } are 1-minute quantised price series, while the remaining time series could be any other relevant feature like interest rates, volatility indices etc., that share the price equivalent meaning. Since trading any tradable market follows the same logic it is sufficient to describe the strategy for trading only a single market and replicate the procedure for the other markets. Let us, thus, call the ith market that we want to trade a target and all the other time series the predictors. To make an effective use of the predictors all of them have to be time-aligned with the target i.e. for every point of the target’s time series one has to map all the predictors latest values available at this time. To solve this problem all the time series were first converted to the same UTC (Coordinated Universal Time) time and then a generic solution has been found to align all N time series into a single matrix that we call the evidence. To construct the evidence matrix, first a set union of all 1-minute time stamps from all the tradable markets has been established: T = ∪M i=1 ti . Then the T -indexed evidence matrix X(T )has been generated by assigning time series values in their time-corresponding places Xi (T = ti ) = xi (ti ), filling the missing data with the last available data point i.e. using copydown method. Given the evidence matrix X = [Xi ] and its corresponding time vector T , the objective is now to extract a set of historical features from the predictors’ columns: Xt−l:t,j , j 6= i and use them to predict the targets future values Xt+1:t+h,i , where h is a prediction horizon or lookahead. A. Features We define a simple and generic family of exogenous features defined upon the evidence matrix X. Namely an jlinear predictor deviation from the ith -target over lookback l at time t denoted shortly by fj→i (t, l) is defined as a difference between the target value Xt,i and its predictor’s perceived expectation E(Xt,i ) = aXt,j + b drawn from the simple linear regression fitted over the lookback l i.e. fj→i (t, l) = arg min[Xt−l:t,i − (aXt−l:t,j + b)]2 a,b

(1)

Only features that passed the selection criteria of S ≥ 0.5 and P P T ≥ 1 are selected for the actual learning process. Note also that evaluating features as defined in previous chapter by the trading performance favours features that are E(Xt−l:t,i Xt−l:t,j ) − E(Xt−l:t,i )E(Xt−l:t,j ) (2) capable of detecting breakout from the established relationaj→i (t, l) = 2 2 E(Xt−l:t,j ) − E (Xt−l:t,j ) ship between the pair of target and predictor markets. bj→i (t, l) = E(Xt−l:t,i ) − aj→i E(Xt−l:t,j ) (3) C. Target Output Now by introducing l-lookback moving average operator l Defining the output target is very important as it deterover a vector: X we can express a complete running mines what is it the predictive model is trying to learn parameter vectors by: based on inputs. In previous section we have defined a l l l Xi Xj − Xi Xj l l family of linear predictor deviation features along with a aj→i (l) = bj→i (l) = Xi − aj→i (l)Xj (4) l l trading performance guided feature selection. To maximise Xj2 − [Xj ]2 the predictive power of the features it is most suited if the Thanks to the fact that the moving average operator appears target output is defined in terms of the similar mechanisms everywhere in (4), parameters calculation process is natuas introduced for the feature generation process. rally suited for online update. Rather than than predicting directly the future price, via Please note that for a single ith target market, and a a very noisy and inaccurate regression process, we propose lookback l there is N − 1 time series with features running to predict the future trend label of buy, sell or hold, subject over the window l. Parameter vectors aj→i (l), bj→i (l) can to several validity conditions. We link these labels with the be therefore captured by the matrices Ai (l) and Bi (l) with strength of a trend line of the target price fitted over the the elements corresponding to the evidence matrix X such future lookahead time and expressed as an expected average that the l-lookback deviation feature matrix for the ith target return per unit of time. Xi can be obtained by: Assuming that at the time t we are looking to predict the future trend of the ith target over future lookahead period h, we first calculate the linear coefficient vector by the usual Fi (l) = Xi ×1(1, N )−[Ai (l)◦(Xi ×1(1, N ))+Bi (l)] (5) formula with indices shifted forward in time: where 1(n, m) stands for a n × m matrix of ones and ◦ is an elementwise dot product. E(Xt:t+h,i Tt:t+h ) − E(Xt:t+h,i )E(Tt:t+h ) (8) ai (t, h) = 2 B. Feature Selection E(Tt:t+h ) − E 2 (Tt:t+h ) The feature selection for the process of learning of specific bi (t, h) = E(Xt:t+h,i ) − ai (t, h)E(Tt:t+h,j ) (9) target is guided by the individual feature performances as if Based on the linear coefficient ai (t, h) we can now define the specific feature was considered a trade signal itself. To discrete target output variable y as follows: turn an individual feature fj→i (l) into a trade signal it is simply subjected to the discretisation function Lq :  yi (t, h) = Lq (ai (t, h)) ∧ (10)  −1, if x < −q t+h ∀τ =t Xτ i Lq (ai (t, h)) ≥ Xti Lq (ai (t, h)) (11) Lq (x) = 1, (6) if x > q   What it means is that the target output label of +1 (buy) 0, otherwise at time t is set when the future trend over lookahead h is where q is a positive threshold usually set to 80th perpositive and above certain minimum threshold q, while at centile over the feature observed values. Given the trade any time over this lookahead the price does not fall below the signal Lq (fj→i (l)) constructed from a single j th feature, its current level Xt,i . Likewise, the −1 (sell) label is set when trading performance is evaluated by simulating the trading the trend is negative and below the threshold −q, while the process that returns the measures of the annualised sharpe price never exceeds the current level Xt,i . All other points ratio (Sann = S) and the average profit per trade (P P T ). along the target time series i.e. those with −q ≤ ai (t, h) ≤ q Sharpe ratio is defined as a ratio of the average daily return are labelled with 0 (hold). per unit of risk annualised by an average of 252 trading days Note that the condition in (11) validates only robust trends in a year [10]: when price does not move strongly against the average trend √ below the current level. To enforce even more stable trend, (7) Sann = S = r¯p /σ(rp ) 252 condition (11) can be replaced with the maximum allowed while PPT is expressed by the average per-trade return deviation of the price from the trend i.e.: expressed in the number of ticks (minimum price moves) ∀t+h (12) τ =t |Xτ i − (ai (t, h)Tτ + bi (t, h))| < d achieved per unit of trading (contracts in case of futures). Parameters a and b at any time t and for any lookback can be easily extracted from the evidence matrix by:

D. Classification Process Given the predictor deviation features generated over llookback for the ith target Fi (l) and the target output vector Yi (h) the objective is now to build a classification model that would learn the relationship between the features and outputs. Initial investigations indicated that the linear logistic classification model appears to be a simple yet capable predictor that builds the logistic model of continuous features mapped into the binary output y. To fit our problem within its representation we have rescaled logistic function by 2y−1 to map into the continuous target range of (−1, 1). We have also ignored the neutral hold-labelled target examples and just used positive and negative labelled examples that limited dramatically the number examples required for learning. Our modified logistic regression function can be expressed by: yˆi (t, h) =

2 P

1 + e−(β0 +

j6=i

βj Ft,j )

− 1 j 6= i

(13)

Learning process involves finding the optimal parameters β which minimise the squared difference between the logistic function and the actual target labels: arg min[ β0 ,βj

2 1+e

P −(β0 + j6=i βj Ft,j )

− 1 − yi (t, h)]2

(14)

Clearly such problem can be transformed into a multiple linear regression problem for which there is many optimisation algorithms that find robust solutions. Once β parameters are learnt the only step required to turn the logistic function (13) into a classifier is to apply the discretisation function Lq (ˆ yi (t, h)) with respect to the threshold value q that would provide outputs in the action space of sell, hold or buy, or {−1, 0, 1}, respectively, inline with the equation (6): Si (t, q) = Lq (

2 P

1 + e−(β0 +

j6=i

βj Ft,j )

− 1)

(15)

Time sequence of such labels as shown in equation (15) directly represents a stepped trade signal that is passed on to the trading platform for execution, i.e. generating buy and sell orders as required. IV. T RADING S IMULATOR M ODEL Trade signals extracted directly from the classification process inform about the timing of buy or sell trades of the considered asset. To manage the the state of the trading system it is much more convenient to use the required position signal which can be obtained from a simple conversion from the trade signal. Required position signal carries a cumulative state of traded market holdings resulting from a sequence of trade actions. It is also easier to express buy (sell) trades in this representation which can be seen as the upward (downward) change in the required position signal.

Note that trades that increase the absolute positions away from the flat (0) position, the so-called entries increase exposure to risk, while trades that decrease the absolute positions towards flat level, the so-called exits, decrease the risk exposure. To account for that that risk imbalance we have applied the separate classification thresholds for entries qen and exits qex and enforced a constraint of qen > qex such that a trade entry is executed on stronger signals, while an exit trade on smaller magnitude opposite signals. What it also introduces is a healthy margin between exiting from one position to the flat state before entering the opposite position. Such trading is considered smoother and more risk-avert as opposed to always holding non-zero position. Given the required position of an individual trading strategy it is now crucial to realistically assess what kind of price we can achieve when submitting a trade order. Specifically it is very difficult to assess the price slippage known as a detrimental price difference comparing to the trade price seen at the time of trade order submission. The slippage may be further exacerbated by the high volume orders that become noticeable on the market orderbook and cause an adverse market response called market impact. In this work due to a lack of realistic market simulation capability and rather illustrative low volume trading model, the market impact effects are ignored, while the price slippage caused by delayed trading is emulated by trade execution at the average prices from the next 2 minutes after the trade submission time. The trading transaction costs are also considered and assumed to be fixed to c = 0.2 · tick size per each unit of trading. Summarising, the trading simulation approximates the realistic prices through delayed execution and uses them to calculate the cumulative Profit/Loss (P/L) time series after transaction costs. Let us now formalise this process. Let ui stand for the required position vector of the ith target at times tj , j = 1, .., n, and let pi denote a corresponding vector of traded prices at times tj after 2-minute delay adjustment. The simulated strategy return after each change of the required position at time tj can be expressed by: rj,i = (pj,i − pj−1,i )uj−1,i − c(uj,i − uj−1,i )

(16)

Accordingly the cumulative P/L along the time tj , j = Pj 1, .., n is simply defined by P/Lj,i = k=1 rj,i . A. Distributed Parallel Backtesting Setup Backtesting process aims at the realistic performance evaluation of the complete trading strategy. The performance is evaluated in terms of the annualised sharpe ratio obtained over portfolio √ daily returns Rp and calculated by Sann = R¯p /σ(Rp ) 252 which assumes that a year contains on average 252 trading days. Backtesting follows a walk-forward cycle of the model learning and evaluation steps. Given the assumed training

horizon we have set the learning period to 12 months and the testing period to 3 months within each cycle. The cycles are repeated in a sequence each time moving the time forward by 3 months until the and of the backtesting period. Despite adaptive and vectorised representation for fast calculations, the amount of experimentation required to complete the backtesting significantly surpassed the limits of processing on a single machine within a reasonable amount of time. In order to address this challenge the parallel processing environment was set up utilising Condor high throughput computing software that harnesses spare processing capacity among many PCs connected on the local network [9]. Condor allows to convert all connected processor cores into a virtual cluster of processing nodes that can execute compiled programs in parallel, exchanging input and output data on the network shared locations. Inline with that infrastructure, the backtesting cycle of a single target has been implemented as an individual high level mapping task called trading experiment. Each such trading experiment involves collection of a scheduled chunk (12 months) of target and the corresponding time-aligned predictors’ training data to evaluate and select good features, then learn classification model parameters, and finally use them to simulate trading and evaluate traded performance over the subsequent 3 months of testing data. The results of trading experiments are then collected together for per-market aggregation and then all individual markets’ profit/loss (P/L) series are yet again collected together for portfolio construction and risk management. Each individual trading experiment is itself decomposed into sub-mapping processes of individual feature evaluations that are then captured by a single reducer process that checks if their performances pass selection criteria. Selected features are then passed on along wit the target labels to build a logistic classification model. For convenience and data security the above architecture for the parallel backtesting process has been implemented using local network storage. The relevant data were sent from a single location to many parallel mapping processes on their requests. For larger data sets, redundant pre-loading across distributed cluster nodes’ collocated storage may be more efficient and significantly reduce data loading time. The diagram summarising the backtesting process with a closeup on individual evaluation cycle with a single trading experiment job is shown in Figure 1. B. Mean-Variance Portfolio Optimisation To better control the risk of trading across many assets we introduce a variant of mean-variance portfolio optimisation [10], [11]. This optimisation allows to control the distribution of traded volumes across different markets in order to maximise the portfolio returns per unit of portfolio risk associated with its standard deviation of returns. The traditional mean-variance optimisation considers holding

Figure 1.

Diagram depicting an individual backtesting experiment

different assets in the proportions that maximise the portfolio return per unit of risk given the expected asset returns and the their covariance matrix. In our case, however, the assets are not held but are traded according to the required position signals ui , generated by the trading strategies. A dilemma arises which returns should be taken for the covariance matrix calculation: the market’s returns or the individual strategies’ returns. Our initial experiments confirmed that calculating covariance matrix over strategies’ returns is very unstable and prone to numerical errors at the assumed trading horizon of hours to days. Markets’ own daily returns ri (t) have been used therefore for covariance calculations. Covariance Σi,j between ith and j th markets is simply an expectation of the product of time-aligned markets’ return deviations. Since the expected market’s returns are very small compared to the actual returns one can approximate covariance with just an expected product of periodic assets’ returns: σij (t) = E[ri (t)rj (t)]. Given the desired individual market holdings defined by

α(t) vector and market covariance matrix Σ(t) the meanvariance optimisation can be defined by finding the optimal holdings vector h(t) that minimises the portfolio risk while maximising the exposure of α to trading, which can be formally expressed by: 1 (17) arg min λhT Σh − αT h 2 h subject to maximum position constraints of −hmax ≤ hi ≤ hmax where λ is a risk aversion coefficient that controls the balance between minimising portfolio risk and maximising α exposure. Note that such a well defined optimisation problem follows a standard quadratic programming formulation and can be solved for example using Matlab’s quadprog() function that uses interior-point-convex algorithm [12]. Further improvements in portfolio performance are possible through additional constraints setting (Ah = b or Ah ≤ b) in the optimisation process. C. Stochastic Trade Diffusion Due to the apparent market correlations many different trading strategies tend to trigger the trade transactions at the similar time which results in a significant market impact effect and a high price slippage losses that erode significantly the profitability of the strategies. Quite often, however there is a considerable degree of flexibility as to when exactly execute the trade. This window of accepted trade execution stretches further for longer trading horizons and can be optimised to maximise the trading turnover of simultaneously traded multiple strategies while maintaining their profitability. The idea behind stochastic trade diffuser is to simply limit the maximum trade density exposed to or visible by the market. It is parameterised by the maximum number of asset units (contracts in the case of futures trading) that can be sign-traded in a fixed moving window of time. For example assuming the maximum trade exposure to gold could be 5 contracts per 10 minutes, the controller will suspend any attempts by the competitive gold trading systems to buy more contracts if in the last 10 minutes there were (buy) trades for the total of +5 contracts. These suspended trades will land on the FIFO queue and wait for execution until the traded density falls below the set limit of 5 contracts per 10 minutes. As a result, the excess of trading activity is delayed and diffused to the market at the lower exposure period. Schematic overview of the proposed stochastic trade diffuser is depicted in Figure 2. The trade orders from multiple systems trading on the same market are represented as required position signals and fed to the diffuser. A counter calculates the running sum of the required buys and sells volume in a specified time window and if it exceeds the maximum limit the excess orders are suspended until the required position falls below the limit to allow their release. Excess orders are placed on and released from the FIFO queue to ensure the fair and timely release.

Figure 2.

Schematic overview of the proposed trade diffuser.

V. E XPERIMENTS The experiments testing the feasibility and performance of the presented large scale exogenous trading strategy have been carried out on 100 various time series ranging from 2000 until 2013. All data have been quantised at the 1 minute intervals except several interest rate time series which were only available at the daily level. Out of the 100 markets, 64 were considered tradable and represented the price series of the asset unit, all of which were converted to US Dollar units while the remaining predictors were mostly the interest rates and some industrial indices. All of the time series have been converted to the common UTC time and aligned as described in detail in Section II. The experiments followed the backtesting process exactly as described in the previous sections. However, rather than trading a single model per target market, up to 10 different trading systems were generated by using different subsets of the selected exogenous features, different lookbacks, lookaheads and classification thresholds. The choices of lookbacks and lookaheads were restricted to discrete values of 1hour, 4hours, 1day, 3days, 1week, and 1month, but were constrained by the: lookahead ≤ lookback. The choice of classification thresholds for both entries and exits was set to 70, 80 or 90 percentiles of the logistic regression output observed over the learning set. Finally the choice of up to 3 different feature subsets among the selected features was determined according to the following fixed scheme: • A subset of just a single best feature • A subset of all selected features • A subset obtained by the greedy feature addition until no performance improvement is reported Summarising, each trading experiment required up to 487 performance evaluations, which included 99 initial features evaluations for feature selection, then up to 378 different trading systems evaluations out of which only 10 best systems (with the highest overall return) are passed on for evaluation on the testing set. Across all tradable markets and time, backtesting required about 1.5 million trading performance evaluations of different systems. Using a Condor cluster with up to a 100 nodes allowed to accomplish this stretched backtesting in about 2 days. Following the initial observations, greedy feature selection appeared as the consistently best feature selection method and hence the other

two methods were dropped. The classification thresholds were also fixed at the commonly best value of 80 percentile for entries and 30 for exits. Moreover, the systems with 1 month lookback had a clear tendency to underperform and hence 1 month was dropped from the choice of lookbacks and lookaheads. These changes provided more than a 10 times speedup in backtesting execution time reducing it to just under 3 hours. Further speedups could be achieved by reducing further the level of variability distinguishing different systems. It has to be noted that once the strategy learning setup is finalised, building a model capable of live trading would only involve learning on the latest period of 12 months which only takes minutes and needs to be recalculated every 3 months. The performance of the proposed strategy is presented in a form of the profit/loss (P/L) curve which is effectively a USD-denominated cumulative return and it is accompanied by some key portfolio statistics. The P/L curve is shown in Figure 3. The strategy turned out to be quite successful returning healthy profits at the average sharpe ratio = 2.08 and the average profit per traded unit equal to 3.2 ticks, which on average translates into around $30. Each trading systems traded on average twice a day with an average holding period of 6h. As seen in Figure 3 the strategy did not have any significant drawdowns (maximum cumulative loss periods) although P/L was flat for the lengthy period between the end of 2010 and 2013.

of such exogenous-input trading strategy involves huge amount of data processing and millions of simulated trading evaluations. However, following a highly scalable parallel design of the backtesting process allows to carry out the required experiments on the modest big data commodity infrastructure or even small parallel processing cluster within a reasonable time of several hours. We have also presented a complete portfolio risk management and some trading execution aspects by demonstrating simple mean-variance optimisation of portfolio holdings and presenting a method for controlling self-inflicted trade density exposure to the market for reducing market impact effects. The complete trading strategy has been backtested and appears to show tradable performance indicated by the sharpe ratio in excess of 2 and the average profit of traded mostly futures markets in the excess of 3 ticks per contract. R EFERENCES [1] A. B. Schmidt. Financial Markets and Trading: An Introduction to Market Microstructure and Trading Strategies, Wiley Finance, 2011. [2] K. Pilbeam. Finance and Financial Markets. Palgrave Macmillan, 2005. [3] S. Shen, H. Jiang, and T. Zhang. Stock market forecasting using machine learning algorithms. Stanford University, 2012. [4] G. Mitra, D. di Bartolomeo and A. Banerjee. Automated Analysis of News to Compute Market Sentiment: Its Impact on Liquidity and Trading-The Future of Computer Trading. Financial Markets-Foresight Driver Review-DR 8, Government Office for Science, Foresight, Jul. 20, 2011. [5] G. Bontempi, S.B. Taieb, Y.-A. Le Borgne. Machine Learning Strategies for Time Series Forecasting. Busines Intelligence: LNBIP 138, pp 62-77, 2013. [6] W. Huang, Y. Nakamori, S.-Y. Wang. Forecasting stosk market movement direction with support vector machine. Computer and Operations Research 32(10): 2513-2522, 2005. [7] G. Creamer, Y. Freund. Automated trading with boosting and expert weighting. Quantitative Finance 10(4): 401:420, 2010. [8] V. Mayer-Schonberger, K. Cukier. Big Data: A Revolution That Will Transform How We Live, Work, and Think. Harcourt, New York, 2013.

Figure 3.

Cumulative Profit/Loss (P/L) curve of the backtested strategy

VI. C ONCLUSION Concluding, this work illustrates how one can design and test a large scale portfolio trading strategy with the machine learning engine utilising many tradable products and some additional descriptive time series, to generate the target market’s trade signals. As demonstrated above, backtesting

[9] D. Thain, T. Tannenbaum, and M. Livny, Distributed Computing in Practice: The Condor Experience. Concurrency and Computation: Practice and Experience 17(2-4): 323-356, 2005. [10] D. Bayley and M. Lpez de Prado. The Sharpe Ratio Efficient Frontier. Journal of Risk 15(2): 3-44, 2012. [11] H. M. Markowitz, G. P. Todd, and W. F. Sharpe. Meanvariance analysis in portfolio choice and capital markets. Wiley, 2000. [12] N. Gould and P.T. Toint. Preprocessing for quadratic programming. Series B, Vol 100, pp. 95-132, 2004

Suggest Documents