Multivariate Maximum Likelihood based Predictive Algorithm Application on Currency Market Stability Estimation with R Data Analysis Smail TIGANI∗
Rachid SAADANE†
Euro-Mediterranean University Fes, Morocco
[email protected]
Hassania School of Public Labors Casablanca, Morocco
[email protected]
ABSTRACT
1
In this paper, we carried out a detailed statistical predictive model to estimate the financial market stability based on historical data. The given stability indicator allows the trader to make the decision about which part of the day a financial instrument is profitable but not risky. This model is implemented as external indicator that can be integrated in the MetaTrader platform and uses historical and real time financial data coming, by streaming, from the trading broker. Using this custom indicator, the trader will save the lost time that he/she spent in front of screens waiting for a valid signal to trade, and avoid the high volatility periods of the day during which the market is unpredictable and risky. The algorithm can be also implemented in algorithmic trading robot as a trigger to turn it on at the best and right moments.
The financial markets calendar anomalies have been deeply discussed and studied by finance experts for a long time. Many artificial intelligence research also focuses on the study of the financial market stability, the risk evaluation and the prices prediction. Let’s talk about some of the factors that influences the asset prices. As explained in [7], the factor that investors have considered in the prediction of prices is trading volume. It is a measure of the quantity of shares that change owners for a given security. For example, on the New York Stock Exchange known by "NYSE", the daily average volume for 2002 was 1.441 billion shares, contributing to many billion dollars of securities traded each day among the roughly 2800 companies listed on the NYSE. The amount of daily volume on a security can fluctuate on any given day depending on the amount of published - or known with different manner - information available about the company. This information can be a press release or a regular earnings announcement provided by the company, or it can be a third party communication, such as a court ruling, social networking like Twitter as explained in [6] or a release by a regulatory agency pertaining to the company. The abnormally large volume was due to differences in the investor’s view of the valuation after taking in consideration the new information. Because of what can be inferred from abnormal trading volume, analysis of trading volume and associated price changes corresponding to informational releases has been of much interest to researchers. It is important for an investor or trader to study the stability factor of any market and many other factors that influences the prices like the interest rates [1], jobless, political stability and more. Whatever it is, the difference between the highest and lowest asset values - that we will call a diameter in this work - will be bigger when the prices are impacted by external and perhaps unknown factors.
CCS CONCEPTS • Mathematics of computing → Multivariate statistics; • Theory of computation → Probabilistic computation;
KEYWORDS Predictive Analytic, Machine Learning, Financial Markets, Algorithmic Trading, Data Analysis ACM Reference Format: Smail TIGANI and Rachid SAADANE. 2018. Multivariate Maximum Likelihood based Predictive Algorithm: Application on Currency Market Stability Estimation with R Data Analysis. In Proceedings of The 2nd Mediterranean Conference on Pattern Recognition and Artificial Intelligence (MedPRAI’18). ACM, New York, NY, USA, Article 4, 5 pages. https://doi.org/ ∗ Smail
TIGANI is a machine learning and big data professor with the engineering faculty of the Euro-Mediterranean university of Fes. He received the telecomunication systems engineering degree in 2012 and the Ph.D in computer science specially in statistical learning algorithms in 2016. Recently he focus is interested in machine learning with applications on financial markets. † Rachid SAADANE received the B.S. degree in physic electronic from the Faculty of Science of Rabat, Morocco, in 2001 and the Ph.D. degree in computer sciences and telecommunications from Faculty of Sciences of Rabat in 2007. He currently works as a Research and an Associate Professor with the Electrical Engineering Department at EHTP, Casablanca, Morocco. His research interests include array of UWB and channel measurements characterization, digital signal processing and machine learning. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). MedPRAI’18, March 2018, Rabat, Morocco © 2018 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-5290-1. . . $15.00
INTRODUCTION
Many researchers build mathematical models and algorithms for price prediction [3] or trend classification [9]. For that, some of them uses linear discrimination algorithm [9] or regression algorithms like in [10]. The aim of the work is to use statistical learning - part of artificial intelligence - to find the best hours on the day to make profit without loosing time in front of screens. The work focuses on the currency market as a special case of financial markets that included also stocks, oils, etc. The content of this paper is organized as follows : the section intituled "Statistical Model Building" introduces the mathematical models building details. The "Historical Data Analysis" section explains how the proposed approach was tested in real financial data
MedPRAI’18, March 2018, Rabat, Morocco coming directly from the FX broker. In other words, we present the data analysis results in it, while the section "MT4 Indicator Implementation" presents the practical aspect and some results when the indicator was executed on the MetaTrader platform. In the end, we present concluding remarks followed by perspectives.
2 STATISTICAL MODEL BUILDING 2.1 Preliminaries and Definitions In this project, we consider the volume as one of the crucial parameters. The proposed machine will continuously supervise the evolution of the invested volume for each time frame (1 hour in our case). To do so, let’s call γkv (h) the kth observed value of the volume at the hour h. For example, if our data set contains the measures of one month, so we will have many values of the volume for the same hour h: the volume today at 14h is not necessarily the same yesterday at 14h. The stability of the forex market depends on the published news, their impacts, the behavior of the traders and others. Actually, we don’t have access to all those information and some of are hidden. Some technical analysis like RSI (Relative Strength Index), MACD (Moving Average Convergence Divergence) and Moving Averages like in [2] allows to detect some hidden intentions. But we believe that all those factors impact the variation and the evolution speed of the prices. In some period of the day, the same scenarios are repeated. The decision is really too hard and very fast for some hours. However, the trends are somehow predictable for those pairs but for others hours, while the trading is not really interesting in some period of the day because the evolution is very slow. The trader will not loose and also will not win in this case. This is why we have to identify those clusters then the trader will choose the medium hours to trade safely. For that, we define the prices diameter γkd (h) at the hour h and the difference between the highest price and the lowest one of the same hour. We have access to the volume, the prices and the highest and lowest prices archive. That allows our online learning algorithm building the stability indicator for each hour. Let suppose that we have n observations for the hour h stored in the learning data set: Dhn = {(γ 1d (h), γ 1v (h)), ..., (γnd (h), γnv (h))}. The stability estimator goal is mainly the study of the average behavior of the impact of the volume and diameter on the prices motion. We define the random variable X n (h) in 1 as the vectorial average of all observations given the concerned hour h. Formally: ! n n Õ Õ h −1 d −1 v Xn , n γk (h), n γk (h) (1) k =1
k=1
Since the observed parameters has the same length, we can rewrite X n (h) simply like: n Õ X nh = n−1 γkd (h), γkv (h) k =1
2.2
S. TIGANI and al. theorem establishes that, when independent random variables are added, their sum converges to a gaussian distribution even if those variables themselves do not follow a normal distribution. In other words, the arithmetic mean of a sufficiently large number of iterates of independent random variables having a well-defined finite expected value and standard deviation, will be approximately normally distributed. In fact, the definition in 1 is an arithmetic mean and that allows us to conclude that X nh ∼ N (m, C) with m ∈ R2 , C ∈ M2 (R). The probability density function of X nh is the bi-variate gaussian and |C| is the determinant of the covariance C. −1
h
e − 2 (X n −m)C (X n −m) p 2π . |C|
t
(2)
In this case, we need to find the probability density function that is able to generate data-points that are similar to our observed data arranged in the data set Dhn . In other words, we need to find the parameters m and C of the normal distribution N (m, C) whose the generated data have a likelihood with our observed data [5].Therefore, we define the likelihood criteria L(Dhn , m, C) = ln p Dhn |m, C that measures this likelihood. After simplifications we find the expression in 3: L(Dhn , m, C) =
n Õ k =1
ln p X kh |m, C
(3)
By including the density expression of the equation 2 in the probability formula mentioned in 3 we obtain the final expression of the log-likelihood to be optimized. It is: L(Dhn , m, C) = −n. ln (2π .|C|) −
n i 1Õh h (X n − m)C−1 (X nh − m)t 2 k =1
ˆ as the best values of the gaussian parameters. AcLet denote m, ˆ C cording to the maximum likelihood definition, the best parameters are the arguments that maximize the likelihood criteria L(Dhn , m, C) having the expression established in 3. This is formally given by the equation 4 below: ˆ = arg max L(D n , m, C) (m, ˆ C) (4) h m,C
That comes to find the analytical solutions of the log-likelihood function given by the equation ∇L(Dhn , m, C) = (0, 0). In other words, we must find the solutions of the equation 5. It is also possible to use the Expectation Maximisation algorithm to estimate the parameters like in [4, 8]. See the analytical way bellow: ! ∂L(Dhn , m, C) ∂L(Dhn , m, C) , = (0, 0) (5) ∂m ∂C The solutions of 5 give - with (t) is the transpose - the bivariate gaussian parameters: m ˆ = n −1
Maximum Likelihood Density Estimation
Let focus now on the probability density of the random variable X nh that we defined in 1. In probability theory, the central limit
h
1
ϕ(X nh ) =
ˆ = n −1 C
n Õ
X nh
(6)
h (X nh − m)(X ˆ ˆ T n − m)
(7)
i=1 n Õ i=1
Multivariate Maximum Likelihood based Predictive Algorithm
2.3
MedPRAI’18, March 2018, Rabat, Morocco
Hypothesis Testing
3.2
We can now move to the decision rule. In this subsection, we explain how the decision will be made: let define two ceils for the volume and the diameter parameters. We can say that the tolerable value of v d risk is 10% from the biggest value denoted γmax and γmax . Formally: v v i i γceil = 10%γmax and γceil = 10%γmax . Based on these values, we define the stability zone denoted Ωs which represents the set of the tolerable (diameter and volume) data-points: d d v v Ωs = {(u, v) ∈ R2 : γmin ≤ u ≤ γceil , γmin ≤ v ≤ γceil }
(8)
We can define the hypothesis H 0 meaning that the financial market is stable (no flash crashes and no high volatility (with a probability)). It is hold if the X n is in the stability zone: p(X n ∈ Ωs ) > 0.5. While the opposite hypothesis H 1 is hold and H 0 is rejected, which means that the trader have to avoid risky trading when p(X n ∈ Ωs ) ≤ 0.5. This probability is computed using the next formula 9: p(X n ∈ Ωs ) =
∬ Ωs
ϕ(γ d , γ v )∂γ d ∂γ v ≈
ÕÕ γd
ϕ(γ d , γ v )
Data Visualisation
In order to study the instability of the financial instrument, currency pairs in our case, we have added the next plots that represent the instability percentage given in a specified hour of the day. This data visualization allows us to observe the hours in which the trade is safe and profitable. Let’s illustrate with an example, the figure 1 shows that the trade of the Euro vs the US Dollar is recommended from 08 : 00 and still interesting until 22 : 00 because the pair is active on that period. However, it is better to avoid the trading in an hour like 15: 00 and 16 : 00 because the EURUSD market is very unstable and unpredictable. We notice that the stability is good at 14 : 00 but suddenly it jumps at 15 : 00, which explains that what happens in 15 : 00 is due to an event that occurs in 14: 00. We conclude that it is wise and mandatory to avoid trading on EURUSD at 14: 00. In fact, the European Central Bank announces very important economic news that influence the investors on the Euro around 14 : 00.
(9)
γv
3 HISTORICAL DATA ANALYSIS 3.1 Financial Data Set
Figure 1: EURUSD Stability Distribution Over 24 Hours
The data sets used in this analysis are downloaded directly from a reliable FX1 broker servers. In this work, we presented the analysis of the pairs EURUSD, EURGBP and NZDCHEF as examples. CSV files are downloaded, they contain the historical financial data since 2011, and some of them start from 2002 until august 2018. The tables contain seven columns: Date, Time, High, Low, Open and Close prices then the Volume. Let explain each column: the date represents the date of the measure, while the time is the measure hour (for hourly table). We can download data by another time frame: 1minute, 5minutes, 30minutes, 4hours, 1day, etc. The high column is the highest price reached by the financial product during this time frame and the low column is the lowest price. The product price at the beginning of the time frame is called (Open Price), while the (Close Price) is the product price at the end of the time frame. The product’s price is up when the close price become greater than the open price. Unfortunately, the prices are down when the open price is greater than the close price. The last column named (volume) refers to the total amount of money invested in the corresponding time frame. Basically, the volume - but not only the volume - impacts the prices when big investments are made. The currency market is ultimately driven by many economic factors that reflect a country’s economic strength. The economic outlook for a country is the most important determinant of its currency’s value. So, knowing the factors and indicators help any trader to keep place in the competitive and fastmoving currency’s world. The volume of invested money at a given time frame is also a good indicator to understand the investors’ sentiments. It is of crucial importance to know how other traders think because any one of them relies only on his own data source.
Let’s move to the EURGBP (Euro vs British Pound) illustrated in the figure 2. By following the same analysis, it is possible to conclude that the best hours to trade it is from 08 : 00 until 21 : 00. But we should be very careful with this pair because it is very unstable and the analytics done have proved it. The biggest stability percentage for the EURGBP is very small compared to the EURUSD.
MedPRAI’18, March 2018, Rabat, Morocco
Figure 2: EURGBP Stability Distribution Over 24 Hours
S. TIGANI and al. By analyzing all currency pairs with the same tool, we can produce a very profitable trading planning that shows the best pairs and the best hours for currency traders without losing time and money.
4 MT4 INDICATOR IMPLEMENTATION 4.1 Computation Algorithm Design The probability of stability given in the equation 9 is computed by the algorithm 1 bellow: Algorithm 1: Algorithm Stability v1.0 Output: pV alue: Stability Probability 1 2 3 4 5 6 7
v v ← γmin d d ← γmin pV alue ← 0 d d )/100 εd ← (γceil − γmin v − γ v )/1000 εv ← (γceil min v while v ≤ γceil do d while d ≤ γceil do
pV alue ← pV alue + ϕ(d, v) d ← d + εd
8 9
The figure 3 highlights the best hours to trade the NZDCHF. It is recommended to avoid the hours that minimizes the stability because the motions are so slow which means that not profit but also no loss will be made. We recommend also to avoid the hours that maximizes the stability because the markets move impredicative which means risky hours.
Figure 3: NZDCHF Stability Distribution Over 24 Hours
10
v ← v + εv
11
return pV alue
4.2
Computational Complexity Estimation
Let’s estimate the algorithmic complexity of the stability probability computer 1. We call Cϕ the algorithmic cost of the algorithm that computes the gaussian probability using the equation 2. In order to do the digitization of the continuous double integral, we have to define the two steps εv and εd . To estimate an algorithm computational complexity, the first thing to do is counting how many fundamental instructions this piece of code executes. As we analyze this piece of code, we want to break it up into simple instructions; things that can be executed by the CPU directly - or close to that. We’ll assume that our processor can execute the following operations as one instruction each: Assigning a value to a variable, Comparing, Array access, Incrementing and decrementing, Elementary arithmetic and logical operators, and Return instruction. In complexity theory, the affectation and arithmetical operations are elementary and should be computed in the cost estimation. The table 1 summaries the elementary operations of the concerned algorithm with the repetition number of each operation bloc. Table 1: Elementary Operations Table Bloc
Operation Number
Repetition’s #
1: Lines 1 to 5
9
1
2: Lines 8, 9
4 + Cϕ
v − γ v )ε −1 (γceil min v
3: Lines 7 to 10
d d )ε −1 2 + 4 + Cϕ )(γceil − γmin v 1
d d )ε −1 (γceil − γmin d 1
4: Line 11
Multivariate Maximum Likelihood based Predictive Algorithm We got 10 operations in the bloc 1 and 4 and the bloc 2 in included in the bloc 3. The total cost that summarizes the table 1 is in the expression 10 # " d d v γ − γmin γ v − γmin 10 + ceil ( ceil )(4 + Cϕ ) + 2 (10) εv εd d During the data analysis with R, the values of εd was (γceil − v v d γmin )/100 and εv was (γceil − γmin )/1000.
Let’s move to spacial algorithmic complexity. The algorithm 1 needs 20 Bytes of memory space. Based on the IEEE 754 Standard, All variables are Floats that needs 4 Bytes for each one. See the table 2: Table 2: Variables Summary Table
4.3
Variable
Description
pV alue v d εd εv
Ptobability of Stability Temporal Variables for Volume Itterations Temporal Variables for Diameter Itterations Steps of Diameter Motion Steps of Volume Motion
Results under MetaTrader
This tool gives the trader a clear visibility about the risk management and let him choose the right hour to trade. Moreover, we have to explain that the best moment to trade is when the stability estimator shows a percentage greater than 30% and less than 70%. A very small percentage reflects a high volatility which makes the market behavior unpredictable... while the high percentages reflect a high stability on which the prices do not really change significantly: That is to say, the trader will neither lose nor win. The stability estimation algorithm 1 is implemented using MQL4 programming language. It computes the probability of the market stability for the current hour. The table 3 summarizes the obtained probabilities for many currency pairs. The measures are done at 23:30, 24 December, 2016. See the table 3: Table 3: Market stability estimations for many pairs at 23h. Currency Pair
Stability Probability
AUDUSD EURUSD USDCAD NZDUSD GBPUSD GBPJPY
42.88% 92.38% 31.94% 15.02% 04.50% 31.94%
The data reported in the aforementioned table led us to conclude that the pair EURUSD is the most stable at 23h. In the trading platform, we notice that the prices motion are very slow. This is explained by the fact that the stock exchange of Europe are closed at that moment. However, the pair NZDUSD and GBPUSD seem to be
MedPRAI’18, March 2018, Rabat, Morocco active according to the result of the stability estimation algorithm. In fact, that confirms the reality in real time.
5
CONCLUSION
This paper presents a predictive model based on multivariate maximum likelihood for financial market stability estimation. The purpose of the proposed statistical model is to be implemented as a custom indicator, that can be plugged in a trading platform. This indicator allows us to detect the best hours to trade a given financial instrument. The proposed approach is mathematically proven and experimentally validated by analysing historical data going from 2001 up to 2017. Since the proposed algorithm is able to give the right profitable but not risky moments, it could be interesting to add - as a perspective - this ability to any trader robot to increase its efficiency.
ACKNOWLEDGMENTS The authors would like to thank the anonymous referees for their valuable comments and helpful suggestions. Special thanks goes to everyone that improved the language’s quality and made this paper more readable. The work is supported by the Euro-Mediterranean University of Fes.
REFERENCES [1] Armand Fouejieu. 2017. Inflation targeting and financial stability in emerging markets. Economic Modelling 60, Supplement C (Jan. 2017), 51–70. https://doi. org/10.1016/j.econmod.2016.08.020 [2] A. A. B. Branquinho, C. R. Lopes, and A. C. E. Baffa. 2016. Probabilistic Planning for Multiple Stocks of Financial Markets. In 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI). 501–508. https: //doi.org/10.1109/ICTAI.2016.0083 [3] Abir Jaafar Hussain Dhiya Al-Jumeily Martin Randles Haya Al-askar, David Lamb and Paul Fergus. [n. d.]. Predicting financial time series data using artificial immune systemâĂŞinspired neural networks. ([n. d.]). [4] Ken Kreutz-Delgado Jason A. Palmer and Scott Makeig. [n. d.]. An EM algorithm for maximum likelihood estimation of Barndorff-Nielsen’s generalized hyperbolic distribution IEEE Conference Publication. ([n. d.]). http://ieeexplore.ieee.org/ abstract/document/7939245/ [5] J. Jiao, K. Venkat, Y. Han, and T. Weissman. 2017. Maximum Likelihood Estimation of Functionals of Discrete Distributions. IEEE Transactions on Information Theory 63, 10 (Oct. 2017), 6774–6798. https://doi.org/10.1109/TIT.2017.2733537 [6] N. Kanungsukkasem and T. Leelanupab. 2015. Finding potential influences of a specific financial market in Twitter. In 2015 7th International Conference on Information Technology and Electrical Engineering (ICITEE). 414–419. https: //doi.org/10.1109/ICITEED.2015.7408982 [7] Walter Sun. 2003. Relationship between Trading Volume and Security Prices and Returns. MIT Laboratory for Information and Decision Systems P-2638. Massachusetts Institute of Technology. [8] Jesper Rindom Jensen Mads GrÃęsbÃÿll Christensen Tobias LindstrÃÿm Jensen, Jesper KjÃęr Nielsen and SÃÿren Holdt Jensen. [n. d.]. A Fast Algorithm for Maximum-Likelihood Estimation of Harmonic Chirp Parameters - IEEE Journals & Magazine. ([n. d.]). http://ieeexplore.ieee.org/abstract/document/7967871/ [9] H. H. Tung, C. C. Cheng, Y. Y. Chen, Y. F. Chen, S. H. Huang, and A. P. Chen. 2016. Binary Classification and Data Analysis for Modeling Calendar Anomalies in Financial Markets. In 2016 7th International Conference on Cloud Computing and Big Data (CCBD). 116–121. https://doi.org/10.1109/CCBD.2016.032 [10] Ciprian Chirilaa Viorica Chirilaa. 2015. Financial market stability: a quantile regression approach. 20 (2015), 125–130. https://doi.org/10.1016/S2212-5671(15) 00056-8