Nonlinear dynamics proposes a new way of viewing financial time series, and it .... wavelet functions on the performance of the hybrid wavelet Multilayer Perceptron Neural. Network ...... Fay D. and Ringwood J. V.,âA wavelet transfer model for time series forecastingâ ,. International ..... Hoboken, New Jersey (2014). 142.
Applications of Wavelet Neural Networks in Financial Time Series THESIS SUBMITTED FOR THE AWARD OF THE DEGREE OF Doctor of Philosophy IN COMPUTER SCIENCE BY MOHD YASIN PIR
UNDER THE SUPERVISION OF PROF. MOHAMMED ASGER
&
DR. FIRDOUS AHMAD SHAH
DEPARTMENT OF COMPUTER SCIENCES SCHOOL OF MATHEMATICAL SCIENCES AND ENGINEERING BABA GHULAM SHAH BADSHAH UNIVERSITY RAJOURI (J&K)-185234 2016 1
Abstract A time series consists of the sequence of values sampled at different points in time, for example stock market index, foreign exchange rates, trading prices, inflation rates and asset prices are the simple examples of the financial time series as they have non-stationary and nonlinear behavior. The basic idea behind time series forecasting involves the development of models that estimate future values based on its past values. The predictions in financial time series have been studied using conventional and traditional statistical linear models for many decades but they have seldom proved fruitful owing to the presence of noise and non-linearity in the time series. On the other hand, the successful application of non-linear methods in other areas of research has kindled the hopes of financial researchers. Nonlinear dynamics proposes a new way of viewing financial time series, and it suggests new techniques for empirically measuring their nature. This new discipline proposes that past values of the time series will help to determine the future values but not in a straightforward way as the relation of past values to future values are non-linear and this non-linearity implies that the change in past values can have wide range effects on future values. Wavelets are mathematical functions that decompose data into different frequency components, after which each component is studied with a resolution matched to its scale. These functions are generated by using translation and dilation of a single function called a mother wavelet. The main reasons for the discovery of wavelets and wavelet transforms is that the Fourier transform analysis does not contain the local information of signals and therefore cannot be used for analyzing signals in a joint time and frequency domain. Wavelet analysis has shown a tremendous performance in the area of financial time series analysis as it provides an important tool for extracting information from financial data with applications ranging from short term prediction to the testing of market models due to its flexibility to handle very irregular data series. Wavelets possess ability to locate precisely time regime shifts and discontinuities by decomposing financial time series on a variety of time scales simultaneously so that relationships between economic variables may well differ across time scales. They have been successfully used in forecasting stock market prices, crude oil i
prices, GDP growth, trading prices, exchange rates, expenditure and income, money growth and inflation, volatility in foreign exchange markets, price fluctuations, salesetc from the last decade. An artificial neural network (ANN) is a set of interconnected units called neurons, which learns from a set of previous observations without being aware of relations between them, to capture the patterns in the learning set. They simulate the learning and decision making processes of the human brain. They have drawn considerable attention in financial engineering in the recent years because of their interesting learning abilities. Recent studies have revealed the predictive power of the Artificial Neural Networks (ANN) in approximating any discontinuous function as they have the ability to formalize unclassified information and more importantly, to forecast financial time series. Another important advantage of ANN is that they can approximate any nonlinear function without having any prior assumption about the properties of the data series unlike the traditional forecasting methods which assumes a linear relationship between inputs and outputs. In recent years, AAN have successfully been applied for the forecasting of financial time series such as stock market indexes, exchange rates, crude oil prices, inflation and gold prices. The combination of wavelet and neural networks is called Wavelet Neural Networks. The origin of wavelet networks can be traced back to the work of Daugman in 1988, which uses Gabor wavelet and neural network for the classification of images and became popular after the pioneer work of Zhang, Benveniste and Szu in early 1990’s. The field of wavelet neural networks is still new, although some sporadic and isolated attempts have taken place in recent years to build a theoretical basis and applications in various areas such as engineering, economics, medical sciences and social science. Wavelet neural networks are also suitable for forecasting the financial time series because wavelets can decompose the financial time series into their time-scale components and unravel the non-linear relationship between economic variables. In recent years, economists have shown a considerable interest for forecasting financial time series using hybrid models. In this study, we came with an idea of combining the wavelet analysis and neural networks to investigate different aspects of financial time series with economically meaningful variables to achieve optimal forecasting. Therefore, the goal of this ii
study is to show the efficiency and significance of wavelets and artificial neural networks in forecasting financial time series. We use wavelet and neural network tool boxes of Matlab R2010 for our model for analyzing and synthesis of forecasting financial time series. One financial variable that has been particularly becoming popular in forecasting real economic activity is the difference between long-term and short-term risk-free interest rates, usually known as the yield spread. Yield spread has an edge over simple interest rates because short-term rates do not contain all the information that can help to predict future economic activity. The yield spread has been found useful for forecasting such variables as output growth, inflation, industrial production, consumption, and recessions, and the ability of the spread to predict economic activity has become something of a stylized fact among macroeconomists. Using the yields on securities with maturities ranging from three months to ten years, we analyze four different long minus short yield spreads (policy horizon and non-policy horizon) by decomposing them with Daubechies wavelets. We investigate the predictive power of yield spreads for forecasting IIP growth by first decomposing the yield and IIP growths using Daubechies wavelets. We observed that the predictive power of each of these spreads for economic activity within aggregate and time scale framework using Wavelet Neural Networks gives slightly better forecasting ability than Neural Networks and significantly better results can be obtained than other conventional techniques. The financial crises that took place during 2007-09 disclosed several drawbacks in which the global stock market underwent significant losses and US stock market was its epicenter. We study, the interdependence between the US stock market and India, China, Germany, UK and Japan market using hybrid Wavelet Multilayer Perceptron Neural Network (MLPNN) model for Pre-Crises, Crises and Post-Crises control periods. The whole data was divided into three periods: Pre-Crises, Crises and Post-Crises. We use weekly data from major national stock market indices of six countries: United States, India, China, Germany, United Kingdom and Japan. Comparison of stock exchange indexes encompasses the period of Feb, 1999 to October, 2013.We observe that there is least inter-connection between US and China markets in the Pre-Crises period followed by Indian stock market which indicates least co-movement with regard to US market. While in the Crises Period, we observe that Germany has again got iii
highest inter-connection with US stock markets and also indicates that German market got most affected during the US crises of 2007-09 than other markets under study. In the Post-Crises period, we observe that India has got the least interconnection with US stock market which indicates that there is a decreasing trend in inter-connection between US and India markets from Crises to Post-Crises period also. Exchange rate is one of the most important determinants of a country's relative level of economic health. Exchange rates play a vital role in a country's level of trade, which is critical to most every free market economy in the world. For this reason, exchange rates are among the most watched, analyzed and governmentally manipulated economic measures. Moreover, empirical studies reveal that there is an inverse relationship between the US dollar exchange rate and the crude oil prices. This relationship between the crude oil price and the US dollar exchange rate has attracted the interest of many economists.As both time series – the crude oil prices as well as the effective exchange rate of the dollar – are non-stationary, we investigate the relationship between real effective exchange rate and crude oil prices by hybrid wavelet network. We use a simple Multi-layer Perceptron Neural Network (MLPNN) based wavelet decomposition to analyse the relationship between real effective exchange rate and crude oil prices. The study for India indicate that crude oil prices effect the real effective exchange rate and the hybrid model better untangle the relationship between real effective exchange rate and crude oil than other models. Next, we give a comparative study of different wavelet based neural network models for forecasting IIP growth using five different yield spreads. These models differ by wavelets such as Haar, Daubechies, Symlets, Coiflets and discrete Meyer. We evaluate the effect of 16 mother wavelet functions on the performance of the hybrid wavelet Multilayer Perceptron Neural Network (MLPNN) IIP Growth–Yield spreads model by using Discrete Wavelet Transform (DWT) technique. The wavelet based neural network models for IIP Growth–Yield Spread modeling performed better when approximation was done with Daubechies wavelet family compared to other wavelet families. Moreover, we observe that the subclass db4 performed better among Daubechies wavelet family in terms of approximation or data preprocessing. It was further observed that the hybrid wavelet based MLPNN model give better results at minimum possible decomposition level i.e. level 1 for IIP Growth–Yield Spread modeling for iv
all wavelets under consideration. The hybrid model also revealed that Yield Spread Sp(10,3) i.e. , Non-policy horizon has better forecasting ability than other four yield spreads in IIP Growth–Yield Spread modeling. The present study successfully demonstrated that the MLPNN model with db4 wavelet function enhance the performance of the neural network models. The selection of suitable wavelet function and the optimum scale are very important.
It is also
observed that yield spread in general is a reliable predictor of output growth such as IIP and that the spreads provides good recession forecasts.
v
Contents
1.
List of Tables
viii
List of Figures
ix
Acknowledgements
xi
Introduction to Time Series Analysis
1
1.1 1.2 1.3 1.4 1.5 2.
Artificial Neural Networks and Wavelet Analysis 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9
3.
Time Series ………………………………………………………………… Time Series Decomposition ……………………………………………… Time Series Analysis and Forecasting …………………………………….. Brief History of Advanced Time Series Forecasting Methods ……………... Related Work …………………………………………………………….
Introduction to Artificial Neural Networks ……………………………….. Historical Perspective of Neural Networks ……………………………….. Neural Networks Architectures ………………………………………… Learning Paradigms in Artificial Neural Networks ……………………… Forecasting with Artificial Neural Networks …………………………... A Historical Glance on Wavelet Analysis ……………………………….. An Overview of Wavelet Analysis ………………………………………… Daubechies Wavelets ………………………………………………………. Introduction to Multiresolution Analysis ……………………………......
Forecasting Models for Time Series Analysis
1 3 4 6 6 11 11 12 15 18 20 21 22 26 28 30
3.1 Statistical Methods in Time Series Forecasting ……………………………
30
Exponential Smoothing Methods ………………………………... Regression Methods ……………………………………………… Autoregressive Integrated Moving Average (ARIMA) Method ….. Threshold Methods ……………………………………………… Generalized Autoregressive Conditionally Heteroskedastic (GARCH)Method ……………………………………………...
30 34 36 39
…………………………...
43
…………………………………….
48
3.1.1 3.1.2 3.1.3 3.1.4 3.1.5
3.2 Neural Networks in Time Series Forecasting 3.3 Wavelets in Time Series Forecasting
vi
40
4.
Wavelet Neural Networks in Financial Forecasting 4.1 Design of Wavelet Neural Networks …………………………………… 4.2 Brief Historical Perspective of Wavelet Neural Networks ……………. 4.3 Methodology …………………………………………………………...
5.
Experimental Results and Discussions
52 52 55 59 62
5.1 Problem 1: Using Yield Spreads to Forecast IIP Growth ………………… 5.2 Problem 2: Analyzing the relationship between Real Effective Exchange Rate and Crude Oil prices with hybrid wavelet model ……….. 5.3 Problem 3: Effect of U.S. Sub-Prime Crises on five major stocks: A study with wavelet networks ………………………………. 5.4 Problem 4: Comparative Study of Different Wavelet based Neural Network Models for IIP Growth forecasting using different Yield Spreads ……………………………………
62
6. Conclusion and Future Directions 6.1 Conclusion …………………………………………………………… 6.2 Future Directions ………………………………………………….. 6.3 Publications out of this thesis …………………………………………
98 98 100 103
73 80
91
Bibliography
104
Similarity Index
116
Selected Publications out of this thesis
123
vii
List of Tables Table 5.1.1: Results of regression analysis (MSE) of Neural Network& Wavelet Neural Network Model ………………………..
71
Table 5.2.1: Descriptive statistics of time series data
…….………………..
74
Table 5.2.2: Performance results of four models
...……………………..
76
Table 5.2.3: Wavelet based study by Daubechies wavelet
………………………..
77
Table 5.3.1: Pre-crises period (Feb.1999 to June, 2007)
...……………………..
81
Table 5.3.2: Crises period (July, 2007 to December, 2009)
………………………
81
Table 5.3.3: Post-crises period (January, 2010 to October, 2013)…………………….
81
Table 5.3.4: Pre-crises data with 431 samples (Decomposition by Daubechies Wavelet)
……………………
84
Table 5.3.5: Crises data with 130 samples (Decomposition by Daubechies Wavelet)
……………………
84
Table 5.3.6: Post-Crises data with 201 samples (Decomposition by Daubechies Wavelet)
……………………
85
Table 5.3.7: Hybrid model results
……………………
87
Table 5.3.8: ANN model results
……………………
88
Table 5.4.1: Summary statistics of time series data
……………………
91
Table 5.4.2: Approximation results of different wavelets
……………………
92
…………………...
93
…………………..
94
Table 5.4.3: Wavelet decomposed Neural Network Model (For different wavelets obtained with Sp(10,3) at different lags) Table 5.4.4: Neural Network Model (For different Spreads at different time lags)
viii
List of Figures Figure 1.1: Strong Seasonality ………………………………………………………
2
………………………………………………………
2
…………………………..
3
Figure 1.4: No trend, seasonality or cyclic behavior ……………………………….
3
Figure 2.1: Artificial Neuron and Multilayered Artificial Neuron Network ………..
11
……………………………………………..
15
………………………………………………….
15
Figure 2.4: Haar Wavelet ……………………………………………………………
26
Figure 2.5: Daubechies (D4) Wavelet ……………………………………………..
27
………………………………………...
35
……………………………….
53
Figure 1.2: No Seasonality
Figure 1.3: Strong increasing trend,strong seasonality
Figure 2.2: Feed Forward Topology Figure 2.3: Recurrent Topology
Figure 3.1: Linear Regression on data set
Figure 4.1:Structure of Wavelet Neural Network
Figure 4.2: A Wavelet Neural Network ………………………………………………
53
Figure 4.3: A Multidimensional Wavelon structure ………………………………..
55
Figure 4.4: Framework of proposed model ………………………………………....
60
Figure 5.1.1: IIP growth ………………………………………………………….....
64
Figure 5.1.2: Yield Spread Sp(1,3)
……………………………………………….
64
Figure 5.1.3: Yield Spread Sp(10,5)
……………………………………………….
64
Figure 5.1.4: Yield Spread Sp(5,3)
……………………………………………….
65
Figure 5.1.5: Yield Spread Sp(10,3)
……………………………………………….
65
Figure 5.1.6: Model 1(Neural Network Model) ……………………………………… 66 …………………………………….... 66
Figure 5.1.7: Normalised IIP growth values
Figure 5.1.8: Normalised Yield Spread Sp(1,3) values
…………………………….. 67
Figure 5.1.9: Normalised Yield Spread Sp(10,5) values …………………………….. 67 Figure 5.1.10: Normalised Yield Spread Sp(10,3) values …………………………….
67
Figure 5.1.11: Normalised Yield Spread Sp(5,3) values …………………………….
68
Figure 5.1.12: Model2 (Wavelet and Neural Network Model) Figure 5.1.13: SWT denoised IIP growth values ix
…………………… 68
…………………………………
69
………………………
69
Figure 5.1.15: SWT denoised Yield Spread Sp(10,5) values ………………………
69
………………………
70
Figure 5.1.17: SWT denoised Yield Spread Sp(10,3) ..……………………………...
70
………………………………………...
72
Figure 5.1.19: Wavelet Neural Network forecasts
……………………………….
72
Figure 5.2.1: Crude oil and Exchange rate returns
……………………………….
74
Figure 5.2.2: Oil returns and its approximations by Db4 wavelet …………………
77
Figure 5.1.14: SWT denoised Yield Spread Sp(1,3) values
Figure 5.1.16: SWT denoised Yield Spread Sp(5,3) values
Figure 5.1.18: Neural Network forecasts
……
78
Figure 5.2.4: Framework of Hybrid Wavelet and Neural Network Model ………...
79
…………………………...
79
Figure 5.3.1: Approximations by Daubechies wavelet (pre-crises data)
…………
85
Figure 5.3.2: Approximations by Daubechies wavelet (crises data)
…………
86
Figure 5.3.3: Approximations by Daubechies wavelet (post crises data)
…………
86
……………………………………..
88
……………………………
93
Figure 5.2.3: Exchange rate returns and its approximations by Db4 wavelet
Figure 5.2.5: Comparison of performance measures
Figure 5.3.4: Performance of hybrid model
Figure 5.4.1: Approximations by different wavelets
Figure 5.4.2: IIP growth vs. Spreads - Wavelet decomposed Neural Network Model (For different wavelets obtained with Sp(10,3) at different Lags)
……………………………………
95
…………………
95
Figure 5.4.3: IIP Growth vs. Spreads -Neural Network Model
x
Acknowledgements I genuinely express my gratitude to my mentor, guide and supervisor, Prof. Mohammed Asger. His dedication and keen interest above all his overwhelming attitude to support had been solely and mainly responsible for completing my work. His timely advice, meticulous scrutiny, and scientific approach have helped me to a very great extent to accomplish this task. I am highly gratified to my co-supervisor, Dr. Firdous Ahmad Shah for his help at every stage of my research. His timely suggestions, scholarly advice and dynamism have enabled me to complete my thesis. I generously thank my friends and ex-colleagues, Mr. Qamar Rayees Khan and Dr. Tasleem Arif for their valuable suggestions and support throughout my study period. It is my privilege to thank my parents and my wife for their constant encouragement throughout my research period. I am highly obliged to my daughter, Taqwa, a source of inspiration for this study. Above all, I am thankful to the Great Almighty-Sea of knowledge and wisdom, for his countless blessings.
Mohd Yasin Pir
xi
Chapter 1: Introduction to Time Series Analysis
Chapter 1 Introduction to Time Series Analysis 1.1 Time Series A time series is a set of statistics (or data points), consisting of successive measurements made over uniform time interval. Examples of time series are daily closing value of the Stock market index, daily exchange rate, unemployment rate, share price, air quality readings, and ECG brain wave activity etc. Time series data occur in There are many application areas like finance , economics, finance, weather forecasting, signal, weather forecasting, pattern recognition, astronomy, medicine, communications engineering and many domains of applied science and engineering where time series data occur. A time series can be continuous or discrete. When observations are made continuously through time, the time series is said to be continuous although the measured data values can be discrete in nature. When observations are taken at specific times, usually at uniform interval, the time series is said to be discrete, although the measured data values can be continuous in nature [1]. Following types of time series patterns are usually observed in nature [2]: 1. Trend: A time series data is said to follow a trend exists when there exists a long-term increase or decrease in the data values which need to be linear. Sometimes we will refer to a trend “changing direction” when it might go from an increasing trend to a decreasing trend. 2. Seasonal: A time series data is said to exhibit a seasonal pattern when it is of fixed and known period and gets influenced by seasonal factors like day of the week, a month or quarter of the year. 3. Cyclic: A time series follows cyclic pattern when data values exhibit rises and falls that are not of fixed period and the duration of these fluctuations is usually of at least two or more years. 4. Irregular fluctuations. Irregular fluctuations occur when there is any variation that is left over after trend or seasonality or any other systematic effect and that have been removed. Since these fluctuations are completely random in nature, they cannot be a
1
Chapter 1: Introduction to Time Series Analysis
forecast even if a short term correlation or one-off discontinuities may exhibit in them [4]. Figure 1.1 to Figure 1.4 depicts the examples of above patterns. A strong seasonality within each year and some strong cyclic behavior for a period of about 6-10 years is depicted in figure 1.1 and there is no apparent trend in the data values over this period. There is no seasonality but an obvious downward trend for a time series data shown in figure 1.2. There is strong increasing trend and seasonality for the time series data shown in figure 1.3. There is also no evidence of any cyclic behavior that exits. The time series data shown in figure 1.4 has no trend, seasonality or cyclic behavior and random fluctuations do not appear to be very predictable. There are also strong patterns that help in developing a forecasting model.
Figure 1.1:
Strong Seasonality
Figure 1.2: No Seasonality
2
Chapter 1: Introduction to Time Series Analysis
Figure 1.3: Strong increasing trend, strong seasonality
Figure1.4: No trend, seasonality or cyclic behavior
1.2. Time Series Decomposition A time series xt comprises of three components: a seasonal component, a trend-cycle component which contains both trend and cycle, and a remainder component (or irregular or error) which contains residual in the time series. An additive model can be written as:
𝑥𝑡 = 𝑆𝑡 + 𝑇𝑡 + 𝑅𝑡
--- (1.1)
Where 𝑥𝑡 is the data, 𝑆𝑡 is the seasonal component and 𝑇𝑡 is the trend-cycle component and 𝑅𝑡 is the remainder component at respective period t. A multiplicative model can be written as:
𝑥𝑡 = 𝑆𝑡 . 𝑇𝑡 . 𝑅𝑡
--- (1.2)
If the magnitude of the seasonal fluctuations or the variation around the trend-cycle does not vary with the level of the time series, the additive model is most appropriate. A multiplicative model is more appropriate when the variation in the seasonal pattern or the 3
Chapter 1: Introduction to Time Series Analysis
variation around the trend-cycle is proportional to the level of the time series. The multiplicative models are most commonly used in economic time series. An alternative to multiplicative model is to transform the data until the variation in the series appears to be stable over time (log transformation), and then use an additive model, equivalent to using a multiplicative decomposition since
𝑥𝑡 = 𝑆𝑡 . 𝑇𝑡 . 𝑅𝑡 is equivalent to 𝑙𝑜𝑔 𝑥𝑡 = 𝑙𝑜𝑔 𝑆𝑡 + 𝑙𝑜𝑔 𝑇𝑡 + 𝑙𝑜𝑔 𝑅𝑡 --- (1.3) The trend-cycle component is simply called the trend component sometimes, even though it may also contain cyclic behaviour [2]. 1.3. Time Series Analysis and Forecasting 1.3.1.
Introduction The main objective of time series analysis is to analyze time series data in order to
extract meaningful statistics and other characteristics .The data points taken for the time series over time have the characteristics such as autocorrelation, trend, or seasonality etc that account for time series analysis to determine a model that would describe the pattern of the time series data [3].
The main objectives of time-series analysis are [4]: 1. Describing the data using summary statistics and other useful characteristics. 2. Finding a suitable model to describe the data-generating process, referred to as modeling. If a value of given variable is based only on its past values, it is said to be unvariate model. If a value of given variable is based on past values of that variable, and also on present and past values of other variables, it is said to be multivariate model. 3.
Estimating future values of the time series (forecasting).
4. Controlling a given process (industrial, economy or other) to enable the analyst a good forecast.
4
Chapter 1: Introduction to Time Series Analysis
1.3.2.
Types of Forecasting Methods
A forecasting method computes forecasts from present and past values, which may be a simple algorithmic rule and need not necessarily depend on an underlying probability model. The term forecasting model is synonymously used to describe a forecasting method in some literature, although the two terms method and model are distinct and rather wrongly used. Forecasting methods may be broadly classified into three categories: 1. Judgmental forecasts: In this case, forecasts are based on some intuition, subjective judgment, earlier knowledge or any other relevant information. 2.
Univariate forecasting methods: Forecasts depend only on present and past values of the single time series being forecasted in this case. Forecasts are possibly augmented by a function of time (or linear trend).
3.
Multivariate forecasting method: Forecasts depend least partly on values of one or more additional time series variables, usually referred to as predictor or explanatory variables. Hence the forecasts depend on a multivariate model that involves more than one equation when the variables are jointly dependent. A forecasting method could combine more than one of these above approaches
generally. Sometimes it rather difficult to express external information formally in a mathematical model in this case, univariate or multivariate forecasts are adjusted subjectively [4]. A suitable mathematical model is developed in time series forecasting in which past observations are collected and analyzed to which captures the underlying characteristics and future events are predicted using this model. When there is a lack of a satisfactory explanatory model and there is not much knowledge about the statistical pattern followed by the time series data, this approach becomes particularly useful. Time series forecasting has important applications in various fields. Important strategic decisions and preventive measures in various fields are often based on the forecasting results. Thus making a good forecast by fitting an adequate model to a time series data becomes very important. Over the past several decades many efforts have been made by researchers over the past several decades for the development
5
Chapter 1: Introduction to Time Series Analysis
and advancement of suitable time series forecasting models [5-7]. 1.4. Brief History of Advanced Time Series Forecasting Methods Before early 1920’s ,time series forecasts were calculated by simply extrapolating time series until Yule offered auto-regressive techniques to forecast the annual number of sun spots in which he used weighted sum of previous data to estimate forecasts. After this, a linear system with noise taken into consideration was widely used for next five decades until ARIMA methodology was proposed by Box and Jerkins. Onward studies were focused on nonstationarity and non-linear time series. During 1980’s, computing ability of personal computers enabled processing of long time series and thereby development of sophisticated algorithms that immensely contributed to the development of machine learning techniques, such as artificial neural networks, which produced very accurate forecasts for most of the non-linear time series data. After that point, artificial neural networks were used for time series forecasting. A new approach, called fuzzy time series forecasting model, introduced by Song and Chissom drew a greater amount of attention after that. At few years of success it was observed that artificial neural networks had some limitations in learning patterns because stock market or other financial data has tremendous noise and complex dimensionality and consequently a support vector machine, a novel network algorithm, developed by Vapnik and others received considerable consideration and success in time series forecasting problems. In recent years, hybrid models in which ARIMA, fuzzy, artificial neural networks and wavelets etc are used give comparatively reliable forecasts than those obtained from conventional forecasting methods as revealed by empirical findings [8, 9]. 1.5. Related Work Financial time series data such as exchange rate, stock market index, inflation, GDP growth and employment rate etc. are inherently non-stationary and as such regarded as one of the most challenging applications in time series forecasting. Considerable research efforts in the past few decades have been focused on forecasting financial time series. We will briefly review some of the previous work in this area of application in this section. The forecast ability of the traditional neural network models have been demonstrated by many studies. A multilayer feed forward neural networks used in [23] to forecast Taiwan 6
Chapter 1: Introduction to Time Series Analysis
Dollar/US Dollar exchange rate give better results better than ARIMA models. The study concluded that the neural network approach as competitive and robust method for forecasting financial data. Three neural-network-based forecasting models i.e. Standard Back propagation (SBP), Scaled Conjugate Gradient (SCG) and Back propagation with Baysian Regularization (BPR) were used in [15] to forecast the exchange rates of 6 foreign currencies against an Australian dollar. The study revealed that Neural Network-based models have better performance than traditional ARIMA model and SCG based model outperforms all others. The study in [18] reveals that forecasting model based on neural network and genetic algorithm provides significantly better performance than traditional artificial neural network and statistical time series modelling approaches. An evolutionary algorithm that automatically builds a radial basis function neural networks for exchange rates forecasting is used in [19]. The outcome of this study reveals that there is considerable improvement in the results by using this approach over the conventional methods. Some studies reveal that a hybrid model can take advantage of the strong points of both models. For example, a study in [6], where hybrid forecasting model that combines both ARIMA and neural network approaches is used to forecast exchange rates. The study reveals that the hybrid model has better performance than each of the two individual models. A study in [12] observes that traditional learning approaches have difficulty with time series prediction when the time series is non-stationary and noisy. For this reason a hybrid model is used which combines symbolic processing and recurrent neural networks is used to solve the problem. An ample range of different methods have been proposed for forecasting exchange rates. The study in [22] compares the forecasting ability of three different recurrent neural network architectures. A trading strategy is then devised which is based on the forecasting results to optimize the profitability of the system. The empirical study in [16] proposes a multistage neural network meta-learning method for forecasting US Dollar/Euro exchange rate and the results demonstrate that the proposed model outperforms the individual ARIMA, BPNN, SVM models and some hybrid models. The use of models in [11] examines the forecasting and trading the US Dollar/Euro exchange rate. The authors demonstrated that Neural Network Regression give improved forecasting and trading results than ARMA models. A hybrid model of wavelet and neural network have been widely-used for forecasting 7
Chapter 1: Introduction to Time Series Analysis
electricity demand [13], [27-28], oil prices [24], stock index [11], and other time series. There are not many applications in which wavelet neural network is used for exchange rate forecasting. A hybrid model of Wavelet Neural Network and Genetic Algorithm in [21] is used for forecasting exchange rates in which the experimental results demonstrate that this proposed hybrid method provides reasonable performance for different forecasting horizons and the author claims that the accuracy of the hybrid model in forecasting does not decline when the forecasting horizon increases. The empirical study illustrates that this hybrid model performs better for time series forecasting than other models. The comparison study of forecast accuracy in [20]demonstrate that the first procedure of non-stationary time series using wavelet neural network
yields the best forecast when compared to MAR and ARIMA since the smallest
RMSE at testing data is obtained. The Wavelet Neural Network (WNN) and back-propagation neural network (BPNN) models are compared in [25] for the crisis forecasting accuracy and in-sample and out-ofsample test is used in this case. The study reveal that WNN can be successfully applied to the currency crises since it captures the economic variables associated with the currency crises effectively and provides a more potent tool for analyzing macroeconomic time series data. A combined methodology based on the wavelet analysis and the artificial intelligence is used in [10] to predict the A300 index (China) and NASDAQ index (USA) and the results are compared with wavelet-ARIMA model and simple back propagation neural network. The study demonstrates that this combined wavelet and neural network model have better prediction power than the other two methodologies. A study in [14] uses a combined wavelet transform and recurrent neural network based on artificial bee colony algorithm (called ABC-RNN) for stock price forecasting.
The simulation results of many international stock markets that
include Dow Jones Industrial Average Index (DJIA), London FTSE-100 Index, Tokyo Nikkei225 Index, and Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX) demonstrate that proposed hybrid system is highly capable and can be implemented in a realtime trading system for forecasting stock prices to maximize profits. The study used in [17] combines artificial intelligence modeling techniques with wavelet multiresolution methodology for forecasting daily spot foreign exchange rates of major internationally traded currencies. The empirical study demonstrates that the combined methodology has higher 8
Chapter 1: Introduction to Time Series Analysis
performance when compared to the traditional exchange rate forecasting models. For a nonlinear system and non-stationary financial time series, the traditional prediction models are not able to achieve agreeable forecasting effects in the problem. The combined model of wavelet neural network overcomes the deficiency of traditional forecasting model limited to linear system for prediction. The study in [26] where Shanghai stock market returns from January 10th, 2006 to July 18th, 2008 is used to compare simulation error of stock market returns between back propagation network and wavelet neural network observes that the wavelet neural network have simulation result more accurate than that of back propagation network and the so constructed wavelet neural network can forecast stock market returns more accurately. A study in [29] uses an integrated wavelet transform, recurrent neural network and artificial bee colony algorithm to forecast Taiwan Stock Exchange Capitalization Weighted Stock Index. A wavelet transform in Haar wavelet family is used to decompose the time series obtained with stock prices and the artificial bee colony (abc) algorithm is used to optimize the recurrent neural network weights and baises. The empirical study demonstrated that this combined approach outperforms the previous methods found in the literature to predict Taiwan Stock Exchange Capitalization Weighted Stock Index. A study in [30] uses discrete wavelet transform and back propagation neural network for predicting monthly closing price of the Shanghai Composite Index (SHI).The future values of stock index are predicted by using low frequency signals as input to neural network and it is observed that wavelet signals improve the accuracy of neural networks in comparison with earlier studies. The study in [31] proposes wavelet neural network model for the short-term forecast of stock returns from high-frequency financial data. The combined model of wavelets and artificial neural networks combines the inherent capability to capture non-stationary and non-linear attributes embedded in financial time series. The accurate results are achieved with hybrid model, although there is no need of any predefined specific parametric model to initiate a simulation process. This proposed neuro-wavelet network has superior modeling and forecasting ability when compared against two parametric methods. The GMDH neural network model is proposed in [32] for forecasting the price of crude oil assuming the price series have time varying variance. The pre-processing the oil price data sets are performed by wavelet transform employed as a tool for before the data is fed to the GMDH neural network. Both smoothed and non-smoothed data sets are used 9
Chapter 1: Introduction to Time Series Analysis
to calculate the moving average and the time varying variance and the best performance results are obtained in this case the authors claim that they obtain more than 40% improvement in the prediction accuracy for smoothing data sets by using the hybrid model. A study in [33] uses a new method called improved wavelet neural network in which a Genetic Algorithm is used to optimize the initial weights, stretching parameters and movement parameters. The study demonstrates that this improved method is effective and has better performance in stock market prediction for Shanghai index.
10
Chapter 2: Artificial Neural Networks and Wavelet Analysis
Chapter 2 Artificial Neural Networks and Wavelet Analysis 2.1. Introduction to Artificial Neural Networks (ANN) An ANN is a network of highly interconnecting processing elements, neurons that operate in parallel and attempts to simulate the parallel nature of the human brain and stimulated by biological nervous systems. The associations between these neurons largely determine the network function. A subgroup of processing element in the network is called a layer. The first layer is the input layer and the last layer is the output layer and there is an additional layer(s) of units, called hidden layer(s) between the input and output layer [34].
Figure 2.1: Artificial Neuron and multilayered Artificial Neuron network
Artificial neurons or simply nodes are the basic processing elements of neural
networks and the effects of the synapses are represented by connection weights in a simplified mathematical model of the neuron. The synapses modulate the effect of the associated input signals and the transfer function represents the non-linear characteristic exhibited by neurons. The neuron impulse is computed as the weighted sum of the input signals transformed bythe transfer function. By adjusting the weights in accordance to the chosen learning algorithm, the learning capability of an artificial neuron is achieved. Figure 2.1 illustrates a typical artificial neuron and the modeling of a multi-layered neural network and the signal flow from inputs 𝑋1 , 𝑋2 , … 𝑋𝑛 is considered to be unidirectional. The
11
Chapter 2: Artificial Neural Networks and Wavelet Analysis
flow of signal is indicated by the arrows and a neuron output signal flow O is represented by the following relationship: 𝑛
𝑂 = 𝑓(𝑛𝑒𝑡) = 𝑓(∑ 𝑤𝑗 𝑥𝑗 ) 𝑗=1
--- (2.1) where𝑤𝑗
= weight
vector,
f(net)= an activation ortransfer function and
𝑛𝑒𝑡 = 𝑤 𝑇 . 𝑥 = 𝑤1 . 𝑥1 + 𝑤2 . 𝑥2 + … 𝑤𝑛 . 𝑥𝑛 ,
--- (2.2)
a scalar product of the weight and input vectors, T= transpose of a matrix.
The output value O is computed as
--- (2.3)
- - - (2.3)
Where θ = threshold level and this type of node is called a linear threshold unit [35]. 2.2. Historical Perspective of Neural Networks A neurophysiologist named Warren McCulloch and mathematician Walter Pitts wrote
a paper in 1943 on how neurons in the brain might work. This pioneering work in itself is regarded as the beginning of modern era of neural networks in which they modeled a simple neural network using electrical circuits to describe how neurons in the brain work. Their model of a neuron is assumed to follow an all or none law. They showed that a network constructed with sufficient number of such simple units and synaptic connections set properly and operating synchronously can compute any computable function. Generally, it is agreed upon that the neural network and of artificial intelligence were born with it. The publication of Donald Hebb’s book, The Organization of Behavior, in 1949 is regarded as the next major development in neural networks and he suggested that neural pathways are 12
Chapter 2: Artificial Neural Networks and Wavelet Analysis
get strengthened all the time when used. This concept is fundamentally essential to the ways in which humans learn and argued that the connections between nerves are enhanced nerves if they fire at the same time. Ashby’s book, Design for a Brain: The origin of Adaptive Behavior, published in 1952, was concerned with the basic notion that adaptive behavior is not inborn but rather it is learned one. He argued that through learning the behavior of an animal or system usually changes for the better and book emphasized on the dynamic aspects of the living organism as a machine and the associated concept of stability with it. Finally it was possible to simulate a hypothetical neural network in 1950’s since computers became more advanced in this period. The first attempt in this regard was made by Nathanial Rochester, IBM research laboratories, but failed to do so. Designing a reliable network with neurons viewed as unreliable components is an issue of particular concern in the context of neural networks. In 1956, Von Neumann solved this important problem by using the concept of redundancy. In 1963, this motivated Winograd and Cowan to suggest the use of a distributed redundant representation for neural networks and demonstrated that large number of elements could collectively represent an individual concept with increase in robustness and parallelism corresponding. In 1958, A new approach, called pattern recognition problem was introduced by Rosenblatt in1958 in his work on perceptron, which is regarded as the novel method of supervised learning. Widrow and Hoff introduced the least mean-square (LMS) algorithm in 1960 and used it to formulate the Adaline, an adaptive linear element and adaline was different from the perceptron in the training procedure. Although 1960’s is regarded as the classical period of the perceptron, yet it seemed as neural networks can do anything until Minsky and Paper published a book in 1969. They used mathematics to demonstrate the fundamental limits on which single-layer perceptrons work. They argued that there was no reason to imagine that multilayer version can overcome the limitations of single-layer perceptrons. The problem of assigning credit to hidden neurons in the network, referred to as credit 13
Chapter 2: Artificial Neural Networks and Wavelet Analysis
assignment problem was an important problem encountered in the design of a multilayer perceptron and by the late 1960’s, most of the ideas and concepts necessary to solve this problem were already formulated. The ideas underlying the recurrent (attractor neural) networks, now referred to as Hopfield networks were also formulated in this period. But it was not until 1980’s, the solution of these basic problems emerged. We may look back on the 1970’s as a decade of dormancy for neural networks from the engineering perspective. In 1982, John Hopfield of California Institute of Technology (Caltech) presented his paper to the National Academy of Sciences. With this, the attention in this field got renewed as he formulated new approach to create more useful machines by using bi-directional lines. To devise a new way of understanding, he performed the computation by using recurrent networks having symmetric synaptic connections and the energy function. Before this concept, the connections between neurons were only unidirectional. In the same year, Reilly and Cooper used a hybrid network constituting of multiple layers and each layer has a different problemsolving strategy. Kohonen published a paper on self-organizing maps in 1982 in which he proposed using one or two dimensional lattice structure, again regarded as another important development in neural networks. The concept was different in some respects and received far more attention and in fact become the benchmark against which the other innovations in this field are evaluated. The back propagation (bp) algorithm was developed by Rumelhart, Hilton and Williams in 1986 and they demonstrated how it can be used in machine learning. In 1988, Broomhead and Lowe seta procedure for the design of layered feed forward networks using radial basis functions (RBF) and it provided an alternative to multilayer perceptions. In the early 1990’s, Vapnik and his coworkers invented support vector machines, a computationally powerful class of supervised learning networks in the early 1990’s. They proposed its use for solving regression, pattern recognition, and density estimation problems. From the earlier days of McCulloch and Pitts, neural networks have certainly come a long way and indeed have established themselves as an interdisciplinary subject with deep roots in 14
Chapter 2: Artificial Neural Networks and Wavelet Analysis
the neurosciences, mathematics, physical sciences, engineering and psychology and needless to say that they are here to stay and continue to grow in theory, design, and applications[36]. 2.3. Neural Network Architectures
The basic architecture design of neural networks consists of three types of neurons layers. The first is the input layer, second is the hidden and the third is the output layer. The manner in which these individual artificial neurons are interconnected, is referred to as topology or architecture of an artificial neural network. The interconnection between the artificial neural networks can be made ina number of ways and thereby result in numerous topologies. The two basic classes in which they are subdivided are Feed-forward and Recurrent topology. Information flows only in one direction from input to output in a simple feed forward topology or acyclic graph. The information not only flows in one direction from input to output but also in opposite direction in a simple recurrent topology or semi- cyclic graph.
Figure 2.2: Feed Forward Topology
Figure 2.3: Recurrent Topology
Some of the common types of artificial neural networks are discussed briefly here: 2.3.1.
Feed-forward Artificial Neural Networks In this type ofartificial neural network, information flows from input to output in only
one direction. As such it has only one condition since there are no back-loops. There are no restrictions on number of connections between individual artificial neurons, number of layers or the type of transfer function that can be used in individual artificial neuron. In its simplest form, it has a single perceptron which has capability of learning linear separable problems only. 15
Chapter 2: Artificial Neural Networks and Wavelet Analysis
2.3.2.
Recurrent Artificial Neural Networks
This type of artificial neural network is analogous to feed-forward neural network except that there are no restrictions regarding back-loops. Information flow is no longer unidirectional but transmits backwards as well. The type of network creates its own internal state and allows to exhibit dynamic temporal behaviour. To process any sequence of inputs, they use their internal memory also. A fully recurrent artificial network is the most fundamental topology of recurrent artificial neural network in which every basic building block or artificial neuron is directly connected with every other basic building block or artificial neuron in all directions. There are other recurrent artificial neural networks like Elman, Jordan, Hopfield, bi-directional and can be considered asits special cases. 2.3.3.
Hopfield Artificial Neural Network A recurrent artificial neural network when used to store one or more stable target vectors,
it is referred to as Hopfield network. A stable vector is a memory that the network recalls when provided with similar vector and is used as signal to the network memory. These networks use only two different values (binary) for their states in which the state is determined by the fact whether the units input exceeds their threshold values or not. The value 1 or -1, or value 1 or 0are taken by binary units. This type of network comprises of lowering the energy of states in the training that the artificial neural network must remember. 2.3.4.
Elman and Jordan Artificial Neural Networks
A special case of recurrent artificial neural networks which is different from a conventional two-layer networks in which the first layer has current connection is an Elman network. In its simplest form, a three-layer Elman network contains a back-loopfrom hidden layer to input layer trough (context unit). This type of network has memory that allows it to both detect and generate time-varying patterns. Usually, this type of network contains sigmoid artificial neurons in its hidden layer and linear artificial neurons in its output layer. With enough artificial neurons in hidden layer, this combination of artificial neurons transfers functions that can approximate any function with arbitrary accuracy. This network is able to store information
16
Chapter 2: Artificial Neural Networks and Wavelet Analysis
and therefore capable of generating temporal and spatial patterns and responding on them. Jordan networks are similar to Elman networks except that context units are fed via the output layer instead of the hidden layer. 2.3.5.
Long Short Term Memory
It is a special case of recurrent artificial neural network which learn from its experience to process, classify and predict time series. These networks have long time lags of unknown size between important events and outperform several other recurrent artificial neural networks and other such sequence learning methods. Built from Long Short Term Memory blocks, these networks are ability to remember value for any length of time achieved with gates in this case. The gates regulate when the input is significant enough for remembering or forgetting and when to output its value. 2.3.6.
Bi-directional Artificial Neural Networks
These networks are designed to predict complex time series. They consist of two individual interconnected artificial neural networks or sub-networks that perform direct and inverse transformation, bi-directional in nature. Interconnection between the sub networks is attained using two dynamic artificial neurons capable of remembering their internal states. The interconnection between the future and the past values of the processed signals increase time series forecasting capability. These networks can predict future values of input data and also past values. There is two stage learning in these networks. In the first stage, we teach one artificial neural sub network to predict future and in the second stage, artificial neural sub network is used to predict past. 2.3.7.
Self-Organizing Map (SOM)
These networks are interrelated with feed-forward networks except that differ in arrangement of neurons. There exists a hexagonal or rectangular grid in their arrangement of neurons. SOMs use unsupervised learning paradigm to create a low-dimensional and discrete representation of input space for training samples, referred to as a map. This makes SOMs very useful for visualizing low-dimensional views of high-dimensional data. In comparison to other artificial neural networks they are different as they use a neighbourhood function to preserve the topological properties of the input space. These networks learn to detect correlations and regularities in their input and can also adapt future responses according to their input. 17
Chapter 2: Artificial Neural Networks and Wavelet Analysis
2.3.8.
Physical Artificial Neural Network
Although most of the artificial neural networks are software-based, yet it does not exclude the possibility to create these networks today with physical elements. The history of physical artificial neural networks goes back to 1960’s when first physical artificial neural networks called memistors were created with memory transistors and follow synapses of artificial neurons. These networks were commercialized but they did not last for long due to their inability for scalability. Several others followed after this attempt to create physical artificial neural network such as attempt based on phase change material or nanotechnology [37-41]. 2.3.9.
Stochastic Artificial Neural Network
These networks can be built when random variations are introduced in the network by giving the network's neurons stochastic transfer functions or by introducing stochastic weights. The random fluctuations enable this network to escape from local minima to them useful tools for optimization problems. Stochastic neural networks built by using stochastic transfer functions are often referred to as Boltzmann’s machine. 2.4. Learning Paradigms in Artificial Neural Networks The capability of the neural network to learn from the environment and to improve its performance through learning that occur over time in accordance with some prescribed measure is the significant and primary property of a neural network.
A network can learn
about its environment by the interactive process of adjustments applied to its synaptic weights and bias levels. Ideally, after each iteration of the learning process, the network becomes more knowledgeable about its environment. Three major learning paradigms are: supervised, unsupervised and reinforcement. Typically these can be employed by any given type of artificial neural network architecture and each learning paradigm has several training algorithms. 2.4.1.
Supervised Learning A machine learning technique which sets parameters of an artificial neural network
from training data is referred to as supervised learning or classification. A broad variety of
18
Chapter 2: Artificial Neural Networks and Wavelet Analysis
classifiers are available like decision tree, support vector machines, multilayer perceptron , knearest neighbor algorithm, Gaussian mixture model, Naive Bayes and radial basis function etc. Each of these classifiers has its strengths and weaknesses. Choosing an appropriate classifier problem is still more an art than a science. In supervised learning, the following steps are usually considered to solve a given problem: Step 1: Determining type of training examples. Step 2: Gathering training data set that satisfactory describe a given problem. Step 3: Describing gathered training data set in form understandable to a chosen artificial neural network. Step 4: Learning and after the learning, analyze the performance of the network with the test or validation data. A test data is that data that has not been introduced to network during learning. 2.4.2.
Unsupervised Learning A machine learning technique that sets the parameters of an artificial neural network on
a given data and a cost function that needs to be minimized is referred to as unsupervised learning. Cost function is any function that is determined by the task formulation and seeks to determine how the data is organized. For example, in clustering, the data is categorized into different clusters by their similarity and this technique fall within the domain of estimation problems used in various applications such as statistical, compression, modeling, and filtering, clustering and blind source separation. This type of learning is different from supervised and reinforcement learning in the regard that the network is only given unlabeled examples. Selforganizing maps generally employ unsupervised learning algorithm(s). 2.4.3.
Reinforcement Learning
It is a machine learning technique that sets the parameters of an artificial neural network by generating data from its interactions with the environment. This technique is concerned with how a network takes actions in an environment to maximize its long-term reward and most often used as a part of network’s overall learning algorithm. It uses several algorithms to find
19
Chapter 2: Artificial Neural Networks and Wavelet Analysis
the policy to find the maximum return function that needs to be maximized. For example, naive brute force algorithm calculates return function in its first step for each possible policy. The policy with the largest return is chosen. Obvious weakness here in this algorithm is when there is extremely large or even infinite number of possible policies and can be overcome by direct policy estimation or value function approach. A direct policy estimation is able to find the optimal policy by searching it directly in policy space that in turn increases the computational cost while a value function approach maximizes the return by keeping a set of estimates of expected returns for one policy; usually either the current or the optimal estimates. Reinforcement learning is particularly suited to problems where there is trade-off between a long-term versus short-term reward and has been found useful for various problems such as telecommunications, robot control and other sequential decision making tasks etc. There can be several learning rules. The five basic learning rules are: a. Error correction learning b. Memory based learning c. Hebbian learning, d. Competitive learning and e. Boltzmann learning. Error-correction learning is rooted in optimum filtering and memory based learning operates by memorizing the training data explicitly;Hebbian learning and competitive learning are both inspired by neurobiological considerations while as Boltzmann learning based on ideas borrowed from statistical inferences [36-42]. 2.5.
Forecasting with Artificial Neural Networks Since the invention of back propagation (BP) algorithm to train feed forward multi-layered
neural networks, ANNs have been widely used for many types of problems in industry, business, and science. Time-series forecasting is among one of the major uses of ANNs and many successful applications suggest that ANNs as promising and alternative tool for both researchers and practitioners in forecasting time series data. Linear statistical methods were
20
Chapter 2: Artificial Neural Networks and Wavelet Analysis
used for several decades for forecasting but since ANN’s can generalize any non-linear forecasting model have made ANN’s popular. There are several advantages in implementation and interpretation of linear models hold but still they have severe limitations. Since real world problems have complex non-linear relationships, the linear models cannot capture these relationships in the data and their approximations to complicated nonlinear forecasting problems is often not rational. Compared to model-based nonlinear methods, ANN is a non-parametric data driven approach. ANN’s can capture nonlinear relationships without having earlier assumption about the underlying relationship for a particular problem. In the past few decades, there is remarkable surge in research activities in using ANNs for forecasting. The lack of systematic investigations is the only prime cause of inconsistency reported in ANN model building and forecasting. ANNs have now been coupled with other advanced methods such as fuzzy logic and wavelet analysis in the last decade or so to improve their capability of data interpretation and modeling and thus avoiding subjectivity in the function of its any training algorithm. Support vector machines have also arose as a set of high-performance supervised generalized linear classifiers in parallel with artificial neural networks in the recent years.In several cases, the problems can be solved more efficiently by combining one or two other techniques rather than implementing ANNs exclusively and these combined methods accompaniment each other to enhance the ability of data interpretation and modeling and to avoid subjectivity in the operation of the training algorithm with artificial neural network individually [42, 43]. 2.6.
A Historical Glance on Wavelet Analysis From a historical point of view, wavelet dates back to the work of Joseph Fourier
during the nineteenth century when he laid the foundation with theories of frequency analysis. This study later becomes significantly important and influential in the study of wavelets. Wavelet theory had never been recorded although it had already been discovered. The first mention of wavelets appeared in 1909 in an appendix to the thesis of Alfred Haar, which in fact became the starting point that the focus of mathematicians turned from frequency-based analysis to more scale based approach. 21
Chapter 2: Artificial Neural Networks and Wavelet Analysis
Several groups started researching in 1930’s on the representation of functions using scale varying basis functions. A French geophysicist, Jean Morlet, was among one of such researchers who described an alternative to Fourier, and represented a signal or function as the sum of sine and cosine waves with certain frequency and amplitude and observed that these waves work well with global information of the signal (an average of the whole signal). But this average missed the local features, which were important. Unlike cosines and sines, wavelets, because of their small extension are able to describe these local features of a signal and have frequency and amplitude as well like cosine and sine waves. It was the beginning where scale varying basis functions was treated as key to understanding wavelets. Between the 1960 and 1980 Guido Weiss and Ronald Coifman, the two mathematicians studied the simple elements of a function space, called atoms with the objective to find a common function and assembly rules that allow the reconstruction of all elements of the function space using the atoms. In 1980, Grossman Morlet identified wavelets in the context of quantum physics, a major step in the development of wavelet movement. In 1985, Stephane Mallat, through his work in digital signal processing, provided additional support in the development of wavelets and discovered some very important relationships with orthonormal wavelet bases. Motivated by this work, Y. Meyer was able to construct the non-trivial wavelets. Unlike the Haar wavelets, the Meyer wavelets are continuously differentiable; however they do not have compact support. After few years, Ingrid Daubechies used Mallat's approach for constructing a set of wavelet orthonormal basis functions and this is regarded as one of the foundation stone of wavelet applications even today [44-45]. 2.7. An Overview of Wavelet Analysis The conventional time series analysis has always found it difficult to tackle with the issues of non-stationarity and in the economics, it is not unusual to utilize Fourier analysis to uncover relations at different frequencies. In spite of its utility, time information of a time series is completely lost under the Fourier transform and as such, it is hard to distinguish transient relations or to identify structural changes. These techniques are only appropriate for time series with stable statistical properties, i.e. stationary time series. A typical economic time series is 22
Chapter 2: Artificial Neural Networks and Wavelet Analysis
noisy, complex and strongly non-stationary.
Denis Gabor [46] introduced short time Fourier
transform (STFT)to overcome the problems of analyzing non-stationary data and the basic thought of this transform is to break a time series into smaller sub-samples and then apply the Fourier transform to each sub-sample. This technique has been found inefficient because the frequency resolution is the same across all different frequencies. An alternative to the STFT for analysis of non-stationary signals is represented by the wavelet transform. Wavelet analysis shares some common features with Fourier analysis but wavelet analysis has the advantage of capturing features in the time series that vary across both time and frequency. Wavelets represent mathematical functions that can decompose data into different frequency components. After decomposition, every component is studied with a resolution matched to its scale. These functions are generated by the dyadic dilations and integer shifts a single function called a mother wavelet. The time-frequency localization is the key feature of wavelets and most of the energy of the wavelets is restricted to a finite time interval and its Fourier transform is band limited. When compared to STFT, the advantage of the time-frequency localization is that wavelet analysis varies the time-frequency aspect ratio, producing good quality time resolution and poor quality frequency resolution at higher frequencies and a good quality frequency resolution and poor quality time resolution at lower frequencies. This approach is reasonable when the signal on hand has low frequency components for long durations and high frequency components for short durations and the signals that we encounter in most economic applications are frequently of this type. The minimum requirements imposed on a function ψ(t)to qualify for being a mother wavelet are that ψ(t) ∈ L 2 (ℝ) (space of square integrable functions) and also fulfils a technical condition, usually referred to as the admissibility condition ∞ |ψ^(ξ)|2
0 < Cψ = ∫0
|ξ|
dξ < ∞,
--- (2.4)
Whereψ^ (ξ) is the Fourier transform of ψ(t) (see [47]).
To endorse that
∞
∞
Cψ < ∞ , a
wavelet function must satisfy the conditions ∫−∞ ψ(t)dt = 0 and∫−∞ |ψ (t)|2 dt = 1. two conditions mean that: 1. ψ(t)must be an oscillatory function with zero mean and 2. The wavelet function has unit energy. 23
These
Chapter 2: Artificial Neural Networks and Wavelet Analysis
There can be two types of wavelets within a given function or family depending on the normalization rules where father wavelet is represented by (2.5)and mother wavelet is represented by (2.6) and j = 1,……, J and J-level wavelet decomposition can be written as [49]:
𝜙𝑗,𝑘 = 2−𝑗/2 𝜙(𝑡 − 2𝑗 𝑘/ 2𝑗 )
𝑗,𝑘
--- (2.5)
= 2−𝑗/2 (𝑡 − 2𝑗 𝑘/ 2𝑗 )
--- (2.6)
Two types of wavelet transforms are CWT (continuous wavelet transform) and DWT (discrete
wavelet transform).Discrete version of wavelet transform is used mostly used in finance and other economics applications as time series
values mostly contain finite set of values.
Discrete wavelets are defined as [48]:
𝜙𝑗,𝑘 = 2𝑗/2 𝜙(2𝑗 𝑡 − 𝑘) 𝑗,𝑘
--- (2.7)
= 2𝑗/2 (2𝑗 𝑡 − 𝑘)
--- (2.8)
Where ϕ and ψ satisfy the following:
𝜙(𝑡) = ∑𝑘 ℎ(𝑘) 𝜙1.𝑘
--- (2.9)
𝑘 (𝑡) = √2 ∑𝑘(−1) ℎ(−𝑘 + 1) 𝜙(2𝑡 − 𝑘) = √2 ∑𝑘 𝑔(𝑘)𝜙(2𝑡 − 𝑘)
--- (2.10) The DWT is implemented through pyramid algorithm in practice. Eachiteration of this algorithm requires three objects. These include the data vector (x), the wavelet filter (hi) and the scaling filter (gi). At the first level, j =1, DWT wavelet coefficients ψi,t and scaling coefficients vi,t as follows:
24
Chapter 2: Artificial Neural Networks and Wavelet Analysis
𝐿−1
𝑣𝑙,𝑡 = ∑
𝑙−0
𝑔𝑙 𝑥2𝑡+1 − 𝑙 𝑚𝑜𝑑 𝑁
--- (2.11) 𝐿−1
𝑙,𝑡 = ∑
𝑙−0
ℎ𝑙 𝑥2𝑡+1 − 𝑙 𝑚𝑜𝑑 𝑁
--- (2.12) A DFT (Discrete Fourier Transform) is able to decompose a signal into sinusoidal basis functions of different frequencies using Fourier analysis. The signal can be completely recovered and no information is lost in this transformation. Unlike Fourier analysis, a signal is decomposed into a set of mutually orthogonal wavelet basis functions in DWT and they differ from sinusoidal basis functions since they are spatially localized i.e., non-zero over only part of the total signal length.
Wavelet functions are dilated, translated and scaled versions of a
common function, known as the mother wavelet.
Since DWT is invertible, the original signal
can be completely recovered from this representation. DWT is not just a single transform as in case of DFT, rather a set of transforms, each with a different set of wavelet basis functions. Two of the most common are the Haar wavelets and the Daubechies set of wavelets. Wavelet analysis has shown a notable performance in analyzing financial time series since it provides a vital tool for extracting information from financial data. Due to its flexibility, they can handle very irregular data series with applications that range from short term forecasting to the testing of market models.
Locating time regime shifts and discontinuities is one of the
significant ability of wavelets. These wavelets can decompose a financial time series on a variety of time scales which makes them very useful in establishing the relationships between economic variables and very well across time scales. They have been successfully used in forecasting exchange rates, GDP growth, stock market prices, crude oil prices, trading prices, expenditure and income, price fluctuations, money growth and inflation, volatility in foreign exchange markets, sales etc from the last decade [47-53].
25
Chapter 2: Artificial Neural Networks and Wavelet Analysis
2.8. Daubechies Wavelets Formulated by the Belgian mathematician, Ingrid Daubechies in 1988, Daubechies Wavelets are the most commonly used set of discrete wavelet transforms. They are among the family
of orthogonal
wavelets and
characterized
by
a
maximal
number
of
vanishing moments for some given support. The Haar wavelet is the oldest among wavelet family and the simplest possible wavelet and defined as a sequence of rescaled square-shaped functions which together form a wavelet family or basis. Haar wavelets were proposed by Alfred Haar in 1909. The disadvantage of the Haar wavelet is that it is not continuous, and therefore not differentiable. The Haar wavelet is also known as Db1, a special case of the Daubechies wavelet [54].
Figure 2.4: Haar Wavelet
The mother wavelet function of Haar wavelet is described as:
--- (2.13) The scaling function is described as:
--- (2.14)
The Daubechies wavelets cannot be defined in terms of the resulting scaling and wavelet functions and it is not possible to describe them in closed form. Both Daubechies 26
Chapter 2: Artificial Neural Networks and Wavelet Analysis
wavelet transforms and Haar wavelet transforms are represented by calculating the running averages and differences through scalar products with scaling signals and wavelets. The only difference between them is the way in which these scaling signals and wavelets are defined. The scaling signals and wavelets have slightly longer supports for the Daubechies wavelet transform. It means that they produce averages and differences by using few more values from the signal and a slight change provides a tremendous improvement in its capabilities [55].
Figure 2.5: Daubechies (D4) Wavelet
There are many Daubechies transforms, but they are all very similar. One of the simplest one is the D4 wavelet transform which has four wavelet and four scaling function coefficients. The scaling function coefficients are defined as:
ℎ0 =
1+√3 4√2
, ℎ1 =
3+√3 4√2
, ℎ2 =
3−√3 4√2
, ℎ3 =
1−√3 4√2
-- (2.15)
The scaling function is applied to the input data at every step of the wavelet transform. With original data set that contains N values, the scaling function is applied in the wavelet transform step to compute 𝑁/2 smoothed values. These smoothed values are stored in the lower half of input vector of N elements in the ordered wavelet transform.Thevalues of wavelet function coefficients are defined as: 27
Chapter 2: Artificial Neural Networks and Wavelet Analysis
𝑔0 = ℎ3 , 𝑔1 = −ℎ2 , 𝑔2 = ℎ1 , 𝑔3 = ℎ0
-- (2.16)
The wavelet function is applied to the input data at each step of the wavelet transform. With original data set having N values, the wavelet function is used in the wavelet transform step to compute N/2 differences. The wavelet values are stored in the upper half of the N element input vector in the ordered wavelet transform [56]. The scaling and wavelet function is calculated by taking the inner product of the coefficients and four data values as described by following equations: D4 scaling function,
𝑑𝑖 = ℎ0 𝑠2𝑖 + ℎ1 𝑠2𝑖 +1 + ℎ2 𝑠2𝑖+2 + ℎ3 𝑠2𝑖 +3
--- (2.17)
D4 wavelet function,
𝑏𝑖 = 𝑔0 𝑆2𝑖 + 𝑔1 𝑆2𝑖 +1 + 𝑔2 𝑆2𝑖+2 + 𝑔3 𝑆2𝑖 +3
--- (2.18)
where i = index that is incremented by 2 in each iteration, to compute new scaling and wavelet function values in each iteration. 2.9. Introduction to Multiresolution Analysis The concept of Multiresolution is connected to the study of signals or images at different levels of resolution. A signal with its frequency content, a qualitative description is usually associated with its resolution. For a low-pass signal, the lower its frequency, the coarser isits resolution. In signal processing, a low-pass and subsampled version of a signal is usually a good coarse approximation for many real world signals. In computer vision and image processing, multiresolution is in particular apparent as coarse versions of an image and is often used as a first approximation in computational algorithms. Stéphane Mallat and Yves Meyer in 1986 were the first to formulate the idea of Multiresolution analysis (MRA) in the context of wavelet analysis. A significant idea that deals with a general formalism for construction of an orthogonal basis of wavelets is certainly MRA and it is central to construction of all the wavelet bases. 28
Chapter 2: Artificial Neural Networks and Wavelet Analysis
Mathematically, the fundamental idea of MRA is to represent any signal or function (f) as a limit of consecutive approximations, with each one as a finer version of this function f. In fact these consecutive approximations correspond to different levels of resolutions. MRA is thus a formal approach to construct orthogonal wavelet bases using a definite set of procedures. The key aspect of MRA is to describe mathematically the process of studying signals or images at different scales. The basic principle of this analysis is to decompose the whole function space into individual subspaces 𝑉𝑛 𝑉𝑛+1 so that the space 𝑉𝑛+1 consist of all the rescaled functions in 𝑉𝑛 which fundamentally means that decomposition of each signal or function into components of different frequencies or scales with each individual component of the original function f occurring in the subspace[57,58]. A Multiresolution analysis of L2 (R)is defined as a sequence of closed subspaces Vj of L2 (R) , j ϵ Z, with the following properties [59-60] : 1. VJ VJ+1 , 2.
𝑣(𝑥 )VJ v(2x)ϵVJ+1 ,
3.
𝑣(𝑥 )V0 v(x + 1)ϵV0 ,
4.
+∞ 2 ⋃+∞ 𝑗=−∞ 𝑉𝑗 is dense in L (R)and ⋂j=−∞ VJ = {0},
5. A scaling function ϵ𝑉0 , with a non-vanishing integral collection { (x 𝑙) 𝑙ϵ Z} is a Riesz basis of
29
𝑉0 .
,
exists such that the
Chapter 3: Forecasting Models for Time-Series Forecasting
Chapter 3 Forecasting Models for Time-Series Forecasting There are numerous models that have been used for time series forecasting over the last few decades. In the section, we discuss briefly the following time series forecasting methods that fall broadly under three categories: 1. Statistical methods 2. Neural Networks 3. Wavelets 3.1. Statistical Methods in Time Series Forecasting Broadly speaking, statistical time series forecasting methods employ linear or non-linear functional form. Exponential Smoothing method, Regression and ARIMA (Autoregressive Integrated Moving Average) methods fall under linear category while Threshold and GARCH (Generalized Autoregressive Conditionally Heteroskedastic) method that fall under non-linear category are discussed in this section. 3.1.1.
Exponential Smoothing Methods
The formulation of exponential smoothing methods arose from the original work of Brown in1950’s and Holt in 1960. They were working on creating forecasting models for inventory control systems. In a smoothing model, the basic idea is to construct forecasts of future values which are expressed as weighted averages of past observations. The more recent observations carry more weight in determining forecasts than observations in the more remote past. Using exponentially decreasing weights, exponential smoothing models weight past observations. The past observations are weighted equally in a Single Moving Average, but in exponential smoothing when the observation get older it is assigned exponentially decreasing weight i.e., recent observations are assigned more weight than the earlier one in the forecasting. The weights assigned to the observations are the same and equal to 1/N in case of moving averages. However, there is one or more smoothing parameters to be determined in the exponential
30
Chapter 3: Forecasting Models for Time-Series Forecasting
smoothing and this choice determine the weights assigned to these observations. In this section, we will briefly discuss single, double and triple exponential smoothing. 3.1.1.1
Single Exponential Smoothing Method It is simply an extension of moving average method in which forecasting is done by
weighted moving average. The mean of all the past observations are used as forecasts in a simple moving average forecast which indicates that equal weight is assigned to all past points. The most recent observations usually provide the best guide to the future in any forecasting scenario and therefore there is a need of a weighting scheme that has decreasing weights when the observations get older. Exponentially smoothing procedures is a class of methods that involve exponentially decreasing weights when the observations get older. A variety of exponentially smoothing methods is used that have a common property that recent values are given more weight than the older observations in the forecasting. One or more smoothing parameters are determined explicitly in the exponential smoothing and this choice determines the weights assigned to the observations. A Single exponential smoothing model is represented as:
Ft Ft 1 (Yt 1 Ft 1 )
--- (3.1)
OR
Ft Yt 1 (1 ) Ft 1 Where 𝐹𝑡 is the forecast value for period t , 𝑌𝑡−1 is the actual value for period t-1 , 𝐹𝑡−1 is the
forecast value for period t-1 and ( alpha) is the smoothing constant and its value lies between 0 and 1.Whenα closer to 1 , it means more weight to the recent observations and therefore more rapidly changing forecast. 3.1.1.2.
Double Exponential Smoothing Method or Holt’s Model
For simple data with no trend or seasonal component, single exponential smoothing method is mostly applicable but for better modelling of the data, the double and triple exponential smoothing is considered when the data has any of these components. method is also known as the Holt-Winter’s trend or seasonality method.
31
This
Seasonality is the
Chapter 3: Forecasting Models for Time-Series Forecasting
tendency of time series data to exhibit behaviour that repeats itself every ‘N’ periods (or season length in periods). The general increasing or decreasing nature of the time series data over a period of time is called Trend and this method is used when the data available shows a trend pattern. This method works like a simple smoothing method except that two components i.e, level and trend must be updated at each period. The trend is a smoothed estimate of average growth and the level is a smoothed estimate of the value of the data at the end of each period.This model consists of a trend component (T, v) and an exponentially smoothing component (E, w) with two different smoothing factors.This model is represented as:
Ft k Et k.Tt
--- (3.2)
Et w.Yt 1 (1 w).(Et 1 Tt 1 )
--- (3.3)
Tt v.(Et Et 1 ) (1 v).Tt 1
--- (3.4)
Where: 𝐹𝑡+𝑘 is the forecast value k periods from t , 𝑌𝑡−1 is theactual value for period t-1, 𝐸𝑡−1 is the estimated value for period t-1, 𝑇𝑡 is the trend for period t , w (0 0, 𝑖 0 and 𝛽i 02 𝑡 is the conditional variance of yt (given information available up to time t-1) and 2 𝑡 has an autoregressive structure and has positive correlation with its own recent past and recent values of y2 (squared returns). It captures the idea of volatility or conditional variance i.e, larger or smaller values of y2t are likely to be followedby larger or smaller values of y2t. Among the various extensions of the GARCH model the two, EGARCH (or Exponential GARCH) and IGARCH (Integrated GARCH) are the mainly popular [77,78]. EGARCH model is represented as: 𝑞
𝑙𝑜𝑔 2 𝑡 =
𝑝
+ ∑ 𝛽𝑘 𝑔(𝑍𝑡−𝑘 ) + ∑ 𝑘 𝑙𝑜𝑔 2 𝑡−𝑘 𝑘=1
𝑘=1
--- (3.26) Where
, allow sign and magnitude of Zthaving
separate effects on the volatility and
2 t = conditional variance Zt =standard normal variable (from a generalized error distribution). ,
,
,
and
are coefficients
The IGARCH process is a GARCH process in which
𝑝
𝑞
∑ 𝛽𝑖 + ∑ 𝑖 = 1 𝑖=1
− − − (𝟑. 𝟐𝟕)
𝑖=1
42
Chapter 3: Forecasting Models for Time-Series Forecasting
3.2.
Neural Networks in Time Series Forecasting Artificial neural networks (ANN) have received a significant attention over the last two
decades in the areas of forecasting and classification where regression and other statistical techniques were traditionally used. One of the most promising applications in finance is forecasting financial time series. Box and Jenkins[80]
developed ARIMA methodology for
fitting a class of linear time series models using a statistical approach and after that the restriction of linearity in this approach have been addressed in number of ways and hence more robust versions of ARIMA models have been developed. In fact, literature on non-linear time series models is available in large volume. The stochastic approach enables the specification of uncertainty in parameter estimates and forecasts and non-linear time series modeling to time series data [81]. Traditionally, GARCH model is used for forecasting exchange rates and stock prices, though the rigidity such as linearity in mean and the violation of assumptions such as non-negativity of the coefficients cannot account for leverage effects, which in fact can account for volatility clustering i.e., when volatility appears in groups, excess kurtosis (leptokurtosis) and fat tails (in which extreme values have a larger probability than that obtained from the normal distribution). These facts motivated the use of more flexible models to capture the behaviour of the financial markets behavior in an improved manner. Few of these models come from Artificial Intelligence (AI), characterized by flexibility and capability to integrate different methodologies that try to emulate the biological systems behavior. There are several potential advantages offered by the ANNs, for instance: 1.
Non-linearity i.e., the neural processor is basically non-linear.
2.
Input-output mapping i.e., the network learns according to the examples in a supervised learning.
3.
Adaptability i.e, ability of the network to adapt their synaptic weights even in real time.
4.
Response capacity, i.e., the network provides a pattern selection and the reliability of decision making in the perspective of pattern classification.
5.
Fault tolerance due to the enormous interconnections.
6.
Parallelism that makes it potentially faster for certain tasks for capturing complex behaviors. 43
Chapter 3: Forecasting Models for Time-Series Forecasting
7.
Uniformity in the analysis and design i.e, the same notation is used in all areas engaged with networks.
8.
Neurobiology Analogy. In general, the ANNs are data-drive, self-adaptive and non-linear methods that do not
require specific assumptions about the underlying model. Numerous investigations do consider that stock indices and exchange rates are the indicators for the future circumstances of the economic and financial system. From the time their application in finance in the early 90’s, the ANNs have become popular, because they are non-parametric models from a statistical point of view. This characteristic makes them quite flexible in modeling real-world phenomena where observations are generally available, but there is not a theoretical relationship or specification, especially for non-linear functions [79, 82-87]. 3.2.1.
Steps in designing a neural network forecasting model Eight steps were outlined by Kaastra and Boyd [88] in designing a neural network
forecasting model. These steps are briefly discussed below: Step 1: Variable Selection The input variables are important in the application area being forecasted and relying on neural networks is due to its powerful ability to detect complex non-linear relationships between numbers of different variables. Choosing variables which are more likely important predictors is achieved by the economic theory. In a design process, the primary concern is the raw data from which a variety of indicators are developed and these indicators form the actual inputs to a neural network [86]. For example, a researcher in finance interested in forecasting market prices has to decide whether to use both fundamental economic and technical and inputs from one or more markets. Fundamental inputs are economic variables which influence the dependent variable. The simplest neural network model uses lagged values of the dependent variable(s) or its first difference as inputs. Technical inputs are lagged values of the dependent variable or indicators calculated from the lagged values. ANN models have outperformed traditional ARIMA-based models in price forecasting, although not in all studies [89, 90]. A more familiar approach is calculating various technical indicators based only on past prices of the market being forecasted [92]. 44
Chapter 3: Forecasting Models for Time-Series Forecasting
Step 2:
Data collection
Although the technical data is easily available from many sources, yet the data be still checked for errors by examining day to day changes, logical consistency and missing observations that often exist in data. All missing observations are dropped or alternatively we assume that the missing observation remain the same by averaging or interpolating from nearby values. Four concerns must be kept in mind when fundamental data is used as an input in a neural network. First, choose a method that is consistent over time series for calculating the fundamental indicator. Second, the data should not be retroactively revised after its initial publication. This is commonly done in databases as the revised numbers are not available in actual forecasting. Third, the data should be appropriately lagged when input to the neural network since primary information is not available as rapidly as market quotations. Fourth, the investigator should be sure that the source will carry on publishing the particular fundamental information or other identical sources are also available [88]. Step 3: Data Preprocessing This step is vital for achieving a good forecast when applying to neural networks for financial time series forecasting. The data is rarely fed to the network in the raw form for input and output variables. The first differencing and logarithm are the two most common data transformations that are used in both traditional and neural network forecasting for removing a linear trend and converting a multiplicative or ratio relationship into additive form to simplify and improve the network training.
In practice, data preprocessing involves trial and error
procedure. Step 4: Data partitioning A general practice is to divide the time series into three distinct sets called the training, testing and validation (out-of-sample) sets. Among these, the training set is the largest one used by the neural network to learn patterns that are present in the data. Next is the testing set ranging from 10 % to 30 % in size of the training set and is used to evaluate the generalization ability of a trained network. We select the network(s) that perform best on the testing set. A last check on the validation set chosen should have a sufficient sample size to evaluate a trained network and should also have enough observations left over for both training and testing. The validation set should consist of the most recent contiguous observations. Be concerned not to use the 45
Chapter 3: Forecasting Models for Time-Series Forecasting
validation set as a testing set by repetitively performing a series of train-test-validation steps and also adjusting the input variables on the validation set based on the network’s performance. The testing set is either selected randomly from the training set or among the set of observations immediately next to the training set. Using a randomly selected testing avoids the fact that the testing set characterized by only one type of market. Repeated retraining is more time consuming although it allows the network to adapt more quickly in a changing market conditions. The variation or the consistency of results in out-of-sample sets is a vital performance measure [88]. Step 5: Neural network design The number of neurons in each layer and type of interconnections between them defines the architecture of a neural network. Once the independent variables are preprocessed, selecting the number of input neurons is easy parameter to select since each independent variable is represented by input neuron of its own. Selecting the number of hidden layers, the number of neurons in the hidden layers, the number of output neurons and the transfer function is not easy. The hidden layer(s) provide the network with its ability to generalize. Generally neural networks with one and two hidden layers are usually used and perform very well. Despite this, there is not any magic formula to select the optimum number of hidden neurons and therefore much relies on experimentation. Some rules of thumb do exist for selecting the range of hidden neurons and a general rule is to select the network that
have better performance on the testing
data set having least number of hidden neurons. While testing a range of hidden neurons, it is imperative to keep all other parameters constant. Changing any parameter creates a new neural network with a potentially diverse error surface which would complicate the selection of optimum number of hidden neurons needlessly [28]. To select the number of output neurons is more straightforward since there are convincing reasons to use only one output neuronal ways. For widely spaced outputs in a neural network and multiple outputs usually produce less significant results as compared to neural network having a single output [91]. The sigmoid transfer function is most commonly used in neural network models but arcus tangens, tangens, hyperbolicus and linear transfer functions are also used frequently. The data input to a network is usually scaled between -1 and +1 or 0 and 1 to be consistent with the type of transfer function
46
Chapter 3: Forecasting Models for Time-Series Forecasting
[88]. Step 6: Evaluation of the system The most common error function is SSE (sum of squared errors) that needs to be minimized in neural networks while other error functions such as least absolute deviations, asymmetric least squares and least fourth powers percentage differences can also be used. These error functions in fact are not final evaluation criterion as other common forecasting evaluation methods such as MAPE (mean absolute percentage error) in neural networks can also minimized [88]. Step 7: Training the ANN The training finds the set of weights between the neurons to establish the global minimum of the error function. The set of weights provide good generalization if not the model is over fitted. Training learns patterns in the data and iteratively presents data with examples of correct known answers and training is stopped after a fixed number of iterations and the network’s capability to generalize the testing set is evaluated and then training is resumed. The network in which a testing set error bottoms out is taken as it is assumed to generalize best. To determine the number of iterations required to achieve negligible improvement in case of particular problem and testing various randomly selected starting weights as computational constraints is allowed. The other two considerations involve learning rate and momentum. Many neural network programs automatically increase momentum and decrease the learning rate when convergence is achieved [88]. Step 8: Implementation The environment in which a neural network is deployed determines evaluation criteria data availability and training times. Most vendors of neural network software provide the way in which trained networks are implemented in the neural network program itself or executable file is provided with it. All the data transformations, scaling and other parameters should remain the same from testing to real use. With regard to retraining frequency, better model ought to be robust and should get improved after retraining more often [88]. An Artificial neural network suffers from following disadvantages [93, 94]:
47
Chapter 3: Forecasting Models for Time-Series Forecasting
1. It needs large numbers of sample data since there are large number of parameters (or weights) 2. Problems like over fitting and capture in local minima do often exist. 3. They do not inevitably include the linear case in a trivial way. Due to the first problem, neural network approaches in real-world applications are often ruled out despite their capability to handle non-linear data efficiently as training data is not available to this sufficient level. The second point depends on the choice of the learning algorithm. Back propagation algorithm is not often the most appropriate option to obtain optimal results. Moreover:
Ann’s are not a universal tool of solving problems since there is no definite methodology to choose, train and verify it.
It requires excessive training times.
Ann’s depend on the quality and accuracy of the given data set.
They can learn input data set very efficiently and responses its outputs too but poor generalization ability do also exist in them.
3.3. Wavelets in Time Series Forecasting Wavelets are defined over a finite domain and unlike the Fourier transform they are localized both in time and in scale contrary to the trigonometric functions. This behavior makes them ideal to analyze non-stationary signals and those with transients or singularities. Wavelets are basically mathematical basis functions that split up the data into different scales of resolution. For example, if we look at a signal with a large ‘window’, we notice only its gross features. On the other hand, looking through a smaller ‘window’, we get information about small features and details of the same data. At high scales, the wavelet has a small time support, enabling it to zoom in on details such as spikes and cusps and on short-lived phenomena. At low scales, wavelets capture long-run phenomena. Wavelets thus are intrinsically connected to the notion of ‘multiresolution analysis’. In other words, data can be examined and analyzed at widely varying levels of focus using wavelets. It is this feature of wavelet decomposition that
48
Chapter 3: Forecasting Models for Time-Series Forecasting
is useful for econometric analysis. In analyzing macroeconomic variables, like extraction of inter-relationship between money, output and price, we mainly concentrate on the long-run equilibrium relationship and the effect of shocks on the equilibrium state. In terms of wavelet decompositions, we can focus our attention to the low-frequency content of the data series ignoring the high-frequency fluctuations, which might distort the series. Such a low-frequency content of the data, without losing the basic characteristics of the original series is capable of capturing the correct long run dynamic relationship among the considered macroeconomic times series [95, 96]. The wavelets are functions which consist in fractioning the original time series in two subseries, one concerning the high frequency and the other one the low frequency aiming to reduce the noise effect in the forecasts [97]. This process of series filtering brought significant improvement in the forecasting models. Usefulness of wavelet analysis has to do with its flexibility in handling a variety of non-stationary signals. Wavelets are constructed over finite intervals of time and they are not necessarily homogeneous over time, but they are localized in both time and scale. Thus, two interesting features of wavelet time scale analysis for economic variables worth to be mentioned. First, since the base scale includes any non-stationary components, the data does not need to be detrended or differenced. Second, the nonparametric nature of wavelets takes care of potential nonlinear relationships without losing details [98, 99]. There are several variations on how wavelets can be used to facilitate time series forecasting, however the general forecasting framework constitute of following steps [100]: 1.
An input time series is decomposed into several components, each associated with a different frequency band. The decomposition is to be additive and based on some type of wavelet transform.
2.
The dynamics of each component is modelled separately and the parameters of each model are (usually) estimated separately. Then the components are extrapolated (forecasted).
3.
In the end, the forecasts of individual components are summed up to construct the forecast of the input time series.
There are several types of wavelet transforms. Depending on the application whether 49
Chapter 3: Forecasting Models for Time-Series Forecasting
the input signal is continuous or discrete, there are two types wavelet transforms, CWT (continuous wavelet transform) and DWT (discrete wavelet transform).
In CWT we use
continuous input signal with regard to time and scale parameters and for DWT we use discrete time signals. We consider that the time domain is the original domain in a wavelet transforms case although the transformation process from time domain to time scale domain does take place, known as signal decomposition since an input signal is decomposed into several other signals with different levels of resolution. Without losing any information, this decomposition allow recovering the original time domain signal and these wavelet transforms do have reverse process called as inverse wavelet transform or signal reconstruction [101]. A sequence of projections onto a basis of father (ϕ) and mother (Ψ) wavelets represent a time series or functions as follows [102]:
--- (3.28)
Where j=1, 2, 3….J and a unit decrease in its value expands the range of the mother wavelet proportional to 2j, while reduces its width and doubles its frequency. However, the father wavelet is not affected by changes in the value of j. A unit increase in k shifts the location of both the father and mother wavelets. For a time series,
t,
J with length N= 2 , the wavelet multi-resolution approximation
can be expressed as
--- (3.29) The father and mother wavelet coefficients are expressed as:
--- (3.30) The father wavelet coefficients capture the smooth and low-frequency trend behaviour in the data can be captured by the father wavelet coefficients while all high-frequency and shortterm deviations from the trend can be captured by mother wavelet coefficients. The multi50
Chapter 3: Forecasting Models for Time-Series Forecasting
resolution approximation of a wavelet can also be written as:
--- (3.31) The time scale components are expressed as:
--- (3.32) The data is decomposed into J time scales in the wavelet multi-resolution analysis and the long-term cycles in the data are captured by time scale components with higher values of j while the smooth trend behaviour in the data is captured by the time scale component SJ in the father wavelets. The high-frequency and short-term cyclical movements in the data is captured by time scales with small values of j [102-103].
51
Chapter 4: Wavelet Neural Networks in Financial Forecasting
Chapter 4 Wavelet Neural Networks in Financial Forecasting 4.1.
Design of Wavelet Neural Networks The combination of wavelets with neural networks has led to the development of
wavelet neural networks. Typically, it consists of a feed-forward neural network, with one hidden layer, and activation functions drawn from an orthonormal wavelet family. The construction of a wavelet neural network is similar to (1+ 1/2) layer neural network i.e., a feedforward neural network having one or more inputs and a hidden layer consisting of neurons with activation functions drawn from a wavelet basis. The output layer of wavelet neural network output layer consists of one or more summers or linear combiners. The wavelet neurons in this single hidden layer are usually referred to as wavelons. The structure of wavelet neural network is depicted in Figure 4.1.There are two main approaches to creating wavelet neural networks.
In the first approach, wavelet and the neural network processing is performed separately. Firstly, the input signal is decomposed using some wavelet basis by using neurons in the hidden layer. Then, the wavelet coefficients are output to one or more summers. The input weights are modified in accordance with some learning algorithm. Here, only dyadic dilations and translations of the mother wavelet can form the wavelet basis. This kind of wavelet neural network is usually referred to as a wavenet.
The second approach combines the two theories. The translation and dilation of the wavelets as well as the summer weights are modified according to some learning algorithm. This is usually referred to as wavelet network [104].
52
Chapter 4: Wavelet Neural Networks in Financial Forecasting
Figure 4.1: Structure of wavelet neural network
The network having a single input and output layers and hidden layer consisting of neurons called wavelons is the simplest form of wavelet neural network. The input parameters comprise of wavelet dilation and translation coefficients. When the input lies within a small area of the input space, wavelons generate a non-zero output. The output of this wavelet neural network is the linear weighted combination of the wavelet activation functions and defined as: u−t
ψ,t (u) = ψ (
)
--- (4.1)
Where ψ(·), t and λ are the mother wavelet function, dilation and translation parameters respectively. The structure of a single input single output wavelet network is shown in Figure 4.2 where a hidden layer consist of M wavelons and its output neuron is expressed as a weighted sum of the wavelons output [104].
Figure 4.2: A Wavelet Neural Network
53
Chapter 4: Wavelet Neural Networks in Financial Forecasting
𝑀
𝑦(𝑢) = ∑ 𝑤𝑖 ,𝑡 (𝑢) + 𝑦̅ 𝑖 𝑖
𝑖=1
--- (4.2) Here,𝑤𝑖 =Output weight 𝑦 = Threshold parameter which is initialized to the mean of desired output samples. The inclusion of value y deals with functions whose mean is nonzero as wavelet function 𝜓(u)has zero mean. This substitution of y is at the largest scale for the scaling function(𝑢) in wavelet Multiresolution analysis.
All the parameters i.e., 𝑤𝑖 ,̅𝑦, t i and λi in
a wavelet network can be adjusted by some learning procedure. The architecture for wavenet is similar as wavelet network except that the parameters ti and λi are fixed at initialisation and not changed by any learning procedure. The main motivation for this restriction is from wavelet analysis i.e., a function f(·) can be approximated to any arbitrary level of detail when
sufficiently large L is selected such that:
𝑓(𝑢) ≈ ∑𝑘〈𝑓, ∅𝐿,𝑘 〉 ∅𝐿,𝑘 (𝑢)
- - - (4.3)
where ∅𝐿,𝑘 = 2L/2 . ∅(2L . u − k), a scaling function that is dilated by 2𝐿 and is translated by 2−𝐿 dyadic intervals. Thus, the output of wavenet is expressed as:
𝑦(𝑢) = ∑𝑀 𝑖=1 𝑤𝑖 ∅𝑖,𝑡 (𝑢)
- - - (4.4)
𝑖
Here, M is sufficiently large to cover the domain of the function that we analyze and an adjustment of 𝑦̅ is not required as the mean value of a scaling function is non-zero [104]. If we consider a Multidimensional Wavelet Neural Network, its input is a multidimensional vector with wavelons that consists of multidimensional wavelet activation functions and produces a non-zero output when we consider an input vector that lies within a small area of multidimensional input space. Their output is expressed as one or more linear 54
Chapter 4: Wavelet Neural Networks in Financial Forecasting
combinations of the multidimensional wavelets. Figure 4.3 shows this form of a wavelons and in effect is equivalent to multidimensional wavelet. The output is expressed as: 𝑁
(𝑢1 , 𝑢2 , … 𝑢𝑁 ) = ∏ 𝑚 ,𝑡𝑚 (𝑢𝑛 ) 𝑛=1
--- (4.5) The structural of a multidimensional wavelet neural network is shown in Figure 4.1 where the hidden layer consist of M wavelons and output layer consists of K summers expressed as : 𝑀
𝑦𝑗 = ∑ 𝑤𝑖𝑗 𝑖 ( 𝑢1 , 𝑢2 , … 𝑢𝑁 ) + 𝑦̅𝑗
𝑓𝑜𝑟 𝑗 = 1,2, … , 𝐾
𝑖=1
--- (4.6) Where,𝑦̅j is required deal with functions of non-zero mean[104].
Figure 4.3: A Multidimensional Wavelon Structure
4.2.
Brief Historical Perspective of Wavelet Neural Networks The construction of wavelet networks can be traced back to the study by Daugman [116]
where he used Gabor wavelets for image classification. The study made by Pati [105,106], Zhang and Benveniste [107] and Szu [108] made wavelet networks more familiar where they introduced them as special feed forward neural network. Zhang used wavelet networks to study problem of controlling a robot arm and introduced the following mother wavelet: 55
Chapter 4: Wavelet Neural Networks in Financial Forecasting
(x) = (x T . x − dim(x)). e− 1/2.T.x
--- (4.7)
A different function was introduced by Szu in which he used Cos (1.75 t) exp (-t 2/2) as mother wavelet where it was used for the classification of phonemes and speaker recognition. Pati [106] used simple combinations of sigmoids as mother wavelet. This approach was generalized in fact generalized to polynomial functions of the sigmoid function by Fernando Marar [109].The ouput may be taken as (𝑓 (𝑥)) for classification, where is the sigmoid function [110]. An inspiration for wavelet networks lies in the fact that universal function estimators represent a function to some precision very efficiently. This follows study made by Hornik [8] and Kreinovich [9]. The determination of the number of wavelets and the initialization of wavelets is a foremost problem with wavelet networks from practical point of view. To attain a fast convergence of an algorithm the better initialization of wavelet neural networks becomes extremely significant. A number of techniques have been developed in this regard. For instance, Zhang [107] in his study initialized the coefficients by means of an orthogonal least-squares procedure. As an alternative, the dyadic wavelet decomposition can also used to initialize the network as an alternative.Echauz [113] in one of his study applied a clustering method to position the wavelets. The distribution of points about a cluster allows approximating the required dilation of a wavelet. In another study by Echauz [117], he proposed an efficient method in which he used trigonometric wavelets. He expressed the functions in the following form: 3
cos trap(x) = cos (
2
3
. x) . min{max{ . (1 − |x| ),0}, 1} 2
--- (4.8)
Although trigonometric wavelets can be approximated by polynomials but fitting a polynomial is a linear problem that can solved more easily than trigonometric wavelets and fitting the parameters of the polynomials were used to approximate the initializing parameters of the corresponding wavelets in the study by Boubez [115] in which the network was initialized by positioning and approximating first low resolution wavelets. To minimize the score, new higher resolution wavelets have been introduced and initialized subsequently with them. The principle of cascade correlation learning architecture was used by Rao [117] to train the network and new wavelets were added stepwise one by one and the network was trained till convergence is reached. An opposite approach was used by Yi-Yu [118] in which the 56
Chapter 4: Wavelet Neural Networks in Financial Forecasting
wavelet network initially use a large number of functions and network is made subsequently as robust as possible using a shrinkage technique to delete the nodes that were not too important. The network can be trained by using stochastic gradient algorithm [107], conjugate gradient method [110], genetic algorithms [119] or more common back propagation algorithm. The Multiresolution properties of wavelet networks have made them very useful in number of interesting. For instance, manufacturing process monitoring systems in [120] use waveletbased methods for vibration detection and their classification. They represent a good alternative in detecting abnormal vibrations to Fourier analysis. Wavelet networks have been used with enormous success for identification and classification of rapidly varying signals. A study in [121] for identifying high risks patients in cardiology and for echo cancellation in [139] are the two instances. Szu et al. [118,110] made foremost effort in the field of speech segmentation and speaker recognition in which wavelet networks have been tested on a number of classical control problems, whether in the recognition of small variations in a plant or to control of robotics arms [122]. An exciting option to wavelet networks is to use a lexicon of dyadic wavelets and to optimize the weights 𝑤𝑖 only, generally referred to as wavenets proposed by Bakshi et al. [123]. Dyadic wavelet network or wavenets were first proposed by Bakshi et al. [123]. In its simplest form, wavelets were used as activation functions and a wavenet correspond to a feed forward neural network.
u(x) = ∑n,m dm,n fn,m (x) + ū Here, 𝑑𝑛,𝑚
--- (4.9)
= coefficients of the neural network
𝑓𝑛,𝑚 = wavelets ū = average of u Bakshi and Stephanopoulos [124] in 1993, proposed the mathematical framework for the construction of wavenets and discussed their various aspects of practical implementation and illustrated that the training and adaptation efficiency of these wavenets is at least an order of magnitude superior than other networks were proposed by Yamakawa et. al [125]. Two types of neuron models were proposed wavelet synapse neuron and wavelet activation function 57
Chapter 4: Wavelet Neural Networks in Financial Forecasting
neuron. They were obtained by modifying a conventional neuron model with a non-orthogonal wavelet bases. In other study by Zhang and Walter [107] , a wavelet based neural network were described similar to the radial basis function network, with the exception of radial basis functions being replaced by orthonormal scaling functions. This study observed that the wavelet network has universal and L2 approximation properties and can be used as a consistent function estimator. Kollias [126] proposed a hierarchical neural network as an efficient architecture for classification and for retrieval of multiresolution invariant image representations. In this study, it was observed that multiresolution image analysis can reduce the size of these representations in a best possible means based on auto associative linear networks. Motivated by multiresolution analysis of wavelets and artificial neural networks, Ying and Zhidong [127] proposed a multiresolution neural network for approximating arbitrary nonlinear functions. This network comprise of a scaling function neural network and a set of sub-wavelet neural networks. Each sub-neural network is able to capture the specific approximation behavior at different resolution of the approximated function and designed a hierarchical construction algorithm that gradually approximated unknown complex nonlinear relationship between input and output data from coarser resolution to finer resolution. In addition they proposed a new algorithm using an immune particle swarm optimization to train multiresolution neural network. Vicente and Javier [128] in 2006, proposed a multiresolution FIR (finite-impulse-response), a neural-network-based learning algorithm that use the MODWT (maximal overlap discrete wavelet transform). The multiresolution learning algorithm uses the analysis framework of wavelet theory, which decomposes a signal into wavelet coefficients and scaling coefficients. The MODWT has the translation invariant property which allows alignment of events in a multiresolution analysis with respect to the original time series which allows it to preserve the integrity of a few transient events. A learning algorithm is also derived in this study that adapts the increase of the activation functions at each level of resolution.
Zakeri et. al. [129] in 2007 studied the real time on-line learning of
a thermal process based that was based on on-line wave-nets with localized and hierarchical multi-resolution learning. Aadaleesan et. al. [130] in 2008 observed that Laguerre basis filters integrated with a wavelet network in Wiener type model structure are competent enough in 58
Chapter 4: Wavelet Neural Networks in Financial Forecasting
modeling highly nonlinear systems with reasonable accuracy with piece-wise linear models. Wavelet basis functions are localized in both time and frequency and can approximate severe non-linearities as well with reasonable accuracy with lesser model terms. A study by Mahmood et.al [131]used MRA and neural networks to detect and classify transients in a power system and the outcome demonstrate that the proposed method is quite simple, accurate and reliable for classification of power system transients. The daily precipitation from meteorological data is proposed by Turgay and Kerem[132] with wavelet–neural network method. This study combines two methods namelyDWT (discrete wavelet transform) and artificial neural networks. The outcome of this study indicates that the results obtained with wavelet–ANN model are significantly better than traditional ANN model or multi-linear regression model. 4.3. Methodology Wavelets can decompose any given function or any continuous time signal into components of different scales. Wavelet transforms represent time series that have discontinuities and sharp peaks and can accurately deconstruct and reconstruct time series that is finite, non-stationary and non-periodic or both. There are two types of wavelet transform, i.e., Discrete Wavelet Transform and Continuous Wavelet Transform. The analyzing wavelet is shifted smoothly over the full domain of the analyzed function during its computation in the continuous wavelet transform. Since it is difficult to analyse the any data when wavelet coefficients are calculate at every possible scale, it is rather more accurate and faster to choose the scales by the power of two. Only discrete wavelet transform is used in this study. Several types of wavelets are available for analyzing time series for the reduction of noise. In our study, we observe that Daubechies (db) wavelet have better performance than other wavelets and are used it decompose the time series and then the
reconstructed time series is afterward fed to
the neural network as an input for training it. We study the result of de-noising wavelets and their reconstructed when fed to the artificial neural network (ANN). The framework of our model is depicted in Figure 4.4.
59
Chapter 4: Wavelet Neural Networks in Financial Forecasting
Figure 4.4: Framework of proposed model
The de-noising procedure is performed using Wavelet GUI toolbox of Matlab software that includes automatic thresholding and proceeds in three steps: 1. Decomposition. 2.
Thresholding detail coefficients.
3. Reconstruction Any wavelet such as Haar, Daubechies, Symlets, Coiflets etc. can be used for decomposition and these different wavelets are analysed for different levels and scales, fixed form threshold is chosen and soft thresholding is applied to detail coefficients and the wavelet at particular level and scale with better performance is chosen in our case and the reconstructed series is obtained with that particular wavelet. This wavelet decomposed and reconstructed series is fed to a simple Multi-layer Perceptron Neural Network model and the observations are recorded for the forecasted values. Neural Network GUI toolbox of Matlab software is used for this purpose. The model is trained in a process called supervised learning in which the input and output are repeatedly fed into the neural network. The model output is matched with the given target output and an error is calculated with each presentation of input data. With the goal of minimizing the error and achieving simulation closer and closer to the desired target output,the error is back propagated through the network to adjust the weights. To train the network, the Levenberg–Marquardt algorithm (LMA), is used because of its simplicity in the current study.
LMA is an iterative
algorithm that locates the minimum function value which is expressed as sum of squares of 60
Chapter 4: Wavelet Neural Networks in Financial Forecasting
nonlinear functions. Trial and error procedure is used to determine the optimum number of neurons in the single hidden layer. To process the inputs for this model, transfer functions such Sigmoid, Tansig etc are used for the neurons of the hidden and output layers. When the network performance determined by the validation and testing is satisfactory, the training of the network was considered to be completed.
If the network fails to perform satisfactory
during testing phase, the network is re-trained. A series of comparative analysis is conducted to evaluate the performance of the hybrid model with other conventional models.
61
Chapter 5: Experimental Results and Discussions
Chapter 5 Experimental Results and Discussions Economists have shown a significant consideration in forecasting financial time series with integrated approach in the recent years in which two or more models are used for forecasting time series data. In this study, we use a hybrid model of wavelet and neural networks to investigate different aspects of financial time series with economically meaningful variables to achieve optimal forecasting. The objective of this study is to demonstrate the efficiency and significance of wavelets and artificial neural networks in financial time seriesforecasting. In this chapter, we discuss the experimental results obtained using the hybrid wavelet neural network model to evaluate forecasting performance in the context of following four problems: 1.
Problem 1: Using yield spreads for forecasting IIP growth.
2.
Problem 2: Analyzing the relationship between real effective exchange rate and crude Oil prices with hybrid wavelet network. Problem 3: Effect of U.S. sub-prime crises on 5 major stock markets : a study with Wavelet networks. Problem 4: Comparative study of different wavelet based neural network models for IIP growth forecasting using different yield spreads.
3. 4.
5.1. Problem 1: Using yield spreads to forecast IIP growth The yield spread, often expressed as the difference between long-term and short-term risk-free interest rates is the one financial variable that has particularly become popular in forecasting real economic activity. It is preferred financial variable over simple interest rates for the reason that short-term rates do not contain all the information that can help to predict future economic activity. It has been observed that yield spread is a reliable parameter for forecasting variables like inflation, output growth, consumption, industrial production and recessions. The capability of the yield spread to forecast financial activity has become well accepted fact among macroeconomists. In this study, we analyze four different long minus short yield spreads also known as policy horizon and non-policy horizon by means of yields on securities with maturities that 62
Chapter 5: Experimental Results and Discussions
vary from 3 to 10 months by decomposition with Daubechies wavelets first and later fed to neural network as
input. We observe that the wavelet neural networks give somewhat
improved forecasting results than obtained with other conventional techniques. The forecasting ability of each four spreads under study for economic activity within combined and time scale framework gets enhanced by using the hybrid approach of wavelet and neural network modeling. 5.1.1.
Frame work of the Forecasting Model
1. Decomposing original time series by Daubechies wavelets. 2. For every decomposed component of the time series, propose a neural network predictive model. 3. For every decomposed original time series which comprise of technical indicators are fed to the neural network model(s) as its input. 4. Use this combined model for arriving at the results. Wavelet Analyze
Original Time Series
(Extract characteristics which will
Neural
Network
Forecasting Model
be input vectors of neural network)
5.1.2.
DATA AND METHODOLOGY In this study, we use the monthly data from October 1996 to June 2012, window size of
189 observations. The Reserve Bank of India (RBI) data is used to test the forecasting ability of yield spreads for output growth. The Index for Industrial Production (IIP) data is obtained from the official website of RBI (http:// www.rbi.org.in.) is considered to obtain the output growth. The yield spreads are constructed by means of yield spreads that range between the long term yields to maturity (YTM) on Government of India (GOI) securities and short term Treasury bill rates constructed at shorter end, longer end and the policy relevant area of the yield curve. The policy horizon spreads that includes1 year yields minus 3 months Treasury bill rate Sp( 1, 3) and 10 year yields minus 5 year yields Sp (10, 5)and non-policy horizon spreads that include5 year yields minus 3 months treasury bill rate Sp (5,3) and 10 year yields minus 3 months treasury bill rate Sp (10, 3) are used in this study. These spreads are depicted as graphically in Fig. 5.1.1 to Fig.5.1.5. 63
Chapter 5: Experimental Results and Discussions
Figure 5.1.1: IIP Growth
Figure 5.1.2: Yield Spread Sp(1,3)
Figure 5.1.3: Yield Spread Sp(10,5)
64
Chapter 5: Experimental Results and Discussions
Figure 5.1.4: Yield Spread Sp(5,3)
Figure 5.1.5:
Yield Spread Sp(10,3)
The Original Time Series values of IIP Growth and four constructed spreads are normalized to minimize the effect of outliers to rescale the original time series which reduces it to time series with zero mean and unit standard deviation. Consider
x1, x2, x3 ,….. xd as the original input variables. The following transformation
would convert them to normalized variables ˜x1; ˜x2, ˜x3,… ˜xd. i= 1………..d
𝑥̅𝑖 = 𝑥̃𝑖𝑛
=
𝑥 ∑𝑁 𝑛=1 𝑁 1
𝑛
𝑖
𝑥𝑖𝑛 −𝑥̅ 𝑖
σi 2 =
1 N−1
(xin − 𝑥̅𝑖 )2 --- (5.1.1)
σi
The two models are used in this study: neural network and wavelet neural network model. In the neural network model (figure 5.1.6), we use normalized and lagged values of IIP growth and four normalized constructed spreads Sp(1,3),Sp(10,5), Sp(5,3) and Sp(10,3).These normalized IIP Growth values and four constructed spreads are fed to the neural network model 65
Chapter 5: Experimental Results and Discussions
in the following combination: 1. Lagged IIP growth and Spread Sp(1,3) 2. Lagged IIP growth and Spread Sp(10,5) 3. Lagged IIP growth and Spread Sp(5,3) and 4. Lagged IIP growth and Spread Sp(10,3) The values of IIP growth lag by a value of 1, 2, 3 and 4. These lagged values of IIP growth and the values of 4 constructed spreads are fed to the network independently. The observations are recorded for each of the forecasted of IIP growth values using these spreads by using this network model. The normalized values are depicted graphically in figure 5.1.7 to figure 5.1.11.GUI toolbox of Neural Network of Matlab (R2010a) software is used in this study. Original Time Series
Normalized
Values
(Preprocessing)
Neural Network
Forecasted
Model
Values
Figure 5.1.6: Model 1(Neural Network Model)
Figure 5.1.7:Normalised IIP growth values
66
Chapter 5: Experimental Results and Discussions
Figure 5.1.8: Normalised Yield Spread Sp(1,3) values
Figure 5.1.9:
Normalised Yield Spread Sp(10,5) values
Figure 5.1.10: Normalised Yield Spread Sp(10,3) values
67
Chapter 5: Experimental Results and Discussions
Figure 5.1.11: Normalised Yield Spread Sp(5,3) values
In the wavelet neural network model (figure 5.1.12), the normalized and lagged values of IIP growth and 4 constructed spreads are de-noised first by Daubechies wavelet and after that they are fed to the neural network model and same modus operandi is followed to forecast IIP growth values as discussed in first approach. Original
Normalized
Time Series
values
Wavelet Denoising
(Preprocessing)
Neural
Forecasted
Network
Values
Model
Figure 5.1.12:Model2 (Wavelet and Neural Network Model)
De-noising is performed by means of single dimensional (1-D) Discrete Stationary Wavelet Transform (SWT). De-noising rejects only the portion of the details that exceeds a certain limit. Daubechies wavelet eliminates the noise from the signal and identifies which component(s) contain the noise and reconstruct the signal without losing information by these components. The results are graphically depicted in Figure 5.1.13 to Figure 5.1.17.
68
Chapter 5: Experimental Results and Discussions
Figure 5.1.13: SWT denoised IIP growth values
Figure 5.1.14: SWT denoised Yield Spread Sp(1,3) values
Figure 5.1.15: SWT denoised Yield Spread Sp(10,5) values
69
Chapter 5: Experimental Results and Discussions
Figure 5.1.16: SWT denoised Yield Spread Sp(5,3) values
Figure 5.1.17: SWT denoised Yield Spread Sp(10,3)
The de-noising procedure involves the following three steps in our study: 1. Decomposition at level 2 by Daubechies wavelet in our case. 2. Thresholding detail coefficients (fixed form and soft thresholding). 3. Reconstruction of IIP Growth values and four constructed spreads. For both the approaches, a simple MLPNN (Multi-layer Perceptron Neural Network) model is used. The input and output layers in this MLPNN contain one neuron to process the values of IIP growth and spread respectively. The model is trained in supervised learning in which the input and output are repeatedly fed into the neural network. The model output is measured with the given target output and error is calculated in each presentation of input data and the error is back propagated through the network to adjust the weights. This minimizes the error and 70
Chapter 5: Experimental Results and Discussions
achieves simulation closer and closer to the desired target output. The Levenberg–Marquardt algorithm (LMA) is used in this study to train the network because of its simplicity. The optimal number of neurons in the single hidden layer is determined by means of trial and error procedure. A two layer feed forward back propagation network is created with ten number of neurons in the hidden layer and optimum results are obtained in this case. The model is trained with maximum of 1000 epochs and a sigmoid function is used for neurons in the hidden and output layers to process their respective inputs for the simple MLPNN model. The performance function used is Mean Squared Error (MSE) in both the approaches. The regression analysis results obtained for two approaches are shown in Table 5.1.1.
Neural Network Model Spread
Lag Step
Spread Sp(1,3)
Sp(10,5)
Spread Sp(5,3)
Spread Sp(10,3)
1
3.91e-12
1.31e-12
3.68e-12
4.22e-12
2
2.31e-12
7.89e-13
6.01e-12
4.06e-12
3
7.78e-12
2.98e-12
3.79e-12
3.91e-12
4
3.97e-12
3.12e-12
3.34e-12
3.84e-12
Wavelet Neural Network Model
1
3.52e-12
5.77e-13
3.22e-12
4.72e-12
2
2.54e-12
4.29e-13
1.25e-12
3.49e-12
3
3.78e-12
2.68e-12
3.35e-12
3.83e-12
4
3.11e-12
1.22e-12
2.03e-12
3.09e-12
Table 5.1.1: Results of Regression Analysis (MSE) of Neural Network & Wavelet Neural Network Model
5.1.3.
Outcomes of Problem 1: 1. The study demonstrated that both the approaches i.e., individual ANN and WNN can effectively forecast the IIP growth than other conventional models as improved results are obtained in both the cases. 2. The spread provides superior recession forecasts, especially up to 1 year ahead and that spread is a dependable predictor of output growth.
71
Chapter 5: Experimental Results and Discussions
3. It is obvious that 10 year yields minus 5 year yields spreads Sp(10,5) i.e, (policy horizon) remains an significant variable in forecasting IIP growth
in comparison
with other spreads as improved results are obtained with it in both the approaches . 4.
The improved forecasting efficiency is achieved when lagged IIP growth values are taken at lag 2 in both these cases for this spread as depicted graphically in Fig 5.1.18 & Fig 5.1.19.
5. Wavelet neural network model gives slightly superior forecasting results than individual neural network model.
Figure 5.1.18: Neural Network forecasts
Figure 5.1.19: Wavelet Neural Network forecasts
72
Chapter 5: Experimental Results and Discussions
Problem 2: Analyzing the relationship between Real Effective Exchange Rate and Crude Oil prices with hybrid wavelet network. The effective exchange rate and crude oil prices of the dollar are inherently time series and non-stationary. In this study, a hybrid wavelet network that comprise of a Daubechies wavelet and a simple MLPPN (Multi-layer Perceptron Neural Network)model is used to analyze the relationship between real effective exchange rate and crude oil prices. The study reveals that the hybrid model better unfold the relationship between real effective exchange rate and crude oil than other conventional models. This study is based on Indian financial market and indicates that crude oil prices do have an effect on the real effective exchange rate. 5.2.1.
Data and Forecasting Criterion In this study, we use the monthly time series data from April, 1993 to February, 2015
for real effective exchange rate and crude oil prices. The values for real effective exchange rate (REXR) are obtained from Reserve Bank of India website (official website http:// www.rbi.org.in.) and spot crude oil prices are obtained from the website of U.S. Energy Information Administration (EIA) for West Texas Intermediate (WTI) - Cushing, Oklahoma. The descriptive statistics is presented in Table 5.2.1 for both the time series. The values are taken in natural logarithms (levels) and log first difference (returns) also. Figure 5.2.1 depicts returns for both these time series and a preliminary examination indicates that the sample means of both time series is positive in both forms (levels and returns). The measure of skewness in the oil returns and level form of real effective exchange rate indicate that they are negatively skewed. The excess kurtosis of the time series data in level for real effective exchange rate indicates that the distribution is leptokurtic relative to normal distribution. Apart from data series in level for real effective exchange rate, the data series fail in the normality test of Jarque-Bera at 1% level of significance. The trend stationarity is measured by KPSS (Kwiatkowski, Phillips, Schmidt and Shin) test. The KPSS test does not accept the stationarity of oil and real effective exchange rate levels and does not reject the stationarity of oil and real effective exchange rate returns.
73
Chapter 5: Experimental Results and Discussions
Descriptive
OIL (Log)
OIL( Returns)
REXR( Log)
REXR( Returns )
Mean
1.605
0.002
1.641
0.001
Median
1.560
0.006
1.651
0.000
Maximum
2.127
0.089
1.823
0.037
Minimum
1.055
-0.144
1.496
-0.030
Std.Dev
0.297
0.036
0.077
0.009
Skewness
0.027
-0.780
-0.197
0.524
Kurtosis
-1.467
1.801
-0.158
3.760
Jarque-Bera
23.295
59.507
2.073
158.671
KPSS
1.3617
0.1216
2.8063
0.1054
Table 5.2.1: Descriptive statistics of time series data
Figure 5.2.1: Crude oil and exchange rate returns
The results obtained with hybrid wavelet and neural network model are compared with artificial neural network (ANN) and wavelet based model to gauge its forecasting performance. Two performance measures are used in this study: RMSE (root mean squared error) and MAPE (mean absolute percentage error). They are defined as: 74
Chapter 5: Experimental Results and Discussions
n
RMSE, R Where, Xobs
= observed
i 1
(Xobs,i Xmodel,i )2 n
--- (5.2.1)
value,
Xmodel = forecasted value at time or place i.
MAPE
,𝑴=
1 𝑛
∑𝑛𝑡=1 |
𝐴𝑡−𝐹𝑡 𝐴𝑡
|
--- (5.2.2)
Where At = actual value Ft = forecasted value. The forecasted time series values are closer to actual values when RMSE and MAPE values are smaller. 5.2.2.
Methodology and Results In this study, we analyze the forecasting performance of the proposed hybrid wavelet
and neural network model with other three models i.e., linear model, AAN model, and wavelet approach. The results are shown in Table 5.2.2. A simple MLPNN model is used for ANN model. In ANN model, the input contains the values of real effective exchange rate returns and there is only one neuron in the output layer to process the observed crude oil price returns. The model is trained in supervised learning where input and output is repeatedly fed into the neural network. The model output is matched with given target output and error is calculated for each presentation of input data. The error is propagated back through the network to achieve simulation closer and closer to the desired target output by adjusting the weights to minimize the error. In the current study, the LMA (Levenberg–Marquardt) algorithm is used to train the network because of its simplicity. This algorithm is iterative and locates the minimum value of the function and expressed as sum of squares of non-linear functions. The trial and error procedure determines the optimal number of neurons in the single hidden layer. Sigmoid function is used for the neurons of the hidden and output layersto process the respective inputs and the model is trained up to maximum of 1000 epochs. When the network performance 75
Chapter 5: Experimental Results and Discussions
determined by the validation (or testing) is satisfactory, the training of the network is considered to be complete. If the network fails to perform satisfactorily during testing phase, the network is re-trained. A randomly distributed sample of 263 sample monthly values (from April, 1993 to February, 2015) is used to train the model and it is observed that the model performs optimally under conditions once the samples are divided for training, validation and testing as 65%, 10% and 25 % respectively. Matlab (R2010a) software with GUI toolbox for ANN is used for this purpose.
Model
R
R2
Adjusted R
Standardized
2
RMSE
MAPE
Coefficient Beta
1. Linear 2. ANN 3. Wavelet 4. Hybrid
.127
.016
.012
.278
.077
.074
.659
.434
.432
.775
0.601
.599
-0.127 -0.278 -0.659 -0.775
.03547 .01036 .00800 .00172
2.75 1.48 1.24 1.19
Table 5.2.2: Performance results of four models
For DWT (Discrete Wavelet Transform) is used for wavelet based approach, which removes noise and is also a well-established denoising technique. DWT is a widely used technique for the last two decades in the financial time series analysis. To eliminate noise from a signal using wavelets requires identifying which component(s) contain the noise and reconstruction of the signal without those component(s). The Matlab (R2010a) software having wavelet GUI toolbox is used. The toolbox also includes automatic thresholding. There are three steps in de-noising procedure: 1.
Decomposition by Daubechies wavelet.
2.
Thresholding detail coefficients by (from level : 1 to 4 and scale 4) and soft thresholding is applied to detail coefficients.
3.
Reconstruction by Daubechies wavelet from level 1 to 4 and scale 4. The better results are obtained at level 4 and scale 4 by Daubechies (db4) wavelet,
depicted in Table 5.2.3. The reconstructed series contain returns of real effective exchange rate 76
Chapter 5: Experimental Results and Discussions
and crude oil prices. Figure 5.2.2 and 5.2.3 graphically depicts the original series and their approximations. Db
R
R2
Model
Adjusted R
2
Standardized Coefficient
RMSE
Beta
Db1
.422
.178
.175
-0.422
.00978
Db2
.222
.049
.046
-0.222
.0096
Db3
.199
.040
.036
-0.199
.00911
Db4
.659
.434
.432
-0.659
.00800
Table 5.2.3: Wavelet based study by Daubechies wavelet
Figure 5.2.2: Oil returns and its approximations by Db4 wavelet.
77
Chapter 5: Experimental Results and Discussions
Figure 5.2.3: Exchange rate returns and its approximations by Db4wavelet.
The wavelet decomposed (Db4 at scale 4) signal is used for hybrid wavelet neural network model and reconstructed series of real effective exchange rate and crude oil price returns are fed to the simple MLPPN model for ANN model. The observations that are recorded 78
Chapter 5: Experimental Results and Discussions
for obtaining the forecasted values of real effective exchange rate returns for this hybid model as depicted in Table 5.2.2. The framework of this model is depicted in Fig 5.2.4. The smaller RMSE and MAPE values are observed with the hybrid model, it means that this hybrid model has outperformed all the other 3 models. The study establishes the reality that real effective exchange rate is strongly associated to crude oil prices and is negatively associated or correlated with crude oil prices as the negative values are obtained for Standardized Coefficient (Beta). The increase in the values of Correlation Coefficient(R) from linear to hybrid model propose that the hybrid model better untangle the relationship between the real effective exchange rate and crude oil prices. Figure 5.2.5gives the graphical depiction of the results.
Figure 5.2.4: Framework of hybrid wavelet and neural network model
Figure 5.2.5:
Comparison of performance measures
79
Chapter 5: Experimental Results and Discussions
5.2.3.
Outcomes of Problem 2: The main findings in this study for India specify that crude oil prices do influence the
real effective exchange rate and the hybrid model better untangle the relationship between real effective exchange rate and crude oil prices than individual ANN model or wavelet based approach. The crude oil price is a dependable predictor of real effective exchange rate and provides superior forecasts. 5.3.1.1 Problem 3: Effect of U.S. sub-prime crises on 5 major stock markets: a study with Wavelet networks. The U.S. financial institutions were manifested initially by the subprime crises of 2007 but subsequently its effects were observed in the other global markets of the world. To evaluate the spillover effects across stock markets, the correlation of stock returns across different markets of the world have been widely studied and applied thereof. This subprime crisis of 2007 had its very severe effects on financial markets and it reeled the economy in all over the world and this itself has become an important issue in academic literature during and shortly after crisis period.
The degree of correlation or co-movement of five major stock markets of
the world are investigated in this study. This includes NIFTY from India, SHCOMP from China, DAX from Germany, FTSE-100 from United Kingdom and NKY from Japan. These stock markets are studied in relation to NASDAQ from U.S. stock market independently. We use hybrid wavelet and neural network model which consist of a simple MLPNN and wavelet based decomposition to analyse the relationship between these stock markets. The study reveals the hybrid model of wavelet and neural network better unravel the relationships between financial institutions and can provide a valuable alternative to the existing conventional methods in testing financial contagions. 5.3.1.
Data In this study, the weekly data from Feb, 1999 to October, 2013from six major national
stock market indices of six countries is used. This includes NIFTY, SHCOMP, DAX, FTSE100, NKY and NASDAQ.
For our analysis, the entire data is divided into three periods: Pre-Crises
period (February 1999 to June, 2007), Crises period (July, 2007 to December, 2009) and PostCrises period (January, 2010 to October, 2013). The weekly stock price returns are calculated 80
Chapter 5: Experimental Results and Discussions
as the natural logarithmic differences of the weekly stock prices. The stock price returns are represented as:
𝑟𝑡 = log(𝑝𝑡 ) − log(𝑝𝑡−1 ) . 100
--- (5.3.1)
Where𝑝𝑡 and 𝑝𝑡−1 represent the stock price index at time t and t − 1, respectively. Table 5.3.1, Table 5.3.2 and Table 5.3.3 summaries the descriptive statistics for weekly stock price returns for these three periods for these 6 countries. Index
NASDAQ
NIFTY
SHCOMP
DAX
FTSE 100
NKY
Max
4.15
3.23
5.24
5.11
3.84
4.12
Min
-4.99
-4.96
-3.83
-5.90
-3.30
-2.73
Mean
0.02
0.15
0.09
0.03
0.01
0.04
S.D.
1.17
1.19
1.17
1.23
0.78
1.04
Skewness
5.25
4.69
4.60
6.49
7.13
3.46
Kurtosis
-0.16
-0.91
0.33
-0.53
-0.21
-0.09
Jarque-Bera
85.93
67.57
89.84
169.88
184.50
12.67
Table 5.3.1: Pre-crises period (Feb.1999 to June, 2007) Index
NASDAQ
NIFTY
SHCOMP
DAX
FTSE 100
NKY
Max
2.71
7.11
5.72
2.48
3.16
4.61
Min
-6.58
-6.43
-5.58
-6.02
-5.11
-7.72
Mean
-0.05
0.06
-0.08
-0.10
-0.07
-0.18
S.D.
1.40
1.96
2.01
1.46
1.24
1.55
Skewness
6.18
4.42
2.86
4.91
4.50
6.69
Kurtosis
-1.13
-0.13
-0.20
-0.97
-0.68
-0.57
Jarque-Bera
82.33*
11.26*
0.99*
40.27*
22.22*
80.82*
Table 5.3.2: Crises period (July, 2007 to December, 2009)
Index
NASDAQ
Nifty
SHCOMP
DAX
FTSE 100
NKY
Max
2.94
2.71
2.28
3.38
2.29
3.20
Min
-3.33
-2.36
-3.46
-5.30
-3.24
-6.07
Mean
0.12
0.04
-0.08
0.09
0.05
0.07
S.D.
0.86
0.91
0.95
1.05
0.81
1.11
Skewness
4.49
3.19
3.64
7.75
5.10
7.55
Kurtosis
-0.61
-0.20
-0.37
-1.02
-0.70
-1.03
Jarque-Bera
31.22*
1.67
7.95
208.5*
53.30*
208.4*
Table 5.3.3: Post-crises period (January, 2010 to October, 2013)
81
Chapter 5: Experimental Results and Discussions
Note : * indicates that the series fails in the normality test since test at 1% significance level and rejection of
statistic is larger than the critical value
null hypothesis . The normality test is performed by using Jarque-
Bera test.
5.3.2. Methodology Wavelets can decompose a given function and can accurately reconstruct time series that is finite, non-stationary and non-periodic. Discrete Wavelet Transform (DWT) analyse the data more accurately and faster and wavelet coefficients are calculated by choosing the scales by the power of two unlike Continuous Wavelet Transform (CWT) where the analyzing wavelet is shifted smoothly over the full domain of the analyzed function and rather makes it difficult to analyse the data since wavelet coefficients are calculated at every possible scale. In this study, we have used only DWT in this study. Among the numerous types of wavelets available for analyzing time series for the reduction of noise, we observe that Daubechies wavelet have better performance than other wavelets. We study the effect of de-noising by wavelets and the reconstructed de-noised signal is later fed into the artificial neural network (ANN) for training. The framework of our model is same as depicted in Figure 5.2.4 and the de-noising procedure is same as discussed in section 5.2.2. This wavelet decomposed and reconstructed series is fed to a simple Multi-layer Perceptron Neural Network (MLPNN) model and supervised learning is used to train the network in which the input and output are repetitively fed into the neural network and the observations for the forecasted values are recorded .The model output is matched with the given target output and an error is calculated with each presentation of input data and error is propagated back to the network. The objective is to minimize the error and achieve simulation closer and closer to the desired target output by adjusting the weights. In this study Levenberg–Marquardt algorithm (LMA) is used to train the network because of its simplicity. LMA is an iterative algorithm that locates the minimum function value, usually expressed as sum of squares of nonlinear functions. Trial and error procedure determines the optimal number of neurons in the single hidden layer in this case. 5.3.3. Results And Discussion The descriptive statistics of weekly stock price returns for pre-crises, crises and post crises periods for the six stock markets of different countries is shown in Table 5.3.4, Table 5.3.5 and
82
Chapter 5: Experimental Results and Discussions
Table 5.3.6 respectively.
A preliminary investigation suggests that there is higher variability
during the crises period since there is increase in the standard deviation when compared to the values during pre-crises and post-crises periods. We also observe that SHCOMP stock market has the highest variability during the Crises period as measured by the standard deviation of the weekly stock price returns. The weekly stock price returns values fails in the normality test during the crises period since Jarque-Bera normality test rejects normality of all the time series at 1% level of significance. This indicates volatility in these markets during the crises period. Except NIFTY and SHCOMP, there is volatility in stock markets in all other stock markets. In NASDAQ and NKY markets, there is increase in the skewness of the values and indicates more lack of symmetry in the crises period and data is skewed right in these two markets i.e., the right tail is long relative to the left tail. NKY market has highest skewness during the crises and post crises period. The highest skewness indicates frequent small negative outcomes and extremely bad scenarios are not as likely in the market. We also observe lowest kurtosis value for NASDAQ market in the crises period i.e., more fluctuations than other markets and a larger degree of variance and truly indicative of crises period in this market. There is highest kurtosis value for the NIFTY i.e. less fluctuations in the crises period and lesser degree of variance than other stock markets. In this study, an attempt is made to correlate the five major stock markets NIFTY, SHCOMP, DAX, FTSE-100 and NKY with NASDAQ stock market. The main objective is to observe the effect of U.S. Sub-Prime crises of 2007 on five major stock markets of different countries using the hybrid Wavelet Multilayer Perceptron Neural Network (MLPNN) technique. To gauge the performance of this hybrid model, the results are compared with artificial neural network model also. The pre-processing of data is performed by first approximating the weekly stock price returns for pre-crises, crises and post-crises periods. Five different discrete wavelets are used for this processing: Haar, Daubechies, Coiflets, Symlets, and Discrete Meyer. Different wavelet functions or subclasses are used in each case and the data is analyzed with different levels of decomposition. Table 5.3.4, Table 5.3.5 and Table 5.3.6 depicts the results for these three periods and it is observed that the hybrid model performs better with Daubechies wavelet with 83
Chapter 5: Experimental Results and Discussions
wavelet function db4 at different levels of decomposition since better approximation results are obtained with it. Root mean squared error(RMSE) is the performance measure used. The weekly stock price returns (wavelet approximated) obtained for NASDAQ stock market is taken as input to the ANN for each of these three periods, depicted in figure 5.3.1, 5.3.2 and 5.3.3. The actual values are 100 times the values shown in these figures. The weekly stock price returns (wavelet approximated) for five stock markets .i.e., NIFTY, SHCOMP, DAX, FTSE-100 and NKY are taken as the target or output separately for these three periods. We observe that best decomposition level obtained with Daubechies wavelet with db4 subclass.
(Performance Measure: RMSE) Decomposition by Daubechies Wavelet with sub class db4 at Country
different Levels
1
2
3
4
5
6
7
8
NASDAQ
0.008
0.010
0.011
0.011
0.012
0.012
0.012
0.012
NIFTY
0.007
0.009
0.011
0.011
0.011
0.012
0.012
0.012
SHCOMP
0.007
0.009
0.010
0.011
0.011
0.011
0.011
0.012
DAX
0.008
0.010
0.011
0.012
0.012
0.012
0.012
0.012
FTSE-100
0.005
0.007
0.007
0.008
0.008
0.008
0.008
0.008
NKY
0.006
0.009
0.010
0.010
0.010
0.010
0.010
0.010
Table 5.3.4: Pre-crises Data with 431 samples
Country
1
2
3
4
5
NASDAQ
0.49E-05
1.79E-05
7.90E-05
7.92E-05
11.7E-05
8.8E-05
37.3E-05
NIFTY
0.69E-04
0.96E-04
2.61E-04
1.67E-04
1.13E-04
5.6E-04
4.35E-04
SHCOMP
0.074E-04
0.41E-04
1.14E-04
1.49E-04
1.93E-04
4.88E-04
4.6E-04
DAX
0.356E-05
2.48E-05
6.75E-05
5.95E-05
7.8E-05
16.4E-05
9.0E-05
FTSE-100
0.67E-18
1.0E-18
1.56E-18
1.07E-18
1.6E-18
1.7E-18
1.27E-18
NKY
0.35E-05
3.45E-05
2.59E-05
0.393E-05
12.8e-05
6.3E-05
0.46E-05
Table 5.3.5: Crises Data with 130 samples
84
6
7
Chapter 5: Experimental Results and Discussions
Country
1
2
3
4
5
6
7
NASDAQ
0.46E-05
1.74E-05
2.02E-05
3.56E-05
0.74E-05
2.08E-05
1.56E-05
NIFTY
0.01E-05
0.66E-05
0.08E-05
1.84E-05
7.86E-05
29.38E-05
55.09E-05
SHCOMP
0.02E-05
2.29E-05
4.9E-05
14.83E-05
20.16E-05
25.39E-05
46.86E-05
DAX
0.36E-05
1.84E-05
1.8Ee-05
1.78E-05
0.25E-05
4.21E-05
1.945E-05
FTSE-100
0.43E-05
2.03E-05
2.87E-05
3.19E-05
4.94E-05
11.41E-05
18.67E-05
NKY
0.68E-05
1.33E-05
2.02E-05
3.69E-05
1.99E-05
9.62E-05
21.61E-05
Table 5.3.6: Post-Crises Data with 201 samples
Figure: 5.3.1: Approximations by Daubechies wavelet (pre-crises data)
85
Chapter 5: Experimental Results and Discussions
Figure 5.3.2: Approximations by Daubechies wavelet (crises data)
5.3.3: Approximations by Daubechies wavelet (post crises data)
86
Chapter 5: Experimental Results and Discussions
The artificial neural network model is trained between 5 to 20 number of neurons in the single hidden layer. Trial and error procedure is used for selecting number of hidden neurons. The number of hidden neurons is chosen based on the least RMSE value. To process inputs in MLPNN model, it is trained up to maximum of 1000 epochs reached and sigmoid function is used for the neurons in the hidden layer and output layers. When the network performance determined by the validation or testing is satisfactory, it is considered that the training of the network is complete.
If the network fails to perform satisfactorily during testing phase, the
network is re-trained. In the Pre-Crises period, 431 sample values are used, 130 sample values are used in Crises period and 201 sample values are used in the Post-Crises periods and all the sample values were randomly distributed. All the models performed optimally under conditions in which the samples are divided for training, validation and testing as65%, 15% and 20 % respectively. Table 5.3.7 shows the results obtained with the hybrid model and are depicted graphically in figure 5.3.4. The objective of this study is to evaluate the effect of NASDAQ stock market with five other stock markets of different countries using wavelet approximated neural network model. All the calculations were performed using Neural Network Financial tool box and Wavelet tool box of MATLAB (R2010a) software.
Wavelet decomposed regression Analysis Results obtained with ANN NASDAQ vs. Other stock markets (Hybrid Model) Pre-Crises Country
Crises
Post-Crises RMSE
R
R2
0.481
3.51E-03
0.2151
0.464
0.3506
0.592
3.42E-03
0.2508
0.501
4.40E-03
0.8717
0.934
4.22E-03
0.8324
0.912
0.825
0.50E-03
0.8644
0.930
3.10E-03
0.8582
0.926
0.755
6.43E-03
0.8280
0.910
6.08E-03
0.6408
0.800
RMSE
R
R
NIFTY
7.56E-03
0.1782
SHCOMP
6.93E-03
DAX
2
RMSE
R
R
0.422
4.33E-03
0.2310
0.1943
0.441
11.9E-03
5.10E-03
0.7864
0.887
FTSE-100
3.96E-03
0.6804
NKY
6.43E-03
0.5704
Table 5.3.7: Hybrid Model Results
87
2
Chapter 5: Experimental Results and Discussions
Figure 5.3.4: Performance of hybrid model
The performance measures used are The Root Mean Squared Error (RMSE) and Pearson’s Correlation Coefficient (R) and Coefficient of Determination (R2) are the performance measures used in this study and the results of two models can be compared by the values obtained in Table 5.3.7 and Table 5.3.8. Since least values of RMSE are obtained with hybrid model which indicate better fit. Also higher R-Squared values obtained with hybrid model which
indicates better fit as well. In our case model 1 has got higher R-Squared
values than model 2. Regression Analysis Results obtained with ANN (NASDAQ vs. Other five stock markets) ANN Model Pre-Crises
Crises
Post-Crises
Country
RMSE
R
R2
RMSE
R
R2
RMSE
R
R2
NIFTY
8.87E-03
0.13585
0.369
13.2E-03
0.2017
0.449
8.43E-03
0.2014
0.449
SHCOMP
7.62E-03
0.12695
0.356
13.5E-03
0.3283
0.573
7.62E-03
0.2353
0.485
6.90E-03
0.74436
0.863
5.66E-03
0.8474
0.921
4.80E-03
0.8179
0.904
FTSE-100
4.24E-03
0.65885
0.812
1.56E-03
0.8447
0.919
4.24E-03
0.8392
0.916
NKY
8.66E-03
0.52356
0.724
7.94E-03
0.7932
0.891
9.00E-03
0.6199
0.787
DAX
Table 5.3.8: ANN Model Results
5.3.4 Outcomes of Problem 3 : This paper The U.S. subprime crises that took place during 2007 with U.S. as its epicenter not only affected the U.S. financial market but also had global effect on the major
88
Chapter 5: Experimental Results and Discussions
financial institutions of the world.
In this study, the hybrid Wavelet Multilayer Perceptron
Neural Network is used to investigate the interdependence between the US stock market and other five major stock markets of different countries. The results of hybrid model are shown in table 5.3.7 and depicted graphically in figure 5.3.4. The following conclusions can be drawn from this study:
There is least R value between NASDAQ and s and highest RMSE values between these two markets in the pre-crises period which implies there is least interconnection between NASDAQ and SHCOMP stock market followed by NIFTY stock market has least co-movement with regard to NASDAQ market. The highest R value for DAX stock market
suggests highest interconnection and more dependence on NASDAQ
stock market during the pre-Crises period followed by FTSE-100stock market. The least RMSE value with the FTSE-100 stock market suggests closer co-movement with regard to NASDAQ stock market.
The highest R value for DAX stock market in the crises period suggests highest interconnection between NASDAQ and DAX stock markets and indicates that DAX got most affected during the U.S crises than other four stock markets. Also NKY got second highest R value in this period which indicates more effect on this market during this period. It also reveals significant contagion effects on DAX and NKY markets in this period and NIFTY and SHCOMP were less affected in this period while FTSE100stock market remained most robust in this period as evident by minimum R value although least RMSE values indicate closer co-movement between FTSE100andNASDAQ markets. The decreasing trend in the R and RMSE values for NASDAQ and FTSE-100from pre-crises to Crises period indicates decrease in interconnection of these markets during the transition period with regard to NASDAQ. The increase in co-movements in these two markets in turn indicates more sustainability during the Crises period. The maximum RMSE of SHCOMP indicate least comovement with regard to NASDAQ stock market.
NIFTY has got the least R value in the post-crises period which indicates that the decreasing trend in interconnection between NASDAQ and NIFTY continues through crises to post-crises period. The decrease in RMSE value indicates increase in the co89
Chapter 5: Experimental Results and Discussions
movement of NIFTY market with NASDAQ market suggests less dependency on NASDAQ followed by SHCOMP after the crises period. The NKY market has got highest RMSE indicating decrease in co-movements between two markets which indicates lesser dependency after crises period. The highest R value of FTSE-100 indicates highest interconnection. There is also an increase in RMSE value after the crises period suggesting decrease in co-movements. This study has successfully demonstrated that the hybrid model of wavelet and neural network provide a valuable alternative to the existing conventional methods in testing financial contagions.
90
Chapter 5: Experimental Results and Discussions
5.4. Problem 4: Comparative Study of Different Wavelet based Neural Network Models for IIP Growth forecasting using different Yield Spreads The study evaluates the effect of 16 mother wavelet functions on the performance of the hybrid wavelet Multilayer Perceptron Neural Network (MLPNN) model for forecasting IIP Growth with Yield spreads by using DWT technique. It further investigates the selection of a suitable decomposition level for the DWT. The results of this hybrid model are compared with artificial neural (ANN) model for gauging its performance. 5.4.1.
Data The data is collected from Reserve Bank of India website (http:// www.rbi.org.in.).
Monthly observations between October 1996 and June 2012 , that comprise a total of 189 sample values is used in this study to test the predictability of yield spreads for forecasting output growth. Index for Industrial Production (IIP) is considered to derive the output growth. For the construction of yield spreads, the spreads between the long term yields to maturity (YTM) on Government of India (GOI) securities and short term Treasury bill rates constructed at shorter end, longer end and policy relevant areas of yield curve is used. The spreads include 1 year yields minus 3 months Treasury bill rate Sp (1, 3) and 10 year yields minus 5 year yields Sp (10, 5) and 10 year yields minus 8 year yields Sp (10, 8) which are known as policy horizon spreads, whereas, 5 year yields minus 3 months Treasury bill rate Sp (5, 3) and 10 year yields minus 3 months Treasury bill rate Sp (10, 3) are known as Non-policy horizon spreads. The summary statistics for monthly IIP growth and five yield spreads shown in Table 5.4.1.
Max
Min
Mean
Mode
S.D.
IIP
7.909
-3.265
2.817
1.763
1.942
Sp(1,3)
6.171
-1.22
0.7825
0.3813
0.9467
Sp(10,5)
10.09
-1.7101
1.709
0.55
1.824
Sp(5,3)
9.171
-0.8152
1.969
1.348
1.637
Sp(10,3)
1.5
-2.67
0.2605
0.3185
0.5322
Sp(10,8)
23.8
-9.50
2.48
2.482
2.926
Table 5.4.1: Summary
statistics of time series data 91
Chapter 5: Experimental Results and Discussions
5.4.2. Results and Discussion At first, data pre-processing is performed by approximating the monthly IIP growth values by 16 different discrete wavelet functions from level 1 to 7. It is observed that the best results were obtained at decomposition level 1 for each wavelet function in consideration while with wavelet function db4, the rmse value is least and as such the best approximation is achieved with db4 function as shown in Table 5.4.2. The results are depicted graphically in Fig 5.4.1 for all best wavelet subclasses obtained at decomposition level 1.
Mother Wavelet
Subclass
Performance Measure : RMSE Decomposition at Level
Haar
Daubechies
Symlets
Coiflets Discrete Meyer
1
2
3
4
5
Haar
0.6774
0.8435
1.0023
1.2541
1.7372
1.804
1.9319
db2
0.6187
0.7156
0.8818
1.3016
1.5234
1.557
1.7482
db3
0.5742
0.6979
0.8744
1.016
1.6045
1.6404
1.8196
db4
0.5608
0.748
0.8285
1.2503
1.5977
1.7154
2.0864
db5
0.575
0.7182
0.8763
1.1831
1.5148
1.6667
2.2136
db6
0.5956
0.6985
0.8013
1.0149
1.5921
1.631
1.9039
db7
0.6074
0.7468
0.8755
1.2356
1.5967
1.6676
1.7276
sym2
0.6187
0.7156
0.8818
1.3016
1.5234
1.557
1.7482
sym3
0.5742
0.6979
0.8744
1.016
1.6045
1.6404
1.8196
sym4
0.6024
0.7451
0.8618
1.0137
1.5928
1.6594
1.7888
coif1
0.5953
0.777
0.9053
1.0553
1.5396
1.5842
1.6634
coif2
0.5912
0.7574
0.8141
1.1058
1.6033
1.6507
1.747
coif3
0.5897
0.754
0.8757
1.2489
1.595
1.6818
1.7783
coif4
0.5889
0.7527
0.7973
1.1763
1.5713
1.6942
1.8072
coif5
0.5886
0.7526
0.8708
1.014
1.5551
1.6996
1.8232
0.8596
1.0263
1.5882
1.6507
1.8824
dmey
0.5898
0.7518
Table 5.4.2: Approximation results of different wavelets
92
6
7
Chapter 5: Experimental Results and Discussions
Figure 5.4.1: Approximations by different wavelets
The wavelet approximated monthly IIP growth values obtained are taken as input to the ANN for each best sub class of mother wavelet separately at best approximation level (i.e., at level 1). The observed monthly yield spread for IIP growth values is taken as the target (output). In our study, the results of five different yield spreads i.e, Sp(1,3), Sp(10,5), Sp(5,3), Sp(10,3)and Sp(10,8)were compared at four different time lags i.e, lag 1 ,lag 2 ,lag 3 and lag 4.It is observed that yield spread, Sp(10,3)-Non-policy horizon has better forecasting ability than other four yield spreads i.e. Sp(1,3), Sp(10,5) Sp(5,3) and Sp(10,8). The better results were obtained with time lag 2 for yield spreads and at decomposition level–1 for all mother wavelets but optimum results were obtained with mother wavelet Daubechiesand sub class db4 when used for approximation in our case. The results obtained with yield spread Sp(10,3) by using different mother wavelets and optimum for particular wavelet sub class function as
93
Chapter 5: Experimental Results and Discussions
shown in Table 5.4.3.
Mother
Subclass
Model1
Wavelet
Performance Measure :RMSE Results obtained at Decomposition level -1 at Lag1
Lag2
Lag3
Lag4
Haar
Haar
0.381
0.332
0.394
0.430
Daubechies
db4
0.314
0.305
0.367
0.423
Symlets
sym3
0.391
0.354
0.392
0.431
Coiflets
coif5
0.378
0.371
0.391
0.428
Discrete Meyer
dmey
0.385
0.378
0.402
0.425
Table 5.4.3: Wavelet decomposed neural network model (For different wavelets obtained with Sp(10,3) at different lags)
Yield
Spread
Model2 Performance Measure :RMSE Lag1
Lag2
Lag3
Lag4
Sp(1,3)
0.526
0.463
0.546
0.598
Sp(10,5)
0.564
0.496
0.586
0.640
Sp(5,3)
0.548
0.483
0.569
0.623
Sp(10,3)
0.495
0.434
0.512
0.560
Sp(10,8)
0.664
0.585
0.690
0.754
Table 5.4.4: Neural network model (For different Spreads at different time lags)
In our study, we use a simple MLPNN model in which the input and output layers contains one neuron to process the wavelet approximated IIP growth values and observed yield spread respectively. The model is trained in a process called supervised learning in which the input and output are repeatedly fed into the neural network. With each presentation of input data, the output of the model is measured with the given target output and error is calculated. The error is propagated back into the network in order to adjust the weights to minimize the error and achieving simulation closer and closer to the desired target output. The LMA is used in the current study to train the network for the reason that of its simplicity. It is an iterative 94
Chapter 5: Experimental Results and Discussions
algorithm which locates the minimum value of the function usually expressed as sum of squares of nonlinear functions. The optimal number of neurons in the single hidden layer is determined using trial and error procedure. The model is trained until a maximum of 1000 epochs is reached.
We use sigmoid function for the neurons of the hidden and output layers to process
their respective inputs with this MLPNN model. The performance measure used in this case is Root Mean Square Error (RMSE).The training for the network is considered complete when the network performance determined by the validation or testing is satisfactory. The network is re-trained when it observed that satisfactory perform is not achieved during the testing phase. All these models are trained using 189 sample values of monthly IIP growth from October 1996 to June 2012. These sample values are distributed randomly and it is observed that all these models unarguably performed optimally under conditions if the samples are divided for Training, Validation and Testing as 65%, 10% and 25 % respectively. The objective of this study is limited to evaluating effects of different mother wavelet functions with the Discrete Wavelet Transformations. The results obtained are depicted graphically for yield Spread Sp(10,3) for different time lags in Fig 5.4.2. The results are compared with artificial neural network (ANN) model to gauge the performance of the hybrid model. In ANN model, monthly IIP growth values is taken as input to the network
directly and the observed monthly yield
spread are taken as the target (output) separately for each spread and at different lags and the network is trained in the same way as described in the in the hybrid model. The results of ANN model is depicted in Table 5.4.4 and depicted graphically in Figure 5.4.3. We observe that hybrid model has better performance than ANN model.
Figure 5.4.2:
IIP Growth vs. Spreads - Wavelet decomposed neural network Model
(For different wavelets obtained with Sp(10,3) at different Lags)
95
Chapter 5: Experimental Results and Discussions
Figure 5.4.3: IIP Growth vs. Spreads -Neural network model
5.4.3. Outcomes of Problem 4 : The following conclusions can be drawn from the study:
The wavelet based neural network models for IIP Growth–Yield Spread modeling performed better when approximation is done with Daubechies wavelet than other wavelets. It further revealed that the subclass db4 performed better among Daubechies family in terms of approximation or data preprocessing.
The hybrid wavelet based MLPNN model give better results at minimum possible decomposition level i.e level 1 for IIP Growth–Yield Spread modeling for all wavelets under consideration and best results were obtained with sub class db4 of Daubechies wavelet.
The hybrid model has better performance when time lagged yield Spread is used for modeling for producing better IIP growth forecasts. The model gives better results for all wavelets under consideration at lag2 and best results are obtained with db4 subclass of Daubechies wavelet when used for approximation.
The hybrid model also reveals that Yield Spread Sp(10,3) i.e. , Non-policy horizon has better forecasting ability than other four yield spreads i.e. Sp(1,3), Sp(10,5) Sp(5,3) and Sp(10,8) in IIP Growth–Yield Spread modeling.
The present study successfully demonstrated that the MLPNN model with db4 wavelet function enhance the performance of the neural network models. The selection of suitable wavelet function and the optimum scale are very important. This study
96
Chapter 5: Experimental Results and Discussions
successfully demonstrated that the DWT based MLPNN model give best results with the db4 mother wavelet function instead of the db2, db3, db5, db6 or db7.
The yield spread in general is a dependable predictor of output growth like IIP growth and spreads provide better recession forecasts than other conventional models.
97
Chapter 6: Conclusion and Future Directions
Chapter 6 Conclusion and Future Directions 6.1 Conclusion For many decades, financial time series forecasting have been studied using conventional and traditional statistical linear models. They have seldom proved successful due to the presence of noise and non-linearity in the time series data. Successful application of nonlinear methods in numerous areas of research has kindled the hopes of financial researchers. It was observed that the past values of the time series will help to determine the future values. But the relation of past values to future values is non-linear and non-linearity implies that any change in the past values can have wide range effects on future values. Recent studies have revealed that wavelet analysis has shown a tremendous performance in the area of financial time series analysis due to its litheness to handle very irregular time series data. On the other hand, artificial neural networks have ability to approximate any discontinuous function by formalizing unclassified information without requiring any prior assumption about the characteristics of the data series contrasting the traditional forecasting methods which assumes a linear relationship between inputs and outputs have also shown significant efficiency in forecasting time series data. In recent years, there has been considerable interest shown by researchers for forecasting financial time series using hybrid models and there is no reason to invalidate that hybrid models provides reasonable performance for different forecasting horizons. In this study, we use an integrated approach of wavelet analysis and neural networks to investigate different aspects of financial time series with economically meaningful variables. We observe that optimal forecasting is achieved by using the hybrid model of wavelet and neural networks than conventional models and this study further reveals the efficiency and significance of wavelets and artificial neural networks in forecasting financial time series. We observe that combined model of wavelet and neural network overcomes the deficiency of traditional forecasting models which seems to be limited to linear system for forecasting. The main findings of this study can be summarized as follows: 98
Chapter 6: Conclusion and Future Directions
1. We analyze the ability of yield spread for forecasting IIP (Index for Industrial production) growth using hybrid model of wavelet and neural network. The study demonstrated that both the approaches i.e., individual ANN and WNN can effectively forecast the IIP growth than other conventional models as improved results are obtained in both the cases. The spread provides superior recession forecasts, especially up to 1 year ahead and that spread is a dependable predictor of output growth. The improved forecasting efficiency is achieved when lagged IIP growth values are used. Wavelet neural network model gives slightly superior forecasting results than individual artificial neural network model. 2. A hybrid wavelet network that comprise of a Daubechies wavelet and a simple MLPPN (Multi-layer Perceptron Neural Network) model is used to analyze the relationship between real effective exchange rate and crude oil prices. The study for India specify that crude oil prices do influence the real effective exchange rate and the hybrid model better untangle the relationship between real effective exchange rate and crude oil prices than individual ANN model or wavelet based approach. The crude oil price is a dependable predictor of real effective exchange rate and provides superior forecasts. 3. We propose a hybrid wavelet and neural network model which consist of a simple MLPNN and wavelet based decomposition to analyse the relationship between 5 stock markets which includes NIFTY from India , SHCOMP from China, DAX from Germany, FTSE-100 from United Kingdom and NKY from Japan. These stock markets are studied in relation to NASDAQ from U.S. stock market independently. The study reveals the hybrid model of wavelet and neural network better unravel the relationships between financial institutions and can provide a valuable alternative to the existing conventional methods in testing financial contagions. 4. One more study evaluates the effect of 16 mother wavelet functions on the performance of the hybrid wavelet Multilayer Perceptron Neural Network (MLPNN) model for forecasting IIP Growth with Yield spreads by using DWT technique. It further investigates the selection of a suitable decomposition level for the DWT. The results of this hybrid model are compared with artificial neural (ANN) model for gauging its performance. The wavelet based neural network models for IIP Growth–Yield Spread modeling performed better 99
Chapter 6: Conclusion and Future Directions
when approximation is done with Daubechies wavelet than other wavelets. The hybrid model has better performance when time lagged yield Spread is used for modeling for producing better IIP growth forecasts. The yield spread in general is a dependable predictor of output growth like IIP growth and spreads provide better recession forecasts than other conventional models.
6.2.
Future Directions
The highly effective hybrid model based on wavelets and artificial neural networks presented in this can also be applied to other real world problems to forecast different types of variable such as foreign exchange rates, inflation rates, spillover effect among different countries, gold prices, prediction of bankruptcy, prediction of hydrology and water resources, renewable energy, wind energy, electric load forecasting, software reliability, software cost estimation, pattern recognition, weather forecasting and so on. Therefore, in future work we plan to investigate the applications of hybrid wavelet neural networks for the above mentioned problems. In particular, we will try to apply our hybrid model to the below mentioned real world problems because of the following reasons: 1. Forecasting foreign exchange rates In recent years, a great deal of effort has been made in order to gain advantages in foreign exchange (FX) rate predictions., however most existing techniques seldom excel the simple random walk model in practical applications. Therefore, the model described in our thesis formed by coupling the wavelets and neural networks may be good candidate to describe and model non-stationary, nonlinear time series. 2. Gold price forecasting It has been a hot issue in economics recently. Actually, in ancient times, gold has been recognized as a symbol of wealth and a frontier-less currency that can be easily exchanged among different monetary systems but in recent times, gold has gradually become a popular nonmonetary tool in the financial market, which is characterized by high yield and high-risk. Gold price is partly regarded as a reflection of investors’ expectations and the world’s economic trends. Thus, gold price forecasting is a vital
100
Chapter 6: Conclusion and Future Directions
issue in economics and we believe that our prescribed algorithm will provide better and most accurate results. 3. Prediction of bankruptcy For financial firms especially banks it has been the extensively researched area from many decades. Actually, creditors, auditors, stockholders and senior management are all interested in bankruptcy prediction because it affects all of them in the same way. Therefore, it very difficult to forecast bankruptcy as there are many other variable on which it depends such as capital adequacy, asset quality, management expertise, earnings strength, liquidity, and sensitivity to market risk. We expect better forecasting results by using the hybrid approach of wavelet and neural networks. 4.
Software reliability It deals with behaviour of a software system and is defined as the probability of working without failure for a specified period of time. Software reliability prediction is a task where we try to predict the future failures and their cost using the past failure data of the software. We believe that the approach used in the thesis will provide better and most accurate results.
5. Electric load forecasting Accurate electric load forecasting could prove to be a very useful tool for all market participants in electricity markets because it can not only help power producers and consumers make their plans but also can maximize their profits. In addition, the electric load is always influenced by various factors, including weather conditions, social and economic environments, dynamic electricity prices and more. Therefore, in the power system, the electric load is difficult to forecast and remains an enormous problem. The goal of electric load forecasting is to take advantage of every model used and find a balance between production and consumption. In the electricity market, precise electricity demand forecasting is often needed and is fundamental in many applications. With an accurate, quick, simple and robust electric load forecasting method, essential operating functions such as load dispatch, unit commitment, reliability analysis and unit maintenance can be operated more effectively. Thus, developing a method to improve the accuracy of electric load forecasting is essential and we expect our approach to be 101
Chapter 6: Conclusion and Future Directions
suitable in this case. 6. Forecasting of hydrology variables The accuracy prediction of hydrology and water resource can give important information for the city planning, land use, the design of civil project and water resource management. Hydrology system is influenced by many factors, such as weather, land with vegetal cover, infiltration, evaporation and transpiration, so it includes the good deal of stochastic dependent component, multi-time scale and highly nonlinear characteristics. It is recommended that future studies should explore the use of the WAVELET–ANN method in groundwater level forecasting for: other watersheds in different geographical regions; other lead times (such as daily, weekly, or yearly forecasting); comparing the forecasting performance of the wavelet based noise removal method to other filtering methods; comparing the use of different types of continuous (Morlet and Mexican Hat) and discrete (Daubechies) mother wavelets in the wavelet decomposition phase of the wavelet neural network forecasting method; comparing the wavelet neural network method with other new methods such as support vector regression with localized multiple kernel learning; and ensemble forecasting via the use of the bootstrap method to develop wavelet–bootstrap–neural network models. Better forecasting results can be expected by using the hybrid approach presented in this thesis.
102
Chapter 6: Conclusion and Future Directions
6.3 Publications out of this thesis Journal Papers: 1. Pir M.Y., Shah F.A., and Asger M., “Analyzing on Effect of U.S. Sub-Prime Crises on Five Major Stock Markets of Different Countries Using Hybrid Wavelet and Neural Network Model”, International Journal of Computer Science And Technology (IJCST), Vol. 7, Issue 1, ISSN: 0976-8491 (Online) | ISSN: 2229-4333 (Print) pp. 60-69(2016). 2. Pir M.Y., Shah F.A., and Asger M., “Analysing the Dependency of Exchange Rate on Crude oil Price with Wavelet Networks: Evidence from India”, International Journal of Computer Science Trends and Technology (IJCST),Vol. 3 Issue 5, ISSN: 2347-8578, pp. 202-209 (2015). 3. Pir M.Y., Shah F.A., and Asger M., “Using Wavelet Neural Networks to Forecast IIP Growth with Yield Spreads”, IPASJ International Journal of Computer Science (IIJCS), Vol. 2, Issue 5,ISSN: 2321-5992, pp. 31-36(2014). National Conference Papers: 4. Pir M.Y., Shah F.A., and Asger M., “Applications of Wavelet Neural Networks in Financial Time Series Forecasting: A Review”, 8th JK Science Congress, University of Kashmir, paper id : CS-07 (Computer Sciences),(2012). 5. Shah F.A. and Pir M.Y.“An Overview of Wavelet Based Neural Networks”, 6th JK Science Congress, University of Kashmir, paper id: CS-15 (Physical Sciences), (2010).
103
Bibliography 1.
Brockwell P.J. and Davis R.A., “Introduction to Time Series and Forecasting”, Springer (2002).
2.
Components of time series. Retrieved at : http://www.emathzone.com/tutorials/basic-statistics/components-of-time-series.htm
3.
Introduction to Time Series Analysis. Retrieved at : http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4.htm
4.
Chatfield C.,“Time Series Forecasting”. CHAPMAN & HALL/CRC (2000).
5.
Zhang G.P., “A neural network ensemble method with jittered training data for time series forecasting”, Information Sciences 177, pages: 5329–5346(2007).
6.
Zhang G.P., “Time series forecasting using a hybrid ARIMA and neural network model”, Neurocomputing 50 , pages: 159–175(2003).
7.
Adhikari R. and Agrawal R.K.,“An Introductory Study on Time Series Modeling and Forecasting”, LAP Lambert Academic Publishing, Germany (2013).
8.
Aladag C.H.,“Advances in Time Series Forecasting”, Bentham e-Books,Turkey(2012).
9.
Kim K.J.,“Financial Time Series Forecasting using Support Vector Machines”, Neurocomputing 55 pages: 307-319(2003).
10. Dongdong H., “The Forecast of Price Index Based on Wavelet Neural Network”, Business Intelligence and Financial Engineering (BIFE), IEEE Xplore, 4th International Conference, Wuhan ,China, ISBN: 978-1-4577-1541-9, 32-36 (2011). 11. Dunis C. L. and William M. S.,“Modelling and trading the eur/usd exchange rate: Do neural network models perform better?”, Derivatives Use, Trading & Regulation, 8:211239(2002). 12. Echauz, J., "Strategies for Fast Training of Wavelet Neural Networks," 2nd International Symposium on Soft Computing for Industry, 3rd World Automation Congress (WAC'98), Anchorage, Alaska, May 10-14, 1-6 (1998). 13. Fay D. and Ringwood J. V.,“A wavelet transfer model for time series forecasting” , International Journal of Bifurcation and Chaos, 17:3691-3696(2007). 14. Hsieha T., Hsiaob H. and Yeha W., “Forecasting stock markets using wavelet 104
transforms and recurrent neural networks: An integrated system based on artificial bee colony algorithm”, Applied Soft Computing 11:
2510–2525(2011).
15. Kamruzzaman J. and Sarker R. A., “Forecasting of currency exchange rates using ann: a case study”,Proceedings of the 2003 International Conference on Neural Networks and Signal Processing, 1:793-797(2003). 16. Kin Keung Lai W. H., Yu L. and Wang S. ,“Multistage neural network metalearning with application to foreign exchange rates forecasting”, Lecture Notes in Computer Science, 4293:338-347(2006). 17. Mitra S. and
Mitra A., “Modeling exchange rates using wavelet decomposed genetic
neural networks”, Statistical Methodology 3:103–124(2006). 18. Nag A. K. and Mitra.A., “Forecasting daily foreign exchange rates using genetically optimized neural networks”, Journal of Forecasting 21:501-511(2002). 19. Rivas V. M. and MereloJ.J.,“Evolving rbf neural networks for time-series forecasting with evrbf. Information Sciences, 165:207-220(2004). 20. Suhartono and Subanar , “Development of Model Building Procedures in Wavelet Neural Networks for Forecasting Non-Stationary Time Series”,European Journal of Scientific Research, ISSN 1450-216X Vol.34 No.3 pp.416-427 (2009). 21. Tao H., “A Wavelet Neural Network Model for Forecasting Exchange Rate Integrated with Genetic Algorithm” ,IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.8A, (2006). 22. Tenti P., “Forecasting foreign exchange rates using recurrent neural networks”, Applied Artificial Intelligence10:567-581(1996). 23. Wu B., “Model-free forecasting for nonlinear time series (with application to exchange rates)”, Computational Statistics & Data Analysis, 19:433-459(1995). 24. Yousefi I. W. S. and Reinarz D., “Wavelet-based prediction of oil prices”. Chaos, Solitons and Fractals25:265-275(2005). 25. `Yu Y., “Evaluation of Wavelet Neural Network for Predicting Financial Market Crisis”, 1st International Conference on Information Science and Engineering (ICISE), ISBN: 978-1-4244-4909-5, pp. 4861 – 4864(2009). 26. Zhao Y., Zhang Y. and Qi C.,“Prediction Model of Stock Market Returns Based on 105
Wavelet Neural Network”,IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application (2008). 27. Zhang B.L. and Coggins R.,“Multiresolution forecasting for futures trading using Wavelet decompositions”,IEEE Transactions on Neural Networks, 12(4):765-775 (2001). 28. Zhang B.L. and Dong Z.Y.., “An adaptive neural-wavelet model for short term load forecasting”. Electric Power Systems Research, 59:121-129(2001). 29. Hsieh T.H., Hsiao H.F.,Yeh W.C., “ Forecasting Stock Markets using Wavelet Transforms and Recurrent Neural Networks: An Integrated System Based on Artificial Bee Colony Algorithm”. Applied Soft Computing, 11(2) : 2510-2525(2010). 30. Wang J.Z, Wang J.J, Zhang Z.G.,Guo S.P., “ Forecasting Stock Indices with Back Propagation Neural Network”. Expert Systems with Applications, 38 :14346-14355(2011) . 31. Ortega L.F., “A Neuro-wavelet Method for the Forecasting of Financial Time Series”, Proceedings of the World Congress on Engineering and Computer Science 2012, Vol I WCECS, San Francisco, USA (2012). 32. Behradmehr N., and Ahrari M., “Forecasting Crude Oil Prices: A Hybrid Model Based on Wavelet Transforms and Neural Networks”. Intl. J.Humanities, 21(3): 131-150(2014). 33. Ye Q. and Wei L., “ The Prediction of Stock Price Based on Improved Wavelet Neural Network”. Open Journal of Applied Sciences, 5: 115-120(2015). 34. Neves J.C. and Vieira A., “Improving Bankruptcy Prediction with Hidden Layer Learning Vector Quantization”. European Accounting Review, 15(2): 253-271(2006). 35. Abraham A., “ Artificial Neural Networks” , Handbook of Measuring Design
edited
by Peter H. Sydenham and Richard Thorn, Article No 129 ,John Wiley & Sons(2005). 36. Haykin S., “Neural Networks-A comprehensive Foundation”, 2nd Edition, Pearson Education, Inc(1999). 37. Gurney, K.,“An Introduction to Neural Networks”, Routledge, ISBN 1-85728-673-1 London (1997). 38. Krenker A.,Volk M.,Sedlar U., Bester J., and Kos A. “Bidirectional artificial neural networks for mobile-phone fraud detection”. ETRI Journal, 31(1):92-94 (2009). 39. Krose B., Smagt P.,“An Introduction to Neural Networks”. The University of Amsterdam, Amsterdam (1996). 106
40. Rojas R., “Neural Networks: A Systematic Introduction”. Springer, ISBN 3-540-60505-3 Germany (1996). 41. Krenker A., Bešter J. and Kos A., “Introduction to the Artificial Neural Networks”. Artificial Neural Networks-Methodological Advances and Biomedical Applications-Prof. Kenji
Suzuki
(Ed.),
ISBN:
978-953-307-243-2,
Intec.
Retrieved
from:
http:\\www.intechopen.com. 42. Zhang G.P.,Patuwo B.E. and Hu M.H.,“A simulation study of artificial neural networks for nonlinear time-series forecasting”. Computers &Operations Research, 28: 381-396(2001). 43. Huang Y., “Review:Advances in Artificial Neural Networks – Methodological Development and Application”. Algorithms, 2, 973-1007(2009). 44. Gao R.X.and Yan R., “Wavelets: Theory and Applications for Manufacturing” ,DOI, Springer Science, Business Media, LLC (2011). 45. Graps A., “An Introduction to Wavelets”, Institute of Electrical and Electronics Engineers, Inc.(1995). 46. Gabor D., “Theory of communications”, J. Inst. Elect. Eng. 93: 429–457(1946). 47. Crowley P., “A guide to wavelets for economists”, Journal of Economic Surveys, 21 (2) 207–267(2007). 48. Kwon D.W., Ko, K., Vannucci, M., Reddy, A.L.N., Kim. S., “Wavelet methods for the detection of anomalies and their application to network traffic analysis”. Quality and Reliability Engineering International 22, 953–969(2006). 49. Kaijian H., Lean Y. , Kin K.L., “Crude oil price analysis and forecasting using wavelet decomposed ensemble model”. Journal of Energy and ExergyModelling of Advance Energy Systems 46(1), 564–574(2012). 50. Ramsay DJB,Usikov, Zalavsy G., “An analysis of US stock price behavior using wavelets”. Fractals 3(2), 377-389(1995). 51. Shrestha K., Tan K., “Real Interest rate parity: Long-run and short-run analysis using wavelets”. Review of Quantitative Finance and Accounting 25(2), 139-157(2005). 52. Wong H., Wai-Cheung I., Zhongjie X., Lui X., “Modelling and forecasting by wavelet, and
their application to exchange rates, Journal of Applied Statistics 30(5), 107
571-584(2003). 53. Yousefi S., Weinreich I., Reinarz D., “Wavelet-based prediction of oil prices”. Journal of Chaos, Solitons & Fractals 25, 265-275(2005). 54. Charles K. Chui, “An Introduction to Wavelets”, Academic Press, San Diego, ISBN 0585-47090-1,(1992). 55. Meyer Y., “Daubechies wavelets”, Chapter 2, CRC Press LLC,(1999). 56. Jense Arne and Anders la Cour-Harbo, “Ripples in Mathematics: the Discrete Wavelet Transform”, Springer,(2001). 57. L. Debnath, F.A. Shah, “Wavelet Transforms and Their Applications”, Springer Science+Business Media, New York,(2015). 58. Mallat, S., “A theory for multiresolution signal decomposition: The wavelet representation” , IEEE Trans. Patt. Recog. And Mach. Intelt. 11, 678–693,(1989) 59. Daubechies I., “Orthonormal basis of compactly supported wavelets”, Comm. Pure Appl. Math., 41, 909-906,(1990). 60. Mallat, S., “Multiresolution representation and wavelets”, Ph.D. Thesis, University of Pennsylvania, (1988). 61. Wagner N., “Intelligent techniques forforecasting multiple time seriesin real-world systems”, International Journal of Intelligent Computing and Cybernetics, 4 (3), 284-310 (2011). 62. Margaret V., Jose J., International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, 4(2), 1133-1139 (2015). 63. NIST/SEMATECH,“e-Handbook of Statistical Methods” at : http: //www.itl.nist.gov/div898/handbook/. 64. Kalehar
P.
Smoothing”
S.,“Time
series
Forecasting
using
Holt-Winters
Exponential
:http://www.it.iitb.ac.in/~praj/acads/seminar/04329008_ExponentialSmoot
hing.pdf. 65. Zhai Y., “Time series forecasting competition among three sophisticated paradigms”. Thesis (M.S.)--University of North Carolina at Wilmington(2005), available at : https://libres.uncg.edu/ir/uncw/f/zhai2005-2.pdf. 66. Kutner M. H., Nachtsheim C. J., and Neter J., "Applied Linear Regression Models", 108
4th ed., McGraw-Hill/Irwin, Boston (2004). 67. Ravishankar N. and Dey D. K., "A First Course in Linear Model Theory", Chapman and Hall/CRC, Boca Raton,(2002). 68. Haque M. ,Rahman A., Hagare D., and Kibria , “ A comparison of linear and non linear regression modeling for forecasting long term urban water demand: A Case study for Blue Mountains Water Supply System in Australia” , ICWRER _2013 proceedings, 363-374 (2013). 69.
Regression Analysis available at : https://en.wikipedia.org/wiki/Regression_analysis.
70. Tong, H.. “Threshold Models In Non-linear Time Series Analysis”,New York: SpringerVerlag. 101–141(1983). 71. Tong, H. Non-linear Time Series. Oxford: Clarendon Press (1990). 72. Tong, H., “Birth of the threshold time series model.” Statistica Sinica, 17, 8–14(2007). 73. Tong, H. and Lim, K. S.. “Threshold autoregression, limit cycles and cyclical data (with discussion).” Journal of the Royal Statistical Society B, 42, 245–292(1980). 74. Tsay, Ruey S., “Testing and modeling threshold autoregressive processes”, Journal of the American Statistical Association, Vol. 84, 231 - 240(1989). 75. Li J., “Threshold Autoregressive Modeling of Bond Series – Japanese Case”. Investment Management and Financial Innovations, Volume 3, Issue 4(2006). 76. Edward N., “Modelling And Forecasting Using Time Series Garch Models: An application Of Tanzania Inflation Rate Data”.
M.Sc. (Mathematical Modelling)
Dissertation , University of Dar es Salaam (2011). 77. Engle R., “GARCH 101: The Use of ARCH/GARCH Models in Applied Econometrics”. Journal of Economic Perspectives—Vol. 15(4), 157–168(2001). 78. Engle R.,"Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation". Econometrica 50 (4): 987–1007(1982). 79. B. Cheng and D. M. Titterington, “Neural networks: Review from a statistical Perspective”.Statist. Sci. 1, 2-54(1994). 80. G. E. P. Box, G. M. Jenkins, and G. C. Reinsel , “ Time Series Analysis, Forecasting and Control”. Third edition. Prentice Hall, Englewood Cliffs, N.J.(1994). 81. Tong H., “Non-linear Time Series”.
Oxford University Press, Oxford (1990). 109
82. Brooks C., “Introductory Econometrics for Finance”. Cambridge University Press, 7th edition(2006). 83. McNelis, P. D., “Neural Networks in Finance: Gaining Predictive Edge in the Market” Elsevier Academic Press, UK (2005). 84. Haykin, S., “Neural Networks a Comprehensive Foundation”. IEEE Computer Society Press(1994). 85. Mehrotra, K., C. K.Mohan y S. Ranka., “Elements of Artificial Neural Networks”. The MIT Press, USA(2000). 86. Allende H., Moraga C And Salas R., “Artificial Neural Networks In Time Series Forecasting: A Comparative Analysis” . K ybernetica, Vol. 88 (6), 685 - 707(2002). 87. Gómez-Ramos E.,Venegas-MartínezF., “A Review Of Artificial Neural Networks: How Well Do They Perform in Forecasting Time Series? ”. Journal of Statistical Analysis, Vol. 6(2), 7-15(2013). 88. I. Kaastra and M. Boyd, “Designing a neural network for forecasting financial And economic time series”. Neurocomputing, 10:215–236(1996). 89. Sharda R. and Patil R. B., “A connectionist approach to time series prediction: An empirical test. In G. J. Deboeck, editor, Trading on the Edge: Neural,Genetic, and Fuzzy Systems for Chaotic Financial Markets, pages 451–464.Wiley, New York,(1994). 90. Tang Z., Almedia C. de, and Fishwick P. A., “Time series forecasting using neural networks vs. Box-Jenkins Methodology”. In International Workshop on Neural Networks, Auburn, AL, (1990). 91. Masters T., “Practical Neural Network Recipes in C++”. Academic Press, New York (1993). 92. Deboeck G. J. and Cader M., “Trading U.S. treasury notes with a portfolio of neural net models. In G. J. Deboeck, editor, Trading on the Edge: Neural, Genetic, and Fuzzy Systems for Chaotic Financial Markets, pages 102–122.Wiley, New York, (1994). 93. Pissarenko D. , “Neural Networks For Financial Time Series Prediction: Overview Over Recent Research” . Retrieved at: http://openbio.sourceforge.net/resources/eigenfaces/Pissarenko2002.pdf. 94. Vlad S., “On the Prediction Methods Using Neural Networks”. Retrieved at : 110
http: //citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.9021 95. Mitra S., “A wavelet filtering based analysis of macroeconomic indicators: the Indian evidence”. Applied Mathematics and Computation , Elsevier Inc.,175 ,1055–1079(2006). 96. Mitra A., Mitra S., “Forecasting business cycle movements using wavelet filtering and neural networks”. Finance India 18 (2) 1605–1626 (2004). 97. Gençay, Ramazan, Selçuk, Faruk, & Whitcher, Brandon J., “An introduction to wavelets and other filtering methods in finance and economics”. Academic Press, San Diego, CA, 359 p (2002). 98. Gherman M., Terebes R., Borda M., “Time Series Analysis Using Wavelets and GjrGarch Models”. 20th European Signal Processing Conference (EUSIPCO 2012), Bucharest, Romania( 2012). 99. Schleicher C.,“An Introduction to Wavelets for Economists”.Bank of Canada Working Paper, No. 02-3(2002). 100. Bašta M., “Additive Decomposition and Boundary Conditions in Wavelet-Based Forecasting Approaches”, AOP 22(2), ISSN 0572-3043 (2014). 101. Yu In-K., Kim C. and Song Y. H.,“A Novel Short-Term Load Forecasting Technique Using Wavelet Transform Analysis”, Electric Machines and Power Systems, 24, 537–549(2000). 102. Gallegati M., Gallegati M., Ramsey J.B., Semmler W.,“The US Philips curve across frequencies and over time”, Oxford Bulletin of Economics and Statistics 73 (4), 489-508(2011). 103. Michis A.,“Time Scale Evaluation of Economic Forecasts”, Working Paper 2014-01, Central Bank Of Cyprus Eurosystem, Working Paper Series(2014). 104. Veitch D., “Wavelet Neural Networks and their application in the study of dynamical systems”. M.Sc. Thesis , University of York , Department of Mathematics, UK, Aug. (2005). 105. Pati, Y.C and Krishnaprasad, P.S.,“ Discrete affine wavelet transforms for analysis and synthesis of feedforward neural networks“. Advances in Neural Information Processing Systems 3, 743-749 (1991). 106. Pati, Y.C and Krishnaprasad, P.S., “Analysis and synthesis of feed forward neural 111
networks using discrete affine wavelet transformations”. IEEE Trans. On Neural Networks 4,73-85 (1992). 107. Zhang and Walter, G.G., “Wavelet networks for functional learning”, Workshop on Information Theory and Statistics,Proceedings. IEEE-IMS (1994). 108. Szu H., Telfer B. and Garcia J., Wavelet Transforms and Neural Networks for Compression and Recognition, Neural Networks 9, 695-708 (1996). 109. Fernando M. J., Carvalho B., and Vasconcelos, G. C., “Function Approximation by Polynomial Wavelets Generated from Powers of Sigmoids”. SPIE 2762, 365-374(1996). 110. Szu, H., Telfer, B. and Garcia J., “Wavelet Transforms and Neural Networks for Compression and Recognition”. Neural Networks 9, 695-708(1996). 111. Hornik K., “Multilayer Feedforward Networks are Universal Approximators”. Neural Networks 2, 359-366 (1989). 112. Kreinovich V., Sirisaengtaksin O. and Cabrera S., “Wavelet Neural Networks are Asymptotically Optimal Approximators for Functions of One Variable”, IEEE, 299-304 (1994). 113. Echauz J., "Strategies for Fast Training of Wavelet Neural Networks," 2nd International Symposium on Soft Computing for Industry, 3rd World Automation Congress (WAC'98), Anchorage, Alaska, May 10-14, 1-6 (1998). 114. Echauz J. and Vachtsevanos G., "Elliptic and radial wavelet neural networks”, in Proc.Second World Automation Congress (WAC’96), Montpellier, France, vol. 5, 173-179 (1996). 115. Boubez T., Peskin I., and R. L., “Wavelet Neural Networks and Receptive Field Partitioning”, Proc. 1993, IEEE Int. Conf. on Neural Networks (ICNN’93), San Francisco, Vol. 3, 1544-1549 (1993). 116. Daugmann, J., “Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression“, IEEE Trans. Acoust., Speech, Signal Proc. 36, 1169-1179 (1988). 117. Rodriguez P., Wiles J., and Elman J. L., “A recurrent neural network that learns to count”. Connection Science, 11(1):5–40,(1999). 118. Ying and Zhidong, “Multiresolution Neural Networks Based on Immune Particle 112
Swarm Algorithm”, Springer-Verlag Berlin Heidelberg 2006 , ICNC 2006, Part I, LNCS 4221, pp. 97–106, (2006). 119. Rao S., and Kumthekar B., “Recurrent Wavelet Networks”, Neural networks for signal processing III :proceedings of the 1993 IEEE-SP Workshop IEEE, 3143-3147 (1993). 120. Pittner S., Kamarthi S.V., and Ginglan G., “Wavelet networks for sensor signal classification in flank wear assessment“, J. Intelligent Manufacturing 9, 315-22 (1998). 121. Dickhaus H., and Heinrich H., “Identification of high risks patients in cardiology by wavelet networks“, Proc. 18th IEEE Engineering in Medicine and Biology, 31 Oct-3 Nov.1997, IEEE, vol. 3, 923-4 (1996). 122. Katic D., and Vukobratovic M.,“Wavelet neural network approach for control of non-contact and contact robotic tasks“, Proc. IEEE Symposium on Intelligent Control, 1618 July 1997 (Istambul), IEEE, 245-50 (1997). 123. Bakshi A. K, Koulanis A., Stephanopoulos G.,“Wave-nets: novel learning techniques, and the induction of physically interpretable models“, SPIE 2242, 637-648 (1994). 124. Bakshi B. R., and Stephanopoulos. G, “Wavenet: a multiresolution hierarchical neural network with localized learning”. AICHE Journal,39(1), 57-81, (1993). 125. Yamakawa, Uchino, and Samatsu K., “Wavelet neural networks employing over-complete number of compactly supported non-orthogonal wavelets and their applications, Neural Networks. IEEE World Congress on Computational Intelligence (1994). 126. Kollias S. D., “A multiresolution neural network approach to invariant image recognition”. Neurocomputing ,12(1) : 35-57(1996). 127. Ying and Zhidong, “Multiresolution Neural Networks Based on Immune Particle Swarm Algorithm” Springer-Verlag Berlin Heidelberg, ICNC 2006, Part I, LNCS 4221, pp. 97–106, (2006). 128. Vicente and Javier, IEEE Trans. on Systems, Man, and Cybernetics—Part c: Applications and Reviews, Vol. 36, No. 2, (2006). 129. Zakeri, Naghavi and Safavi , “A Comprehensive Study for Real-Time Learning of
113
Wave-Net Models of a Nonlinear Time-Varying Experimental Process”. Proceedings of the 15th Mediterranean Conference on Control & Automation, July 27-26, Athens-Greece (2007). 130. Aadaleesan, Miglan N., Sharma R. and Saha P., “Nonlinear system identification using Wiener type Laguerre–Wavelet network model”. Chemical Engineering Science, Volume 63, Issue 15, Pages 3932-3941 (2008). 131. Mahmood, Qureshi and Kamran, “Application of wavelet multi-resolution analysis & perceptron neural networks for classification of transients on transmission line”. Universities Power Engineering Conference, AUPEC '08. Australasian (2008). 132. Turgay P., Kerem H., and Cigizoglu, “Prediction of daily precipitation using Wavelet–neural networks”. Hydrological Sciences Journal, Volume 54, Issue 2, Pages 234-246, (2009). 133. Bachmann G., Narici L. and Beckenstein E., “Fourier and Wavelet Analysis”. SpringerVerlag (1999). 134. Crowley P., “A guide to wavelets for economists”. Journal of Economic Surveys, 21 (2), 207-267(2007). 135. Hern´Aandez E. and Weiss G., “A First Course on Wavelets”. CRC Press, New York (1996). 136. Ogden R. T., “Essential Wavelets for Statistical Applications and Data Analysis”. Birkhauser, Boston (1997). 137. Percival D. P. and Walden A. T., “Wavelets in Time Series Analysis”. Cambridge University Press, Cambridge (2000). 138. Ramsey J. B., “Wavelets in economics and finance: past and future”. Studies in Nonlinear Dynamics and Econometrics, 6 (3), 1-27(2002). 139. Vidakovic B., “Statistical Modeling by Wavelets”, John Wiley and Sons. New York (1999). 140. Wojtaszczyk P., “A Mathematical Introduction to Wavelets”. Cambridge University Press, Cambridge (1997). 141. Alexandridis A. K. and Zapranis A. D., “Wavelet Neural Networks”. John Wiley & Sons, Hoboken, New Jersey (2014). 142. Hubbard B., “The World According to Wavelets”. A. K. Peters (1996). 114
143. Michael Azoff
E., “Neural Network Time Series: Forecasting of Financial Markets”. A
Wiley Finance Edition (1994). 144. Fausett L. V., “Fundamentals of Neural Networks: Architectures, Algorithms and Applications”. Pearson edition (2004). 145. Remus W. and O’Connor M. “Principles of Forecasting”. Springer US (2001). 146. Shadbolt J. “Neural Networks and the Financial Markets: Predicting, Combining and
Portfolio Optimisation (Perspectives in Neural Computing)”. Springer (2002). 147. Vemuri V. “Artificial Neural Networks: Forecasting Time Series (Forecasting time series/ieee catalog no Eh03822)”. IEEE Computer Society Press, U.S. (1994). 148. Vijay R., “Time Series Analysis Using Neural Networks”. LAP Lambert Academic Publishing (2012). 149. Montgomery D.C., Jennings C. L. and Kulahci M., “Introduction to Time Series Analysis and Forecasting, 2nd Edition”. Wiley (2015). 150. McNelis P.D., “Neural Networks in Finance: Gaining Predictive Edge in the Market”. Academic Press (2005). 151. Ryan T. P., “Modern Regression Methods”. Wiley-Interscience (2008). 152. Liu Lon-M. “Time Series Analysis and Forecasting”. Scientific Computing Associates Corp.( 2009). 153. Chatfield C., “Time-Series Forecasting”. Chapman and Hall/CRC (2000). 154. Makridakis S., Wheelwright S., and Hyndman R.J., “Forecasting: methods and Applications”. Wiley (1998).
115
Similarity Index Similarity index of various Chapters as per urkund analysis1 S. No.
Chapter No. & Title
Similarity Index (%age)
1
Chapter-1: Introduction to Time Series Analysis
10
2
Chapter-2: Artificial Neural Networks and Wavelet Analysis
8
3
Chapter-3: Forecasting Models for Time Series Analysis
5
4
Chapter-4: Wavelet Neural Networks in Financial Forecasting
2
5 6
Chapter-5: Experimental Results& Discussions ization Chapter-6: Conclusion and Future Directions
1https://secure.urkund.com
116
1 5
117
118
119
120
121
122
Selected Publications out of this Ph.D. Thesis
123