Methods of Denoising Financial Data
18
Thomas Meinl and Edward W. Sun
Contents 18.1 18.2 18.3
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Denoising (Trend Extraction) of Financial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3.1 General Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3.2 Transfer Functions: Time Versus Frequency Domain . . . . . . . . . . . . . . . . . . . . . . . . 18.4 Nonlinear Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4.1 General Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.5 Specific Jump Detection Models and Related Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.5.1 Algorithms for Jump Detection and Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.5.2 Further Related Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
520 520 524 524 525 527 527 531 532 533 536 536
Abstract
Denoising analysis imposes new challenges for financial data mining due to the irregularities and roughness observed in financial data, particularly, for instantaneously collected massive amounts of tick-by-tick data from financial markets for information analysis and knowledge extraction. Inefficient decomposition of the systematic pattern (the trend) and noises of financial data will lead to erroneous conclusions since irregularities and roughness of the financial data make the application of traditional methods difficult. In this chapter, we provide a review to discuss some methods applied for denoising analysis of financial data.
T. Meinl (*) Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany e-mail:
[email protected] E.W. Sun KEDGE Business School and BEM Management School, Bordeaux, France e-mail:
[email protected] C.-F. Lee, J. Lee (eds.), Handbook of Financial Econometrics and Statistics, DOI 10.1007/978-1-4614-7750-1_18, # Springer Science+Business Media New York 2015
519
520
T. Meinl and E.W. Sun
Keywords
Jump detection • Linear filters • Nonlinear filters • Time series analysis • Trend extraction • Wavelet
18.1
Introduction
Technical developments allow corporations and organizations to instantaneously collect massive amounts of data, particularly the tick-by-tick data from financial markets. Mining financial data turns to be the foundation for financial informatics and stimulates the research interest in analyzing information conveyed at different frequencies in this data. Financial data have the complex structure of irregularities and roughness that are caused by a large number of instantaneous changes of the markets and trading noises. The noises conveyed in the financial data usually illustrate heavy tailedness, that is, the underlying time series data exhibits a large number of occasional jumps. Ignoring these irregularities can easily lead to erroneous conclusions for data mining and statistical modeling. As consequence, the statistical data mining methods (or models) require a denoising algorithm to clean the data in order to obtain more significant results (see Sun and Meinl (2012)). Most data cleaning methods focus only on a known type of irregularity. In financial data, the irregularity is manifold, that is, the irregularity varies along with time and changes with different measuring scales (see Fan and Yao (2003) and Sun et al. (2008)). Finding an effective denoising algorithm turns out to be the initial task of financial data mining (see Au et al. (2010) and Meinl and Sun (2012)). In this chapter we will outline the classic and newly established methods for the denoising analysis (trend extraction) of financial data analysis. We talk about different approaches and a respective classification of them. Based on this classification we focus on nonparametric methods (i.e., linear and nonlinear filters) and examine their pros and cons.
18.2
Denoising (Trend Extraction) of Financial Data
There is no universal definition of trend which applies to all application in different fields, and it is generally accepted that a trend is a slowly evolving component that is the main driving force for long-term development beneath the system. Pollock (2001) characterizes trend as being limited to certain low frequencies of the data. This notion excludes any noisy influences and fluctuations from higher frequency levels. However, this notion of trends is not satisfactory for many financial data we encounter in practice. Some theoretical models (like the Black-Scholes model) do not incorporate aspects like seasonalities or even jumps; they are still widely used today in practice, assuming perfect division between the trend and stochastic fluctuations. It is insufficient for many financial data we measure today, especially, when considering trends over longer time horizon, we perceive significant jumps or steep slopes
18
Methods of Denoising Financial Data
521
which cannot be attributed to be part of the persistent stochastic noise. Another example we can observe in electricity markets, where it has been found that jumps occur on such a regular basis that is has been found reasonable to model these by specific stochastic processes; see Seifert and Uhrig-Homburg (2007). These patterns contradict the slow evolving characteristic and the low-frequency notion generally associated with trends and have been considered as an inherent part of them. A basic model (among others proposed by Fan and Yao (2003)) is stated as follows. Given a time series Xt, t 0, this series can be decomposed into X t ¼ # t þ st þ Y t ,
(18.1)
where #t represents a slowly varying function known as trend component, st a single of combination of periodic functions (i.e., seasonal components), and Yt the stochastic component, assumed to be stationary. A trend is a mostly slow (in respect to the noise) evolving pattern in a time series, with its driving force not being attributed to any noise present in the signal. Trends may also exhibit edged frontiers (i.e., jumps and sudden regime changes) as well as steep slopes, roofs, and valleys (see Joo and Qiu (2009)), as long as these patterns can be contributed to the long-term dynamics of the time series and do not stem from the noise component responsible for short-term variations. We note that trend must always be interpreted in respect to the time series at hand (i.e., the period coverage of the financial data) and the goal of the data analysis, that is, on which scale the trend and the noise are relevant. We ignore any distinction between long-, intermediate-, and short-term trends and/or seasonalities, as these usually depend on the context of the financial data. The question that may now arise is if we consider a trend # to be specified according to Eq. 18.1, how does this agree with the notion that the trend may also exhibit jumps, which clearly contradict this definition? Without going further into this, we only note that this kind of trend would require a different (and more accurate) definition of how trend is to be defined in a time series, than the generally accepted notion in today’s literature. While we will not propose a new definition either and leave this task to others, we can point out how jumps enter the trend. As pointed out by, for example, Tsay (2005), jumps in financial time series data, and particularly in high-frequency data, are attributed to external events, like the increase or drop in interest rates by some governmental financial institution. These events can be considered to happen only occasionally and are very sparse in relation to the frequency the data is measured, that is, the amount of data exhibiting no jumps at all. In the field of high-frequency financial data analysis, jumps are thus assumed to be extreme events that happen with low probability but form nevertheless part of the stochastic distribution and must be considered to be modeled there. Thus, Y will be modeled either by stochastic processes including jump components (see, e.g., Seifert and Uhrig-Homburg (2007)) or by a distribution itself, depending on the model. Such a distribution for high-frequency financial data has then found to be heavy tailed, that is, jumps happen with enough regularity that they cannot simply be discarded as nonrecurring events; see, for example, Rachev (2003).
522
T. Meinl and E.W. Sun
However, as these extreme events can have an enormous impact on the stochastic variance analysis and its succeeding usage, and furthermore could lead to misleading results in the regions of the signal without any jumps, it is generally preferred to include them into the trend component # rather than the stochastic component Y. In, for example, Wang (1995), a definition of an a-cusp in a continuous function f at x0 is given, that is, if there exists an e > 0 so that jf ðx0 þ hÞ f ðx0 Þj Kjhja
(18.2)
holds for all h 2 [x0 e, x0 + e]. For a ¼ 0, f is said to have a jump at x0. It is commonly agreed that the jump should significantly differ from the other fluctuations (i.e., the noise) in the signal. As said above, jumps are just one particular pattern of extreme events we are interested in. Others are steep slopes, roofs, and valleys, which in Joo and Qiu (2009) are defined by at this point having a jump in the first-order derivative of the regression curve. Other extreme events frequently occurring in many practical applications are spikes and outliers. However, these are usually undesirable features that should not be included in the trend of affect it by any means. This is due to the following reasons. First, in many cases, these outliers or spikes consist only of one or very few points often caused by measurement errors, and it is obvious that they were caused by some factor that plays no vital role in the ongoing time series analysis (unless the focus is on what caused these outliers). Second, while jumps imply a permanent change in the whole time series, outliers do not contribute to this. While we are aware that the distinction between a few (adjacent) outliers and roofs/valleys is not precise, from the context of the time series in most cases, it is evident whether an occurrence should be considered as an outlier that is to be neglected or a significant feature to be included in the trend. In this work we only consider homogeneous financial time series data. However, in many applications and particularly for high-frequency financial time series data, this data is initially irregularly spaced. Homogeneity in this context means that for a given series Xt, t 2 N, holds t + 1 t ¼ c, with a constant c > 0, that is, all time steps are equally spaced. As we will see, this is not always the case for empirical time series, especially in the area of financial high-frequency data. In this case, it is necessary to preprocess the inhomogeneous (i.e., irregularly spaced) time series by interpolation methods in order to regularize the raw data. Though there exist models which can handle inhomogeneous time series directly (see Dacorogna (2001), but they also remark that most today’s models are suited for regularly spaced time series only), regarding the methods we discuss in the following sections, we restrict ourselves to homogeneous ones. In this chapter we focus on nonparametric methods for trend extraction. This is due to the reason that in most time series we analyze in this work, we cannot reasonably assume any model for the underlying trend. Yet, as noted in the framework above, in case such assumptions hold, we can expect those models to
18
Methods of Denoising Financial Data
523
perform better than nonparametric models, since they exploit information that nonparametric approaches cannot. Furthermore, the commitment to certain parametric time series or trend models can be seen as a restriction when considering the general case, which may lead even to misleading results in case the trend does not match model, as certain patterns might not be captured or considered by the model itself. This can easily be seen at a most basic example, in case a linear trend is expected, which in most cases will only be a poor estimator for any nonlinear trend curve. In this case nonparametric approaches are less restrictive and can more generally be applied, while of course not delivering the same accuracy as parametric models which exactly match the underlying trend, with only their parameters to be calibrated. If, e.g., the trend follows sinusoidal curve, a sinusoidal curve with its parameters being estimated by, e.g., the least-squares method will almost surely provide a better accuracy than any other nonparametric approach. On the other hand, if the underlying trend is linear or even contains only little deviations from a perfect sinusoidal curve, a sinusoidal fit to this trend, it has been shown at simple examples that the parametric sinusoidal approach can lead to confusing results and conclusions. We further require that the method used for trend extraction be robust, that is, the results are reliable and the error can be estimated or is at least bounded in some way. In many cases (see, e.g., Donoho and Johnstone (1994)), the robustness of a method is shown by proving its asymptotic consistency, that is, its convergence towards a certain value for certain parameters tending towards infinity. It should be remarked that the robustness should be independent of the time series itself and/or any specific algorithm parameter sets, in order to be applicable in practice. Of course this does not exclude specific assumptions on the time series that must be met or parameter ranges for which the algorithm is defined. Therefore, as we cannot reasonably assume any model for the any time series in general, in this work, we focus on nonparametric approaches only. Within these approaches we focus on two main branches for trend extraction: linear and nonlinear filters. This is for the reason because linear filters are known and have proven to deliver a very smooth trend (given the filtering window size is large enough), while nonlinear filters excel at preserving characteristic patterns in a time series, i.e., especially jumps. Both methods in general require only very few (or none at all) information about the underlying data, besides their configuration of weights and calibration parameters, and are thus applicable to a wide range of time series, independent of the field the data was measured in. Although there exists a variety of other nonparametric methods, most of these already rely on specific assumptions or choices of parameters which in general cannot easily be derived for any time series data or different analysis goals. Nevertheless, for the sake of completeness, later in this chapter we list some alternative methods, also including parametric approaches, which have been applied in economic, financial, or related time series data.
524
18.3
T. Meinl and E.W. Sun
Linear Filters
Linear filters are probably the most common and well-known filters used for trend extraction and additive noise removal. We first provide the most general notion of this filter class and then depict the two viewpoints of linear filters and how they can be characterized. While this characterization on the one hand is one of the most distinguishable advantages of linear filters, on the other hand at the same time, it leads to the exact problem we are facing in this work, that is, the representation of sharp edges in otherwise smooth trends.
18.3.1 General Formulation The filtered output depends linearly from the time series input. Using the notation of Fan and Yao (2003), a linear filter of length 2h + 1 can be defined as ^t ¼ X
h X
wi Xtþi ,
(18.3)
i¼h
^ t the filtered output and wi the filter weights. These kinds of filters are with X also known as (discrete) convolution filters, as the outcome is the convolution of the input signal with a discrete weight function, where often the following notation is used: h h X X w Xt :¼ wi Xti ¼ wti Xi : (18.4) i¼h
i¼h
^ t is the result of weighted Thus, for every data point Xt, the filtered output X summation of data points around t. Applied for the whole time series, this results in weighted average window of size L ¼ 2h + 1 which is moved throughout the whole series. The size of this window is also called the bandwidth of the filter. It is also common notation to set h 1 even if the window filter size should be finite, with wi ¼ 0 for jij > h. The probably best known linear filter is the mean filter, with wi ¼ 2h + 1, that is, all filter weights are uniformly distributed. A more general viewpoint is given by the notion of kernel filters. Given a kernel function w.l.o.g. with support [1, 1], this function assigns the weights according to K ði=hÞ wi ¼ Xh : K ð j=h Þ j¼h
(18.5)
Commonly used examples are the Epanechnikov kernel K E ð uÞ ¼
3 1 u2 þ , 4
(18.6)
18
Methods of Denoising Financial Data
525
the Gaussian kernel 2 1 u K ðuÞ ¼ pffiffiffiffiffiffi exp , 2 2p G
(18.7)
and the symmetric beta family K Bg ðuÞ ¼
g 1 1 u2 I juj1 : Bð1=2, g þ 1Þ
(18.8)
For the values g 2 {0, 1, 2, 3}, the kernel KBg (u) corresponds to the uniform, Epanechnikov, biweight, and triweight kernel functions, respectively.
18.3.2 Transfer Functions: Time Versus Frequency Domain The previous section depicts the linear filtering method in the time domain, i.e., we ^ t and how they evolve look at the time series Xt and its respective filtered output X over time t. Another perception can be given by taking the frequency domain into account. For all linear filters, we cannot only give its definition as depicted earlier, but also in respect to the frequencies, the filters let pass. This notion can be derived as follows. While the sequence of filter weights wi, also called impulse response sequence, determines the filtered output in the time domain (or equivalent are the linear filter’s time domain representation), via the discrete Fourier transform (DFT), we can derive the transfer function 1 X
Wð f Þ ¼
wj ei2pf j ,
(18.9)
j¼1
its counterpart in the frequency domain, also called frequency response function. Alternatively, if this formulation is given in the beginning, we can also derive the weights via the inverse transform: Z wj ¼
1=2 1=2
W ð f Þei2pf j df :
(18.10)
Obviously, these two formulations are equivalent, as one can be derived from the other, and vice versa. By considering the transfer function’s polar representation, W ð f Þ ¼ jW ð f Þjeiyðf Þ ,
(18.11)
526
T. Meinl and E.W. Sun
with jW ð f Þj the gain function. The magnitude in gain jW ð f Þj describes the linear filters behavior in the frequency domain, that is, what kind of frequencies and their respective proportions will let passed or be blocked. For our needs, it satisfies to distinguish between high – and low-pass filters, that is, filters that let pass either the high frequencies and block the lower ones, or vice versa. Of course, other filter types exist; for example, by combining high – and low-pass filters (e.g., in a filter cascade), one can derive band-pass and stop filters, so that the frequency domain output will be located only in a certain frequency range. In this work we are specifically interested in low-pass filters, as they block the high-frequency noise and the output consists of the generally low-frequency trend. As we assume that the weights wj are real valued, one can show (see, e.g., Percival and Walden (2006)) that W ðf Þ ¼ W ð f Þ, and, with jW ðf Þj ¼ jW ð f Þj , it follows that jW ðf Þj ¼ jW ð f Þj . Therefore, the transfer functions are symmetric around zero. Because of its periodicity, we need to consider W ð f Þ only an interval of unit length. For convenience, this interval is often taken to be [1/2, 1/2], i.e., 1/2. Therefore, with above-depicted j fj symmetry, it suffices to consider f 2 [0, 1/2] in order to fully specify the transfer function. While saying that certain frequencies are blocked and others are passed, this holds only approximately true, since the design of such exact frequency filters is not possible, but always a transition between the blocked and passing frequencies. The goal of many linear filters is either to minimize these transitions (i.e., the range of by this affected frequencies), which, on the other hand, inevitably causes ripples in the other frequencies, that is, they are not any longer blocked, or to let them pass completely (see Bianchi and Sorrentino (2007) and the references therein for further details about this topic). As we have seen at above examples, linear filters can be designed either from a time or from a frequency perspective. The time domain usually focuses on putting more weights on the surrounding events (i.e., events that recently before or occurred shortly after) and, thus, gives an (economic) interpretation similar to the, for example, ARMA and GARCH models. On the contrary, the frequency domain is based on the point of view that certain disturbances (almost) exclusively are located in a certain frequency range and are thus isolated from the rest of the signal. Also, the slowly evolving trend can be seen to occupy only the lower frequency ranges. Thus, the (economic) meaning lies here in the frequency of events; see, for example, Dacorogna (2001). Although linear filters can be designed to block or let pass certain frequencies nearly optimal, at the same time this poses a serious problem when facing trends that exhibit jumps or slopes. As these events are also located in the same (or in case of slopes the adjacent) frequency range as the high-frequency noise, this has the effect that jumps and edged frontiers are blurred out, while steep slopes mostly are captured with poor precision only. Hence, from a frequency perspective, a smooth trend and edge preservation are two conflicting goals. This is as the linear filters are not capable to distinguish between the persistent noise and a single events that,
18
Methods of Denoising Financial Data
527
while located in the same frequency range, usually have a higher energy and, thus, significantly larger amplitude. Thus, the same filtering rule is applied throughout the whole signal, without any adaption. Note, however, that linear filters still give some weight to undesirable events like outliers, due to their moving average nature. Thus, while some significant features like jumps are not displayed in enough detail, other unwanted patterns, like spikes, still partially carry over to the filtered output. To overcome all these drawbacks for that kind of trends or signals, besides the class of linear filters, the class of nonlinear filters has been developed.
18.4
Nonlinear Filters
As we have seen, linear filters tend to blur out edges and other details though they form an elementary part of the time series’ trend. In order to avoid this, a wide range of nonlinear filters has been developed which on one hand preserve those details while on the other try to smooth out as much of the noise as possible. We do not find nonlinear filters only in time series, but they were in many cases developed specifically for two-dimensional signals, specifically images, where the original image, probably corrupted by noise during data transmission, consists mainly of edges, which form the image.
18.4.1 General Perception While linear filters generally provide a very smooth trend achieved through averaging, two characteristics pose a problem for this class of filters: • Outliers and spikes Single, extreme outliers and spikes can cause the whole long-term trend to deviate in the same direction, though they obviously do not play a part in it. • Jumps, slopes, and regime changes Whenever there occurs a sudden external event in the underlying main driving force, it causes the trend to jump, that is, contrary to spikes it changes permanently onto another plane. While slopes are not that extreme, they also show a similar behavior as they decay or rise with the trend’s unusual degree. The reasons for the deviation sensitivity to these events are given by one of the most favorable linear filters’ characteristics themselves: It follows directly from them being characterizable in terms of frequency passbands we explained earlier that all frequencies are treated the same (i.e., filtered according to the same rule) throughout the whole signal. This means that no distinction is made (and even cannot be made) between noise and those patterns, as they are located in approximately the same frequency range. Technically, as long as an outlier or a jump is contained in the weighted moving average filtering window, also a weight is assigned to these outlier data points or the points on the other plane, i.e., before
528
T. Meinl and E.W. Sun
and after the jump. Nonlinear filtering procedures try to avoid this by using a different approach, for example, by considering a single value only (instead of multiple weighted ones) that was selected from an ordered or ranked permutation of the original values in the filtering window. Although we cannot characterize nonlinear filters in the same way as we do with linear filters (i.e., by transfer functions) besides their characteristics not being classifiable in that way, according to Peltonen et al. (2001), we can divide the whole class of these filters into several subclasses that share the same or similar approaches. Among them, there are stack filters, weighted median filters, polynomial filters, and order statistic filters. Astola and Kuosmanen (1997) provide two different taxonomies for further classification, though they remark that these divisions are not unique. In their work, they extensively show how those different filters behave (i.e., their characteristics) when applied onto different benchmark signals in respect to the mean absolute error (MAE) and mean squared error (MSE) measures. The behavior of nonlinear filters is generally characterized by their impulse and step response, i.e., the filtered output when the input consists of a single impulse or step only. These impulses generally are given by the sequence [. . . , 0, 0, 0, a, 0, 0, 0, . . .] and [. . . , 0, 0, 0, a, a, a, . . .], respectively, with a 6¼ 0. Though in most cases no analytical result can be given, these characteristics help us to understand how the nonlinear filter behaves in respect to those patterns, for which the linear filters generally fail to deliver adequate results. Despite their undoubtedly good ability to preserve outstanding features in a time series while extracting the trend, nonlinear filters also suffer some drawbacks: 1. Insufficient smoothness Though most nonlinear filters try to deliver a smooth trend and a good resolution of edged frontiers, beyond the jumps’ surrounding regions, they usually fail to deliver a smooth trend as accurate as even simple linear filter (e.g., the mean filter) provides. Yet, by applying further smoothing procedures (e.g., by recursive filters or some kind of linear filtering on the nonlinear output, with a smaller bandwidth, as most of the noise is already smoothed out) comes at the price that the prior preserved details of jumps or slopes tend to get lost. This effect is even aggravated when the high-frequency noise present in the signal is extremely high, i.e., the trend which evolves, besides the jumps, quite slowly is dominated by the noise with extremely high amplitudes. Some filters try to counter this effect, but they either: • Provide only a tradeoff between an overall smooth signal and poor jump resolution, or a trend still exhibiting ripples but preserved edges, or • Rely on further information about the time series itself, that is, the noise component and its structure, the jumps or the trend itself This is due to the problem that most filtering rules are applied throughout the whole signal, that is, they do not adapt themselves sufficiently when the filtering window approaches a jump. An overview of these and other nonlinear filters’ performance is given by Astola and Kuosmanen (1997), proving again the well-known insight that there cannot exist one solution that performs optimal for all cases. Though the authors also report some approaches that try to
18
Methods of Denoising Financial Data
529
incorporate that behavior, these filters provide only dissatisfactory results (see the examples below). 2. Lack of frequency control Another feature nonlinear filters are lacking is the ability to regulate the filtered output in terms of frequency passbands, as linear filters do. Since Pollock (2001) defines the trend in terms of frequency bands and Ramsey (1999, 2002) points out that frequency analysis is an important aspect in financial time series, so is frequency control. Though not necessary for all applications, the ability to a priori control and regulate the filters output (in contrast to an only a posteriori frequency analysis of the filtered result) may come handy when one wants to ensure that certain frequencies are not contained in the output, that is, when certain information about the noise frequencies is at hand, the analyst can decide before the actual filtering process (and thus, without any try and error procedures) what frequency parts should be filtered out. Although a nonlinear filter can also provide the same or a similar result, no theoretical results or statements are available before the filtering procedure has been carried out completely. This incapacity of the nonlinear filter follows directly from the fact that nonlinear filters do not rely on frequency passbands, as they must be able to filter events, even though they occupy nearly the same frequency range (i.e., noise and jumps), due to different rules. As Astola and Kuosmanen (1997) classify linear filters to be a subclass of the class of nonlinear filters, above statement does not exactly hold true, i.e., it would mean that a subclass of nonlinear filters can be characterized by their transfer function and frequency output. We, however, separate the classes of linear and nonlinear filters by whether or not a filter can be described by a transfer function or not, which marks a strict division of these two classes.
18.4.1.1 Examples To illustrate the different courses of action of nonlinear filters and give the reader an idea of their general procedure, in this section we outline several examples of above named subclasses. As this list cannot be exhaustive by any means, of course, we note that we do not take filters into account that already rely on specific assumptions of the systems beneath the time series themselves. Trimmed Mean Filter This filter works essentially as the mean filter, with the difference that the extreme values of the ordered series X(i) are trimmed. Index t is omitted here as the order is no longer in concordance with time. Therefore, an (r, s)-fold trimmed mean filter is given by N s X 1 X ð iÞ : N r s i¼rþ1
(18.12)
A special case is the choice of r ¼ s. A further modification of the trimmed mean filter is not to discard the ordered values beyond X(r) and X(s), but instead replace them by X(r+1) and X(s+1) themselves. This is the Winsorized mean filter:
530
T. Meinl and E.W. Sun
! N s X 1 r Xðrþ1Þ XðiÞ þ s XðNsÞ : N i¼rþ1
(18.13)
In these methods, the (r, s) tuple is dependent of the data itself. Other filters consider to make these values independent from the data or dependent from the central sample itself, i.e., nearest neighbor techniques. All those filters have in common that they discard all samples from the ordered series being too far away (respectively) according to some measure. L-Filters and Weighted Median L-filters (also called order statistics filters) make a compromise between the weighted moving averages of linear filters and the nonlinear ordering operation. The idea is that the filtered output is generated by weighted averages over the ordered samples, that is, N X
wi XðiÞ ,
(18.14)
i¼1
with wi the filter weights. A similar notion is given by weighted median filters, where the weights are assigned to the time ordered sample Xt and where the weights denote a duplication operation, i.e., wi ∘Xt = Xt, 1 , . . . , Xt, wi. The output is then given by median fw1 ∘X1 , . . . , wN ∘XN g:
(18.15)
Ranked and Weighted Order Statistic Filters An rth-ranked-order statistic filter is simply given by taking X(r) as the filter output. Examples are the median, the maximum (r ¼ N), and the minimum (r ¼ 1) operation. This can also be combined with weights as depicted above, that is, rth order statisticfw1 ∘X1 , . . . , wN ∘XN g:
(18.16)
Hybrid Filters Another approach is the design of nonlinear filters consisting of filter cascades, that is, the repeated application of different filters on the respective outputs. A general formulation is, for example, given by rth-order statistic F1(X1, . . . , Xn), . . . , FM(X1, . . . , Xn), where F1, . . . , FM can denote any other filtering procedure. A concrete example is the median hybrid filter that combines prior linear filtering procedure with a succeeding median ordering operation, i.e., ( ) k N X X median ð1=kÞ Xi , Xkþ1 , ð1=kÞ Xi : (18.17) i¼1
i¼kþ2
18
Methods of Denoising Financial Data
531
Selective Filters An interesting approach is given by the principle of switching between different output rules depending on some selection rule, for example, based on the fact that the mean filter delivers a larger (smaller) output than the median filter when the filtering window approaches an upward (downward) jump. Thus, a selection rule would be given by mean X1 , ..., XN median X1 , ..., XN :
(18.18)
A certain drawback of this selection rule is that it is one-sided, that is, it considers only the first half of the region around the jump. This is due to the fact that the mean for the second half, after the jump has occurred, is generally smaller (larger) than the median. Other rules can include thresholds and aim at deciding whether there has actually happened a jump or there was an impulse in the signal, which is not caused by the noise distribution, but due to some other explanatory effect. Above-depicted examples of nonlinear filters should give the reader an overview over the most common methods applied in practice. For detailed information about each filter’s characteristics, advantages, and drawbacks, we refer to Astola and Kuosmanen (1997), where there are also the references to the original works to be found. Yet, we see that basically most of these filters rely on some ordered statistics, with their input or output modified prior or afterwards, respectively. Since this basic principle applies to most filters not directly dependent on some specific characteristic or assuming a certain structure of the original series to be filtered, the different methods pointed out above can be combined in numerous ways. In many cases, however, even though we portrait only the very basic methods, we see that almost all of them already incorporate an implicit or explicit choice of additional parameters besides the filter bandwidth, either by weights, rules, or thresholds. These choices introduce further biases into the filtering process. Though some of these parameters can be chosen to be optimal in some sense (i.e., minimize a certain distance measure, e.g., the MSE for the L-filters), they lack the concrete meaning of weights we have for linear filters.
18.5
Specific Jump Detection Models and Related Methods
In this section we list further related methods that are concerned with the estimation of long-term trends exhibiting edged frontiers, i.e., jumps and/or steep slopes. We first review methods being explicitly developed either only for the detection of jumps in a signal corrupted by noise or approaches that also include capturing (i.e., modeling) those very jumps. We show the advantages and limits of applications of these methods, highlighting in which aspects further research is still necessary. We conclude this chapter by listing some of the most in practice well-known methods.
532
T. Meinl and E.W. Sun
18.5.1 Algorithms for Jump Detection and Modeling The issue of detecting and modeling jumps in time series has been recognized as an essential task in time series analysis and therefore has already been considered extensively in the literature. Though we introduce wavelet methods in the next chapter only, we list the works based on these methods here as well without going into details. We only note that wavelets, based on their characteristics, make excellent tools for jump and spike detection, as it is this what they were developed for in the first place (see Morlet (1983)). Generally, the most appreciated procedures in the recent literature can be seen as two different general approaches. One is via wavelets, and the other uses local (linear) estimators and derivatives. One of the first approaches using wavelets for jump detection in time series, besides the classical wavelet literature, for example, Mallat and Hwang (1992), was given by Wang (1995, 1998). He uses wavelets together with certain datadependent thresholds in order to determine where in the signal jumps have happened and whether they are significantly different from short-varying fluctuations, and provides several benchmark signals. Assumptions about the noise structure were made according to Donoho and Johnstone (1994), that is, the approach is applicable for white (i.e., uncorrelated) Gaussian noise only. This work was extended by Raimondo (1998) to include even more general cusp definitions. More recent contributions extend these works to stationary noise (Yuan and Zhongjie (2000)) and other distributions (Raimondo and Tajvidi (2004)) and also provide theoretical results about asymptotic consistency. A further application specifically on high-frequency data is given by Fan and Wang (2007). The other line is given by Qiu and Yandell (1998) who estimate jumps using local polynomial estimators. This work is continued in Qiu (2003) where jumps are not only detected but also represented in the extracted time series. Gijbels et al. (2007) further refine the results by establishing a compromise between a smooth estimation for the continuous parts of a curve and a good resolution of jumps. Again, this work is limited to pure jumps only and, since it uses local linear estimators as the main method, has no frequency interpretation available. Sun and Qiu (2007) and Joo and Qiu (2009) use derivatives to test the signal for jumps and to represent them. Finally, we note a completely different approach that is presented in Kim and Marron (2006): a purely graphical tool for the recognition of probable jumps (and also areas where almost certainly no jumps occurred). Yet, the authors confess that their work is only to be seen as a complementary approach, and refer to Carlstein et al. (1994) for further thorough investigation. We note that there exists already a good body of work about how to detect (i.e., estimate their location), and even how to model jumps (i.e., estimate their height), though in most works bounded to strict requirements on the noise or the trend model. However, although most models will also automatically include the detection of steep slopes, they fail at modeling the slope itself. While a jump
18
Methods of Denoising Financial Data
533
can easily be presented (either by indicator functions or any other methods used in the cited works above), matters are different with slopes: Since there can be given no general formulation or model of the exact shape of a slope, any parametric approach will fail or deliver only a poor approximation if the model does not fit the occurred slope. Examples of such different kinds of slopes are innumerous: Sine, exponential, and logarithmic decays are only the most basic forms to approximate such events, which in practical examples rarely follow such idealized curves. Naturally, only nonparametric approaches will adapt themselves to the true shape of the underlying trend but generally suffer the same drawbacks as all linear and nonlinear methods pointed out above, i.e., they always have to bargain a tradeoff between bias and signal fidelity.
18.5.2 Further Related Methods We conclude this section by outlining the most popular and established filtering methods.
18.5.2.1 General Least-Squares Approaches The most general approach can be seen by setting up a parameterized model and calibrating the model afterwards, generally using some minimization procedure in respect to some error measure. This can be seen as a straightforward approach, requiring only the initialization of an appropriate model and choice of error measure. An example from the energy market is given by Seifert and Uhrig-Homburg (2007) and Lucia and Schwartz (2002), who set up a trigonometric model in conjunction with indicator functions and minimize the squared error for each time step in order to estimate the deterministic trend as well as values for different seasons and days. Though they find that their model works well in practice, for general cases it might be difficult always finding an appropriate model, especially when there is no information about the trend, its seasonal cycles, and other (deterministic or stochastic) influences. This is especially the case when the data set covers only a short period of time. When analyzing financial time series data, one could suggest the trigonometric model and the order of the sinusoidal functions; in general it may be difficult to set up such a model or reason, why this model and its estimated trend are appropriate for the respective time series. This requires either a rigorous a priori analysis of the series itself or further information about the external factors (i.e., the system the time series is derived from) and their interaction. In addition to this, the estimation can never be better than the model and to which accuracy it approximates the true trend. We note another probable critical issue when using indicator functions in combination with least-squares estimation. First, using indicator functions confines the model to jumps only, that is, slopes or similar phenomena cannot be captured by that approach, as the indicator functions automatically introduce jumps in the trend
534
T. Meinl and E.W. Sun
component. Thus, indicator functions excel at modeling jumps but perform poorly with other types of sudden changes. Second, for this approach, it is extremely important to determine the location of the jump as exact as possible, as otherwise the estimated trend in this area may be highly inaccurate. It is therefore dubious, whether such parametric approaches are appropriate to deal with economic and financial time series, though they will perform very well, if their requirements are met.
18.5.2.2 Smoothing Splines Smoothing splines, though also utilizing the least-squares methods, on the contrary do not rely on a specific model assumption. Instead, they penalize the regression spline in respect to its roughness, i.e.,
min m
N X
ð 2 ðXt mðtÞÞ2 þ o ðm00 ðxÞÞ dx:
(18.19)
t¼1
It follows directly from this definition that for o ¼ 0 this yields us an interpolation while for o ! 1 m will approximate a linear regression. In practice, there emerges another difficulty: In many cases, the (optimal) choice of o remains unclear. Although there exist several works that have established some data-dependent rules for this, in many cases, when the assumptions about the noise do not hold or the time series incorporates additional deterministic (e.g., cycles) or stochastic components (e.g., outliers that are part of the system and not due to measurement or other errors), the choice of the penalizing smoothing parameter is a difficult task that has been and is still undergoing extensive research (see Morton et al. (2009), Lee (2003), Hurvich et al. (1998), Cantoni and Ronchetti (2001), Irizarry (2004)). We only mention particularly the cross-validation method which is used to determine the optimal smoothing parameters (see Green and Silverman (1994)). Furthermore, though o is eventually responsible for the degree of smoothness (i.e., on which scale or level the trend shall be estimated), one can hardly neither impose nor derive any additional meaning on or from this parameter.
18.5.2.3 The Hodrick-Prescott Filter The Hodrick-Prescott (HP) filter was first introduced by Leser (1961) and later became popular due to the advanced works of Hodrick and Prescott (1997). In order to extract the trend t ¼ [t0, . . . , tN + 1] from a given time series X, this trend is derived by solving
min t
N X
ðXt tt Þ2 þ oððttþ1 tt Þ ðtt tt1 ÞÞ2 :
(18.20)
t¼1
We note that (besides from not using a spline basis for approximation) this approach can be seen as a discretized formulation of the smoothing spline.
18
Methods of Denoising Financial Data
535
The smoothing parameter o plays the same role, while the penalized smoothness measure is the discretized version of the second derivative. Thus, though several authors propose explicit rules (of thumb) for choosing o (see, e.g., Ravn and Uhlig (2002), Dermoune et al. (2008), Schlicht (2005)), some researchers like Harvey and Trimbur (2008) also recognize that in some way this choice still remains kind of arbitrary or problematic for many time series which cannot be associated in the same time terms.
18.5.2.4 The Kalman Filter Another sophisticated filter was developed by Kalman (1960). It is a state-space system specifically designed to handle unobserved components models and can be used to either filter past events from noise or forecast. Another good introduction to the Kalman filter can be found in Welch and Bishop (1995) and a thorough discussion in Harvey (1989). The basic Kalman filter assumes that the state x 2 ℝn of underlying process in a time series can be described by a linear stochastic difference equation: xt ¼ Axt1 þ But1 þ wt1 ,
(18.21)
with A the state transition model that relates in conjunction with the (optional) control input B, the respective input ut, and the noise component wt the previous state to the next. In the above equation, A and B are assumed to be constant but may also change over time. Of this model (i.e., the true state xt), only zk ¼ Hxk þ vk
(18.22)
can be observed. Both noise components are assumed to be independent of one another and to be distributed according to w N ð0; QÞ and
v N ð0; RÞ:
(18.23)
With A, B, Q, and R assumed to be known, the filter predicts the next state xt based on xt1 and also provides an estimate of the accuracy of the actual prediction. Since its first development, the Kalman filter has become popular in many areas; see, for example, Grewal et al. (2001). However, a serious drawback of this Kalman procedure is that many real-world models do not fit the assumptions of the model, for example, above requirement of a linear underlying system is often not met. Though there exist extensions for nonlinear systems (see e.g., Julier and Uhlmann (1997)), there still exists the problem that one or more of the required parameters are unknown. While for many technical systems (e.g., car or missile tracking systems) based on physical laws the state transition model A is exactly known, this becomes a difficult issue in many other application areas, including finance. Additionally, as, e.g., Mohr (2005) notes, the performance of the Kalman filter can be very sensitive to initial conditions of the unobserved components and their variances, while at the same time it
536
T. Meinl and E.W. Sun
requires an elaborate procedure of model selection. He also notes that in macroeconomic time series, the Kalman filter does not work with annual data. Therefore, we see that while the Kalman filter unquestionably delivers excellent results in many areas (and has also applied for financial time series as well, though not without critique), its usage is not convenient for general cases we treat in this work, requiring more assumptions and knowledge about the underlying model.
18.6
Summary
In this chapter we provided an overview of the different tasks of time series analysis in general and the specific challenges of time series trend extraction. We pointed out several methods and outlined their advantages and disadvantages, together with their requirements. The reader should keep in mind the following key points: • Linear filters provide a very smooth trend but fail to capture sudden changes. • Nonlinear filters capture jumps extremely well but are very sensitive to higher levels of noise. • Advanced methods may provide good results depending on the scenario but often rely on very specific assumptions that can yield misleading results, in case they are not completely met. • The specific task of jump detection and modeling is well understood but lacks appropriate methods for more general changes like steep slopes. • And the most important: No methods universally perform best for all tasks!
References Astola, J., & Kuosmanen, P. (1997). Fundamentals of nonlinear filtering. Boca Raton: CRC Press. Au, S., Duan, R., Hesar, S., & Jiang, W. (2010). A framework of irregularity enlightenment for data pre-processing in data mining. Annals of Operations Research, 174(1), 47–66. Bianchi, G., & Sorrentino, R. (2007). Electronic filter simulation & design. New York: McGrawHill. ISBN 0071494677. Cantoni, E., & Ronchetti, E. (2001). Resistant selection of the smoothing parameter for smoothing splines. Statistics and Computing, 11(2), 141–146. Carlstein, E., Mueller, H. G., & Siegmund, D. (1994). Change-point problems. Hayward, CA: Institute of Mathematical Statistics. Dacorogna, M. M. (2001). An introduction to high-frequency finance. San Diego, CA: Academic Press. Dermoune, A., Djehiche, B., & Rahmania, N. (2008). Consistent estimators of the smoothing parameter in the hodrick-prescott filter. Journal of the Japan Statistical Society, 38, 225–241. Donoho, D. L., & Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3), 425–455. Fan, J., & Wang, Y. (2007). Multi-scale jump and volatility analysis for high-frequency financial data. Journal of the American Statistical Association, 102(480), 1349–1362. Fan, J., & Yao, Q. (2003). Nonlinear time series: Nonparametric and parametric methods. New York: Springer. Gijbels, I., Lambert, A., & Qiu, P. (2007). Jump-preserving regression and smoothing using local linear fitting: A compromise. Annals of the Institute of Statistical Mathematics, 59(2), 235–272.
18
Methods of Denoising Financial Data
537
Green, P. J., & Silverman, B. W. (1994). Nonparametric regression and generalized linear models: A roughness penalty approach. New York: Chapman & Hall/CRC. ISBN 0412300400. Grewal, M. S., Andrews, A. P., & Ebooks Corporation. (2001). Kalman filtering theory and practice using MATLAB. New York: Wiley Online Library. ISBN 0471392545. Harvey, A. C. (1989). Structural time series models and the Kalman filter forecasting. Cambridge: Cambridge University Press. Harvey, A., & Trimbur, T. (2008). Trend estimation and the Hodrick-Prescott filter. Journal of the Japan Statistical Society, 38(1), 41–49. Hodrick, R. J., & Prescott, E. C. (1997). Postwar US business cycles: An empirical investigation. Journal of Money, Credit and Banking, 29(1), 1. Hurvich, C. M., Simonoff, J. S., & Tsai, C. L. (1998). Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 60(2), 271–293. Irizarry, R. A. (2004). Choosing smoothness parameters for smoothing splines by minimizing an estimate of risk (p. 30). Johns Hopkins University, Department of Biostatistics Working Papers. Joo, J. H., & Qiu, P. (2009). Jump detection in a regression curve and its derivative. Technometrics, 51(3), 289–305. Julier, S. J., & Uhlmann, J. K. (1997). A new extension of the Kalman filter to nonlinear systems. In International symposium on aerospace/defense sensing, simulation and controls (Vol. 3, p. 26). Citeseer. Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1), 35–45. Kim, C. S., & Marron, J. S. (2006). Sizer for jump detection. Journal of Nonparametric Statistics, 18(1), 13–20. Lee, T. (2003). Smoothing parameter selection for smoothing splines: A simulation study. Computational Statistics & Data Analysis, 42(1–2), 139–148. Leser, C. E. V. (1961). A simple method of trend construction. Journal of the Royal Statistical Society: Series B: Methodological, 23(1), 91–107. Lucia, J. J., & Schwartz, E. S. (2002). Electricity prices and power derivatives: Evidence from the nordic power exchange. Review of Derivatives Research, 5(1), 5–50. Mallat, S. G., & Hwang, W. L. (1992). Singularity detection and processing with wavelets. IEEE Transactions on Information Theory, 38(2), 617–643. Meinl, T., & Sun, E. W. (2012). A nonlinear filtering algorithm based on wavelet transforms for high-frequency financial data analysis. Studies in Nonlinear Dynamics and Econometrics, 16(3), 1–24. Mohr, M. F. (2005). A trend-cycle (-season) filter. Working Paper Series European Central Bank, (499). Morlet, J. (1983). Sampling theory and wave propagation. Issues in Acoustic Signal/Image Processing and Recognition, 1, 233–261. Morton, R., Kang, E. L., & Henderson, B. L. (2009). Smoothing splines for trend estimation and prediction in time series. Environmetrics, 20(3), 249–259. Peltonen, S., Gabbouj, M., & Astola, J. (2001). Nonlinear filter design: Methodologies and challenges. In Proceedings of the 2nd International Symposium on Image and Signal Processing and Analysis, 2001. ISPA 2001, (pp. 102–107). Percival, D. B., & Walden, A. T. (2006). Wavelet methods for time series analysis. Cambridge: Cambridge University Press. Pollock, D. S. G. (2001). Methodology for trend estimation. Economic Modelling, 18(1), 75–96. Qiu, P. (2003). A jump-preserving curve fitting procedure based on local piecewise-linear kernel estimation. Journal of Nonparametric Statistics, 15(4), 437–453. Qiu, P., & Yandell, B. (1998). A local polynomial jump-detection algorithm in nonparametric regression. Technometrics, 40(2), 141–152. Rachev, S. T. (2003). Handbook of heavy tailed distributions in finance. North-Holland. Raimondo, M. (1998). Minimax estimation of sharp change points. Annals of Statistics, 26(4), 1379–1397.
538
T. Meinl and E.W. Sun
Raimondo, M., & Tajvidi, N. (2004). A peaks over threshold model for change-point detection by wavelets. Statistica Sinica, 14(2), 395–412. Ramsey, J. B. (1999). The contribution of wavelets to the analysis of economic and financial data. Philosophical Transactions: Mathematical, Physical and Engineering Sciences, 357(1760), 2593–2606. Ramsey, J. B. (2002). Wavelets in economics and finance: Past and future. Studies in Nonlinear Dynamics & Econometrics, 6(3), 1–27. Ravn, M. O., & Uhlig, H. (2002). On adjusting the Hodrick-Prescott filter for the frequency of observations. Review of Economics and Statistics, 84(2), 371–376. Schlicht, E. (2005). Estimating the smoothing parameter in the so-called Hodrick-Prescott filter. Journal of the Japan Statistical Society, 35(1), 99–119. Seifert, J., & Uhrig-Homburg, M. (2007). Modelling jumps in electricity prices: Theory and empirical evidence. Review of Derivatives Research, 10(1), 59–85. Sun, E. W., & Meinl, T. (2012). A new wavelet-based denoising algorithm for high-frequency financial data mining. European Journal of Operational Research, 217(3), 589–599. Sun, J., & Qiu, P. (2007). Jump detection in regression surfaces using both first-order and secondorder derivatives. Journal of Computational and Graphical Statistics, 16(2), 289–311. Sun, W., Rachev, S. T., Fabozzi, F., & Stoyan S. (2008). Multivariate skewed student’s t copula in analysis of nonlinear and asymmetric dependence in German equity market. Studies in Nonlinear Dynamics and Econometrics, 12(2). Tsay, R. S. (2005). Analysis of financial time series. Hoboken: Wiley. Wang, Y. (1995). Jump and sharp cusp detection by wavelets. Biometrika, 82(2), 385–397. Wang, Y. (1998). Change curve estimation via wavelets. Journal of the American Statistical Association, 93(441), 163. Welch, G., & Bishop, G. (1995). An introduction to the Kalman filter. Chapel Hill: University of North Carolina at Chapel Hill. Yuan, L., & Zhongjie, X. (2000). The wavelet detection of the jump and cusp points of a regression function. Acta Mathematicae Applicatae Sinica (English Series), 16(3), 283–291.