A time-frequency search for stock market anomalies 1 ... - CiteSeerX

A time-frequency search for stock market anomalies Sudeshna Adak Department of Biostatistics Harvard School of Public Health 677 Huntington Avenue Boston, MA 02115

Abhinanda Sarkar Department of Mathematics M.I.T. 77 Massachusetts Avenue Cambridge, MA 02139

We carry out a time-frequency analysis of daily values of the Dow Jones Industrial Average and the S&P 500 Index of stock prices based on 62 years of data. The algorithm used is based on pruning dyadic trees to obtain optimal stationary segmentations of non-stationary time series. The resulting time-frequency representations yield insights into the frequency-domain evolution of the US stock market. Details concerning computational speed and recombination are brie y addressed.

1 Introduction Stock market returns are supposed to adhere to the so-called \ecient market hypothesis". Loosely speaking, this implies that if there is any systematic trend or pattern in the returns, someone will take advantage of it to come out richer and the market will adjust to remove the pattern. How true is this is practice? This has been an area of debate for many years. For a slightly dated review of the issues and ndings, see the papers in Dimson (1988). Regularities in stock prices data are often termed anomalies as one expects pure noise to be the norm. Daily data has been found to demonstrate a weekend eect (see, for example, French (1980)) as well as lower frequency annual eects (as studied in Keim (1983)). More speci cally, one can ask questions like \Is the dierence between Monday and Friday prices following a dierent pattern than the dierence between any two consecutive weekdays?". In order to answer such questions, frequency domain characteristics are very useful. We suggest a proposal to hunt for periodic patterns by looking at spectra. Spectral analysis demands stationarity. All segments of the data should be realizations of the same random process. Over a sixty year period, this is clearly not true. For example, the composition of the S&P 500 can change many times in that span of time. Thus we need to be able to isolate periods of approximate stationarity and do spectral analysis within these segments. We can then nd time-varying periodicities. In addition, we can also automatically determine when the stock market changed its frequency-domain characteristics. In this paper, we use an extension of an algorithm developed by one of us in Adak (1996) to carry out a time-frequency analysis of the Dow-Jones Industrial Average (DJIA) and 1

the S&P 500 Index daily closing values to study some of the anomalies present since the great depression of the late 1920s. We thank Eduardo Ley and the StatLib database for making the data publicly available. We look at two time series each of length 16,385. The rst are the daily closing values (for working days) of the DJIA and the second are the corresponding gures for the S&P 500, both from the period October 14, 1931 to June 11, 1993. If Pt is the price at time t, the series we look at is Xt = log(Pt =Pt?1 ). For completeness, here are three reasons for doing so. (1) Many theoretical models assume that prices follow geometric Brownian motion. This implies that Xt is stationary. (2) Xt looks more stationary than Pt for the data at hand. The latter shows an exponentially increasing trend. (3) If Pt ? Pt?1 is small relative to Pt , a one-term Taylor series expansion shows that Xt is approximately (Pt ? Pt?1 )=Pt . This is the return at time t, a quantity often more important than the price itself. However, we observe that Xt is not quite stationary, the variance being non-constant. This eect has been tested thoroughly using volatility tests for the ecient markets hypothesis. We are interested more in changes in cyclical behavior than in changes in magnitude and our algorithm is designed to adjust for this. It should also be noted that we have not used infomation not available in the time series itself. Investors obviously look at a multitude of trends and thus our analysis is unlikely to be very helpful in predicting the behavior of the entire market. On rare occasions, markets systematically misbehave, as in the crash of October 1987. Data from these periods do not conform to familiar trends. As a bonus, our algorithm can isolate such periods if they unduly in uence the time-frequency representation. This serves both as an outlier-detection mechanism as well as a safety against aberrant observations corrupting entire spectral analyses.

2 Time-frequency trees Time-frequency trees are introduced here as a method of determining time-dependent frequency artifacts in time series data, such as changing periodicities and varying patterns of autocorrelation. Tree algorithms have been used extensively in the classi cation and regression setting (Brieman et al. (1984)), popularly known as CART. In general, tree algorithms consist primarily of the following steps:

Growing the tree| An initial binary decision tree, T0, is constructed according to

a splitting rule. That is, the data is divided into two segments based on optimizing a splitting criterion. Then, each part is further divided into two, and so on.

Pruning the tree| The initial tree T0 is pruned by recombining some of the branches of the tree according to a recombination rule. Starting at the bottom of the tree, the two branches of every node are compared to determine their potential for recombination. 2

Optimal Tree Selection| A nested sequence of pruned subtrees is obtained and the

\best pruned subtree" is selected on the basis of evaluating the quality of the resultant data segmentation.

2.1 Adaptive Segmentation via time-frequency trees The segmentation algorithm described here uses a tree-based method in order to detect changes in the spectral characteristics over time and to partition the data accordingly into segments that are stationary or at least approximately so. It is dierent from usual tree-based algorithms in its execution, as explained below, in order to suit our objective of nding stationary segments.

Initially, a complete dyadic tree, T0, is grown by recursively halving each data segment

to a maximum depth of D in the tree, as shown in Figure 1. So, the splitting rule is the trivial one of always dividing the data segment into two halves. This recursive splitting of the data is equivalent to a multilevel windowing operation. The window width is the same as the length of the data at the top level and is halved at every successive level. This multilevel windowing operation can be related to multigrid methods as described in Section 3.2 of McCormick (1992). 1

Depth of Tree

0

−1

−2

−3

−4

−5 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time

Successive Partitioning of a Dyadic Tree

The spectrum in each block is estimated. Then an optimal pruning algorithm is used to recombine adjacent segments for which the spectra are the same.

3

This rule of spliiting the data into halves recursively produces windowed spectra and the window-widths vary with depth in the tree. For example, at depth d in the tree, 2d ? 1 equisized blocks are made and the spectrum is estimated in each block as shown in the diagram below. Squared Amplitude `

`

x(t)

`

At depth d in the tree

- FFT - FFT - FFT - FFT - FFT - FFT

Block 0 Block 1 `

`

`

`

` `

`

Block 2d ? 1

? - Spectrum f^ - Spectrum f^ - Spectrum f^ - Spectrum f^ - Spectrum f^ - Spectrum f^

The algorithm described in this section is used in the segmentation of locally stationary time series of dyadic length, i.e. N = 2J , for some J . It is dierent from the tree-growing algorithms of CART, which nd the best possible subdivision of each data segment by a search procedure that evaluates a splitting criterion at all points in the data segment. If such a search procedure were to be used here, it would necessarily involve estimating the spectra for all possible subdivisions of a data segment. We have, here, eliminated this very time-consuming search by choosing to always split the data into half. This makes the algorithm computationally very ecient for two reasons. (1) At a depth d (d = 0; 1; :::; D ? 1) in the tree, spectra have to be estimated only at 2d segments, rather than 2N -many spectra that would be needed at every depth of the tree to execute a search for the best subdivisions. (2) Each segment in the tree is of dyadic length, which makes the estimation of spectra using the fast fourier transform(fft) work much faster. In fact, the dyadic nature of our algorithm provides us with the a procedure that requires to the order of N:(log2 N )2 operations. (Recall that each fft requires order of N: log2 N operations.)

4

Step 1:

Fast Segmentation algorithm for series of length N = 2J Set D = blog2 N c ? 2

[ # Maximum default depth to which tree is grown

|

Step 2: Step 3:

Set For

m = N=2D+1 d

Can be reset to a given depth.

[ # Maximum overlap of segments ]

= 0 :

D,

b; d): b = 0; ::; 2d ? 1, each block overlapping the next one over 2m ? 1 data points. block(b; d) corresponds to a node --- node(b; d) of the tree 4: For b = 0 : (2d ? 1) , Compute an estimate, f^b;d (), of the spectrum in block(b; d) Divide the data into

Step

]

2d

blocks , block(

end for end for

The optimal pruning algorithm is used to nd the \most stationary segmentation". This is the segmentation that minimizes X distance(f^left ; f^right ) (1) segments2T0

where f^left; f^right are the estimated spectra from left and right half of the segment respectively and distance is a measure of discrepancy between the two spectra. This measure of nonstationarity as de ned by Equation 1 quanti es the discrepancy between the segmentation of the tree T0 and a stationary segmentation, in which all intervals in the partition are stationary or at least approximately so. To avoid over segmentation, one may opt to penalize on the number of segments, and choose instead the criterion of minimizing X distance(f^left ; f^right ) + (Number of segments) (2) segments2T0

where is the penalization imposed on the number of segments. The following facts about trees were proved in Brieman et al. (1984) Given a penalty parameter , there is a unique optimally pruned subtree T () that minimizes Equation 2.

As the penalty parameter increases, the size of the tree decreases. In fact, if < 0 , then T (0 ) is a subtree of T ().

Since there can be only a nite number of nested subtrees of the complete binary tree To , it follows that there are only a nite set of penalty values which yield distinct subtrees.

T (?1) = To; T (1) = The entire time series. 5

It follows that starting with a complete binary tree, we can obtain a nite number of nested subtrees, by increasing the size of the penalty parameter . The following algorithm describes how this nested sequence of subtrees is obtained and also how the corresponding penalization parameters are determined. Step 0:

Algorithm: Optimal Pruning with penalties Initialize = ?1

[# Can be initialized to an arbitrarily small value] Step 1: For

Set

at the current value.

d = (D ? 1) : ?1 : 0, For b = 0 : 2d - 1,

Rb;d = distance(f^2b;d+1 ; f^2b+1;d+1 ) Set Value(b; d) = Rb;d + Step 3: If d == D ? 1, Mark the block(b; d) as terminal If d < D ? 1, If Value(b; d) Value(2b; d + 1) + Value(2b + 1; d + 1) Mark block(b; d) as terminal Set Rsumb;d = Rb;d Set Nb;d = 1 Set g(b; d) = 1 Otherwise, leave block(b; d) unmarked and Set Value(b; d) = Value(2b; d + 1) + Value(2b + 1; d + 1) Set Rsumb;d = Rsum2b;d+1 + Rsum2b+1;d+1 Set Nb;d = N2b;d+1 + N2b+1;d+1 Set g(b; d) = (Rb;d ? Rsumb;d ) = (Nb;d ? 1) Step 2:

Compute

end for end for

Final Segmentation, for penalty, = Set of highest marked blocks = fblock(b; d):block(b; d) is marked and its ancestors are unmarkedg [This is the optimal pruned subtree

Value of the best tree,

R(T ())

T ()]

= Value(0,0)

= minimized value of Equation 2 Step 4:

If block(0,0) is marked, STOP

else Set

new

f

b; d) 2

= min g(b,d):block(

and return to Step 1.

g

Final Segmentation

Any appropriate distance measure between spectra such as Kolmogorov-Smirnov distance

or Cramer Von-Mises distance can be used. In the examples in this paper, we have used the Kolmogorov-Smirnov distance measure, where the estimated spectra have been normalized to behave as densities and the data in each segment was demeaned prior to spectrum estimation. So, distance between spectra will refer to the Kolmogorov-Smirnov 6

distance measure

^ ^right () distance F^left ; F^right = max Fleft () ? F

(3)

where F^left ; F^right refer to the (normalized) spectral distribution functions of the left and right segments being compared. are provided in Adak (1996) on the actual method of Finally, the \best pruned subtree" is selected from the nested sequence of pruned subtrees by choosing the best by cross-validation, which is explained at length in Adak (1996). More details are provided there on the actual method of spectral estimation that is used in this algorithm.

2.2 Recombination to a non-dyadic split It is clear that the algorithm described above can only make dyadic segmentations of the data as the splits in the data can occur only at dyadic points such as N=2; 3N=4 etc. As described, it is an inherent drawback of the procedure that even if frequency characteristics of the time series change at a non-dyadic point, the algorithm will try to estimate the dyadic segmentation that best approximates the true partitioning of the data. For instance, if the time series had a single change point at N=3, then the best dyadic approximation to 1/3 that we can hope to achieve in our segmentation algorithm is equivalent to the binary expansion of 1/3 to (D-1) terms i.e. 1 = :010101010101:::::: to (D ? 1) terms (4) 3 This can be translated into the best dyadic subtree that one should ideally obtain by the segmentation algorithm as alternately dividing the left followed by the right branch in the dyadic tree. In order to compensate for this feature, we used a recombination algorithm to determine the validity of the splits generated by the time-frequency tree. It is hoped that this will improve the quality of the nal segmentation of the data in terms of characterizing the true time-frequency representation. The recombination algorithm used for this purpose is outlined below: (a) Segments generated by the best-pruned time-frequency tree were compared and evaluated for potential recombination. (b) Two segments were compared at a time. (c) If the segments being compared were of dierent lengths, only that part of the longer segment which was adjacent and equal in length to the shorter segment was used. (d) The distance between the spectra of the two segments being compared was computed. If this distance was less than the average distance between segments of the same length in the nal time-frequency representation, the two segments being compared were recombined. 7

3 Analysis of the log-dierenced DJIA and S&P 500 data The log-dierence of the daily records from the DJIA and the S&P 500 index were analyzed using time-frequency trees. We used the Kolmogorov-Smirnov distance measure to compare the spectra of adjacent segments in the tree. It is important to normalize the spectral estimates to a density. Initially, the algorithm was used with the Kolmogorov-Smirnov distance between non-normalized spectral distribution function estimates. However, it was found that almost the same segmentation as produced by using a distance measure that calculates the change in the integrated spectrum (which can be regarded as a measure of the change in the variance) i.e. Z 1=2 Z 1=2 ^ ^ ^ ^ (5) fleft() d ? fright() d distance fleft ; fright = 0

0

For example, the segmentation algorithm (the maximum depth of tree was set at 10) when applied to the DJIA resulted in 75 segments in using the non-normalized Kolmogorov-Smirnov metric and 121 segments in using the variance change metric of Equation 5. When applied to the S&P 500 data, the segmentation algorithm (the maximum depth of tree was set at 10) resulted in 76 segments while using the non-normalized Kolmogorov-Smirnov metric and 101 segments in using the variance change metric of Equation 5. However, a comparison of the segmentation points in both data sets, as shown in Figure 2, show that qualitatively speaking, the nature of the segmentations are the same. In the case of the DJIA, the largest segment was of length 2048 and the smallest of length 32, irrespective of the metric used. For the S&P 500 data, the largest segment was of length 8192 and the smallest segment was of length 32, irrespective of the metric used. Segmentation using KS(non−norm.): DJIA 0.4

Segmentation using the variance Test: DJIA 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

−0.1

−0.1

−0.2

−0.2

−0.3

−0.3

−0.4

5000

10000

−0.4

15000

Segmentation using KS(non−norm.): S&P 500 0.1

10000

15000

Segmentation using the variance Test: S&P 500 0.1

0.08

0.08

0.06

0.06

0.04

0.04

0.02

0.02

0

0

−0.02

−0.02

−0.04

−0.04

−0.06

−0.06

−0.08 −0.1

5000

−0.08 5000

10000

−0.1

15000

5000

10000

15000

Figure 2: Comparison of the non-normalized Kolmogorov-Smirnov and the Variance change metrics 8

So, it seems that the Kolmogorov-Smirnov distance between non-normalized spectra is capturing essentially the change in the variance of the data. As explained in the introduction, we wish to focus on how the periodicities and other frequency-domain characteristics in stock prices change over time. To measure changes in spectral phenomena which are beyond just changes in variance, we used the Kolmogorov-Smirnov distance between normalized spectral distribution function estimates, as de ned in Equation 3. A summary of the results for the two data sets is presented in Table 1 below. Results of Segmentation of log-dierenced DJIA and S&P 500 DJIA Depth of complete tree, D 8 Number of segments from the best-pruned 21 Time-Frequency Tree Number of segments 9 (after recombination) Segmentation time points 33/01/25, 44/03/08, (after recombination) 45/01/16, 52/09/30, 62/12/04, 64/12/16, 71/03/02, 85/05/07

S&P 500 8 14 11 45/06/20, 52/09/30, 62/12/04, 64/12/16, 66/12/28, 69/02/18 77/03/30, 87/05/18, 88/05/20, 88/11/21

From the algorithm outlined in the paper, the salient features of the resultant segmentation can be described as follows: (a) Each segment is stationary within itself. That is, spectra of further subdivisions of this segment are quite similar to each other in the Kolmogorov-Smirnov metric. (b) Adjacent segments are highly dissimilar in their spectral properties. That is, spectra of adjacent segments are markedly dierent from each other in the Kolmogorov-Smirnov metric. The stationary segments of the DJIA data and the produced by our algorithm and the spectra of the resulting stationary segments are shown in Figure 3 and Figure 4 respectively.

9

0.3

0.2

0.1

0

−0.1

−0.2

−0.3

2000

4000

6000

8000

10000

12000

14000

16000

Figure 3: Segmentation of the log-dierenced DJIA DJIA Spectrum of Segment #1

Spectrum of Segment #2

−25

−30


−35

−20

−35.5

−22

Spectrum in dB

Spectrum in dB

Spectrum in dB

−20

−36 −36.5 −37

−38 0

0.2 0.4 Frequency

−33

−33

−35 −36 −37


−34 −35 −36

−27.5

−28

−37

−38 0

−38 0

0.2 0.4 Frequency


−28.5 0

0.2 0.4 Frequency


−30

−34

−33 −34

−35

Spectrum in dB

Spectrum in dB

−32.5

−32

−36

−37

−35 −36 0

0.2 0.4 Frequency

Spectrum of Segment #9 −32

−31

Spectrum in dB

0.2 0.4 Frequency

−27

Spectrum in dB

−32

−34

−30 0

0.2 0.4 Frequency


−32

Spectrum in dB

Spectrum in dB


−26 −28

−37.5 −35 0

−24

−33 −33.5 −34 −34.5

0.2 0.4 Frequency

−38 0

0.2 0.4 Frequency

−35 0

0.2 0.4 Frequency

Smoothed spectra of segments of the log-dierenced DJIA The stationary segments of the S&P 500 data and the produced by our algorithm and the spectra of the resulting stationary segments are shown in Figure 5 and Figure 6 respectively. 10

0.08

0.06

0.04

0.02

0

−0.02

−0.04

−0.06

−0.08

−0.1

2000

4000

6000

8000

10000

12000

14000

16000

Figure 5: Segmentation of the log-dierenced S&P 500 data S&P 500 index Spectrum of Segment #1



−37 −37.5 −38 0

Spectrum in dB

−32

Spectrum in dB

Spectrum in dB

−36.5

−33 −34 −35 −36 0

0.2 0.4 Frequency Spectrum of Segment #4

−34 −35 −36 −37 −38 0



−27.5 −28 −28.5 0


Spectrum in dB

Spectrum in dB

Spectrum in dB

−27 −26 −27 −28 −29 −30 0

−28 −29 0.2 0.4 Frequency Spectrum of Segment #9

−32 −34 −36 −38 0

Spectrum in dB

0.2 0.4 Frequency Spectrum of Segment #10 −20 −21 −22 −23 −24 0

0.2 0.4 Frequency

Spectrum in dB

−23

Spectrum in dB

Spectrum in dB

−27

−30 0


−30

Spectrum in dB

−26

−34 −34.5 −35 −35.5 −36 0

0.2 0.4 Frequency Spectrum of Segment #11 −30

−24 −25 −26 0

0.2 0.4 Frequency

−30.5 −31 −31.5 −32 0

0.2 0.4 Frequency

Figure 6: Smoothed spectra of segments of the log-dierenced S&P 500 data As is well-known, the DJIA and the S&P 500 index measure dierent aspects of corporate behavior. The former is based on thirty blue-chip companies, while the latter is presumably more re ective of the general state of aairs. Given this, one should not expect similar time11

frequency trees. In fact, the two segmentations highlight distinct features. Relative changes in the S&P 500 were more pronounced before World War II. As explained earlier in this section, this dierence in variability will not be picked up by normalized spectra. The fact that this region is not heavily segmented is a re ection of the fact that, while the index showed variable volatility, the cyclical patterns stayed relatively constant. The stock market crash of October 1987 caused major uctuations in all stock indices. Three segmentations within an 18 month period were required to account for the crash and subsequent recovery. This was a period of anomalous activity, and the algorithm detected the resultant non-stationarity. This form of non-stationarity needs to be distinguished from that due to changes in nature of periodic behavior. Outliers can change spectra by their presence. The resulting corrupted spectrum (as in segment 9 of Figure 6) is not a good tool for identifying dominant periodicities. The S&P 500 was subjected to a change in de nition in 1957. This, however, does not seem to have a aected frequency characteristics. The DJIA is more prone to outliers, presumably because it is more susceptible to changes in the constituent companies. Thus the algorithm did not treat the period of the October 1987 crash as an aberration. On the other hand, it did isolate a 10 month period in 1944-45. The spectrum for this third segment in Figure 4 shows small high frequency components. This indicates that low frequency trading was more the norm, pointing to a sluggish market. This is in marked contrast to the nature of periodic activity before 1944. Similar dierences can be read o from the other spectra. Is there evidence for a weekend eect? The data is daily, based on a ve-day working week. Thus a weekend eect will manifest itself with spectral peaks in the neighborhood of 0.2. Segment 9 for the DJIA (1985-93) and segment 2 for the S&P 500 (1945-52) show such a peak. However, there are many long segments where a weekend eect is not indicated. The reader can look for his or her favorite characteristics and observe how they change over time.

References Adak, S. (1996, September). Time-Dependent Spectral Analysis of Nonstationary Time Series. Ph. D. thesis, Stanford University. Brieman, L., J. H. Friedman, R. Olshen, and C. Stone (1984). Classi cation and Regression Trees. Wadsworth, CA. Dimson, E. (Ed.) (1988). Stock market anomalies. Cambridge University Press. French, K. (1980). Stock returns and the weekend eect. J. Financial Econ. 8, 55{69. Granger, C. and O. Morgenstern (1970). Predictability of Stock Market Prices. Heath Lexington Books. Keim, D. (1983). Size-related anomalies and stock return seasonality: Further empirical evidence. J. Financial Econ. 12, 13{32. 12

McCormick, S. (1992). Multilevel Projection Methods For Partial Dierential Equations. SIAM.

13