On the Benefits of Normalization1 - Springer Link

1 downloads 0 Views 160KB Size Report
eruptions depends on the ecliptic latitude of the Moon. He made tables and plots to .... of both external (Moon and Sun) and internal structures of the Earth on the ...
ISSN 0742-0463, Journal of Volcanology and Seismology, 2009, Vol. 3, No. 6, pp. 432–434. © Pleiades Publishing, Ltd., 2009. Original Russian Text © E.A. Boyarskii, and A.V. Deshcherevskii, 2009, published in Vulkanologiya i Seismologiya, 2009, No. 6, pp. 74–77.

On the Benefits of Normalization1 E. A. Boyarskii and A. V. Deshcherevskii Institute of Physics of the Earth, Russian Academy of Sciences, Moscow, 123995 Russia Received September 29, 2008

Abstract—We discuss the methodology to use in searching for statistical relationships between catastrophic events such as earthquakes and volcanic eruptions on the one hand and astronomical or geographic data on the other. It is pointed out that data should necessarily be normalized before doing statistical probabilistic analyses. Examples of studies are provided whose authors arrive at certainly false inferences owing to the absence of a correct normalization. DOI: 10.1134/S0742046309060062

Unfortunately, not all studies hold to a sufficiently rigorous level of methodology. Here we would like to discuss just a single mistake that constantly recurs, namely, the absence of proper normalization of the data before examining how their distribution depends on some factors. Examples are the works of N.P. Bulatova [1] and V.M. Fedorov [2, 3]. Both these references examine nonnormalized data, thereby distorting the meaning of their work and jeopardizing the validity of the results. Fedorov [3] studied how the frequency of volcanic eruptions depends on the ecliptic latitude of the Moon. He made tables and plots to conclude that eruptions occur more frequently when the Moon’s ecliptic latitude ϕ is close to its extreme values; this is claimed to point to a cause-and-effect relationship. We know that the Moon’s ecliptic latitude does not vary linearly, but follows a sinusoidal law with a slight amplitude modulation due to disturbances of the Moon’s orbit (Fig. 1). The Moon comparatively quickly passes the region near the zero ecliptic latitude and, in contrast, is “delayed” near the extreme values of ϕ. This is clearly seen in the histogram (Fig. 2) which is based on the actual values of ϕ observed in 2001–2008. For this reason, the probability of ϕ falling in an interval is not the same, but is described to a first approximation by the well-known cosine distribution. Generally speaking, any event that is more or less uniformly distributed over time will occur much more frequently at the extreme values of any variable that varies according to the harmonic law.

In order to avoid making erroneous inferences one should convert the number of eruptions occurring at definite values of ϕ to the number of eruptions per unit time. To do this it is sufficient to divide the number of eruptions in each range of ϕ by the time during which the value of ϕ remains within that range. This normalization should have also been made in [3], since the times the Moon remains in the ranges chosen by the author differ by factors of more than three. During a single draconic month the values of ϕ remain in the range [4.56, 5.32] for approximately 2.5 days, with the figure being only 0.7 days for a similar range, [0.00, 0.76]. 6 5 4 3 Lunar ecliptic latitude, degrees

One of the “eternal” lines of research in geophysics is the search for statistical relationships between different phenomena. This is especially the case for earthquakes and volcanic eruptions, where it is of importance to detect a statistically significant, even though weak, dependence of these events on known or measurable quantities. The pioneer study is due to Perrey [4] who noticed a relationship between the frequency of earthquakes and lunar phases as far back as 150 years ago [4].

432

2 1 0 –1 –2 –3 –4 –5 0 January February March April 2007

May

June

Fig. 1. Variation of lunar ecliptic latitude. Open circles mark the Moon’s positions at intervals of 24 hours. Stippled bands show the intervals from Table 1.

Instead of normalizing the number of eruptions the author could well make use of a theoretical distribution of eruptions that incorporates the time the Moon remains in each of the ranges. This would reduce the probability statistical analysis to the testing of the null hypothesis asserting that there is no relationship between latitude ϕ and the occurrence of eruptions. If the probability of actually observed data under the null hypothesis is below some confidence probability (5 or 1%, say), the null hypothesis is rejected to infer that there is a relationship between the phenomena under study. The long-established procedure for testing the goodness of fit between a theoretical and an actual distribution is to use the chi-square statistic. The first two columns in Table 1 are from [3]. According to V.M. Fedorov, the null hypothesis implies equal numbers of eruptions in equal ranges of ϕ, i.e., implies the uniform distribution for these (column 3). Now, this is exactly the mistake in this treatment. Unless the mistake is detected, the null hypothesis is certain to be rejected, as the probability of the actually observed distribution is very small compared to 0.01%. If, however, we observe that the Moon remains for different times in equal ranges, then the null hypothesis implies the numbers of eruptions in column 4. In that case a probability of about 20% corresponds to the value χ2 = 8.5, which means that the agreement between observed data and the null hypothesis could be still better in only 20% of all cases. The available evidence thus by no means lends itself to the inference that there is a statistical relationship between the lunar ecliptic latitude and the probability of eruption. The above is excellently illustrated by Fig. 3 where the actual data are in perfect agreement with the theoretical distribution following from the null hypothesis. The real situation in [3] is also complicated by the fact that the author’s phrase “daily probability” does not conform to the accepted meaning of the term “probability” and distorts the sense borne by the resulting calculated quantity. The label “days” on the axis of abscissas in Fig. 1 [3] is misleading as well, considering that the index along that axis is merely the number of the interval concerned, not the number of days. A similar normalization is also missing in Fedorov’s treatment of the other three astronomic quantities (the Earth–Sun distance, the difference between the geocentric longitudes of the Sun and Venus, and the Sun and Mars). All these quantities vary approximately according to the harmonic law, hence the null hypothesis will yield different variants of the cosine distribution in all four plots [3, Fig. 1]. Consequently, all subsequent inferences drawn by Fedorov are physically meaningless, based as these are on an incorrect interpretation of this figure. The mistake discussed above has been repeated several times in Fedorov’s monograph as well [2]. Both these studies by Fedorov mislead an inexperienced reader by drawing his/her attention to spurious effects and thereby distracting it from really interesting issues in the possible influence of tidal waves on volcanic activity. Such an analJOURNAL OF VOLCANOLOGY AND SEISMOLOGY

433

Relative duration of Moon remaining at latitude ϕ

ON THE BENEFITS OF NORMALIZATION

6

5 4 3 2 1 0 –1 –2 –3 –4 –5 –6 Lunar ecliptic latitude ϕ , degrees

Fig. 2. Histogram of the Moon’s positions at different ecliptic latitudes for 8 years. Stippled columns show the same intervals as in Fig. 1.

ysis can quite legitimately be carried out based on Fedorov’s data. A similar mistake is made by N.P. Bulatova [1] who examined the earthquake rates at different latitudes on the Earth. Having divided the Earth’s surface into zones of 10° latitude, she “forgot” to divide the rates by the areas of the respective zones, although the area of the ten-degree zone near the pole is over 11 times as small as that of the equatorial zone (0°–10°). The author then proceeds to refine her earthquake distributions by taking latitude zones 1° wide. Here again, the area of the polar zone is smaller than that of the equatorial by a factor of 114 this time! Not Table 1. Observed and expected numbers of eruptions Lunar ecliptic Number latitude ϕ from of eruptions ... to ... 1

2

–5.32… –4.56 –4.56… –3.80 –3.80… –3.04 –3.04… –2.28 –2.28…–1.52 –1.52…–0.76 –0.76…0.00 0.00…0.76 0.76 …1.52 1.52…2.28 2.28…3.04 3.04…3.80 3.80…4.56 4.56…5.32 Total χ2 test

Vol. 3

No. 6

2009

170 80 61 62 50 51 55 58 47 60 60 62 76 138 1030

Expected number of eruptions (the null hypothesis) uniform distribution

true distribution

3

4

everywhere 73.5714

1030.00 226.3

170.48 92.40 69.39 59.88 54.13 51.19 49.25 48.73 48.96 50.37 53.60 60.29 77.84 143.48 1030.00 8.5

BOYARSKII, DESHCHEREVSKII

Number of eruptions in ranges

434

Actual number of eruptions Uniform distribution Theoretical distribution

180 160 140 120 100 80 60 40 20

0 –6 –5 –4 –3 –2 –1 0 1 Lunar ecliptic latitude

2

3

4

5

6

Fig. 3. Number of volcanic eruptions for different values of the lunar ecliptic latitude: (1) actual number of eruptions, (2) uniform distribution, (3) theoretical distribution.

being in the least disturbed by the circumstance, the author calculates the statistical confidence and studies the effects of both external (Moon and Sun) and internal structures of the Earth on the seismicity. We have modified the normalized values (Table 2) in due proportion in order to preserve their sum (15917) and thus to be entitled to use the χ2 test. Naturally enough, there has been some rearrangement of the data in favor of Table 2. Numbers of earthquakes in different latitude zones based on the catalog [5] Zone, deg –90…–80 –80…–70 –70…–60 –60…–50 –50…–40 –40…–30 –30…–20 –20…–10 –10…0 0…10 10…20 20…30 30…40 40…50 50…60 60…70 70…80 80…90 Total

Number of Relative area earthquakes of zone, % 0 0 0 4 25 908 518 584 735 317 1426 954 4971 4173 1293 6 3 0 15917

0.7596 2.2558 3.6834 4.9990 6.1628 7.1394 7.8990 8.4186 8.6824 8.6824 8.4186 7.8990 7.1394 6.1628 4.9990 3.6834 2.2558 0.7596 1.0000

Normalized number of earthquakes 0.00 0.00 0.00 5.51 27.91 875.09 451.21 477.31 582.47 251.21 1165.48 831.00 4790.80 4659.00 1779.66 11.21 9.15 0.00 15.917

the higher latitudes. The author was lucky in that the seismicity of the polar latitudes is so low as not to lead her to preposterous inferences in virtue of nonnormalized earthquake histograms. All the same, this makes one skeptical about the correctness of the calculations and the validity of the inferences, whatever these are. Sadly enough, publications such as the above discredit the very idea of searching for statistical relationships between extraterrestrial and terrestrial processes; now the detection of such a relationship will often serve as the basis for a physical model. Our planet is constantly under the powerful influence on the part of the Moon, Sun and other celestial bodies through gravitational and electromagnetic forces. There is ample evidence that this influence does manifest itself in very diverse natural processes. However, in order to draw reliable, statistically significant inferences one should handle experimental data and time series with utmost care, since this field contains many pitfalls that are not readily recognizable. REFERENCES 1. Bulatova, N.P., The Latitude Distribution of Terrestrial Seismicity in Relation to the Positions of the Sun and Moon, Vulkanol. Seismol., 2005, no. 2, pp. 57–78. 2. Fedorov, V.M., Gravitatsionnye factory i astronomicheskaya khronologiya geosfernykh protsessov (Gravitational Factors and the Astronomical Chronology of Geosphere Processes), Moscow: MGU, 2000. 3. Fedorov, V.M., The Chronologic Structure and the Probability of Volcanic Activity in Relation to the Tidal Strains in the Lithosphere, Vulkanol. Seismol., 2005, no. 1, pp. 44–50. 4. Perrey, A., Mémoire sur les rapports qui peuvent exister entre la fréquence des tremblements de Terre et l’age de la Lune, Compt. Rend. Acad. Sci., 1853, vol. XXXVI, no. 12, pp. 534–540. 5. U.S. Geological Survey National Earthquake Information Center http://neic.usgs.gov.

JOURNAL OF VOLCANOLOGY AND SEISMOLOGY

Vol. 3

No. 6

2009