Click Here
SPACE WEATHER, VOL. 8, S06D09, doi:10.1029/2009SW000532, 2010
for
Full Article
Survey and prediction of the ionospheric scintillation using data mining techniques L. F. C. Rezende,1 E. R. de Paula,1 S. Stephany,1 I. J. Kantor,1 M. T. A. H. Muella,1 P. M. de Siqueira,1 and K. S. Correa1 Received 19 October 2009; revised 11 December 2009; accepted 9 January 2010; published 5 June 2010.
[1] Irregularly structured ionospheric regions may cause amplitude and phase fluctuations of radio signals. Such distortion is called ionospheric scintillation. These ionospheric irregularities occur as part of depleted plasma density regions that are generated at the magnetic equator after sunset by equatorial ionospheric plasma instability mechanism. Also known as ionospheric bubbles, they drift upward to high altitudes at the equator and extend/expand to low latitudes along the Earth magnetic field lines. Ionospheric irregularities affect the space weather since they present large variations with the solar cycle and during solar flares and coronal mass ejections. In general, navigation systems such as the Global Positioning System and telecommunications systems are also affected by the scintillation. The aim of this work is to apply data mining for the prediction of ionospheric scintillation. Data mining can be divided into two categories: descriptive or predictive. The first one describes a data set in a concise and summarized way, while the second one, used in this work, analyzes the data to build a model and tries to predict the behavior of a new data set. In this study we employed data series of ionospheric scintillation and other parameters such as the level of solar activity, vertical drift velocity of the plasma at the magnetic equator, and magnetic activity. The results show that prediction of the ionospheric scintillation occurrence during the analyzed period was possible regardless of the high variability of the ionospheric parameters that affect the generation of such irregularities. Citation: Rezende, L. F. C., E. R. de Paula, S. Stephany, I. J. Kantor, M. T. A. H. Muella, P. M. de Siqueira, and K. S. Correa (2010), Survey and prediction of the ionospheric scintillation using data mining techniques, Space Weather, 8, S06D09, doi:10.1029/2009SW000532.
1. Introduction [2] Equatorial ionospheric scintillations are caused by ionospheric irregularities embedded in background plasma of large density. Scintillation occurs when a radio wave cross an irregularity layer in the ionosphere and suffers distortions in phase and amplitude. Ionospheric scintillation may cause serious degradation in the performance of telecommunications systems and applications based on Global Navigation Satellites Systems (GNSS), such as the Global Positioning System (GPS). For example, rapid signal phase variations and amplitude fades may cause loss of receiver lock (loss of synchronism between satellite and receiver) and, consequently, the number of available GNSS satellites to get a good positioning geometry can be drastically reduced. Several studies have demonstrated that the equatorial ionospheric scintillations affect the performance and reliability of GPS receivers [e.g., Bandyopadhayay et al., 1997; Kintner et al., 2001; Rezende et al., 1 National Institute for Space Research, São José dos Campos, Brazil.
Copyright 2010 by the American Geophysical Union
2007]. Therefore, an adequate knowledge about their spatial‐temporal characteristics is essential to improve current positioning/navigation applications and ionospheric scintillation prediction models. [3] Global and regional irregularity models have recently been developed to forecast the scintillation activity caused by ionospheric plasma density irregularities. For example, during the past two decades the researchers at NorthWest Research Associates (NWRA), USA, have developed the WBMOD (Wideband Model) ionospheric scintillation model. This model consists of coupled ionospheric electron density irregularity and propagation models [Fremouw and Secan, 1984; Secan et al., 1995], which computes the amplitude and phase scintillations indices under prespecified geophysical conditions and as function of location (latitude, longitude), date and time. [4] Another well‐known model is The Global Ionospheric Scintillation Model (GISM). This model developed by scientists from Informatique Electromagnètisme Electronique Analyse (IEEA), France, allows one to obtain both mean errors and radio wave scintillations. GISM model computes intensity and phase scintillation indices by
S06D09
1 of 10
S06D09
REZENDE ET AL.: SCINTILLATION PREDICTION
treating the ionosphere as a multilayer turbulent medium, each of them acting as a phase screen. The electronic density inside the medium is provided by NeQuick ionospheric electron density empirical model developed by the Universities of Graz and Trieste [Radicella and Leitinger, 2001]. The main inputs on GISM model are the geophysical parameters, data, local time and measured plasma inhomogeneity data [Béniguel, 2002]. [5] Costa and Basu [2002] developed a theoretical model for calculation of phase and amplitude scintillations at equatorial regions. The algorithm assumes that the phase fluctuations of the wavefront emerging from the bottom of the irregularity layer are proportional to electron density fluctuations directly obtained from satellite in situ measurements. This model assumes that the same phase fluctuations can be obtained from their power spectral densities and phase spectra, represented by analytical functions with parameters provided by physics‐based or morphological models of ionospheric irregularities. [6] More recently, Anderson et al. [2004] described a technique for forecasting the occurrence frequency of ionospheric scintillation activity in the equatorial ionosphere on a night‐to‐night basis. Their methodology establishes a relationship between the magnitude of the vertical E x B drift velocity enhancement, just after sunset, and the magnitude of the S4 index subsequently observed by scintillation receivers about 1–2 h later. [7] However none of these models used a data mining approach. Data mining can be defined as the process of extracting hidden, previously unknown, and potentially useful high‐level information in large databases [Witten and Frank, 2000]. Such information may be given by patterns, models, classifications or clustering that help to explain a given phenomenon. Patterns may appear in the data and are associated to specific behaviors. Models may help to predict such behaviors from previous data, similarly to a regression. Classification and clustering allow one to divide the data into known or unknown classes, respectively. [8] Data mining can be classified into two categories: descriptive and predictive data mining. Descriptive data mining describes the data set in a concise and summarized manner and presents interesting general properties of the data. Predictive data mining analyzes the data in order to construct a single one or a set of models, and attempts to predict the behavior of new data sets [Han and Kamber, 2001]. [9] In this work, data mining techniques are used for the first time to analyze and predict ionospheric scintillation over the Brazilian Equatorial Ionization Anomaly (EIA) region. The EIA is primarily a daytime feature which starts to develop during the morning hours and reaches a maximum development around afternoon hours. However, during the evening hours a large enhancement of the vertical drift and the rapid uplift of the equatorial ionosphere may cause a resurgence of the anomaly. The EIA is characterized for two peaks in the F region ionospheric plasma density, one on each side at ±15°–20° of magnetic dip latitude, and a minimum ionization density at the magnetic
S06D09
equator [Hanson and Moffett, 1966]. The EIA crests are the result of the E x B drift that moves the F region plasma upward at the dip equator (“fountain effect”) and its consequent downward diffusion along the magnetic field lines. The increased background electron density at the anomaly regions is known to cause significant delay on tracked satellite signals. Since the strength of the scintillations depends essentially on the integrated electron density deviation (DN), which is obtained from the product of the irregularity amplitude (DN/N) and the background plasma density (N), strong scintillations on radio wave signals could be expected to occur more frequently at off‐equatorial latitudes in the region of the EIA crests. [10] The severe effects due to scintillations at the equatorial region is known to be the principal source of receiver failure and degradation for the global navigation satellite systems (GNSS), and thus is a concern for life critical navigation. Therefore, an adequate knowledge about the spatial‐temporal distribution of ionospheric scintillations and their maximum expected conditions during geomagnetic disturbances becomes important to mitigate positioning errors on GNSS based applications. Hence, the prediction of ionospheric scintillation is crucial for the technical and scientific communities involved in Space Weather monitoring activities. [11] This present paper is organized in the following way. In section 2 we describe the instrumentation, the database and the methodology used for the prediction of ionospheric scintillation. Next, in section 3, we show the results of the temporal variation of the scintillation at the anomaly peak, the prediction of scintillation temporal profiles with antecedence of one or more hours, and an exception case of non scintillation occurrence. Finally, in section 4, we present the main remarks of the data mining technique (bagging‐CART) implemented in this work.
2. Methodology 2.1. Input Parameters/Attributes [12] The main parameters (attributes) used in this work were the ionospheric amplitude scintillation at the magnetic equator and under the southern crest of the equatorial ionization anomaly EIA, the ionospheric vertical drift velocity over the equator, the geomagnetic index (Kp) and the solar radio flux in 10.7 cm (F10.7). The data used in the present analysis were obtained during the solar maximum years 2000–2002 (see Table 1). The time interval of scintillation occurrence considered here are those recorded from 2000 to 2400 (local time) and, the ExB vertical drift velocity data were obtained during the hours of its prereversal peak, between 1700 and 1900 (local time). 2.1.1. Ionospheric Scintillation [13] Amplitude scintillation data were obtained from two ground‐based GPS receivers (Figure 1a). One is located close to the magnetic equator at São Luís (2.3°S, 44.2°W and −1.5° magnetic inclination), at latitudes where the plasma bubble irregularities are generated, and another is located at São José dos Campos (23.1°S, 45.8°W and −32.0° magnetic 2 of 10
S06D09
REZENDE ET AL.: SCINTILLATION PREDICTION
S06D09
Table 1. Data Days of Sample for Training and Test of Model Year
Month
Days
2000 2000 2001 2001 2001 2001
January March February March November December
16, 18 3, 5, 13, 14, 15, 16, 17, 21, 30 2, 3, 4, 8, 9, 10, 11, 12, 17, 18, 22, 23 1, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 21, 25, 26, 29 2, 3, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 24, 25, 26, 27, 28 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 19, 22, 23, 25, 26, 27, 28, 29
inclination) under the southern anomaly peak, at latitudes where the ionospheric scintillations are very intense and reach large amplitudes (Figure 1b). Figure 1a shows the locations of the GPS observation sites used in this work. GPS amplitude scintillations are always associated to the presence of large‐scale plasma depletions or plasma bubble irregularities. However, the irregularity structures that cause fluctuations on GPS signals are those of the order of ∼360–400 m that are embedded into the bubbles. Once the plasma bubbles are generated, they lift up to high altitudes over the equator and extend/expand to off‐equatorial latitudes until the region of the equatorial anomaly crests. It takes 1 h approximately for the plasma bubbles to reach the magnetic latitudes of the anomaly crests. [14] The GPS scintillation monitors (receivers) used in this work are based on a GEC‐Plessey development system, whose data are acquired in a sample rate of 50 Hz and at the L1 frequency (1575.42 MHz) [Beach and Kintner, 2001]. The ionospheric amplitude scintillation computed by the receivers is represented through an index, the so‐called S4
scintillation index. The S4 index is the mostly used parameter to measure amplitude scintillation and is defined as the normalized standard deviation of the received signal power intensity. It is calculated as following: S42 ¼
hI 2 i hIi2 hIi2
;
ð1Þ
where I is the signal power intensity and the brackets h i denote time average values.
2.1.2. Ionospheric Vertical Drift Velocity [15] Data from plasma vertical drift velocity were obtained from a Digital Portable Sound (DPS) Digisonde installed at the equatorial station of São Luís (see Figure 1a). The parameter provided by the Digisonde and used in this work to infer the vertical drift velocity was the ionospheric base height h′F. The ionospheric parameter h′F represent the virtual height of the bottom ionospheric F layer. The vertical drift velocity (VD) is determined by computing the time derivative of h′F data scaled from ionograms recorded
Figure 1. (a) A map showing the location of the GPS and digital ionosonde stations. (b) S4 isolines and colors of the scale bar.
3 of 10
REZENDE ET AL.: SCINTILLATION PREDICTION
S06D09
S06D09
delay of 9 to 30 h after the storm onset and reduce the equatorial plasma upward drift during day and downward drift during night. In this way the prereversal vertical drift peak is inhibited/reduced in amplitude and, as a result, the ionospheric irregularity generation is weakened or inhibited. In this case, the ionospheric scintillation can also be weakened or inhibited [Muella et al., 2009].
Figure 2. Attributes used in this work. at 15 min intervals [Bittencourt and Abdu, 1981; Reinisch, 1986; Bertoni, 1998]: 0
VD ¼
h F : t
ð2Þ
[16] There is dependence between scintillation and vertical drift velocity of the plasma. In our observations we have seen, for example on 24 November 2001 (day listed in Table 1), that the drift velocity was 11.12 m/s (slow) and the S4 index in the anomaly peak reached 0.26 (weak scintillation level). On the other hand, on 14 December 2001, the vertical drift velocity of the plasma reached 58.23 m/s (high) and the S4 index was 1.17 (strong scintillation level).
2.1.3. Kp Index [17] The 3 hourly magnetic Kp index is used in this work to classify into magnetic quiet and disturbed conditions. It may be considered as a magnetically quiet day when the daily SKp < 24. The Kp data were downloaded from World Data Center for Geomagnetism, Kyoto (http://wdc.kugi. kyoto‐u.ac.jp/kp/index.html). The mean Kp value is calculated 24 h before to the prereversal peak time that occurs between 1700 and 1900 (local time) (B. Fejer, personal communication, 2008). [18] The study of irregularities characteristics during magnetic storm (disturbed days) gives insight into the role of electric fields of magnetospheric origin [de Paula et al., 2004] in the irregularity process and is of interest in the impact on global VHF/UHF communication systems [Basu et al., 2001]. An important parameter responsible for the growth of ionospheric plasma instabilities after sunset is the equatorial upward vertical plasma drift [Fejer et al., 1999] which is driven by the F layer dynamo zonal (eastward) electric field, known as the prereversal electric field. [19] During magnetic storms direct penetration of eastward magnetospheric electric fields, occurring during the post sunset hours, can intensify the prereversal electric field thereby enhancing the irregularity process triggering irregularities, even during epochs outside of the irregularity season in Brazil. In this case, the ionospheric scintillation can be triggered. [20] Disturbance dynamo westward zonal electric fields during some magnetic storms [Fejer and Scherliess, 1995; Scherliess and Fejer, 1997] can reach low latitudes with a
2.1.4. Solar Flux [21] It is well known that scintillation occurrence increases with the increase in the solar activity [de Paula et al., 2007]. Hence, intense scintillation activity is expected during solar maximum years. The solar activity can be measured using the 10.7 cm solar radio flux as a proxy (F10.7 index). The F10.7 solar radio flux data can be obtained from National Geophysical Data Center (http:// www.ngdc.noaa.gov/stp/SOLAR/FLUX/flux.html). The F10.7 solar flux with a resolution of 1 day is a parametric input for the model. 2.2. Preprocessing of the Data [22] The data described in section 2.1 correspond to parameters that are considered attributes for the data mining process. Such data was imported to a relational database MySQL. The data can be retrieved using a database language (Structured Query Language) with a temporal of 5 min. Figure 2 shows the attributes used as predictors and the attribute to be predicted (answer, in red), with 1–4 h of antecedence in relation to the other attributes. The time of the scintillation occurrence is introduced in the model. This is represented in Figure 2 as Hm_AP (which stands for hour_minute in the anomaly peak). Others attributes used in the model are also shown, such as, S4 index in the equator (S4 Eq), S4 index in the anomaly peak (S4 AP), solar flux (F10.7), drift velocity (drift vel.) and index of magnetic activity (Kp). The numbers in the bottom line of Figure 2 are examples of possible values which can be assumed by the input parameters. More details are given in section (2.4). The ionospheric scintillation prediction is a nonlinear problem and therefore some common methods may not yield good results. Initially we tried Multiple Linear Regression (see Table 2), and also Multilayer Perceptron Neural Network (MLP), commonly used for nonlinear problems. However, both failed to provide good
Table 2. Training With Bagging: Comparison Between Multiple Regression and CART Iteration
RMS Regression
RMS CART
1 2 3 4 5 6 7 25 200
0.67607 0.61718 0.53507 0.45699 0.39231 0.34626 0.3167 0.26023 0.25859
0.67045 0.59551 0.49535 0.39397 0.31067 0.24251 0.19361 0.05622 0.047041
4 of 10
S06D09
REZENDE ET AL.: SCINTILLATION PREDICTION
S06D09
Figure 3. Data smoothing of the S4 index using (top) a 15‐point or (bottom) a 30‐point width moving window (data from São José dos Campos station). results and we adopted an ensemble method (bagging), since there is a tendency for the use of such methods for these problems. Ensemble methods are discussed in section 2.3, and specifically bagging, in section 2.3.1. In this work, bagging was successfully employed with a particular decision tree algorithm (CART), described in section 2.3.2. [23] Due the large variability of the S4 index, its data was smoothed using a moving average in order to get a continuous and smooth curve of the S4 index in function of time. The choice of a 15‐point width window for the moving average seems to yield a curve that better describes the daily variation of the scintillation. A 30‐point width window smooth excessively the curve and the associated peaks do not match the actual variation of the scintillation in time (Figure 3). This large variability of the S4 index is caused by signal noise, acquisition process and even numerical noise.
2.3. Ensemble Methods [24] Ensemble methods combine several models for prediction or classification to get a single model or result. Ensemble methods usually work with unstable learning algorithms or weak classifiers. Unstable learning algorithms are susceptible to small changes in the training set that cause bigger changes in the results [Duda et al., 2001]. Weak classifiers are those that perform only slightly better
than a random choice. Ensemble methods are typically employed in the case of complex models with small data sets, as in the current work.
2.3.1. Bagging [25] Bagging is an ensemble method that can be applied for prediction. Introduced by Breiman [1994], bagging stands for “bootstrap aggregating.” It uses bootstrap to randomly generate several samples from an original sample. In our case, we adopted a scheme from Witten and Frank [2000], being multiple samples generated, and multiple models separately trained. The answers of these models are averaged (in the case of a continuous answer) or defined by polling (discrete answer) for each instance of the original sample and an aggregate model is then trained and applied to test data. 2.3.2. Decision Tree [26] Decision tree is a supervised classification algorithm where the results are a set of chained rules with the if‐then‐ else construct leading to a hierarchical structure similar to a tree. Basically, the components of this structure are the nodes, points where the classification rules are applied to test the predictors (attributes of the instances), and the subsequent leafs that are the branches resulting of the classification. Each end/terminal node of the tree expresses a possible classification, i.e., the answer attribute corresponding to a given set of predictors. Decision tree have
5 of 10
S06D09
REZENDE ET AL.: SCINTILLATION PREDICTION
S06D09
Figure 4. S4 index probability distribution function for some time intervals. been successfully employed in machine learning and data mining since they are fast, effective and not require setting any parameter. [27] There are many decision tree algorithms. In this work, we chose the Classification and Regression Tree (CART) [Breiman et al., 1984]. This nonparametric decision tree is based on binary recursive partition CART which is considered a greedy method, since it grows the tree based on immediate decisions at the nodes and do not perform any optimization considering the entire tree. It can be used for classification or regression [Sutton, 2005], being employed here to perform a nonlinear regression in order to predict the scintillation.
2.4. Prediction With Antecedence of Hours [28] Ionospheric irregularities are highly dynamic and have large day‐to‐day variability. They form after sunset over the magnetic equator and depending upon the ionospheric conditions, mainly on the vertical plasma drift velocity, they can evolve or not. We have selected data from GPS satellites with elevation angle higher than 30°. This angle mask minimizes the effects of geometric factors in the S4 index calculation due to tropospheric scattering and multipath. We limited our measurements for S4 index data recorded between 1900 and 2400 (local time). The scintillation prediction concerned here is based on 80 nights of observation during the last solar maximum years (2000–
2002) and throughout the months of large occurrence of plasma bubbles (November–March). We used data obtained from January–March 2000, February–March 2001 and November 2001. Table 1 shows the list of days used in the present analysis when GPS and Digisonde data were available. [29] This scheme performs predictions with one or more hours of antecedence. Besides employing scintillation data at the equator, the earlier scintillation observed at the anomaly peak is also used for the prediction. These observed values of the scintillation at the anomaly peak improve further predictions, since we assume a continuous and smooth curve for the scintillation at the equator and at the anomaly peak (AP). For example, in order to perform the prediction of the AP scintillation at 2000 h (local time) with an hour of antecedence, we employ the observed AP scintillation at 1900. The prediction with antecedence of hours was performed with a bagging method using the CART decision tree.
3. Results 3.1. Performance of the Training Using CART [30] In order to evaluate the performance of CART in the training considering data for prediction with hours of antecedence, we compared its training root mean square error (RMS error) to that of a standard multiple regression 6 of 10
S06D09
REZENDE ET AL.: SCINTILLATION PREDICTION
S06D09
Figure 5. Prediction of the S4 daily temporal profile for different days: (a) 29 December 2001, (b) 14 December 2001, (c) 15 December 2001, (d) 12 November 2001, (e) 25 December 2001, and (f) 23 February 2001. algorithm. We employed bagging to generate a number of new samples from the original sample for both algorithms. [31] Table 2 shows that both RMS errors decrease with the increase of the number of samples, but do not improve meaningfully with more than 25 samples. For that reason we adopted such number of samples. Table 2 also shows that CART yielded the minor RMS errors.
3.2. Analysis of the Temporal Variation of the Scintillation at the Anomaly Peak [32] In order to check if the scintillation data matches its expected behavior, we calculated the probability distribution function (pdf) of the observed S4 index at the anomaly peak, using all data of the sample. Data was divided into 4 temporal ranges: blue, from 2000 to 2100; green, from 2101 to 2200; magenta, from 2201 to 2300; and red, from 2301 to 2400. The resulting pdf’s are shown in Figure 4. It
Figure 6. A 26 day sequence of daily predictions versus observed values. 7 of 10
REZENDE ET AL.: SCINTILLATION PREDICTION
S06D09
S06D09
Table 3. Prediction of Scintillation With Antecedence of Hours RMS Datea
1 h of Antecedence
2 h of Antecedence
3 h of Antecedence
4 h of Antecedence
25‐12‐2001 27‐12‐2001 28‐12‐2001 29‐12‐2001
0.12748 0.05432 0.07830 0.053608
0.1459 0.10158 0.14492 0.13657
0.14963 0.1386 0.16335 0.16317
0.17746 0.1675 0.17968 0.17857
a
Date format is day‐month‐year.
can be observed that the highest values of S4 (above 0.6) have higher probability at the intervals: 2100–2200 (green) and 2201–2300 (magenta). This observation matches the expected scintillation behavior at the anomaly peak, since scintillation usually starts to increase at 2030, decreases after 2300, and come to an end at about 2400.
3.3. Prediction of Scintillation Temporal Profiles With Antecedence of Hours [33] The proposed methodology, based on the bagging‐ CART scheme, was employed for the prediction of the temporal profile of the S4 index at the anomaly peak (AP) with antecedence of one or more hours (of the same night). 3.3.1. Prediction of Scintillation Profile With Antecedence of 1 h [34] In general, the predictions of the scintillation daily profile with antecedence of 1 h were good. For instance, we got a very good prediction for 29 December 2001 (Figure 5a) since the correlation between predicted and observed data was 0.986 and the RMS error was 0.053608.
[35] The temporal profile of S4 scintillation index at AP (during night) may present different patterns, according to the behavior of scintillation. The variability of the daily temporal profile of the scintillation from one day to another can be high. Some examples of possible temporal profiles follow below. Different profiles can be observed in Figures 5a–5f. In particular, the last profile, from 23 February 2001, shows no occurrence of scintillation, and was correctly predicted. That was the sole day of the data with no scintillation, despite being in solar maximum period and in the scintillation seasonal period (November– March) at Brazilian longitudinal sector. In addition, the predictors were favorable to occurrence of scintillation: high solar flux (F10.7 = 142), and quiet magnetic conditions (Kp = 1.782). Only the velocity of the plasma vertical drift was not very high (34.44 m/s) for this uncommon day. [36] To illustrate the robustness of the proposed methodology, we plotted a sequence of 26 daily predicted temporal profiles of scintillation. The prediction of each daily scintillation profile was performed separately and covers
Figure 7. Prediction with antecedence of more hours on 27 December 2001. 8 of 10
S06D09
REZENDE ET AL.: SCINTILLATION PREDICTION
Table 4. Discretization of the S4 Index and Definition of Nominal Classes Values S4 ≤ 0.1 0.1 < S4 ≤ 0.15 0.15 < S4 ≤ 0.3 S4 > 0.3 S4 ≤ 0.1 0.1 < S4 ≤ 0.3 0.3 < S4 ≤ 0.5 S4 > 0.5
Nominal Equator Scintillation
a
Anomaly Peak Scintillationb
no weak moderate strong no weak moderate strong
a
S4 index between 1700 and 2000 LT (time of prereversal peak plasma bubble generation). b S4 index between 2000 and 2400 LT (time that ionospheric scintillation occurs at the anomaly peak).
the time interval 2000–2400 (local time). Figure 6 shows the sequence of 26 daily predictions compared to the actual observed values of scintillation at AP (the horizontal axis corresponds to the number of instances). The RMS error calculated for all instances of these 26 days was 0.10765 and the correlation between observed and predicted values was 0.922. It can be noted a depression after instance 200 that corresponds to a date with absence of scintillation (23 February 2001).
3.3.2. Prediction of Scintillation With Antecedence of More Hours [37] The same methodology (bagging‐CART) allows the prediction of scintillation with antecedence of a few hours (up to 4 h). As it would be expected, in comparison to the 1 h antecedence prediction, the RMS error was higher, and the error varies directly to the antecedence. Table 3 shows the RMS error that corresponds to predictions with different hours of antecedence for 4 days of December 2001. A prediction example of 2, 3, and 4 h of antecedence is also shown in the Figure 7 for 27 December 2001. We can see that for larger antecedence hours the RMS increase and the correlation decrease between the sample (measured data) and prediction. 3.4. Prediction of Scintillation With 1 Day of Antecedence [38] In order to predict the scintillation with 1 day of antecedence, a single value of S4 is considered for each day, corresponding to the maximum observed value of S4 at the equator or at the anomaly peak. This value of the S4 index is discretized and then defined the nominal class according to Table 4. In the same way, the result of the prediction of the scintillation for a given day is a class (absence, weak, moderate or strong scintillation). [39] In the prediction with antecedence of 1 day, we employed a sample of 134 days obtained by the inclusion of scintillation data corresponding to the months of November and December 1999. This kind of prediction requires that scintillation data to be arranged as in a temporal series, since, for example, the prediction for one specific day requires data from the previous day. This prediction was performed
S06D09
by the free software Weka (free software), version 3.6.0 (www.cs.waikato.ac.nz/ml/weka/), that requires nominal classes. The 134 day sample was divided in a ratio of 66% for training and 34% for tests. We selected the bagging algorithm with the J48 decision tree, also known as C4.5 [Quinlan, 1993]. The algorithm classified correctly 44 instances (95.65%) out of 46 test instances (RMS of 0.1576). The algorithm misclassified the only 2 instances of moderate scintillation as being of strong scintillation. Another trial was attempted using the cross validation scheme that divides the original sample into N subsamples, and randomly chooses one subsample for testing and the remaining (N‐1) samples for training. This procedure is repeated alternating the choice of the test subsample. As a result, the bagging‐J48 classified correctly 119 strong‐scintillation instances (88.81%) out of the 134 instances (RMS of 0.2363). The algorithm misclassified 3 no‐scintillation, 4 weak‐scintillation and 8 moderate‐ scintillation instances as being strong. Therefore, the failure to classify nonstrong scintillation instances shows that our strategy for prediction with 1 day antecedence must be revised.
4. Remarks [40] Ionospheric scintillation studies and its prediction are important since scintillation affects the Global Navigation Satellite Systems (GNSS) and telecommunication systems. This work shows many test cases for which the ionospheric scintillation prediction (temporal profile of S4 index) with antecedence of hours was successfully obtained with a good accuracy. The implemented method (bagging‐ CART) was able to predict the scintillation with one or more hours of antecedence following the dynamics of the ionospheric irregularity process. It was also able to predict an exception case when no scintillation occurred (23 February 2001). It was also noted that bagging coupled with CART generated better predictions than with multiple regression. The prediction of discrete values of S4 index with an antecedence of 1 day can also be performed. Despite bagging algorithm classified correctly 95.5%, it was unable to predict the non occurrence of scintillation with 1 day of antecedence, even when the conditions were favorable. Once the input parameters used as attribute are available and since that, the methodology presented in this work is based on the evolution of the plasma bubble from magnetic equator to low latitudes, it could be applied for any sector of longitude. However, the validation of this technique for other longitudes was not part of the scope of the present study, but it could be further investigated. We believe that other important parameters may affect the occurrence of ionospheric scintillation [Abdu, 2001; Muella et al., 2010] and, hence, could be included to train the sample in order to improve the predictions. [41] Acknowledgments. The authors acknowledge M. A. Abdu and Inez Staciarini Batista for providing the DGS256 digital portable sound data and Maria Goreti Santos Aquino for data treatment. 9 of 10
S06D09
REZENDE ET AL.: SCINTILLATION PREDICTION
References Abdu, M. A. (2001), Outstanding problems in the equatorial ionosphere‐thermosphere electrodynamics relevant to spread F, J. Atmos. Sol. Terr. Phys., 63, 869–884, doi:10.1016/S1364-6826(00) 00201-7. Anderson, D., B. Reinisch, C. Valladares, J. Chau, and O. Veliz (2004), Forecast the occurrence of ionospheric scintillation activity in the equatorial ionosphere on a day‐to‐day basis, J. Atmos. Sol. Terr. Phys., 66, 1567–1572, doi:10.1016/j.jastp.2004.07.010. Bandyopadhayay, T., A. Guha, A. DasGupta, P. Banerjee, and A. Bose (1997), Degradation of navigational accuracy with Global Positioning System during periods of scintillation at equatorial latitudes, Electron. Lett., 33(12), 1010–1011, doi:10.1049/el:19970692. Basu, S., et al. (2001), Ionospheric effects of major magnetic storms during the international space weather period of September and October 1999: GPS observations, VHF/UHF scintillations, and in situ density structures at middle and equatorial latitudes, J. Geophys. Res., 106, 30,389–30,413, doi:10.1029/2001JA001116. Beach, T. L., and P. M. Kintner (2001), Development and use of a GPS ionospheric scintillation monitor, IEEE Trans. Geosci. Remote Sens., 39, 918–928, doi:10.1109/36.921409. Béniguel, Y. (2002), Global Ionospheric Propagation Model (GIM): A propagation model for scintillations of transmitted signals, Radio Sci., 37(3), 1032, doi:10.1029/2000RS002393. Bertoni, F. C. P. (1998), Ionospheric drift studies by digital ionosondes, M.S. dissertation, Inst. Nac. de Pesqui. Espaciais, São Paulo, Brazil. Bittencourt, J. A., and M. A. Abdu (1981), A theoretical comparison between apparent and real vertical ionization drift velocities in the equatorial F region, J. Geophys. Res., 86, 2451–2454, doi:10.1029/JA086iA04p02451. Breiman, L. (1994), Bagging predictors, Tech. Rep. 421, Dep. of Stat., Univ. of Calif., Berkeley. Breiman, L., et al. (1984), Classification and Regression Trees, 358 pp., Wadsworth Int. Group, Belmont, Calif. Costa, E., and S. Basu (2002), A radio wave scattering algorithm and irregularity model for scintillation predictions, Radio Sci., 37(3), 1046, doi:10.1029/2001RS002498. de Paula, E. R., K. N. Iyer, D. L. Hysell, F. S. Rodrigues, E. A. Kherani, A. C. Jardim, L. F. C. Rezende, S. G. Dutra, and N. B. Trivedi (2004), Multi‐technique investigations of storm‐time ionospheric irregularities over the São Luís equatorial station in Brazil, Ann. Geophys., 22, 3513–3522. de Paula, E. R., et al. (2007), Characteristics of the ionospheric irregularities over Brazilian longitudinal sector, Indian J. Radio Space Phys., 36, 268–277. Duda, R. O., P. Hart, and D. G. Stork (2001), Pattern Classification, 654 pp., John Wiley, New York. Fejer, B. G., and L. Scherliess (1995), Time dependent response of equatorial electric fields to magnetospheric disturbances, Geophys. Res. Lett., 22, 851–854, doi:10.1029/95GL00390. Fejer, B. G., L. Scherliess, and E. R. de Paula (1999), Effects of the vertical plasma drift velocity on the generation and evolution of equa-
S06D09
torial spread F, J. Geophys. Res., 104, 19,859–19,870, doi:10.1029/ 1999JA900271. Fremouw, E. J., and J. A. Secan (1984), Modeling and scientific application of scintillation results, Radio Sci., 19, 687–694, doi:10.1029/ RS019i003p00687. Han, J., and M. Kamber (2001), Data Mining Concepts and Techniques, 548 pp., Academic, San Francisco, Calif. Hanson, W. B., and R. J. Moffett (1966), Ionization transport effects in the equatorial F region, J. Geophys. Res., 71, 5559–5572. Kintner, P. M., H. Kil, T. L. Beach, and E. R. de Paula (2001), Fading timescales associated with GPS signals and potential consequences, Radio Sci., 36, 731–743, doi:10.1029/1999RS002310. Muella, M. T. A. H., E. R. de Paula, I. J. Kantor, L. F. C. Rezende, and P. F. Smorigo (2009), Occurrence and zonal drifts of small‐scale ionospheric irregularities over an equatorial station during solar maximum—Magnetic quiet and disturbed conditions, Adv. Space Res., 43, 1957–1973, doi:10.1016/j.asr.2009.03.017. Muella, M. T. A. H., E. A. Kherani, E. R. de Paula, A. P. Cerruti, P. M. Kintner, I. J. Kantor, C. N. Mitchell, I. S. Batista, and M. A. Abdu (2010), Scintillation‐producing Fresnel‐scale irregularities associated with the regions of steepest TEC gradients adjacent to the equatorial ionization anomaly, J. Geophys. Res., 115, A03301, doi:10.1029/2009JA014788. Quinlan, J. R. (1993), C4.5: Programs for Machine Learning, doi:10.1007/ BF00993309, Morgan Kaufmann, San Mateo, Calif. Radicella, S. M., and R. Leitinger (2001), The evolution of the DGR approach to model electron density profiles, Adv. Space Res., 27, 35–40, doi:10.1016/S0273-1177(00)00138-1. Reinisch, B. W. (1986), New techniques in ground‐based ionospheric sounding and studies, Radio Sci., 21, 331–341, doi:10.1029/ RS021i003p00331. Rezende, L. F. C., E. R. de Paula, I. J. Kantor, and P. M. Kintner (2007), Mapping and survey of plasma bubbles over Brazilian Territory, J. Navig., 60, 69–81, doi:10.1017/S0373463307004006. Scherliess, L., and B. G. Fejer (1997), Storm time dependence of equatorial disturbance dynamo zonal electric fields, J. Geophys. Res., 102, 24,037–24,046, doi:10.1029/97JA02165. Secan, J. A., R. M. Bussey, E. J. Fremouw, and S. Basu (1995), An improved model of equatorial scintillation, Radio Sci., 30, 607–617, doi:10.1029/94RS03172. Sutton, C. D. (2005), Classification and regression trees, bagging, and boosting, in Handbook of Statistics, vol. 24, edited by C. R. Rao, E. J. Wegman, and J. L. Solka, pp. 303–329, doi:10.1016/S0169-7161(04) 24011-1, Elsevier, Amsterdam. Witten, I. H., and E. Frank (2000), Data Mining: Practical Machine Learning Tools and Techniques With Java Implementations, 371 pp., Morgan Kaufmann, San Francisco, Calif. K. S. Correa, E. R. de Paula, P. M. de Siqueira, I. J. Kantor, M. T. A. H. Muella, L. F. C. Rezende, and S. Stephany, National Institute for Space Research, Avenida dos Astronautas, 1758 Jardim da Granja, 12227‐010, São José dos Campos, SP, Brazil. (
[email protected])
10 of 10