Non-Linear Interpolation of Missing Points within the ...

NONLINEAR INTERPOLATION OF MISSING POINTS WITHIN THE HOURLY FOF2 TIME SERIES N M Francis1, A G Brown2, P S Cannon1, D S Broomhead3. 1

Radio Science And Propagation Group, Signal and Information Processing Group, Defence Evaluation and Research Agency, Malvern, Worcestershire, WR14 3PS, UK. Email: [email protected]

2

3

Department of Mathematics, UMIST, P.O. Box 88, Manchester, M60 1QD, UK.

ABSTRACT This paper proposes a novel technique for the prediction of solar-terrestrial data sets that contain a significant proportion of missing data points. Radial Basis Function (RBF) Neural Networks (NN) are adopted for this purpose and their advantages over their Multi-Layer Perceptron (MLP) counterparts are outlined. 1.0 INTRODUCTION Non-contiguous data sets are a common problem with respect to geophysical time series. Simple linear interpolation is undesirable because it can disrupt the delicate non-linear dynamics within the data, thereby degrading the performance of the modelling process. A non-linear interpolative scheme is presented that minimises the effect of interpolation upon any given modelling process. An interpolated model has been implemented for the prediction of hourly foF2 values from 1971 to 1973 for Slough station, UK. The time series as a whole contains 6.6% missing data points. The first 23,000 points have been used to train the model, while the remaining 3304 points were used to test the ability of the model to generalise to unseen data. The effectiveness of the model has been assessed relative to the performance of a non-interpolated model, as well as persistence and recurrence interpolated models, as shown in figure 1. 2.0 ANALYSIS TECHNIQUE An RBF NN has been employed to provide the core modelling capability for the interpolation process [Francis et al., 1998]. The most important advantage of this network over the more commonly used Multi-Layer Perceptron (MLP) architecture is the ability of the RBF to find a globally optimum solution, to a time series prediction problem, in a single pass process. MLPs can only produce locally optimum solutions though an iterative process.

1

Figure 2 details the basic, non-interpolated modelling process, which acts as a bootstrap for the subsequent interpolation of missing data points. Figure 3 details the interpolation process. Two iterations of the interpolation process were sufficient to achieve the optimal results. Simple nonlinear techniques, such as interpolating missing points using the prediction output from the RBF NN, were not found to give stable solutions. Three simple reference interpolation schemes were adopted for the purpose of comparison with the nonlinear gap-filling method. The first technique, the zero interpolation scheme, sets all missing points in the de-meaned and normalised time series to a value of zero. The second uses persistence to fill in missing data points. The third and final technique uses the 24-hour recurrent structure of the hourly foF2 time series to replace missing points. In addition, the complete vector only RBF model is also included. However, the complete vector test set is much smaller than the interpolated vector test set, due to incomplete vector rejection. This means that the results are not directly comparable with the other techniques. 3.0 RESULTS Figure 1 shows a graph of root mean square (RMS) error, on the interpolated test vector set, plotted against number of hours ahead for the prediction, for each of the prediction schemes. Using any of the proposed interpolation schemes, the vector rejection rate falls from 60% to 6.6%. Residual rejection occurs because training vectors are still rejected if there is a gap at the corresponding point in the required output time series. The output time series is identical to the input time series for a self-prediction problem. The graph clearly shows that the nonlinear interpolation technique consistently produces the best results in terms of relative predictive accuracy. It shows a 2.3% improvement over the recurrence model, which characterises the dominant diurnal variation of foF2, for

each of the one to thirty hour ahead predictions. In turn, the recurrence model shows a similar improvement relative to the persistence model. The zero interpolation model falls significantly behind these reference models, indicating the necessity of adopting an approach to deal with the problem of missing data points in geophysical time series.

Currently, the interpolated model parameters are the optimal parameters for the complete vector only model, due to processing constraints. By re-optimising the model parameters at every step, it is likely that further significant gains in predictive accuracy can be made.

It is interesting to note that the interpolated model produced lower prediction errors, relative to the complete vector only model, when tested on the complete vector only test set instead of the interpolated vector test set. This shows the positive benefits that can be accrued, in terms of improved network training, through interpolation of the input time series. This point is further strengthened by the fact, on the same test set, the zero interpolation model performed less accurately than the complete vector only model. It would appear that rejecting incomplete data vectors might be preferable to including them in a model, if incomplete vectors provide the network with poor training material.

Finally, in instances where simple climatological models cannot be used to provide sensible interpolations of missing points, this technique will find the best interpolation solution for that particular problem. In such a case, the resultant improvement may be greater than for the prediction of foF2, where the dominant diurnal variation of this ionospheric parameter is closely modelled by a simple recurrence model. © British Crown Copyright 1998/DERA Published with the permission of the controller of Her Britannic Majesty’s Stationary Office.

4.0 CONCLUSIONS In summary, this paper describes a novel and general scheme to deal with the problem of non-contiguity in geophysical time series. This scheme can be used to improve the viability and accuracy of the underlying non-interpolated predictive model, which is currently being used to provide real-time predictions. This baseline nonlinear scheme provides a significant improvement upon simple interpolation schemes that rely upon persistence or recurrence properties of the observed ionospheric parameter.

5.0 REFERENCES Brown, A. G., N. M. Francis, D. S. Broomhead, P. S. Cannon, A. Akram, ‘Is there Evidence for the Existence of Nonlinear Behaviour within the Interplanetary Solar Sector Structure?’, Accepted by JGR, 1998. Francis, N. M., A. G. Brown, P. S. Cannon, D. S. Broomhead, ‘Prediction of the Ionospheric Parameter, foF2, on hourly, daily and monthly time-scales’, Submitted to JGR, 1998.

Prediction Accuracy for 1973 foF2 hourly values

0.9 Complete vectors only Nonlinear interpolation Zero interpolation Persistence interpolation Recurrence interpolation

0.8 0.7 0.6

29

27

25

23

21

19

17

15

13

11

9

7

5

3

0.5 1

RMS error (MHz)

1.0

Hours ahead Figure 1. Relative Performance of Interpolation Techniques 2

3

Slough FoF2 time series de-meaned and reduced to unit variance.

Data vectors constructed using optimal length sliding window.

Data vectors that contain missing data points are removed.

Fixed no. of functions required to optimise prediction accuracy.

Remove SVD components that only characterise noise.

Data vectors projected on to principal axes using SVD.

Predictive RBF model constructed from time series and functions.

Models constructed to predict from 1 to 30 steps ahead.

Results compared with persistence / recurrence model, using RMS error.

Figure 2. Flowchart Representing Non-interpolated Modelling Process 5

Create RBF with same parameters as basic uninterpolated model.

Perform one step ahead prediction.

Use iterative method to minimise RBF error due to interpolation of a gap.

Fill gaps in original input time series using second gap filling model.

Output time series retains gaps, to prevent feedback loop.

Create 2nd RBF model using interpolated time series as new input.

Create 3rd RBF model using second filled time series.

Predict from 1 to 30 steps ahead.

Results compared with Reference interpolations, using RMS error.

Figure 3. Flowchart Representing Interpolation Modelling 6

Non-Linear Interpolation of Missing Points within the ...

Non-Linear Interpolation of Missing Points within the ...

Suggest Documents

1 NONLINEAR INTERPOLATION OF MISSING POINTS ... - CiteSeerX

Superconvergence Points of Spectral Interpolation

Quality Estimation of the Interpolation of Missing Speech Signal

Missing Data Interpolation using Compressive Sensing

Missing Data Interpolation By Using Local-Global

Missing data interpolation by recursive filter

Homogeneous interpolation on ten points arXiv

missing links within urban agriculture

INTERPOLATION METHODS FOR NONLINEAR ... - Project Euclid

fast interpolation and cubature at the Padua points in

Nonlinear Interpolation between Slices - science.uu.nl project csg

Nonlinear Interpolation between Slices - science.uu.nl project csg

Nonlinear L1C1 Interpolation: Application to Images

Generation of lattices of points for bivariate interpolation

Regular Points for Lagrange Interpolation on the Unit Disk - CiteSeerX

A general, multipurpose interpolation procedure: the magic points.

A general, multipurpose interpolation procedure: the magic points.

Using trend clusters for spatiotemporal interpolation of missing data in ...

Integrating Trend Clusters for Spatio-temporal Interpolation of Missing

Interpolation of missing electrode data in electrical impedance ...

Interpolation of Missing Temperature Data at Meteorological Stations

Interpolation inequalities on the sphere: linear vs. nonlinear flows

Masters Thesis Computing the Visibility Graph of Points Within a

MISSING VALUES IN NONLINEAR FAcTOR ... - Semantic Scholar