Polynomial regression coefficients estimation in finite differences space

Polynomial Regression Coefficients Estimation in Finite Differences Space Anatolii V. Omelchenko∗ , Oleksii V. Fedorov† National University of Radio Electronics Kharkov, Ukraine, ∗ Email: [email protected], † Email: [email protected] Abstract—Estimators for polynomial regression coefficients in case when correlation function of observation noise is not completely known have been considered. It was shown that sufficient statistics to define i-th order regression coefficient with respect to equidistantly spaced samples are finite differences of the same order. The major attention was paid to research into roundoff noises influence on precision of polynomial regression coefficients computation.

I.

I NTRODUCTION

There are a lot of applied regression analysis problems that arise the necessity to estimate polynomial regression coefficients so that they possess minimal deviations in relation to the true values. Such the problems appear, for example, in systems of trajectory measurements, in gravimetry and some other applications. If the disturbance correlation function is a priori known, there is a possibility to construct optimal estimates for regression coefficients. However in vast majority of cases disturbance characteristics are not completely known, this fact makes difficult applying any existed approaches. The aim of the work is 1) to construct efficient estimators for polynomial regression coefficients in cases when observation noise correlation function is not completely known; 2) to research into influence of roundoff noises on precision of estimates we get. II.

P ROBLEM S TATEMENT AND E STIMATORS

The problem statement supposes that in accordance with data n X ai · k i + ζ[k], k = 0, K − 1, z[k] = (1)

It is known that linear estimators for polynomial regression parameters are always representable as a linear combination of observations [3] a î =

K−1 X

wi,n [k] · z[k],

(2)

k=0

where wi,n [k] are processing weights. Employing properties of finite differences one can show that estimator (2) is representable in the form [4], [5] a î =

K−i−1 X

∆i z[k] · Fi,n [k + i],

(3)

k=0

where functions Fi,n [k], satisfying condition ∆i Fi,n [k] = (−1)i wi,n [k] and appearing as weights at finite differences of observation sequence in (3), are called weight functions of regression analysis. Expressions (2) and (3) analysis allows us to conclude that sufficient statistics to define regression coefficient ai with respect to equidistantly spaced samples are finite differences of the same order. The way the estimator (3) is written allows us to represent the expression for the filter impulse response as the convolution of impulse responses of difference and low-pass filters. This makes possible to implement multistage design of a liner filter that produces the estimate for the i-th coefficient of a polynomial.

i=0

where ξ[k] is a sequence of correlated random variables, it is necessary to find unbiased linear estimators for regression coefficients ai , i = 0, n, having minimal variances. On characteristics of observation noise ξ[k] we assume that we are given a frequency range within which the major part of the noise power is concentrated and, perhaps, power spectral density of this noise. As we know in case of disturbance correlation matrix complete defining the strict solution to the problem we formulated can be obtained by applying the generalized least squares method [1], [2]. Note that implementation of this method in addition to a priori difficulties, requires significant computational resources. In the present work we develop the other approach that is based on employing weight functions, specified in space of observation finite differences.

An important advantage of the estimator (3) over (2) consists in simplifying restrictions on the linear combination coefficients. Instead of the requirement for coefficients wi,n [k] orthogonality to functions k r , where r < i, the estimator (3) uses condition of weight function Fi,n [k] equality to zero held at points k = 0, i − 1. This fact allows us to develop efficient methods for synthesis of estimators for polynomial regression coefficients. For a particular case i = n (the estimator for the high power coefficient of a polynomial) these algorithms take respectively the following forms: a ˆn =

K−1 X k=0

wn,n [k] · z[k],

(4)

spectral density while converting disturbance samples to their finite differences of the order n [4].

and a ˆn =

K−n−1 X

∆n z[k] · Fn,n [k + n],

(5)

k=0

where ∆n Fn,n [j] = (−1)n wn,n [j]. A useful property of the estimator (5), that allows decreasing roundoff errors, consists in easiness of compensation of an a priori known part a ˘n of the high power coefficient of the polynomial regression with the aid of the following procedure a ˆn =

K−n−1 X

where the function Vr (ω) =

n

(∆ z[k] − n!˘ an ) · Fn,n [k + n] + a ˘n .

(6)

k=0

III.

For the next following optimization, it is convenient to represent criterion (12) in the form Z πX p ω D(γ0 , ..., γp ) = [ γr Vr (ω)]2 S(ω) sin2n dω, (14) 2 0 r=0

Within the statement of optimization problem we are going to seek for the weight function of regression analysis in the following form (7)

where k˜ = k − (K − 1 + n)/2 and function P [k] is a window function possessing the properties P [k] > 0, k = n, K − 1; P [k] = 0, k 6= n, K − 1

P [k]k˜2r cos ω k˜

(15)

k=0

and the sought-for coefficients {γ0 , . . . , γp } satisfy the condition (11).

W EIGHT F UNCTION Fn,n [k] OF R EGRESSION A NALYSIS O PTIMIZATION

Fn,n [k] = P [k] · (γ0 + γ1 k˜2 + · · · + γp k˜2p ),

K−1 X

Applying some simple transformations one can show that the objective function of the criterion (12) is representable in the form p X p X γr γj Wr,j , (16) D(γ0 , . . . , γp ) = r=0 j=0

where values Wr,j =

(8)

and symmetry with respect to the point (K − 1 + n)/2. As the window P [k] we can use the generating function for the n-th order Chebyshev’s polynomial n k (K − 1 + n − k)n , k = n, K − 1; (9) P [k] = 0, k 6= n, K − 1.

Z

π

Vr (ω)Vj (ω)S(ω) sin2n 0

ω dω. 2

(17)

Thus, the optimization problem becomes the problem of quadratic form (16) minimization subject to constrain (11). This minimization can be performed in any programming language. However the most convenient way is to reduce the problem we consider to solving a system of linear equations. Subject to the normalization condition (11) we have p

The weight function Fn,n [k] is also said to satisfy the normality condition K−1 X

Fn,n (k) =

k=n

1 , n!

(10)

which subject to (7) means that p X

γ r ρr =

r=0

1 , n!

(11)

PK−1 where values ρr = k=0 k˜2r P [k] are equal to moments of the window function P [k]. It is necessary to find coefficients {γ0 , . . . , γp } that minimize the factor Z π ω |Fn,n (ω)|2 S(ω) sin2n dω, D(γ0 , . . . , γp ) = (12) 2 0

γ0 =

K−1 X

Fn,n [k]e−iωk

(13)

k=0

is the weight function frequency characteristic. The nonnegative function S(ω) represents a penalty function that depends on the disturbance properties. The multiplier sin2n ω/2 in (12) takes into account the transformation of the disturbance

(18)

Substituting (11) for (14) we will get

1 D(γ1 , . . . , γp ) = 2 ρ0

Z

π 0

p X 1 [ V0 (ω) + γr (ρ0 Vr (ω)− n! r=1 ω ρr V0 (ω))]2 S(ω) sin2n dω. (19) 2

Equating to zero partial derivatives of expression (19) with respect to sought-for coefficients γ1 , . . . , γp results in a system of p linear equations p X

γr {ρ20 Wr,j − ρ0 ρj Wr,0 + ρr ρj W0,0 − ρr ρ0 Wj,0 } =

r=1

where Fn,n (ω) =

X 1 1 γr ρr ). ( − ρ0 n! r=1

where values Wr,j

1 {ρj W0,0 − ρ0 Wj,0 }, n! are specified by (17).

j = 1, p,

(20)

Since the quadratic form (16) is positive defined the set of coefficients γ1 , . . . , γp found by solving system (20) is the minimizer for (12). Let us illustrate the idea we propose by example. In this example we assume K = 20; n = 2; p = 2. The disturbance

F2,2@kD,

y[0]=b[0]x[0] y[1]=b[1]x[1] y[2]=b[2]x[2]

P HkL

❝ ❝ ❝ ❙ ❙ 1 + δ0 1 + δ 1 1 + δ2 ❙ ❙ ✇ ❝❄ ❙ ✲ ❝❄ ✲

2 Ρ0

0.05

1 + ε1

0.04 0.03

Fig. 3.

1 + ε2

y[K] = b[K−1]x[K−1]

❝ 1 + δK−1 S˜

✲ ❝❄

✲❝ 1 + εK−1

The graph model of the roundoff noises

0.02

IV.

0.01 5 Fig. 1.

10

15

20

k

A NALYSIS OF THE ROUNDOFF N OISES I NFLUENCE ON THE P RECISION OF R EGRESSION C OEFFICIENTS E STIMATION Representing estimators (2)–(6) in a generalized form we

get

Weight functions for the case ωL = 1 and ωH = 1.5

S=

ÈF2,2HΩLÈ, ÈP2HΩLÈ 1

(21)

where S = a î is the final result of our calculations; x[k] = z[k] or x[k] = ∆i z[k] are results of observations; b[k] = wi,n [k] or b[k] = Fi,n [k] are coefficients of weighted summation.

0.01

There are different ways to compute sums: native algorithm; algorithm with presorting; multiply-accumulate operation; Kahan summation algorithm (also known as compensated summation) [6], [7].

0.001 10-4

Fig. 2.

b[k] · x[k],

k=0

0.1

0.0

K−1 X

0.5

1.0

1.5

2.0

2.5

3.0

Ω

The magnitude response of regression analysis

spectral density is supposed to be uniformly distributed over the frequency band [ωL , ωH ], where ωL = 1 and ωH = 1.5. As the window function P [k] we used function (9). There are samples of the weight function F2,2 [k] shown in Fig. 1 by points. The function F2,2 [k] has been constructed in accordance with the optimization procedure presented in this section. In the same figure by the dashed line we showed the window function P [k] normalized to the value 2ρ0 . The magnitude response |F2,2 (ω)| of regression analysis computed in accordance with (13) is shown in Fig. 2 by the solid line. The dashed line represents the magnitude response in case when the window function P [k] was used as the weight function: K−1 1 X | P [k]e−iωk |. |P2 (ω)| = 2ρ0 k=0

In the considered example the benefit Z ωH Z ωH ω 4 ω 2 |P2 (ω)| sin β= |F2,2 (ω)|2 sin4 dω dω/ 2 2 ωL ωL that characterizes the quotient of variances of estimates for the coefficient a2 computed for the estimator with weight function (9) and with the optimized weight function F2,2 [k], came to β = 83.7.

In the present paper we investigated the first variant as the most appropriate for problems of data processing, especially in real time mode. At estimating the overall roundoff errors influence we used the following hypothesis for normalized roundoff errors, which belong to the half-segment (−1/2, 1/2] [7]. Hypothesis. All standardized roundoff errors of floating point calculations are random pairwise independent values with distributions, which do not depend on input data or temporary calculation results. Mathematical expectation for standardized roundoff errors equals zero and the variance does not exceed 1/12. There is a graph that takes into account roundoff noises at calculating statistics (21) in Fig. 3. This graph is a special kind of a linear system graph given in [8]. The following notation was used in the graph: δk was for the product b[k]·x[k] relative roundoff error; εk was for the k-th summation result relative error at the sum accumulation in expression (21). In the present work with the aid of the graph model of roundoff noises depicted in Fig. 3, we showed that sum (21) calculation variance component caused by roundoff errors, equals K−1 X ˜ D≈ U 2 [i] · E[ε2i ], (22) i=1

where εk is the relative error of rounding of k-th summation result while accumulating the sum in (21); U [i] =

i X

k=0

y[k],

y[k] = b[k] · x[k].

(23)

˜ of The influence of roundoff noises on the component D the coefficient estimate a ˆ2 variance for the second degree polynomial regression model has been estimated by simulation in CAS Mathematica 8.0. In Mathematica, calculations are performed with an arbitrary precision operated by value of variable MachinePrecision, by default equal | log10 2−53 | = 15.9546, that corresponds to the IEEE 754 double-precision [9]. We carried out our simulations with the mentioned above value of machine precision. There are three estimators to consider, given by the following expressions: K−1 X

w2,2 [k] · z[k],

(24)

∆ z[k] · F2,2 [k + 2],

(25)

(∆2 z[k] − 2˘ a2 ) · F2,2 [k + 2] + a ˘2 ,

(26)

a ˆ2 = a ˆ2 =

k=0 K−3 X 2 k=0

and a ˆ2 =

K−3 X k=0

where a ˘2 is an expected value for a2 . While performing our simulation we dealt with numbers in base 10 and results of temporary calculations in formulas (24)– (26), precisely mantissa, that represents the fractional part of a number, were roundedoff with the relative precision 10−8 . During the previously described roundoff process the IEEE 754 single-precision was approximately achieved. While observation sequence modeling in accordance with (1), we assumed the outer disturbance be equal to zero: ζ[k] = 0 and in order to provide roundoff noise independence in individual experiments, coefficient a2 was simulated as the uniformly distributed random variable over the interval [0.99, 1.01]. The value for a ˘2 in the estimator (26) was taken equal to 0.99. At the modeling, M = 1000 data realizations of the form (1) were involved, each realization included K = 100 observations.

5)

Function U [i], given by formula (23), is an important auxiliary function which describes roundoff noise. In order to ˜ we have minimize this noise, characterized by the variance D, PK−1 to seek to minimize i=1 U 2 [i]. Analysis of the function U 2 [i] behavior showed that for estimator (24) this function PK−1can vary nonmonotonously and, as the result, the sum i=1 U 2 [i] will possess huge values. In turn, estimators (25) and (26) possess monotonous PK−1varying functions U 2 [i] and so lesser values of the sum i=1 U 2 [i]. In addition, the function associated with the estimator (26) differs from the similar function for the estimator (25) just in scale (hasPconsiderably smaller one), this leads to a smaller K−1 value of i=1 U 2 [i]. The mantioned above facts prove the conclusion we did on that the estimator (26) is preferable to (24) and (25), while (25) is better then (24). V.

2) 3) 4)

˜ in the Values of the roundoff noise variance D case of estimator (24) depend on the true values of coefficients a0 and a1 , while there is no such a dependency for estimators (25) and (26). Estimator (26) provides a significant advantage in the ˜ values when compared to roundoff error variance D estimators (24) and (25). Estimator (25) is preferable to estimator (24) when the regression function has a broad dynamic range. Analytical expression (22) for the variance of roundoff noise in the case of estimator (24) gives quite precise results, while for estimators (25) and (26) this analytical expression provides underestimated values of the variance. This behavior can be explained by

C ONCLUSIONS

Linear estimators for parameters of polynomial regression can be represented by linear combinations of finite differences of observations. This allows us to create efficient methods to synthesize estimators for polynomial regression coefficients at presence of correlated disturbances. It was shown that estimating of polynomial regression coefficients in finite differences space allows us to decrease the influence of roundoff errors on precision of regression coefficients calculation. R EFERENCES [1] [2]

[3]

On the analysis of modeling results we concluded the following: 1)

cross correlation of roundoff errors, which appears at sums accumulation. ˜ is done by The main contribution to the variance D accumulation disturbances that appear at sums calculation. If K ≫ 1, then quantization and multiplication disturbances can be neglected (at the same roundoff precision).

[4]

[5] [6]

[7] [8]

[9]

G. A. F. Seber and A. J. Lee, Linear Regression Analysis, ser. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., 2003. W. Palma, Long-memory time series : theory and methods, ser. Wiley series in probability and statistics. Hoboken (N.J.): Wiley-Interscience, 2007. A. Omelchenko and A. Fedorov, “Weight functions of polynomial regression analysis,” Bulletin of V. Karazin Kharkiv National University. Series "Mathematical Modelling. Information Technology. Automated Control Systems", vol. 10, no. 833, pp. 193–205, 2008, (in Russian). I. I. Sharapudinov, Polynomials Orthogonal on Grids. Theory and Applications. Makhachkala: Dagestan State Teaches’ Training University, 1997, (in Russian). R. Haggarty, Discrete Mathematics for Computing, 2nd ed., ser. Pearson education. Addison-Wesley, 2010. N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 2002. V. V. Voevodin, Computational Foundations of Linear Algebra. Moscow: Science, 1977, (in Russian). L. Rabiner and B. Gold, Theory and Application of Digital Signal Processing, ser. Prentice-Hall signal processing series. Englewood Cliffs, NJ: Prentice-Hall, 1975. A. S. Tanenbaum, Structured Computer Organization, 5th ed., ser. Pearson Education. Upper Saddle River, NJ: Prentice Hall, 2006.