Multiple linear regression with correlated explanatory ...

1 downloads 0 Views 387KB Size Report
Keywords: Multiple linear regression, Cross-correlation, Errors-in-variables (EIV) model, Weighted total least squares (WTLS), Regression coefficients.
Multiple linear regression with correlated explanatory variables and responses B. Li1*, M. Wang1 and Y. Yang2 Different from the traditional linear regression model that captures only the errors of dependent variables (responses), this contribution presents a new multiple linear regression model where, besides the errors of responses, the errors of explanatory variables and their correlations with response errors are rigorously taken into account. The new regression model is typically a non-linear errors-in-variables (EIV) model, which is referred to as the error-affected and correlated linear regression (ECLR) in this paper. Considering the fact that only part of elements in design matrix A of the regression model are random, the authors express error matrix EA of A as a function of EX consists of all non-zero random errors. Then, the authors can easily formulate the stochastic model without the effect of non-random elements in A. An iterative solution is derived based on the Euler–Lagrange minimisation problem for ECLR. The authors further show that ECLR is very general and some of the existing linear regression methods, the ordinary least squares (OLS), the total least squares (TLS) and the weighted total least squares (WTLS), are the special cases. The experiments show that the ECLR method generally has a better performance than the OLS, TLS and WTLS methods in terms of the difference between the solution and the true values when the explanatory variables and responses are significantly correlated. Keywords: Multiple linear regression, Cross-correlation, Errors-in-variables (EIV) model, Weighted total least squares (WTLS), Regression coefficients

Introduction Two classes of models are frequently involved in geodetic data processing by their problematic nature. One is the parameter estimation with the exact physical model, for instance, the distance from satellite to receiver, and this kind of model is usually regarded as a linear or non-linear parameter estimation problem; while the other is to find the relationship (model) with the observed explanatory variables and their responses, which is well known as a regression model (Xu, 2013b). Of the regressions, the simplest and widely used model is the linear regression. Given n sets of dependent variables xi (i ¼ 1, ..., n) and their corresponding response y, the linear regression model is conceptually formulated as EðyjxÞ ¼ b0 þ b1 x1 þ · · · þ bn xn

ð1Þ

where bi (i ¼ 0, ..., n) are the regression parameters need to be determined. Since one of the purposes of regression is to predict the response when the explanatory variables

are sampled, it is, therefore, very important to precisely estimate the regression parameters so as to establish the reasonable linear regression model. Before continuing, the authors define some notations of m as follows. y ¼ ½y1 ; · · ·; ym T denotes the responses  T samples with yi being its ith sample. X ¼ xT1 ; · · ·; xTm consists of m samples of n sets of dependent variables with the jth sample of variables as x j ¼ ½xj;1 ; · · ·; xj;n , h iT and 1 x i ¼ 1 y ¼ ½1y1 ; · · ·; 1ym T , E X ¼ 1Tx1 ; · · ·; 1Txm ½1xi;1 ; · · ·; 1xi;n T are the random noises of y, X and x, respectively. The regression parameter vector b ¼ ½b1 ; · · ·; bn T . Im denotes the m-identity matrix and em the m-column vector with all elements of 1s. E(?) and D(?) denote the expectation and dispersion operators of a random variable, respectively. vec(?) is the mathematical operator to convert a matrix to a column vector by stacking one column of this matrix underneath the previous one. ‘fl’ is Kronecker product with the properties: (AB)fl(CD) ¼ (AflC)(BflD) and vec(ABC) ¼ (C TflA)vec(B). For more properties about these operators, one refers to Koch (1999). Collecting m times of observations, the linear regression model follows

1

College of Surveying and Geo-Informatics, Tongji University, Shanghai 200092, China State Key Laboratory of Geo-information Engineering, Xi’an Research of Surveying and Mapping, Xi’an 710054, China

2

*Corresponding author, email [email protected]

Ñ 2015 Survey Review Ltd. Received 11 April 2013; accepted 1 August 2013 DOI 10.1179/1752270615Y.0000000006

EðyjXÞ ¼ ½e m ; Xb

ð2Þ

A major task of regression analysis is to produce the estimates of unknown model parameters based on the

Survey Review

2015

VOL

00

NO

0

1

Multiple linear regression

information contained in the given datasets and some assumptions about the stochastic characteristics of the data (Sykes, 1993). In this case, the target is to estimate the n þ 1 linear regression parameters b. For a long time, the linear regression model was established only taking into account the errors 1y of responses and completely ignoring the errors of explanatory variables. It follows the linear regression model as y 2 1 y ¼ ½e m ; Xb

ð3Þ

Common with assumption that random noises 1y are normally distributed (Greybill and Lyer, 1994), the regression model (3) is a Gauss–Markov model (Christensen, 2011) and then the ordinary least squares (OLS) method can be applied to compute the best linear unbiased estimates of b (Teunissen, 2000). However, in most real situations, since not only the responses y but also the explanatory variables X are from the real observations, they are all inevitably erroraffected. As a result, the model (3) is not suitable anymore and the regression model is revised as y 2 1 y ¼ ½e m ; X 2 E X b

ð4Þ

to properly assimilate the errors of X and then compute the reasonable regression parameters. This is a typical errors-in-variables (EIV) model. In recent years, the so-called total least squares (TLS) and weighted TLS (WTLS) methods that generalise the OLS method were intensively studied to solve this EIV model (Schaffrin and Wieser, 2008; Neitzel, 2010; Shen et al., 2011; Li et al., 2012, 2013; Xu et al., 2012; Shi et al., 2015). Unfortunately, in these studies, the correlations between EX and 1y are still ignored. It would make sense in the coordinate transformation where X and y are usually measured with different coordinate data at different times probably with different instruments. In the regression, it is not the case above anymore. The multiple sets of explanatory variables are usually correlated with their responses to a certain extent, which is the basis of regression. If this correlation information is ignored, one cannot compute the optimal regression parameters. Although nothing has been done to process this specific regression model, some algorithms have been developed in the study by Fang (2011, 2013) and Snow (2012) to process the general EIV model with correlations between the observations y and design matrix A 5 [em, X]. In other words, the stochastic correlation QAy is used to capture the correlations between A and y. However, in our case, actually in most of EIV models, only part of elements X in A are random and the others em are constant. As a consequence, it is somewhat troublesome to derive the expression of QAy based on the random variables X and y, and many rows and columns are zero probably affecting the computation efficiency. In this paper, the authors capture this correlation directly by using the correlation matrix Qxy instead of QAy (Li et al., 2012, 2013), which is essentially equivalent to the formulation of partial EIV model in the study by Xu et al. (2012), also see the study by Shi et al. (2015). Nevertheless, the correlation between X and y is not taken into account by Xu et al. (2012). Therefore, this paper can be understood as an extension in the frame of the partial EIV model proposed by Xu et al. (2012).

2

Survey Review

2015

VOL

00

NO

0

To intuitively get some insight on how the real observations are error-affected and how they are approximately processed in the existing methods, the authors sample 11 data points on a given straight line y ¼ 2x þ 5. The error-affected data are then generated by giving the variance of explanatory variable x and response y as s2x ¼ 1, s2y ¼ 9 and their correlation as r ¼ 20.75. As visualised in Fig. 1, the grey points are the simulated error-affected points (play the observation roles in regression) with 2-s error confidence ellipses in solid grey. In principle, the authors should exactly consider this error property in order to compute the precise regression parameters. However, in the WTLS method, the correlation between x and y is ignored, namely the assumption r¼0 is taken. In this case, the errors are deemed with error property shown in dashed red ellipses that are parallel to the two coordinate axes. In the OLS linear regression, besides the correlations ignored in WTLS, the errors of x are further ignored. Then, the errors of observed points are deemed distributed in the error intervals (dotted blue line) along y-axis. It is obviously not reasonable to compute the regression parameters. One of the most important goals of regression is to get a reasonable and accurate estimate of regression parameters. In this paper, the authors investigate a novel linear regression model that not only takes random errors of explanatory variables x and responses y into account but also considers the correlations between them. The method developed to solve this model is referred to as error-affected and correlated linear regression, abbreviated as ‘ECLR.’ The paper is organised as follows. In Error-effected and correlated linear regression model section, the new multiple linear regression model will be formulated taking 50 40 30 Response y

Li et al.

20 10 0 –10 –20 –30

−15

−10

−5 0 5 Explanatory variable x

10

15

1 Real observation points sampled based on straight line y 5 2x 1 5 with stochastic characteristics, s2x 5 1, s2y 5 9 and r 5 20.75, and three error processing strategies. The solid grey ellipses stand for the exact error ellipses of observed points, the dashed red ellipses for error ellipses with ignored correlations between explanatory and response, while the dotted blue lines for error intervals with only error-affected responses. These three error processing strategies are with respect to error-affected and correlated linear regression (ECLR), (W)TLS and OLS methods, respectively. The size of ellipses is based on 2-s confidence

Li et al.

Multiple linear regression

into account the errors of both explanatory variables and their responses and their stochastic correlations, followed by its iterative solutions. In Existing methods reduced from ECLR method section, the OLS, TLS and WTLS methods will be derived from the ECLR method as the special cases. In Experimental analyses section, the simulation experiments are carried out to demonstrate the performance of ECLR method comparing with the OLS, TLS and WTLS methods. Finally, some concluding remarks are summarised in Concluding remarks section.

the functional model and stochastic model together yields the new multiple linear regression model as 2 3 2 3 1y 1y 6 7 6 7 y 2 1 y ¼ ðA 2 E X HÞb; 4 5 ¼ 4 5 vecðE X Þ 1x

Error-effected and correlated linear regression model

where Qxx is a positive-definite symmetric cofactor matrix since x consists of all random variables. The following two equations will be frequently applied in further derivations

Model formulation As explained in the Introduction section, some methods have been proposed by Snow (2012) and Fang (2011, 2013) to process the general EIV model correlating the observations y and design matrix A. The stochastic correlation matrix they used is QAy, which includes many zero rows and columns for the constant term em in design matrix A. In this paper, the authors will directly make use of the correlation matrix Qxy instead of QAy following the idea by Xu et al. (2012) and Li et al. (2012, 2013). To properly capture the stochastic characteristics of both explanatory variables and responses, their errors must be incorporated in the linear regression model. It follows the EIV model (4) and is alternatively formulated as y 2 1 y ¼ ðA 2 E A Þb

ð5Þ

where A ¼ [em, X] and EA ¼ [0, EX]. To specify the general random characteristics of X and y, the authors generalise the stochastic model as (Li et al., 2013; Snow, 2012; Fang, 2013) 3 2 3 2 1y 1y 5 4 5¼4 vecðE A Þ 1A 02 3 2 31 ð6Þ Q yy Q yA 0 5A , N @4 5; s20 4 Q Ay Q AA 0 where s20 is the variance scalar of unit weight. Qyy is a positive-definite symmetric cofactor matrix with dimension m, while QAA is a non-negative-definite symmetric cofactor matrix with dimension m|(n þ 1) because of zero rows and columns existing. Q yA ¼ QTAy are the matrices to capture the correlations between X and y. It is emphasised that, unfortunately in the existing literature, they ignore either the correlations between X and y (i.e. Q yA ¼ QTAy ¼ 0) or the errors 1y completely (i.e. QAA ¼ 0 as well). These two schemes of error processing are with respect to the (W)TLS-based or OLS-based regression, respectively. Apparently, they are not reasonable and, if the correlations indeed exist, will lead to the inaccurate estimate of regression parameters. Since some columns in A are constants, it will result in some zero rows and columns in QAA. In terms of Li et al. (2013), the authors reformulate the error matrix EA ¼ EXH with H ¼ [0n|1, In]. Consequently, all elements in the error matrix EX are non-zero. Such a formulation is essentially equivalent to the formulation in the partial EIV model by Xu et al. (2012). Collecting

02 3 2 Q yy 0 B6 7 6 2 6 , NB @4 5; s0 4 Q xy 0

Q yx Q xx

ð7Þ

31 7C 7C 5A

1 A ¼ vecðE A Þ ¼ vecðE X HÞ ¼ ðH T ^I m Þ1 x

ð8Þ

E A b ¼ ðb T ^I m Þ1 A ¼ ðb T H T ^I m Þ1 x

ð9Þ

ECLR method to solve the new multiple linear regression model Applying the generalised least squares criterion to solve model (7), the associated objective function reads 321 2 3 2 3T 2 Q Q yx 1y 1y yy 7 6 7 6 7 6 6 7 W¼4 5 4 5 4 5 Q xy Q xx 1x 1x 21 T ¼ 1Ty Q21 yy 1 y þ 1xjy Qxjy 1 xjy

ð10Þ

and Q xjy ¼ Q xx 2 with 1 xjy ¼ 1 x 2 Q xy Q21 yy 1 y 21 Q xy Qyy Q yx . Following the Euler–Lagrange minimisation problem, the authors form the target function Y ¼ W þ 2l T ðy 2 1 y 2 Ab þ ðb T H T ^I m Þ1 x Þ

ð11Þ

where l is an m-vector of Lagrange multipliers. The solution of this target function is derived through the Euler–Lagrange necessary conditions as follows  1 LY  21 ^T ~ Txjy Q21 ¼ 1~ Ty Q21 yy 2 1 xjy Q xy Qyy 2 l 2 L1 y b;^ 1~ y ;1~ x ;l^ ¼0

ð12aÞ

 1 LY  ^T ^T T ¼ 1~ Txjy Q21 xjy þ l ðb H ^I m Þ ¼ 0 2 L1 x b;^ 1~ y ;1~ x ;l^

ð12bÞ

 1 LY ¼ 2l^ T ðA 2 E~ X HÞ ¼ 0 2 Lb b;^ 1~ y ;1~ x ;l^

ð12cÞ

 1LY ^ b^ T H T ^I m Þ~e x ÞT ¼0 ¼ðy2 1~ y 2Abþð 2 Ll b;^ 1~ y ;1~ x ;l^

ð12dÞ

The terms with a ‘hat’ denote the estimates of deterministic unknowns, while terms with ‘tilde’ the prediction of stochastic unknowns. The Hessian matrix of second-order derivative of the objective function (10) with respect to 1y and 1x is

Survey Review

2015

VOL

00

NO

0

3

Li et al.

Multiple linear regression

 21 ~ TQ ~ 21 A ~ Q b^ b^ ¼ s20 A ll

1 L2 Y 2 3 ¼ 2 1y h i T T 4 5 1y 1x 1x 2 21 21 21 Qyy þ Q21 yy Q yx Qxjy Q xy Qyy 6 4 21 2Q21 xjy Q xy Qyy

21 2Q21 yy Q yx Qxjy

Q21 xjy

3 7 5

ð22Þ

If the variance scalar is unknown, it can be estimated as s^ 20 ¼

~ y þ 1~ Txjy Q21 ~ xjy 1~ Ty Q21 yy 1 xjy 1 m2n21

¼

^ l^ T ðy 2 AbÞ m2n21

ð23Þ

ð13Þ Since its determinant is the product of the determinant 21 of matrices Q21 xjy and Qyy , both are positive, the Hessian matrix is positive-definite. Therefore, the solutions of (12a)–(12d) will make the target function minimum. Fang (2014) pointed out that if the rigorous minima is required, the Hessian matrix with respect to the parameter vector and Lagrange multipliers vector must also be positive-definite. However, in principle, the condition that the Hessian matrix is positive-definite can at most guarantee to achieve the local optimal solution rather than the global optimal solution for a non-linear model (Xu, 2002). The authors are now in a position to solve the Euler–Lagrange necessary conditions. Multiplying (12b) with Q xy Q21 yy and adding to (12a), they readily obtain ^ m Þl^ 1~ y ¼ ½Q yy 2 Q yx ðHb^I

ð14Þ

Substituting (14) into (12b) and conducting some derivations yields ^ m Þl^ 1~ x ¼ ½Q xy 2 Q xx ðHb^I

ð15Þ

Inserting both (14) and (15) into (12d) leads to ^ ~ 21 ðy 2 AbÞ l^ ¼ Q ll

ð16Þ

The second step: iteration from the ith to (i þ 1)th step   ~ AA;ðiÞ ¼ b^ T H T ^I m Q xx ðHb^ ðiÞ ^I m Þ Q ðiÞ   ~ AA;ðiÞ 2 b^ T H T ^I m Q xy ~ ll;ðiÞ ¼ Q Q ðiÞ   2 Q yx Hb^ ðiÞ ^I m þ Q yy ~ 21 ðy 2 Ab^ ðiÞ Þ 1~ x;ðiÞ ¼ ðQ xy 2 Q xx ðHb^ ðiÞ ^I m ÞÞQ ll;ðiÞ  ðiÞ ¼ A 2 E~ A;ðiÞ 1~ A;ðiÞ ¼ ðH T ^I m Þ1~ x;ðiÞ ) E~ A;ðiÞ ; A  T 21 21 T 21 ^ ~ Q ~ ~ ~ Q ~ ~ b^ ðiþ1Þ ¼ A A ðiÞ ll;ðiÞ A ðiÞ ðiÞ ll;ðiÞ ðy 2 E A;ðiÞ b ðiÞ Þ  T 21 21 ~ Q ~ ~ Q b^ b;ðiþ1Þ ¼ A ^ ðiÞ ll;ðiÞ A ðiÞ ðb^ ðjÞ 2 b^ ðj21Þ Þ , d ðb^ ðjÞ 2 b^ ðj21Þ ÞT Q21 ^^ bb;ðjÞ

ð17Þ

which can be understood as the cofactor matrix of lumped errors 1 l ¼ 1 y 2 ðb^ T H T ^I m Þ1 x

Based on the above derivations, the iteration procedure is designed, where the subscripts with brackets denote the iteration number. The OLS solution is used to initialise the iteration computation. The first step: initialisation with OLS solutions  21 A T Q21 b^ ð1Þ ¼ A T Q21 yy A yy y

The third step: iteration stops after j steps if

where a very important medium matrix is defined ^ mÞ ~ ll ¼ ðb^ T H T ^I m ÞQ xx ðHb^I Q T T ^ m Þ þ Q yy ^ 2 ðb H ^I m ÞQ xy 2 Q yx ðHb^I

Iterative procedure

ð18Þ

with a small value d. After iteration, the variance scalar of unit weight is computed as s^ 20

T l^ ðjÞ ðy 2 Ab^ ðjÞ Þ ~ 21 ðy 2 Ab^ ð jÞ Þ with l^ ð jÞ ¼ Q ¼ ll;ð jÞ m2n21

Finally, inserting (16) into (12c) obtains 21

21

~ Ab^ ¼ ðA 2 E~ A ÞT Q ~ y ðA 2 E~ A ÞT Q ll ll

ð19Þ

Obviously, (19) is a non-linear system, because E~ A and ~ 21 are also the function of b. Thus, it has to been Q ll iteratively solved. The authors design an iterative algorithm to solve the non-linear system (19). ~ 21 E~ A b^ from ~ ¼ A 2 E~ A and subtract ðA 2 E~ A ÞT Q Let A ll the two sides of (19) yields ^ ~ TQ ~ 21 A ~ b^ ¼ A ~ TQ ~ 21 ðy 2 E ~ A bÞ A ll ll

ð20Þ

Then, the estimates of regression parameters b are solved as  21 ^ ~ TQ ~ 21 A ~ ~ TQ ~ 21 ðy 2 E~ A bÞ b^ ¼ A ð21Þ A ll ll ~ in (21) and ignoring the ~A ¼A2A Replacing E ~ the authors work out the randomness of b^ and A, variance matrix of b^ is

4

Survey Review

2015

VOL

00

NO

0

Existing methods reduced from ECLR method The proposed ECLR method is very general. The authors will show in this section how some existing methods, OLS, TLS and WTLS methods, can be reduced from it, i.e. some existing methods are special cases of ECLR method with different treatments of the stochastic model. ~ T ¼ 0 in (7), the ECLR method will ~ xy ¼ Q When Q yx reduce to the WTLS method (Schaffrin and Wieser, 2008; Neitzel, 2010; Shen, et al., 2011; Xu et al., 2012; Li ~ ll in (17) becomes et al., 2013). The full rank matrix Q ^ m Þ þ Q yy ~ ll ¼ ðb^ T H T ^I m ÞQ xx ðHb^I Q

ð24Þ

^ m Þl^. For more e~ y ¼ Q yy l^ and e x ¼ 2Q xx ðHb^I information about this WTLS method, one can refer to Li et al. (2013)

Li et al.

^ mÞ þ Im ~ ll ¼ ðb^ T H T ^I m ÞðHb^I Q T T ^ ^ ¼ ðb H Hb þ 1ÞI m

True : y = 2x + 5

40 30 20

OLS : y = 1.9273x + 4.3732 TLS : y = 2.0939x + 4.3578 WTLS : y = 1.9894x + 4.3674 ECLR : y = 2.0058x + 4.3659

10 0

Raw data corrected [LS] Fitted line [LS] corrected [TLS] Fitted line [TLS] corrected [WTLS] Fitted line [WTLS] corrected [ECLR] Fitted line [ECLR] True line

–10

ð25Þ

and e~ y , e~ x can be further reduced to e~ y ¼ l^ and ^ m Þl. ^ e~ x ¼ 2ðHb^I ~ T ¼ 0 and Q ~ xx ¼ 0 holds ~ xy ¼ Q If the assumption Q yx ~ true, then Q ll ¼ Q yy . In this case, it follows that e~ x ¼ 0 ^ The solution of ~ A ¼ 0) and e~ y ¼ y 2 Ab. (i.e. E regression parameters reads 8  21 > ^ ¼ A T Q21 A A T Q21 b > yy yy y <  21 > > : Q b^ b^ ¼ A T Q21 yy A

50

Response y

~ xy ¼ Q ~ T ¼ 0 in (7), when the matrices Q ~ xx Besides Q yx ~ and Q yy are further assumed to be unit matrices (or more generally, the scaled unit matrices though it can then be somehow referred to as WTLS). The ECLR method is reduced to the standard TLS. The matrix in (17) becomes

Multiple linear regression

ð26Þ

It is the classic OLS solution with perfect property of best linear unbiased estimate (Teunissen, 2000). In summary, the authors have put forward a very general multiple regression model that captures the errors of both explanatory variables and responses and the correlations between them. The ECLR method developed to solve this new regression model is of course general as well. Therefore, the existing methods, OLS, TLS and WTLS, are the special cases of the ECLR method, and the existing methods are usually employed to solve the reduced models by ignoring either the errors of explanatory variables or the correlations between explanatory variables and responses from the general linear regression model. To intuitively understand how the ECLR method works, based on the example presented in Fig. 1, the authors illustrate the data points corrected by the errors computed in terms of OLS, TLS, WTLS and ECLR methods, respectively, and their regressed lines in Fig. 2. Obviously, for OLS method, the points are corrected along y-axis because only response errors are considered. The corrected points are all exactly on its regressed line. While for the other methods, the points are corrected along approximately the direction perpendicular to the major axis of real error ellipses (see e.g. the grey ellipses in Fig. 1) although there are very small direction differences between these methods. The magnitudes of corrected errors are significantly different from the methods, and basically, they are ordered from the largest to the smallest by TLS, ECLR and WTLS in this example. In this example, compared to the ECLR method, the TLS method probably overcorrects but the WTLS under-corrects the errors. Moreover, different from the OLS method, the corrected points are not on their regressed lines and still have discrepancies. Further analysis indicates, but the complicated formula derivations are not shown here, that the TLS, WTLS and ECLR methods are similar to the OLS method with the corresponding corrected points instead of the raw points.

–20 –30 –40

–15

–10

–5

0

5

10

15

Explanatory variable x

2 Data points after corrected errors and the regressed lines in terms of ordinary least squares (OLS), total least squares (TLS), weighted total least squares (WTLS) and error-affected and correlated linear regression (ECLR) methods, respectively, corresponding to the example in Fig. 1.The grey points stand for original/observed points and the grey solid line the true line y 5 2x 1 5.The black squares and the black dashed and dotted line are the points with corrected errors and the regressed line based on OLS method, while the green diamonds and the green dashed line, the red triangles and red dotted line, and the blue pentagrams and the blue solid line are similar results based on the TLS, WTLS and ECLR methods, respectively

Experimental analyses The authors demonstrate the performance of the ECLR method by using a straight-line fitting compared with LS, TLS and WTLS methods. To extensively compare these methods, the authors set up the various stochastic models specified by changing the precisions, sx and sy of explanatory variables and responses, and the correlation coefficient r between x and y.

Simulation set-up Given the straight-line equation y ¼ 6 þ 3x, i.e. b0 ¼ 6 and b1 ¼ 3, the simulations are implemented as follows. It follows for an integer i that 0" 31 # 2 " # s2x rsx sy i x 5A ; 4 , N@ rsx sy s2y b0 þ ib1 y The authors can then change sx, sy and 21,r,1 to specify the precision of x and y and their correlation. Given an integer interval [25, ..., 5] for i, they simulate N points for each i. Totally, they obtain m¼116N observation points. In this case, the variables of (7) are as follows. y ¼ ½y1 ; · · ·; ym T , A ¼ ½e m ; x with x ¼ ½x1 ; · · ·xm T ; Qyy ¼ s2yIm, Qxx ¼ s2xIm, Qxy ¼ QTyx ¼ rsxsyIm and s20 ¼ 1.

Results and analysis By changing the values of sx, sy and r, the authors can specify the different linear regression model. For each given model, the authors simulate data according to the above procedure and then compare the solutions of b0, b1 and s20 from different methods. The four sets of experiments are conducted. They are specifies by sx ¼ sy ¼ 1, sx ¼ 1 and sy ¼ 4, sx ¼ 0 and sy ¼ 1, and sx ¼ 0 and sy ¼ 4, respectively. The results are shown in Figs.(3–6).

Survey Review

2015

VOL

00

NO

0

5

Multiple linear regression

4

4

3.8

3.5

6.2 6.1

3.6

3

3.4 β1

β0

6

2.5 σ0

Li et al.

3.2

5.9

2

3

1.5

2.8

5.8

1

2.6 5.7

–0.9 –0.6 –0.3 0 ρ

2.4

0.3 0.6 0.9

–0.9 –0.6 –0.3 0 ρ

0.5

0.3 0.6 0.9

–0.9 –0.6 –0.3 0 ρ

0.3 0.6 0.9

3 The regression results with sx 5 1, sy 5 1. Blue with squares – error-affected and correlated linear regression (ECLR), red with stars – weighted total least squares (WTLS), green with triangles – total least squares (TLS) and black with cross – ordinary least squares (OLS)

6.1

β1

β0

6

5.9

5.8

5.7 –0.9 –0.6 –0.3 0 ρ

4

4

3.8

3.5

3.6

3

3.4

2.5

3.2

2

3

1.5

2.8

1

2.6

0.5

2.4

0.3 0.6 0.9

σ0

6.2

–0.9 –0.6 –0.3 0 ρ

0

0.3 0.6 0.9

–0.9–0.6 –0.3 0 0.3 0.6 0.9 ρ

4 The regression results with sx 5 1, sy 5 4. Blue with squares – error-affected and correlated linear regression (ECLR), red with stars – weighted total least squares (WTLS), green with triangles – total least squares (TLS) and black with cross – ordinary least squares (OLS)

1.4

3.6

6.2

1.2

6.1

3.4 1 σ0

β1

β0

6 3.2

0.8

5.9 0.6 3

5.8 5.7

0.4

–0.9 –0.6 –0.3 0 ρ

0.3 0.6 0.9

2.8

–0.9 –0.6 –0.3 0 ρ

0.3 0.6 0.9

0.2

–0.9 –0.6 –0.3 0 ρ

0.3 0.6 0.9

5 The regression results with sx 5 0, sy 5 1. Blue with squares – error-affected and correlated linear regression (ECLR), red with stars – weighted total least squares (WTLS), green with triangles – total least squares (TLS) and black with cross – ordinary least squares (OLS)

For all sets of experiments, the authors allow the correlation coefficients r varying from 20.9 to 0.9 with an interval of 0.1. Here, r¼ + 1 is not implemented since in those cases the variance matrix will be singular.

6

Survey Review

2015

VOL

00

NO

0

Overall, the intercept (b0) is much more insensitive than the slope (b1) to the error processing with different methods. For b1, the WTLS can generally produce much closer results to those of ECLR than the TLS and OLS

Li et al.

6.2

Multiple linear regression

1.4

3.6

1.2

6.1

3.4 1 σ0

β1

β0

6 3.2

0.8

5.9 0.6 3

5.8 5.7

0.4

–0.9 –0.6 –0.3 0 ρ

0.3 0.6 0.9

2.8

–0.9 –0.6 –0.3 0 ρ

0.3 0.6 0.9

0.2

–0.9 –0.6 –0.3 0 ρ

0.3 0.6 0.9

6 The regression results with sx 5 0, sy 5 4. Blue with squares – error-affected and correlated linear regression (ECLR), red with stars – weighted total least squares (WTLS), green with triangles – total least squares (TLS) and black with cross – ordinary least squares (OLS)

methods. The estimation of variance scalar of unit weight is very sensitive to different methods, but it can always be recovered very precisely (in this case, the true value is 1). One can refer to Xu (2013a) for more information about the estimation of variance of unit weight. This is also a very important benefit of ECLR because in most real applications one does not know this quantity and it needs to be estimated. Considering the very small difference in b0, let us focus on the comparison of b1 and s0 in the following. In the first set of experiments, since the same precision is taken for both x and y, i.e. sx ¼ sy ¼ 1, the WTLS is equivalent to the TLS. They both ignore the correlations between x and y; thus, their performance should be worse than ECLR. But they take into account the errors of explanatory variables x to some extent, their performance should be better than OLS. Therefore, in these cases, the expected performance ordered by the method is ECLR $ WTLS ¼ TLS . OLS

ð27Þ

Here, the first equality holds true when r ¼ 0. This theoretical performance is indeed observed in Fig. 3. Both b1 and s0 from OLS are significantly worse than those from the other methods. The ECLR method has better performance than the other methods although they coincide with each other for r ¼ 0. The second set of experiments has a general setting of stochastic model, sx?sy. It has, therefore, the performance order as ECLR $ WTLS . TLS . OLS

ð28Þ

Again, the first equality holds true when r ¼ 0. But now, the WTLS should be better than TLS because of consideration of different weights in WTLS. The OLS method is still worst because of ignoring the errors of explanatory variables. Let us see its corresponding results in Fig. 4. The TLS has very much worse performance than WTLS although it is slightly better than OLS, which means that the weights play important roles in the regression and should be adequately considered. The WTLS has the comparable performance in estimating b1 but the significant degradation in estimating s0. The authors are now analyzing the remaining two special sets of experiments where the errors of explanatory variables are indeed non-existent, i.e.

sx ¼ 0. In such cases, the authors have the performance ordered by the method as ECLR ¼ WTLS ¼ OLS . TLS

ð29Þ

The ECLR is equivalent to WTLS because the correlation is non-available anymore in the case of 1x ¼ 0, while the WTLS is equivalent to OLS based on their definitions. But now, TLS is the worst in principle, since in TLS, the explanatory variables x are blindly deemed as erroraffected variables (i.e. sx?0). However, this error-free property can be captured by setting proper weights in the WTLS method. The results in Figs.(5 and 6) obviously assure our theoretical analysis, especially for s0. Let us have a simple comparison between Figs.(5 and 6). The estimates of b1 are noisier in Fig. 6 than in Fig. 5 because of the larger sy assigned for the simulated data in Fig. 6. In other words, the simulated data are noisier in Fig. 6. Finally, to statistically compare the regression performance of different methods with sx ¼ 1, sy ¼ 4 and r ¼ 20.9, 20.5, 0, 0.5 and 0.9, respectively, the authors conduct the simulation computations 500 times for each given r. The means of the resulting b0, b1 and s20 over 500 times are then computed and shown in Table 1. Table 1 The estimated regression parameters and variance scalars of unit weight for OLS, TLS, WTLS and ECLR methods

r ¼ 20.9 r ¼ 20.5 r¼0 r ¼ 0.5 r ¼ 0.9

b^0 b^1 s^ 20 b^0 b^1 s^ 20 b^0 b^1 s^ 20 b^0 b^1 s^ 20 b^0 b^1 s^ 20

OLS

TLS

WTLS

ECLR

5.9997 2.4029 2.6563 6.0123 2.5434 2.1741 5.9849 2.7281 1.5142 6.0056 2.9076 0.8083 5.9983 3.0556 0.2113

5.9987 3.8530 4.1449 6.0135 3.6643 3.3722 5.9836 3.4570 2.3228 6.0064 3.2723 1.2300 5.9981 3.1465 0.3185

5.9994 2.8891 1.8524 6.0127 2.9385 1.4819 5.9844 3.0010 1.0016 6.0059 3.0513 0.5199 5.9982 3.0928 0.1328

5.9993 3.0001 0.9964 6.0127 2.9979 1.0023 5.9844 3.0010 1.0016 6.0058 2.9982 1.0020 5.9984 3.0009 1.0040

OLS: ordinary least squares; TLS: total least squares; WTLS: weighted total least squares; ECLR: error-affected and correlated linear regression.

Survey Review

2015

VOL

00

NO

0

7

Li et al.

Multiple linear regression

The differences among methods are very marginal for the intercept b0, but very remarkable for the slope b1. The proposed ECLR method has indeed a better performance than the other existing methods in estimating the regression coefficients and the variance scalar of unit weight. Generally, the performance of the WTLS method is much closer to that of the ECLR than the OLS and TLS methods. The results of WTLS are exactly the same as those of ECLR when r ¼ 0. It reassures the theory that the WTLS is equivalent to ECLR in the case of r ¼ 0. More promisingly, the variance estimates of unit weight from the ECLR method are much better (closest to its true value 1) than all of the other methods, no matter what value of r is taken. The results of other methods are either too optimal or too conservative, such that it is not applicable even in real situations. Before closing this section, it is worth pointing out that for a general EIV model, the authors cannot always expect to obtain the practically better solution with the WTLS method than with the OLS, depending on the ratio of signal to noise of the involved EIV model. However, the WTLS can in general achieve the better variance estimate of unit weight than OLS (Xu, Liu, Zeng and Shen, 2014).

Concluding remarks The linear regression is frequently applied to find the linear relationship (model) between the observed explanatory variables and their responses, and the key is to get reasonable and accurate regression parameters. However, the linear regression model was traditionally established ignoring either the errors of explanatory variables completely or the correlations between x and y. In this paper, the authors have presented a novel multiple linear regression model that not only takes random errors of explanatory variables x and responses y into account but also considers the correlations between them. For this formulated non-linear regression model, the authors developed a new method, called ECLR, to solve this model. Based on the theoretical and experimental analysis, the following conclusions are summarised: The presented multiple linear regression model is very general, from which some of the existing regression models, by ignoring the correlation between x and y, and the errors of explanatory variables completely can be reduced. Therefore, the proposed ECLR method is general and the methods in current use, OLS, TLS and WTLS with respect to the reduced regression models, are all the special cases with different treatments of the stochastic model. The ECLR method generally shows better performance than the OLS, TLS and WTLS methods especially in the case of strong correlations between the errors of explanatory variables and responses. When the errors of explanatory variables are independent from that of the responses, i.e. r ¼ 0, the presented method ECLR is equivalent to the WTLS method. If the same precision further holds true for x and y, they are the equivalent to TLS. If the explanatory variables are indeed error-free, i.e. sx ¼ 0, the methods ECLR, WTLS, and OLS are equivalent, and the performance of them are better than that of the TLS method since the TLS blindly deems x as error-affected variables. Besides the benefits in obtaining more precise regression parameters, even more promisingly, the

8

Survey Review

2015

VOL

00

NO

0

method ECLR can obtain very accurate variance scalar estimation s^ 20 of unit weight. This is a very important ^ 20 is unknown result, since in most of real applications, s and needs to be estimated.

Acknowledgements This work is supported by the National Natural Science Funds of China (41374031), the State Key Laboratory of Geo-information Engineering (SKLGIE2013-M-2-2), the Key Laboratory of Geo-informatics of State Bureau of Surveying and Mapping (201306), and the China Special Fund for Surveying, Mapping and Geo-information Research in the Public Interest (HY14122136).

References Christensen, R. 2011. General Gauss-Markov models. Plane answers to complex questions. Springer Texts in Statistics. New York: Springer, pp. 237–266. Fang, X. 2011. Weighted total least squares solution for applications in geodesy. PhD Dissertation, No. 294, Leibniz University Hanover, Hanover. Fang, X. 2013. Weighted total least squares: necessary and sufficient conditions, fixed and random parameters. Journal of Geodesy, 87(8), pp. 733–749. Fang, X. 2014. On non-combinatorial weighted total least squares with inequality constraints. Journal of Geodesy, 88(8), pp. 805–816. Greybill, F., Lyer, H. 1994. Regression analysis: concepts and applications, 1st ed., Belmont, CA: Duxbury Press. Koch, K. 1999. Parameter estimation and hypothesis testing in linear models. 2nd ed., Springer, Berlin. Li, B., Shen, Y. and Li, W. 2012. The seamless model for threedimensional datum transformation. Science China: Earth Sciences, 55(12), pp. 2099–2108. Li, B., Shen, Y., Zhang, X., Li, C. and Lou, L. 2013. Seamless multivariate affine error-in-variables transformation and its application to map rectification. International Journal of Geographical Information Science, 27(8), pp. 1572–1592. Neitzel, F. 2010. Generalization of total least-squares on example of unweighted and weighted 2d similarity transformation. Journal of Geodesy, 84(12), pp. 751–762. Schaffrin, B. and Wieser, A. 2008. On weighted total least squares adjustment for linear regression. Journal of Geodesy, 82(7), pp. 415–421. Shen, Y., Li, B. and Chen, Y. 2011. An iterative solution of weighted total least-squares adjustment. Journal of Geodesy, 85(4), pp. 229–238. Shi, Y., Xu, P., Liu, J. and Shi, C. 2015. Alternative formulae for parameter estimation in partial errors-in-variables models. Journal of Geodesy, 89(1), pp. 13–16. Snow, K. 2012. Topics in total least-squares adjustment within the errorsin-variables model: singular cofactor matrices and priori information. PhD Dissertation, Report No. 502, Geodetic Science Program, School of Earth Sciences 2012. The Ohio State University, Columbus. Sykes, A. 1993. An introduction to regression analysis, The Inaugural Coase Lecture. Law School, University of Chicago, Chicago. Teunissen, P. J. G. 2000. Adjustment theory: an introduction. Delft: Delft University Press. Available at: ,http://www.library.tudelft.nl/ dup., Series on mathematical geodesy and positioning. Xu, P. 2002. A hybrid global optimization method: the onedimensional case. Journal of Computational and Applied Mathematics, 147, pp. 301–314. Xu, P. 2013a. The effect of incorrect weights on estimating the variance of unit weight. Studia Geophysica et Geodaetica, 57(3), pp. 339–352. Xu, P. 2013b. Nonlinear models, nonlinear estimation, nonlinear filtering and nonlinear optimization. Lecture Notes given in Tongji University China, Shanghai Xu, P., Liu, J. and Shi, C. 2012. Total least squares adjustment in partial errors-in-variables models: algorithm and statistical analysis. Journal of Geodesy, 86(8), pp. 661–675. Xu, P., Liu, J., Zeng, W. and Shen, Y. 2014. Effects of errors-in-variables on weighted least squares estimation. Journal of Geodesy, 88(7), pp. 705–716.