Robust Methods of Building Regression Models-An Application to the

1 downloads 0 Views 1MB Size Report
advantages of an internal analysis of the robustness of least squares for a given sample are ... belongs to the vast literature on hedonic price functions ..... Use a Bayesian approach that involves building a ... Figure 1 may have a great influence on the regression ...... DET, and ACC the coefficients measure the elasticity.
~oumal ?Joural

of Business Business & &Economic EconomicStatistics, Vol.2, No. No. 1, January Statistics,Vol. January1984

Robust Methods of Robust of Building Methods Building Regression Regression Models-An Application Models-An to the the Application to Housing Sector Sector Housing Daniel Peña Penia Daniel Escuelade de Iingenieros Universidad Politecnica de Madrid. Escuela Industriales, Universidad Politécnica de Madrid, Spain Industriales, Madrid, Madrid, lingenieros Spain Javier Ruiz-Castillo Javier Ruiz-Castillo de Teoría TeoriaEconómica, Universidad Departmento de Universidad Complutense de Madrid, Spain de Madrid, Economica, Madrid, Madrid, Departmento Complutense Spain Thisarticle articlestudies This studies robustification strategies robustification for the the linear in the linearmodel modelin the presence outliers.The The strategiesfor presence of outliers. an intemal internalanalysis the robustness advantages of an robustness of least least squares for a given are advantages analysisof the squares for sample are given sample out. The The application pointed tnethodology is iIIustrated this methodology illustratedby an explicit modelof the the pointedout. applicationof this by building buildingan explicitmodel Area. determinants rentalhousing in the determinants of rental values in the Madrid MadridMetropolitan Area. housingvalues Metropolitan KEYWORDS: WORDS:Outliers; KEY Outliers; Influential Influentialobservations; Robust regression; Cook distance; Hedonic observations;Robust distance; Hedonic regression;Cook price market. function;Housing pricefunction; Housingmarket.

illustrated in Section where we we present model of illustrated Section 4, where present aa model the housing rental rental values Madthe determinants determinantsof housing for the the Madvalues for rid rid Metropolitan Metropolitan Area. Section 5, Area. The final section, The final Section section, containssorne some coneluding contains comments. comments. concluding

1. INTRODUCTION INTRODUCTION The introducing sorne The question some degree questionof introducing degreeof objectivobjectivbeen ity the rejection observationshas has been ity in the rejection of out1ying outlying observations the statistics the subject the considerableresearch research in the statistics subject of considerable literature.This This is aa fundamental literature. fundamental problem problem with with crosscrosssection samples section where one typically large body body of has aa large sampleswhere typicallyhas data numerous variables. data on numerous variables.A few few atypical observaatypical observations may make the nonnormal, detions the data data distribution distributionnonnormal, demay make the least least squares stroying the optimality estimation squaresestimation stroyingthe optimalityof the procedure, which which could become very could become inefficient. procedure, very inefficient. In this point In this artiele articlewe considerthe the problem we consider fromthe the point problemfrom of view view of building building an model of market an explanatory market explanatorymodel the observed rental values rental values in terms traits of each terms of the observed traits each this exercise exercise housing unit unit in an an urban area. Hence, urban area. Hence, this housing belongs to the the vast vast literature hedonic price literatureon hedonic functions pricefunctions belongs in urban urban economics, which has has been been reviewed reviewedby GrilGrileconomics, which Ball (1973), iches (1971), and Quigley iches The (1979). The (1971), Ball (1973), and Quigley (1979). microeconomic underpinnings the empirical microeconomic work empirical work underpinningsof the in this this area areacan can be found provides foundin Rosen Rosen (1974), who provides (1974), who price determination determinationof aa differentiated differentiated and aa model model of price and indivisible indivisible product product under under competitive conditions. competitiveconditions. The The rest restof the the artiele articleis organized as follows. follows.Section Section organizedas 2 summarizes summarizesthe the effects effects of outliers outliers in the the context context of maximum likelihood maximum the linear linear model. model. likelihood estimation estimation of the Section briefly surveys possible solutions Section 3 briefly the possible solutions and and surveys the shows the model's model's showsthe the advantages of a robustification robustificationof the advantagesofa construction consisting of an construction methodology, an internal internal methodology, consisting a sensitivity analysis ofa model estimated by least squares of model estimated least sensitivityanalysis by squares with aa particular particular sample. Its empirical application Its with sample. empirical application is

2. THE THEEFFECTS 2. EFFECTSOF OF OUTLlERS OUTLIERS We begin We begin by by briefly briefly reviewing for later later reference the referencethe reviewingfor maximum maximum likelihood likelihoodestimation of the linear estimationofthe linearmodel model Y= X + U Y=x,B+U

(1) (1)

whereY is a (n where vectorofresponses, of responses,X is a (n (n x 1) 1) vector (n x k) k) matrix of predetermined , is aa matrix predetermined variables rank k, k, P variableswith with rank vector ofparameters, of parameters,and (k and U is aa (n vector (k x 1) 1) vector (n x 1) 1) vector of disturbances. disturbances. Let be the the density function of U, and Let ffbe and assume assume that that density function 0 and E[UU'] E[U] E[UU'] = a2 u 2 I. The maximum maximum likelihood E[U] = Oand estimation of (1) leads to estimation (1) leads n

max

L In f(e¡) Inf(ei)

¡=I i=1

n

= min =

L~ -g(e¡), -g(ej),

¡=I i=l

(2) (2)

In f and e, e¡ = Y¡ where -g x/ Pare # are the sample -g = In yi - xí sample residuals. residuals. If maximum likelihood likelihood estithe maximum estiIfff is differentiable, differentiable,the 6 is the mator of P unique) to the the solution solution (assumed mator (assumed unique) the system system n

L 1/;(e¡)xí O', Z I(ei)x; = O',

¡=I i=l

(3) (3)

where derivative of g, g, and X¡ is the row the first firstderivative the ith where1/;/ is the and xi ith row 10

and Ruiz-Castillo: Pefa and Peña Ruiz-Castillo: Robust RobustMethods Methodsof Building Models BuildingRegression RegressionModels

11 11

y

ofX. Another way of X. Another of writing(3) way ofwriting (3) is n

¿~

e¡x[ ex W¡ Wi = O', 0',

(4) (4)

¡=I i=

whereW¡ where wi = 1/;(e¡)/e¡. e(ei)/ei. Thus, the maximum maximum likelihood likelihood estimation estimation of the the Thus, the linear model minimization linear model can as (a) the minimization can be interpreted interpretedas (a) the of aa eertain funetion g certain function the sample residuals;(b) sample residuals; g of the (b) the choice choice of aa funetion function 1/;k of sample residuals, whose whose eomcomsample residuals, ponents are are orthogonal the linear linear spaee by ponents spacegenerated orthogonalto the generatedby the the eolumns columns ofX; of X; and and (e) as weighted with leastsquares (c) as squareswith weightedleast weights determinediteratively. iteratively. widetermined weightsWi If! assume that that it belongs Iff is symmetrie, symmetric,we may may assume belongs to the potential the potential exponential general form form s~ exponentialfamily-a family-a general sugand Box Box (1953), gested Diananda (1949) and studied studied gestedby by Diananda (1949) and (1953), and Box and In this and Tiao Tiao (1973). this case case by Box (1973). In

Y -.- _.- - _.-.~--~ YP p

p

--- --- .. --_

_-

MP

_

.".0w

o

• • • 1 :.,._.,,_ ........

__ .::. -::¡t· .: ..*: . -----:-..... :: .. : " :-.

_

_

-

,e .• : •••• :." e.. • •• '.

.. : ... .:•

....

"... .• '.'. •". ". e. . . ' .. ..~ : '" .

"

"

.

"

•I I

!(u) f(u)

2 = kl(a)a-expl-k2(a) k l (a)u- 1expl-k2 (a) IIu/u /(I+a)l, u/a 112/(l+a)3,

xx

Xp

-1 l. 1. Therefore, distributionwill will be leptokurwith Therefore,the distribution leptokur-

12

&Economic Joumal EconomicStatistics, Joural of Business Business & Statistics,January January1984

tic. In In this least squares longer optimal. tic. this case, case, least Also, squaresis no longer optimal.AIso, estimatorsdepend since parameter estimators depend the variances variancesof the the parameter since the directly,on than a2, u 2, which is greater errorvariance, the error variance,which directly.onthe greaterthan will be unreliable sueb,estimates unreliable and unstable in and very such estimateswill very unstable difTerent differentsamples. samples. there are are two note that that there two Finally, it is worthwhile worthwhileto note Finally, types of possible outliers. If we consider sample points consider outliers. possible sample points types find an an anomalous anomalousvalue (Yi, xI), one may may find value of Yi for the the y, for ( i, x/), corresponding Figure 1 (A). residual for as in Figure The residual for xi, as (A). The correspondingXi, point P will and its efTect will be a vertical vertical will be large, effect will large, and point displacement regression line. lineo Alternatively, Altematively, we we the regression displacementof the may have vector of explanatory explanatory an atypical of the vector have an value ofthe atypicalvalue may variables that may an atypical not be associated associatedwith with an variablesthat atypical may not response, as Here, point point P alone alone essenessenas in Figure (B). Here, response, Figure 1 (B). tially that of the regression determinesthe slope line, so that slope ofthe regressionline, tiallydetermines of the anomalous nature of situation the in spite nature the situation the the anomalous spite or zero. be very small or even zero. small even residual may residual very may 3. DIFFERENT APPROACHES TO SOLVE SOLVE DIFFERENT APPROACHES THE PROBLEM THE PROBLEM

The practical approaches approaches to deal problems deal with with the problems The practical posed by outliers follows: as follows: summarizedas outlierscan can be summarized posed

the l. justify the centrallimit theorem theorem to justify 1. Appeal the centrallimit Appealto the use least Once normality hypothesis in order to use least squares. Once order squares. normalityhypothesis residualplots the model has been estimated, use residual plots against has been the model estimated,use against the variables to the explanatory or the estimated values values or the estimated explanatoryvariables outliers. detect possible outliers. detect possible 2. building a that involves involves building 2. Use a Bayesian approachthat Bayesianapproach formal incorporates the the a priori priori expected model which which incorporates formalmodel expected model by the standard standardlinear linear model deviation respect to the with respect deviationwith by parameters in an model. an extended extendedmodel. means means of parameters 3. Reject Reject least least squares of a robust estimation favorofa robustestimation squaresin favor procedure by selecting a function g that yields reasonthat function yields reasonprocedureby selecting ably efficient estimates under the normality assumption under the estimates efficient normalityassumption ably without sufTering the instability instability of least least squares the without squaresin the sufferingthe of outliers. presence outliers. presence estimation criterion, 4. Robustify, rather than than the the estimation criterion, Robustify, rather the the methodology followed constructionof the the construction followed in the the methodology linear model. model. linear are that decisions decisionsare This requires checking at each each stage This requires stage that checkingat obseranomalous obsernot small group determinedby not determined group of anomalous by a small but vations. not abandoned, least squares vations. Hence, abandoned, but Hence, least squares is not instead by a the estimation estimation process instead the supplementedby process is supplemented detection of battery of diagnostic that permit checks that permit detection diagnostic checks battery potentially influential measurements of influential observations, observations,measurements potentially tests of and tests their efTects estimated coefficients, effects on estimated their coefficients, and whether they they are are significantly whether atypical. significantlyatypical. In the next briefly review review these these apwe briefly In the next section section we approaches. proaches. 3.1 The Use of Residual Residual Plots

vast majority This the altemative the vast alternativesuggested This is the majority by the suggestedby Its main of statistics textbooks. Its main limilimieconometrictextbooks. statisticsand and econometric can only serve to tation that, at best, residual residual plots plots can serve at best, tation is that, only the A in 1. detect Figure l. However, in the outliersof type detect outliers Figure However, type

large sample numerous variables, variables, context contextofa of a large of dataon numerous sampleofdata residual plots pl01s by by themselves themselves are very helpful helpful for for are not not very residual with several detecting multivariant values values with severalcodetectingatypical atypicalmultivariant ordinates far from from the mean values values of the ordinatesfar the mean the explanatory explanatory variables. Unfortunately, these these outliers variables.Unfortunately, outliers of type type B in Figure may have great influence influence on the the regression have a great regression Figure 1 may results and particularly damaging. results and are, therefore,particularly are, therefore, damaging. 3.2 The Bayesian 3.2 Approach Bayesian Approach

This been used used by by Jeffreys JefTreys (1961), Box and and Tiao Tiao This has has been (1961), Box (1968), Chen and Box (1979 a, b, c), Box (1979, 1980) Box Box Chen and (1979, 1980) (1979 a, b, c), (1968), possibly the most most general and thorough thorough and others. It is possibly and others.It generaland have unable approach the problem, but we have been unable to the but been to approach problem, implement because of its computational complicait its because computationalcomplicaimplement tions adequate software for its its the requirements softwarefor tions and and the requirementsof adequate we abstain here from further efficient Thus, from abstain here further efficientapplication. application.Thus, it. comments comments on it. 3.3 Robust Robust Regression Regression Estimates

The least squares approach althe least alThe shortcomings squaresapproach shortcomingsof the ready mentioned mentioned have led in the the last last 20 years an have led years to an ready extensive literature that overcome these these diffidiffithat aims aims to overcome extensive literature culties. Books by by Mosteller and Tukey Huber Mostellerand culties. Books (1977), Huber Tukey (1977), (1981), and Lewis the proband Bamett Barnettand Lewis(1978) (1978) present presentthe prob(1981), and lem numerous references. references. lem and and contain contain numerous The instability of least presence of the presence least squares The instability squares in the and V outliers \f; in the functions functionsgg and outliersis due due to the the form form of the 2 In this expressions and (3). this case, {(u) == case, g(u) == uu2,, 1/I(u) expressions(2) (3). In (2) and 1. Therefore, since all all obserobseru, and Wi(U) Therefore,since u, and i/(u)/u == l. wi(u) == 1/I(u)/u vations those data large data with with a large are given vations are equal weight, weight, those given equal the least least squares residual absolutevalue value carry residualin absolute squaresequaequacarrythe It effect. It tion towards them-an obviously undesirable efTect. towardsthem-an tion obviouslyundesirable that grows is intuitively that a function more function gg that clear that grows more intuitively clear will give such slowly smallerweight when u is large weightto such largewill give a smaller slowlywhen atypical leading, consequently, more observations,leading, consequently,to more atypical observations, robust by has been been advocated advocatedby This solution solution has robustestimates. estimates.This historical Huber (1964) for historical 1973 for and others others(see Huber (see Stigler (1964) and Stigler1973 comments). Hogg (1979) Huber (1981) and Huber presenta (1981) present (1979) and comments). Hogg good of this approach. See also alsoJefTreys (1961, approach.See Jeffreys(1961, summaryofthis good summary p. ff.). p. 214 fT.). These procedures are threetypes robustprocedures are subject These robust types of subjectto three criticisms. nature of the the functions functionsgg the heuristic heuristicnature criticisms.First, First,the the formulation. formulation. or certainarbitrariness arbitrariness in the or 1/1 introducea certain 4 introduce Second, properties of the the estimates estimates the small-sample Second, the small-sampleproperties are these methods are useful useful in dealdealare unknown. unknown.Third, methodsare Third,these A ing with outliers of type in Figure 1, but they do not but with outliers Figure 1, they not type ing small values with solve problem posed posed by by atypical values with small the problem solve the atypical residuals. residuals. and Box Box Chen and With respect to the the first first criticism, With respect criticism, Chen and (1979a) have established that the functions g and 1/1 the functions that A have established (1979a) for suggested in the literature are optimal for particular are the literature particular optimal suggested function Huber'sfunction types of contamination. For instance, contamination.For instance,Huber's types with distribution gis optimal for a normal distribution with Laplace tails, normal for is g optimal Laplacetails, contamiwhich approximated by the contamithe can be be closely which can closely approximatedby can nated normal model model presented presented in (6). nated normal Therefore,it can (6). Therefore, use should that the methodology we use should depend we that the be argued depend methodology argued

Pehaand andRuiz-Castillo: Ruiz-Castillo: RobustMethods Peña Robust Methodsof of Building Models BuildingRegression RegressionModels

on the the specific structureof each each particular on The specificstructure particularsample. sample.The third criticism criticism leads leads to to generalized in M-estimates in third generalized M-estimates in (4) which the the weights which not only on the the wi in (4) depend depend not weights W¡ only on residualbut but also also on on the the observation's observation'sinfluence influence measresidual measuredby its distance distanceto to the the center centerof the the scatter scatterof points ured by its points as in in Krasker Kraskerand and Welsch Welsch (1982). this apas (1982). Although apAlthough this solves the the problem, the solution solution rereproach proach partiaIly partiallysolves problem, the mains heuristic heuristicand and obtaining mains obtaining sampling propertiesof sampling properties estimatesis is difficult. difficult. estimates Robustification of the Methodology 3.4 Robustification Methodology

The main main reason reasonfor for constructing robustestimation estimation The constructingrobust methods is is to to guarantee that our will not our results results will not be methods be guaranteethat few anomalous anomalous obserfundamentaIly obserdependenton aa few fundamentallydependent vations. However, the fact fact that an estimate that an estimate might be vations. However, the might be sensitiveto aa smaIl small set set of outliers outliersdoes does not not mean mean very very sensitive that it is inefficient inefficient in every conceivablecase. case. Before Before that every conceivable an estimation estimation procedure, it is reasonable reasonableto rejecting procedure,it rejectingan its good whetherits are preserved in investigate investigatewhether propertiesare preservedin good properties each particular eaeh particularsample. sample. dataset set susceptible treated Therefore, Therefore,given givenaa data susceptibleto being beingtreated means of aa linear linear model, ask the the by model, it is pertinent pertinentto ask by means this sample contain obfoIlowing obquestions:(a) (a) Does this followingquestions: sample eontain servationswhose whoseaa priori influenceis mueh muchgreater than servations prioriinfluence greaterthan the rest rest in the the construction construction of the the model? model? (b) Is it the (b) Is actual influence influence that that eaeh each inpossible measure the the actual possible to measure dividualobservation observationhas the parameter dividual has a posteriori posteriorion the parameter estimates? (e) there exist exist a test test to determine determine estimates? (c) Does there an observation whetheran observationconstitutes constitutesan outlier? whether an outlier? We now answersthat been given We now review review the the answers that have have been given to these questions. The first first issue issue has been approached these has been questions. The approached with the of the "hat" "hat"matrix, whose properties with the help help ofthe properties have have matrix,whose been discussedby and Welsch been discussed by Huber Huber (1975), Welseh (1975), Hoaglin Hoaglin and Cook (1977, (1978), Belsley, Kuh, and Welsch Welseh (1978), Cook (1977, 1979), Kuh, and 1979), Belsley, and Weisberg (1980), (1980), and (1980). Weisberg(1980). The "hat" matrix The "hat" matrix V projects projects the vector vector Y on the the linear columns of X: linear spaee generated by the eolumns X: space generated Y V = V Y, Y,

V = X(X'X)-'X'. X(X'XtIX'.

(8) (8)

The The matrix matrix V is symmetric symmetrie and and idempotent. idempotent. Its Its imporimportanee for our purpose lies in the fact that e = (1 V)U tance (I - V)U = (1 (I - V)Y, from which one obtains 0"2(1 -- Vii) var(e¡) = a2(1 var(ei) vii)

(9) (9)

with vii Vii = (x¡ - x), x), where X is the (xi -- x)'(X'Xtl(x¡ )'(X'X)')(x centered observation, and and l/n l/n (X'X) is centered matrix matrix of the observation,

the variance varianee and and covariance covariance matrix matrix for for the explanatory explanatory variables. variables. Therefore, except except for for a constant constant term, term, vii Vii represents represents Therefore, the Mahalanobis Mahalanobis distance distanee of an observation observation xi Xi to the center gravity of of the scatter seatter of of points, points, X. X. If a point xi Xi center of gravity is very very far far from from X, its vii Vii will will be large large and and the variance variance of of the corresponding eorresponding residual residual will will be small, smaIl, as (9) indiindi= In cates. the if the variance cates. limit, vii Vi¡ 1, varianee will will be zero, zero, 1, limit, which which means means that that the point's point's position position relative relative to the rest rest forces forees the regression regression equation equation to go through through it, of the observed value for irrespective ofthe observed value for Yi. irrespective yi.

13 13

It can can be be concluded concludedthat It that sample with high samplepoints points with vii high Vii are, influential. Since Since V is is aa projection are, potentially, potentially, influential. projection < l.1. Moreover, 0 < Vii matrix, since the the trace trace of an an vii~ matrix, O Moreover,since matrix is is equal idempotent to its its rank, r7= 1 Vii = k, k I E= idempotent matrix vii equal to rank, where kk is is the where the average the rank rank of X. X. Consequently, the Consequently, average value of the value the v¡¡'s and vii'sis kln. kin. FoIlowing Kuh, and Following Belsley, Belsley, Kuh, Welsch(1980), in praetice Welsch an observation observationis is eonsidered considered (1980), in practicean potentiaIly influentialif Vii> 2k/n. vii> 2kln. potentiallyinfluential Huber Huber(1981) hassuggested anotherinteresting inter(1981) has suggestedanother interestinginterfor the pretation the Vii terms. Sinee Since V is is idempotent, viiterms. pretationfor vii== idempotent,Vii into r1=1 V7j. Thus, taking (8) into aceount account v2. Thus, taking(8) j= nn

var(y¡)= v0 var(yj) = 2vii. Vii • S v'f;var(Yj)=0"2 var( i) = L j=l j=1

Therefore, that the the sample mean of hh indeindeTherefore,recalling sample mean recallingthat pendent observationswith with common common varianee variance 0"2 a2 has has pendent observations varianceO"a2/h, that 11/vii 21 h, it is clear IV¡i can variance can be clearthat be interpreted as interpretedas the the number numberofequivalent of equivalentobservations observationsused usedto compute compute Yi. Yi. If Vii vii= 1, 1, then Yi y, is computed with a single single observation, its residual residualis zero zero (see tion, its Equation9). (see Equation 9). In an In an alternative alternative approach determine aa priori approach to determine priori influentialobservations, influential Andrewsand and Pregibon observations,Andrews Pregibon(1978) (1978) use use the the change "volume" of the the scatter scatter of points change of "volume" points whenone when one eliminates eliminatesaa subset subsetofobservations. of observations.However, However, andJohn Draper John(1981) haveestablished establishedthat thataa measure measure Draperand (1981) have ofa influenee in this of a single this approaeh point'sinfluence singlepoint's approachis precisely precisely 1 -- Vii. 1 vii.

The The second second issue issue is how how to determine determine the the actual actual influenceon the influence the model model of eaeh each observation observationin a given given sample. There are are several based on severalways this based sample. There ways of doing doing this the empirical influence function lEA 6, where IEA = IJA SA- {J, IJA the estimate AAis the estimateobtained obtainedafter aftereliminating the subset subset eliminatingthe A A of observations, and {Jf is the the estimate estimate with with the the fuIl full observations,and sample Cook and and Weisberg sample(see (see Cook Weisberg1980). 1980). A simple A's scalar measure measure of A's simple way way of obtaining obtaining a sealar influenee the distance between IJAA A and influence is to eonsider considerthe distancebetween and {JA in a metric metric with with statistical statistieal meaning. meaning. Cook introCook (1977) introduced such a measure measure by duced such 2 DA = (A (IJA - {J)'(X'X)(IJA DA - {J)/ks )(X'X)( )/ks2 ,

where where s2 S2 is the regression regression residual residual variance varianee and and

xt

(X' 1S2 is an an estimate variance covariance eovariance estimate of the variance (X'X)-'s2 matrix for,.{J.

Using the subindex subindex (i) indicate that that a referred-to referred-to Using (i) to indicate eharacteristic has has been been calculated the ith ith obsercharacteristic calculatedwithout withoutthe observation, the Cook distanee can easily easily be obtained obtained from from Cook distance vation, 2 D¡ = ((i) (IJU) - A)' {J)' (X'X)()(i (X' X)(IJ(i) - {J)/ks 6)/ks2 2 = = e eh;;/ks (1 - vii)2. Vii)2. vii/ks2(1

It is interesting interesting to note that that Di D¡ can also be written written as D¡ = (Y(i (VU) -- Y)' Y)' (Y(, (V(/) - Y)/ks2. Y)/ks 2 • Di

indicating that that Di D¡ measures measures the Euclidean Euclidean distance distanee in indicating whieh the prediction prediction vector vector Y is translated which translated after after eliminating the ith observation observation from from the regression. regression. nating Finally, the construction construetion of of tests tests to determine determine the Finally,

14

Joumal EconomicStatistics, &Economic Business & Joural of Business Statistics,January January1984

atypical models has has used used numerous numerous data in regression regressionmodels atypicaldata approaches Barnett and and Lewis Lewis 1978 1978 for for a survey surveyof (see Barnett approaches(see this topic). When the the problem problem is considered within aa consideredwithin this topic). When likelihood testing approach, the resulting resulting test test stastaratio testing likelihoodratio approach,the tistic is a monotonic residthe Studentized Studentizedresidmonotonic function function of the tistic uals uals - vii, r¡=e¡/s~, ri= eils

(10) (10)

dividedby its where residual has has been been divided wherethe least least squares by its squaresresidual estimated deviation. standarddeviation. estimatedstandard A shortcoming this approach that the distribudistribuapproachis that shortcomingof this tion r¡ under under the Student the normality tion of ri assumptionis not Student normalityassumption not indeindet,t, because because numerator numerator and and denominator denominatorare are not for ss in (10) pendent. However, However, the the substitution substitutionof s(i) (10) pendent. s(i)for yields n - k - 1 degrees degrees yields a Student t distribution with n-k of freedom. Weisberg reasons (see For computational freedom. For (see Weisberg computationalreasons 1980), as convenientto express expressit as 1980), it is convenient t¡ V(n - k - l)/(n ti == r¡ ri .J(n l)/(n - kk-- r7), r),

The relevant where r¡ri is given by (10). relevant distribution for where distributionfor (10). The given by obtaining test's significance level is that that of the the test's obtaining the significance level maximum value with n-k maximum value of a sample statisticswith n- ksampleof t statistics 1 degrees unknown. HowHowwhich value value is unknown. freedom,which degreesof freedom, ever, have been been tabulated critical values values have tabulated ever, approximate approximatecritical using the Bonferroni Miller 1977, and Bonferroniinequality 1977, and (see Miller using inequality(see Cook Prescott 1981). Cook and and Prescott 1981). In conelusion, the statistics In and t¡t, constitute constitute statisticsVii, vii, D¡, Di, and conclusion, the the basis basis for methodological robustification robustification of the the for the methodological the linear predemodel. The The V¡¡ terms depend linear model. viiterms depend only only on the predetermined potential influence the potential influence terminedvariables variablesand and measure measurethe of each account its its position each observation observationtaking position taking into account relative to the rest of the the sample. would have have a the rest We would relative sample. We robust points had had analogous all points The robustdesign values.The viivalues. designif aH analogousV¡¡ Cook Di statistic the actual influence of each Cook Di statisticcaptures actual influence each capturesthe observation the estimated estimatedparameters the predicobservationon the predicparametersof the tion interesting because because it tion vector vector Y. The The statistic statistic is interesting indicates the the practical practical irrelevance irrelevance of worrying about indicates worrying about sample anomalous, have observationsthat, have that, although sample observations although anomalous, model. FinaHy, the model. statistic little influence on the little influence Finally, the t statistic summarizes both features test featuresand and is used used as as a formal formal test summarizesboth of whether whether a single next an outlier. observationis an outlier. The The next single observation section way of attacking attacking containsan an application this way section contains applicationof this the problems posed posed by by outliers. outliers. the problems THE DETERMINANTS 4. A MODEL MODEL FOR FOR THE DETERMINANTSOF RENTAL HOUSING IN THE RENTAL VALUES IN THE HOUSING VALUES AREA MADRID METROPOLITAN MADRID METROPOLITANAREA

4.1 The Problem the Data Problem and the In Spain, the rental housIn rentalhousinterventionin the Spain,government governmentintervention ing instiseveral public two forms. forms. First, sector takes takes two First, several public instiing sector tutions promote-directly or indirectly-the construcconstrucor indirectly-the tutions tion below the the market market level. level. at rents rents below tion of public public housing housingat enforcedcomcomSecond, the government has enforced since 1920 1920 the Second,since governmenthas controls in the pulsory lease lease renewal renewal and the private private and rent rent controls pulsory In 1964 sector. were liberalized 1964 rents liberalizedon new new contracts. contracts. sector.In rentswere

Therefore, the rental renta! market market sector ineludes only prisector includes Therefore,the only private housing units occupied occupied after after 1964. 1964. vate housingunits Our this section sectionis to build build an an explanatory Ourproblem problemin this explanatory model of market market rental rental values terms of the the observed observed model valuesin terms traits ofdwelling units in the the Madrid Area of dwellingunits MadridMetropolitan traits MetropolitanArea (MMA behavioral relationship relationship This is not a behavioral hereafter).This (MMA hereafter). function that from the the but a function the rent rent resulting resulting from but that gives gives the and demand demand for for each each variety interaction interactionof supply variety of supply and The partial the the differentiated product. The partial derivatives derivativesof differentiatedproduct. that implicit or functionare as the the implicit or hedonic hedonic that function are interpreted interpretedas prices of the the corresponding characteristics. prices correspondingcharacteristics. The final model to assess final aim use the estimated estimatedmodel assess The aim is to use the the distributional and the consethe economic economic advantages distributionalconseadvantagesand Table 1. Structural Table1. StructuralHousing Traits HousingTraits Name Name

Description Description

a. a. Continuous Variables ContinuousVariables AGE AGE OCUP OCUP

M2 M2 ROOM" ROOMI NFL NFL DET DET

in years Building since age in years since Buildingage its construction its construction Years Years of occupancy the occupancy of the housing unit housingunit in squared Space meters Space in squaredmeters Space innumber numberof rooms rooms Space in Number Numberof f100rs floors Deterioration Deteriorationstate of the the building building

Mean Mean 21.9

b. Dummy Variables b. DummyVariables Age of building Age building in XIX XIXcentury AXIX Builtin AXIX Built century Builtin Builtin 1900-1940 in 1941-1964 A4164 Built A4164 Builtin 1941-1964 in 1965-1974 A6574 Built A6574 Builtin Type Typeof building building MAGL "Marginal" housing MAGL bad concon(in bad "Marginal" housing(in dition) dition) CHTW Chalet CHTW Chaletor or townhouse townhouse APT APT Detached Detachedapartment apartmentbuilding building Otherapartment Other apartmentbuilding building Type promotion Typeof promotion PRla Private firm Privatefirm PRla inUser's inUser'scooperative, cooperative,particular particular dividual, selfconstruction dividual,selfconstruction a Unknown UNK UNKa Unknown

services Hygienic Hygienicservices LESS Less than than a full fullbathroom LESS bathroom full bathroom -One 009 full TWOM Two or Two morebathrooms TWOM or more bathrooms Payrllents of utilities utilities Payments a Hot HOTW in other Hotwater waterbill billincluded includedin other HOTWa concept concept in other HEAT" billincluded HEA ra Heating includedin other Heatingbill concept concept BEXpa Building included BEXP8 expendituresincluded Buildingexpenditures in other in otherconcept concept Othervariables variables Other Withtelephone TELPH With TELPH telephone CHEAT Withcentral centralheating CHEAT With heating Withgarage GAR With GAR garage a Witha janitor With janitor JAN JANa Withfurniture FURN With FURN furniture FTENa Firsttenancy FTEW First tenancy •' Nonsignificant in the exploratory analysis. exploratory analysis. Nonsignificant variables in

Standard Standard Deviation Deviation 23.0

3.6

•' 2.3

68.0 3.6 5.1 5.1 4.6

42.8 1.2 2.8 17.5

Percent Percent

7 19 21 21 53 100 100 3

4 38 55 100 100 27 45

28 100 19 70 11 11 100 100 47 38 19 36 20 7 34 15 18

Models RobustMethods Methodsof Building Peña Robust Ruiz-Castillo: and Ruiz-Castillo: Peia and BuildingRegression RegressionModels

quences public housing housing and rent control control policies policies in and rent quences of public rean assessment assessment wiIl will be resuch an Spain. results of such The results Spain. The ported elsewhere Peinaand and Ruiz-Castillo Ruiz-Castillo(1983). elsewherein Peña (1983). ported Our from aa 1974 1974 survey data come from Our data 4,067 housing housing surveyof 4,067 that for that of the total total number number for units MMA (or .4%ofthe units in the MMA (or .4% private area). The used here here consists consists of 460 private The sample sample used area). Such 1964and and 1974. 1974. Such rental between1964 rentaldwellings occupiedbetween dwellingsoccupied wiIl be made made available by the the authors upon request. request. data authorsupon availableby datawill areas usually urban areas Hedonic price functions for urban usuaIly ininfunctions for Hedonic price variables:a set set of traits traits volve two two types volve explanatoryvariables: types of explanatory which characterizing units of the buildings buildingsto which dwellingunits characterizingdwelling characteristics. locational characteristics. they belong, belong, and and a set set of locational they eithertype variablesof either that Tables and 2 describe the variables describethe Tables 1 and type that could measure. we could measure. are in Sorne measurement problems problems are comments on measurement Some comments order: order:

for a building's l. many cases, value for 1. In In many age, cases, to get building'sage, get a value construcof the known we had had to consider midpoint ofthe known constructhe midpoint considerthe discontinuitiesfor certaindiscontinuities tion interval. This for this this This led led to certain tion interval. dated the knew that knew that the building dated variable. When When we only variable. building only was assigned variablewas from the AGE variable 19th century, from the the 19th assigned century,the constructiondate date that the the construction the which implies implies that value 85, which the value 1880. was was assumed assumedto be 1880. number recordeda number 2. For interviewersrecorded Foreach eachbuilding, building,interviewers defects. observed of points points for for each types of observed defects. each of eight eight types value the state deterioration of the state of deterioration Thus, the greater value the the Thus, greater condition of the correcorreworse the the condition variable variable(DET), (DET), the worse sponding building. spondingbuilding. 3. The variable is a weighted weighted index index of The accessibility accessibilityvariable that from the 85 zones each transportation times from each of the 85 zones that times transportation make up the MMA to six centraIly located neighborMMA six the located make neighborup centrally all employment concentrated. hoods where where 25% 25% of aIl hoods employment is concentrated. The weights reflect relative importance importance of employThe weights reflectthe the relative employrelative to total total ment in each each of those those neighborhoods ment neighborhoodsrelative in is measured measured employment in the six. Accessibility the six. Accessibility employment and public private and public transportation, weighted minutes of private minutes transportation,weighted by the rate of both transporboth modes all transpormodes for for aIl the utilization utilization rate by tation in the metropolitan area. area. tation purposes the metropolitan purposes firstprincipal 4. The the first principal component The index index HIGH HIGH is the component 8 Table2. Locational Variablesa Table LocationalVariables

Name Name ACC ACC POPD POPO RENr' RENTb

HIGH HIGH OLD OLO b INFRAb INFRA SCHOOLb SCHOOLb

Description Description

Mean Mean

in 41.2 Accessibility indexin Accessibilityindex minutes minutesof transportation transportation time time in 19,950 Population Populationdensity densityin inhabitantsper inhabitants per squared squared kilometer kilometer Average monthly 18,826 Average monthlyfamily family in pesetas incomein income pesetas Socioeconomic index .06 Socioeconomicindex Buildings index .13 Buildingsage age index -.07 in bad bad -.07 Indexof housing Index housingin conditions conditions 7,351 and secondary Primary 7,351 Primaryand secondary school school enrollments enrollments

Standard Standard Deviatio Deviation 17.6 20,683 4,764 .88 1.14 .78

4,345

in!he thatmake the Madrid Area. •*All AII variables take values Area. the 85 zones zones !hat makeup MadridMetropolitan variablestake valuesin Metropolitan up !he b Nonsignilicant in !he b the explanatory explanatory analysis. analysis. Nonsignificant variables in

15

12 variables set of 12 explaining variables varianceof a set 34%ofthe of the variance explaining34% 85 the 85 representing differentsocioeconomic socioeconomic aspects aspectsof the representingdifferent zones the MMA. MMA. zones in the 5. principal components components analysis 5. We We also also applied analysisto appliedprincipal characteristics 5 variables buildings' characteristics severalbuildings' variablesdescribing describingseveral firstprincipal The first principal component zones. The in each the 85 85 zones. each of the component dominated (OLD), variance, was was dominated of the variance, 38%ofthe explaining38% (OLD), explaining The second second by the the average each zone. zone. The of buildingsin each averageage ageofbuildings by % of component additional21% explainingan additional21 (INFRA), explaining component (INFRA), the imporas an an index index of the was interpreted importhe variance, interpretedas variance,was zone. housing in bad bad condition each zone. tance condition in each tance of housing variables that repre6. Of aIl that might all the variables conceivablyrepremight conceivably could only measmeassent local public sector activities, sent local activities,we could public sector school enroIlment enrollment ure primary and secondary ure secondary school primary and (SCHOOL). (SCHOOL). 7. It is always always difficult ascertainwhether whetherthe rental rental difficultto ascertain 7. It of this type amount type amount in monthly expendituresin surveys surveysofthis monthlyexpenditures In our our case, we only knew reflects rent. In or net net rento reflectsgross case, we only knew gross or utilitieswere wereincluded included whether payments for certain utilities for certain whetherthe payments did not know whether or but did know whether other measures, or not in other measures,but We tried tried to such measure was was the rent bill bill itself. itself. We the rent such aa measure account three dummy for these these effects effectsby account for by constructing constructingthree dummy variables person interviewed interviewed that take the value value 1 if the person variablesthat take the for hot hot water, declared, respectively, that that payments water, declared, respectively, payments for included in the building expenditures were included heating, or building expenditureswere heating,or other other measure. measure. 4.2 The Selection Form 4.2 the Functional Functional Forrn Selection of the

To decide best functional decide on the the best functional form, followed form, we foIlowed an process that that began began with with an an exploratory an iterative iterative process exploratory reasonable first repreanalysis the data data to obtain obtain a reasonable first repreanalysisof the identification sentation. Next, we concentrated sentation.Next, concentratedon the identification of possible FinaIly, we we carried carried out the maxioutliers. Finally, the maxipossible outliers. mum likelihood likelihood estimation variable's mum estimationofthe of the dependent variable's dependent best and performed checks to best transformation transformation and performed various various checks find an for metric for the predetermined an adequate metric the variafind adequate predeterminedvariables. bles. For the initial we used used three three For the initial exploratory analysis, we exploratoryanalysis, of tools: tools: bivariate plots of the the response variable types bivariateplots response variable types with each explanatory with respect respect to each the empirical variable,the empirical explanatoryvariable, of each variable, and residual distribution distribution each variable,and residualplots from plots from with several preliminary regressions with severalsets sets of explanatory preliminaryregressions explanatory variables. the foIlowing: variables.The The results resultswere were the following: l. The The rent rent variable required transformation, 1. variablerequired transformation,possipossibly logarithmic transformation. Plots of e¡ transformation.Plots for ei == F( y¡) bly a logarithmic yi) for the the untransformed untransformedy variable variable showed showed curvature curvatureand and heteroskedasticity. both the distribution distributionof both Moreover,the heteroskedasticity.Moreover, the rent the rent variable variable and and the the regression regression residuals residuals had had a strong positive asymmetry. logarithm has has a the logarithm strongpositive asymmetry.FinaIly, Finally,the clear interpretation, indicating indicating that that each clear economic economic interpretation, each effect depends trait'seffect reachedby other trait's level reached by the the other dependson the level housing attributes. attributes. housing obtain linearity for the response, 2. To obtain linearity for response, once rents rents in logs, were were expressed the logarithmic logarithmic transformation transformation expressedin logs, the also applied was also the continuous continuous variables variablesOCUP, was applied to the OCUP,

Joumal of Business Business & & Economic Economic Statistics, Statistics, January January 1984 Joural

16

were were that that both 19th-century 19th-century and very very moder modem housing housing showed rents rents significantly significantly higher, higher, while while housing housing from from showed 1940 was was the least least expensive. expensive. In a first first approximation, approximation, 1940 we represented represented this effect effect by a second second degree degree polynopolynomial. To avoid avoid the expected expected multicollinearity, multicollinearity, the folfolmial. lowing variables were defined: AGDM AGDM = AGE AGE - AGE, lowing and AGDM2 AGDM2 = AGDM2, AGDM2 , where where AGE is the mean mean age age and for all housing. housing. for With these these decisions decisions made, made, the resulting resulting model model apapWith pears in column column (1) of Table Table 3. The residuals' residuals' distridistripears bution is asymmetric, asymmetric, with with asymmetry asymmetry and and kurtosis kurtosis bution coefficients equal equal to -1.95 -1.95 and and 7.5. The KolmogorovKolmogorovcoefficients Smimov test test leads leads to the rejection rejection of the residuals' residuals' Smirnov

NFL, DET, ACC, ACC, and POPD. POPD. Since the case case for for M2, NFL, OCUP and NFL was was not clear, clear, the decision decision to transtransOCUP form them them was was maintained maintained only only provisionally. provisionally. form Variables denoted denoted by by a in Tables Tables 1 and 2 were were 3. Variables initially rejected rejected because because they they did not supply supply additional additional initially information. information. building age age variable variable showed showed a complex complex and and 4. The building highly nonlinear nonlinear influence, influence, probably probably because because it captures captures highly very different different effects effects and and acts acts as a proxy proxy for for other other very variables. Moreover, Moreover, as already already indicated, indicated, its construcconstrucvariables. was not free free of difficulties. difficulties. To identify identify nonlinear nonlinear tion was effects, its range range was was broken broken down down into several several intervals intervals effects, represented by by a set of dummy dummy variables. variables. The results results represented

Table 3. Regression Regression Results Table Coefficients standard errors) errors) Coefficients(and (andstandard Variables Variables

CONSTANT CONSTANT AGDM AGDM AGDM2 AGDM2 OCUpa OCUPa a M2 M2a a NFL NFLa a DET DET8 MAGL MAGL CHTW CHTW APT APT LESS LESS TWOM TWOM TELPH TELPH CHEAT CHEAT GAR GAR FURN FURN BEXP BEXP a ACC ACC8 POPDa POPD8 HIGH HIGH OLD OLD 2 R R2 Standard Standard error error Numberof Numberof observations observations

•aIn InIogarilhms. logarithms.

(2) (2)

(3) (3)

(.00005) (.00005) -.25 -.25 (.04) (.04) .46 .46 (.07) (.07) .26 (.06) (.06) -.06 -.06 (.03) (.03) .11 .11 (.15) (.15) .66 (.15) (.15) -.OS -.08 (.06) (.06) -.19 -.19 (.OS) (.08) .06 .06 (.09) (.09) .05 .05 (.06) (.06) .11 .11 (.07) (.07) .2S .28 (.10) (.10) .33 .33 (.07) (.07)

7.14 (.60) (.60) -.010 -.010 (.002) (.002) .0002 (.00004) (.00004) -.25 -.25 (.03) (.03) .42 (.05) (.05) .20 (.05) (.05) -.06 -.06 (.02) (.02) -.OOOS -.0008 (.11 (.11)) .53 .53 (.12) (.12) .02 .02 (.05) (.05) -.23 -.23 (.07) (.07) .07 .07 (.07) (.07) .14 .14 (.05) (.05) .09 .09 (.06) (.06) .29 .29 (.OS) (.08) .29 .29 (.05) (.05)

7.57 (.35) (.35) -.012 -.012 (.002) (.002) .0002 (.00004) (.00004) -.25 -.25 (.03) (.03) .42 (.05) (.05) .20 (.04) (.04) -.06 -.06 (.02) (.02)

-.15 -.15 (.13) (.13) .04 .04 (.02) (.02) .16 .16 (.04) (.04) -.06 -.06 (.03) (.03)

-.33 -.33 (.10) (.10) .0009 .0009 (.017) (.017) .10 .10 (.03) (.03) -.05 -.05 (.02) (.02)

.71 .71 .4S .48

(1) (1)

5.77 (.77) (.77) -.012 -.012 (.002) (.002)

.0002 .0002

460

(4) (4) 7.S1 7.81 (.31) (.31) -.008 -.OOS (.002) (.002) .0001 .0001 (.0003) (.0003) -.25 -.25 (.02) (.02) .40 (.05) (.05) .18 .1S (.04) (.04) -.07 -.07 (.02) (.02)

(5) (5)

8.13 S.13 (.33) (.33) -.09 -.09 (.03) (.03) .12 (.OS) (.08) -.OS -.08 (.007) (.007) .39 (.05) (.05) .19 (.04) (.04) -.08 -.OS (.02) (.02)

.54 (.11) (.11)

.48 .4S (.10) (.10)

.49 (.10) (.10)

-.23 -.23 (.07) (.07) .08 .OS (.07) (.07) .14 .14 (.05) (.05) .09 .09 (.06) (.06) .30 .30 (.OS) (.08) .29 .29 (.05) (.05) .06 .06 (.05) (.05) -.3S -.38 (.OS) (.08)

-.23 -.23 (.06) (.06) .18 .1S (.06) (.06) .14 .14 (.04) (.04) .11 .11 (.05) (.05) .26 .26 (.07) (.07) .24 .24 (.05) (.05) .11 .11 (.05) (.05) -.41 -.41 (.07) (.07)

-.25 -.25 (.06) (.06) .19 .19 (.06) (.06) .15 .15 (.04) (.04) .13 .13 (.05) (.05) .25 .25 (.07) (.07) .26 .26 (.05) (.05) .12 .12 (.04) (.04) .41 .41 (.07) (.07)

.10 .10 (.03) (.03) -0.7 -0.7 (.03) (.03)

.09 .09 (.03) (.03) -.06 -.06 (.02) (.02)

.08 .OS (.03) (.03) -.07 -.07 (.02) (.02)

.79 .79

.79 .79

.82 .S2

.82 .S2

.37 .37

.37 .37

.33 .33

.32 .32

451 451

451 451

443 443

443 443

Other Other Alternative Alternative Variables Variables

AGEa AGEa AXIX AXIX OCUP OCUP

Peña Robust Methodsof Building Models RobustMethods Ruiz-Castillo: Peha and and Ruiz-Castillo: RegressionModels BuildingRegression

Outliers Table Possible Outliers Table4. Possible Observatían Observation number numberDit

1 2 3 4 5 6 7 8 9 10 11 11 12 13 14 15 16 17 18 19

v,

DI

ti

.03 .04 .06 .03 .04 .05 .06 .03 .05 .05 .04 .11 .11 .03 .03 .03 .04 .12 .15 .15

.06 .05 .08 .02 .03 .03 .04 .02 .03 .02 .02 .04 .01 .01 .01 .01 .00 .01 .01 .04 .00 .00

-6.6 -6.6 -5.3 -5.3 -5.4 -5.4 -4.1 -4.1 -3.7 -3.7 -3.7 -3.7 -3.7 -3.7 -3.6 -3.6 -3.3 -3.3 -3.0 -3.0 -3.0 -3.0 -2.8 -2.8 -2.8 -2.8 -2.5 -2.5 2.4 -2.4 -2.4 2.4 -0.7 -0.7 .07

= .0 normality with a = l. The distributionappears .01. The distribution appearsto be normalitywith negative numberof negative aa normal contaminated by by aa small small number normal contaminated the median values, since is symmetric around median (whose around it since values, (whose symmetric + and reasonably normal in range the .07 ± value normal and value is .07) range.07 reasonably .07) 1.5 q, where q is the residuals standard deviation. a a, the model's model's robustness robustness The internal The internal analysis analysis of the which are are 19 observations worthy of attention, 19 observations yielded worthy attention,which yielded 18 two (numbers and last two 18 and 19) The last included includedin Table Table 4. The (numbers 19) with influential the are more influential with largest the potentially more are the vi largestVii potentially acvalues, their actual influence is negligible actual influence values, although althoughtheir negligibleac17 observations Di statistic. the Di statistic. The The first first 17 observations cording cording to the ti values were carefully reviewed, with with the the largest values were with carefullyreviewed,with largestti from to suffer from data data the result that that the first 9 appeared suffer the result the first appeared of a zero in the rent transcription errors (omission a zero the rent errors (omission transcription of the next 8 observations were next figure). Since several the observations were Since several figure). provisiondecidedto maintain maintainthem them provisionopen doubt, we decided open to doubt, we new we estimated a new regression estimated with ally. Consequently, regressionwith ally. Consequently, with results 451 451 data data points points with results summarized summarizedin column column (2) of Table the elimination the first Table 3. 3. As can can be seen, elimination of the first seen, the 9 observations the results without changing the results without observationsimproves improves changing The coefficients variables them them substantially. coefficients of most most variables substantially.The with remain constant with the following remain essentially constant the essentially excepfollowing excepthe of MAGL MAGL and APT become become tions: the coefficients tions: (a) and coefficients (a) that they should be elimielimipractically zero, zero, suggesting practically suggestingthat they should POPD the influence of POPD nated from (b) the influence from the the model; nated model; (b) appears now by the the accessibility index; accessibilityindex; capturednow appearsto be captured TELPH and the coefficients of TELPH and and ACC ACC increase, the and (c) coefficients increase, (c) making them them significant. making significant.

information, we estimated estimated a new new In view view of this this information, and POPD, APT, and POPD, model without without the variables variablesMAGL, model MAGL,APT, but introducing introducing the variables previously rejected for the variablespreviously the but rejectedfor BEXP was was model with points. As a result, with 460 data data points. model result, BEXP provisionally included included in the model because, although model because, provisionally although not significant (it has a t value of 1.2), it appears to be has t value (it appears 1.2), significant of Table 3 summaColumn Table summapotentially important. Column (3) important. potentially final model fitted with with 451 451 data data points. rizes model fitted points. rizesthe the final the robustness robustness analysis this for this Next, we repeated Next, repeated the analysis for We model new anomalous anomalous data. data. We model in order order to detect detect new nature the of the 8 observations the atypical observations confirmed nature confirmed the atypical previously commented commented upon, their actual actual inupon, although previously althoughtheir to be generally small judging their fluence appeared fluence be small appeared generally by their judgingby rate, we reestimated model withwe reestimated the model withvalues. At any Di values. any rate, out these presented the results resultspresented these 8 observations, observations,obtaining obtainingthe in column remarkthat Table 3. We We should should remark that (a) column (4) of Table (a) CHEAT, and BEXP, which were and which were the variables TWOM, the variables TWOM,CHEAT, BEXP, become signifinot with a == .05, not formally .05, become formallysignificant significantwith signifiof the coefficients cant doubt; (b) the rest without any the rest the coefficients cant without any doubt; (b) Kolmogorovare not significantly are and (c) the Kolmogorovaltered;and (c) the significantlyaltered; as well well as as tests and Smirnov test, as tests on the the asymmetry Smirov test, asymmetryand kurtosis, lead to the acceptance of the hypothesis of the lead the the kurtosis, acceptance hypothesis the residuals' residuals' normality normality with a = .10. In initial In conclusion, if we compare the latter latterwith with the initial conclusion,ifwe comparethe 18 can model, it can observed that after eliminating the be observed that after the 18 model, eliminating observations ofthe as outliers outliers(3.7% of the observationsthat that we considered consideredas (3.7% total), the residual residualvariance variancehas hasdiminished diminishedby 55%, the 55%,the total),the of explained variability has increased by has proportion increased proportion explained variability and we can reasonably accept hypothesis that 17%, and can the that 17%, reasonablyaccept hypothesis the the residuals residualsare arenormally distributed.Most Most coefficients coefficients normallydistributed. and when this is not the have very slightly, have changed and this when the changed very slightly, model becomes becomes more with a case and and the the model with case more compatible compatible the center economic the to priori economic information: the distance information: distance the center priori of Madrid measured by by the logarithm of the accessibilMadridmeasured the logarithm the accessibiland the that and the fact that a housing unit has two fact unit has two or ity index, ity index, housing more more bathrooms, bathrooms, telephone, telephone, central and buildcentralheating, and buildheating, included another ing expenditures included in another measure, become become expenditures measure, the model significant from variablesin the model presumably freefrom significantvariables presumablyfree values. atypical values. atypical To test test the we have performed the former formerspecification have performed specificationwe maximum likelihood the Box-Cox Box-Cox the estimation of the the maximum likelihood estimation X for transformation rent variable. variable. This for the the rent transformation parameter parameter A This has and 443 done for the models models with with 460,451, has been been done for the and 460, 451, results, which insensitive to The results, which are data. data. The are practically insensitive practically different of the continuous explanatory differentspecifications the continuous specifications explanatory variables, presented in Table are presented Table 5. 5. variables,are we keep As we the keep eliminating observations,the eliminating atypical atypical observations, maximum function for A gradually the likelihood likelihood function maximum of the for X gradually

Maximum Values Table Table 5. Maximum Values of the Likelihood Likelihood Function Function for for Different Different Specifications Specifications of the Box-Cox Parameter and the Sample Size Sample Size x n

-.2 -.2

460 451 451 443

-4866 -4866 -4635 -4635 -4493 -4493

-.1 -.1 -4816 -4816 -4605 -4605 -4469 -4469

.0 -4775 -4775 -4585 -4585 -4454 -4454

.1 -4745 -4745 -4574 -4574 -4447 -4447

17

.2 -4727 -4727 -4574 -4574 -4450 -4450

.3 -4721 -4721 -4584 -4584 -4464 -4464

.4 -4729 -4729 -4605 -4605 -4489 -4489

.5 -4750 -4750 -4637 -4637 -4524 -4524

18

Joumal Joural of Business Business & &Economic EconomicStatistics. January1984 Statistics,January

approaches zero. The The maximum maximum is reached X == .3 reachedfor for A approacheszero. the full full sample, with with 451 .1 with with the 451 data, and .1 with 443 data, and sample, .2 with data. 95%confidence In the latter latter case, data. In confidenceinterval intervaldoes does case, a 95% not inelude the logarithmic transformation not include the logarithmic transformation(A (X == O). 0). Although might still this suggests thatthe the model model might stillcontain contain Althoughthis suggeststhat furtheroutliers, did accept the logarithmic further transforoutliers,we did acceptthe logarithmictransformation because it is reasonable an as adequate mation as reasonablefrom from an adequate because economic of view and and is not not dramatically economic point point ofview dramaticallyrejected rejected the empirical evidence (see Atkinson 1982 1982 for for a empirical evidence (see Atkinson by the recent analysis the transformation transformation parameters' recent senparameters'senanalysisof the sitivity outliers). sitivityto outliers). As regards regards the have already already we have the explanatory variables,we explanatoryvariables, the first first exploratory did not not pointed models did that the exploratorymodels pointed out that NFL variathe OCUP whetherto express OCUP and and NFL variaindicate whether indicate expressthe bles with logarithmic transformation. bles with or without without a logarithmic transformation.To we performed the following decide this issue, issue, we performed the decide this following 2 x 2 factorial the sums which the factorial experiment sums of squared squared experiment in which residuals are are presented presented for for each residuals each possible possiblespecification: specification: In OCUP OCUP In OCUP NFL 46.25 NFL 45.34 46.25 45.14 In NFL 44.22 44.22 InNFL The that the numberof floors floorsshould should be The results resultssuggest the number suggestthat while the logs, while the years in logs, should not not be occupancy should years of occupancy transformed. transformed. The whose specification The last was open last variable variable whose specification was open to doubt building age. We searched searchedfor for the the best best was the the building doubt was age. We nonlinear specification using the procedure suggested the procedure nonlinear specificationusing suggested by Box and Tidwell but unfortunately Tidwell(1962), the comcomBox and (1962), but unfortunatelythe by putation algorithm Finally, we did not converge. we apapputation converge. Finally, algorithmdid plied the First, among plausible the following the plausible criterion.First, plied followingcriterion. among the transformations, choose the one generating the smallest smallest choose the transformations, generatingthe sum Next, study possibility of the possibility residuals.Next, sum of square squareresiduals. study the with one or or more that specification supplementing more of specificationwith supplementingthat the dummy the variables AXIX, AXIX, A4164, A4164, or A6574. A6574. dummy variables transformaThis procedure leads leads to the logarithmic transformaThis procedure the logarithmic coeffition, the dummy Since the the coeffiby the AXIX. Since correctedby tion, corrected dummy AXIX. that for forAXIX cient the log of AGE was negative and that AGE was AXIX cient of the negativeand with our our consistent with was positive, positive, this was this formulation formulation is consistent information relationship's pattern: pattern: ceteris pariceteris parithe relationship's informationon the the smaller smallerthe the housing housing bus, the the greater building age, the building bus, age,the greaterthe whose solid solid rent buildings, whose the 19th-century for the rent except except for 19th-centurybuildings, reconstruction other unobserved unobservedcharacteristics) construction(or (or other characteristics)require an an upward correction. upwardcorrection. quire the Final Model 4.3 The Selection 4.3 Selection of the

rents for for housing Since predict market we want market rents Since we want to predict housing units whose government controlled, relevant aregovernment whose rents rentsare units controlled,a relevant mean criterion the number numberof regressors the mean choose the criterionto choose regressorsis the this error squared prediction error. error. An estimate estimate of this error squared prediction serving Mallows different models models is the the Mallows compare different serving to compare 2 + 2p - n, where SSR statistic: statistic: C p = (SSR p /u ) + Cp (SSRp/a2) SSRpp is the a22 is an sum of of squared residuals with p regressors, regressors, u model unbiased estimator estimator ofthe variancein the the model of the residual residualvariance unbiased with the largest number and is the of variables, and n the sample withthe largestnumber variables, sample subset the selection selectionof the the subset size. The C permits the size. The statisticpermits p statistic Cp

that maximizes the model's that maximizes the predictive capacity model'spredictive mincapacity(or (or minthe mean imizes imizes the mean squared squarederror). error). This criterion criteriondid did not not lead lead to the This the inelusion inclusion of new new variables our previous variables to our previous list. best model list. The The best model is preprecolumn (5) sented sented in column has a C Table 3, and and has 12.6 p of 12.6 (5) of Table Cp 17 explanatory with with 17 variables. Once Once this this model was model was explanatoryvariables. we repeated the internal internalanalysis each obobanalysis of each selected, selected,we repeatedthe servationand and searched servation searchedfor for other othersources sourcesofspecification of specification errors. The results errors.The as follows: resultswere wereas follows: 1. The The maximum maximum value for the value for the t statistic for the the 1. statistic for Studentized Studentizedresiduals residualswas was 3.5. The two two next next values valueswere 3.5. The were 3.1, the rest while the rest of the the data data presented 3.1, while presentedno problems. problems. These the explanatory These three three observations observationsare are elose close to the explanatory variables' variables'center center of gravity, influence on that their their influence gravity,so that estimatesis small. At any rate, there parameter therewas was no small. At parameterestimates rate, any observationwith with a high value. we convalue. Therefore, we conobservation Di Therefore, high that the eluded that cluded the final final model model is robust robust to outliers. outliers. 2. Residual did not Residual plots plots did not show show any evidenceof specany evidence specification ification errors. errors. The The residual residual distribution distribution is normal normal according the Kolmogorov-Smirnov test with with a == accordingto the Kolmogorov-Smirov test .05. .05. 3. Finally, estimationsituation 3. Finally, the estimation adequate withsituationis adequate without multicollinearity problems: the condition index out multicollinearity of the condition index problems: the X'X matrix matrix was was only the 8.8 (see Belsley, Kuh, and 8.8 and (see Belsley, Kuh, only Welsch 1980). Welsch 1980).

4.4 The 4.4 The Economic EconomicInterpretation Interpretation In the In the first place, the aboye that first place, above analysis indicates that analysis indicates the MMA 82%of rent differencesfor for market markethousing rent differences housing in the MMA 82% 17 characteristics were can can be explained the 17 characteristicsthat that were explained by by the empirically relevant. While the information on strucWhile information strucrelevant. the empirically tural traits traits is rather ratherrich, data on attributes tural attributesreferring rich, data referringto MMA 85 zones of the MMA were very housing location in the location the 85 the zones werevery housing poor. Thus, it is not surprising that the latter, that the the latter, the poor. Thus, surprising accessibility index A Ce, the socioeconomic index index the socioeconomic index ACC, accessibility ALTA, and buildings age index OLD and the the buildings OLD explained ALTA, explained age index 14 struconly 4% the while 4% ofthe observed variability, while the of observed the 14 struconly variability, tural the remaining 78%. Ifwe 78%. If we turalcharacteristics characteristicsexplained the explained remaining had public goods levels, pollution local public had data data on local goods levels, pollution of differenttypes, and the the distribution distributionof nonresidential nonresidential different types, and land we should expect that the location variables' we that the location variables' land uses, should uses, expect relative importance would have been greater. would have been relative importance greater. In variablesappear with the In the all variables the second second place, the appear with place, all for expected algebraic As the coefficients' intersigno for the coefficients' interexpected algebraicsign. pretation, the comments are are in order: order: the following pretation, followingcomments 1. For l. For the the variables variables in logarithms logarithms AGE, AGE, M2, M2, NFL, NFL, the elasticity DET, and ACC the measure the and ACC the coefficients coefficientsmeasure DET, elasticity size measured measured directly. increasein housing 10%increase Thus,a 10% housingsize directly.Thus, in squared meters leads leads to a 4% 4%increase which increase in rents, rents,which squaredmeters are decreasing scale in indicates returns to scale that there there are indicates that decreasingreturns this variable. The the accessibility -.4 elasticity The -.4 for the this variable. elasticity for accessibility two dwellings identicalin every index somewhatlow; index is somewhat low;two every dwellingsidentical respect, fora difference differenceof 50% 50%on transportation transportation exceptfor respect,except would have 20% time the Central Business District, District, would have a 20% CentralBusiness time to the The .19 for number difference for number of .19 elasticity difference in rento rent. The elasticity

Pehaand and Ruiz-Castillo: Ruiz-Castillo: RobustMethods Peña Robust Methodsof Building Models BuildingRegression RegressionModefs

floors have an an immediate perfloorsdoes does not have immediateinterpretation; interpretation;perhaps taller buildings are taller buildings are more more desirable desirable on average haps average some characteristic characteristicnot reflected because they they possess possess sorne reflected because in our for housing housing the -.08 -.08 elasticity our survey. Finally, the elasticity for survey. Finally, deterioration reasonable. deteriorationstate state seems seems reasonable. untransformed variables 2. For continuous untransformed variables For the the continuous OCUP, and ANTIG the coefficients, ALTA, and ANTIG the coefficients,multiplied OCUP,ALTA, multiplied the percentage rent increase increaseattribattribpercentage in rent 100, represent representthe by 100, utable to a unit the corresponding characunit increase increasein the utable correspondingcharacteristic. can be interpreted for occupancy The 8% 8%for teristic.The occupancyyears interpreted yearscan as annual rate rent inflation inflation during the 1965as the the annual rate of rent during the 1974 premiums for for location The market market premiums location in 1974 periodo period. The zones or for for better better socioeconomic socioeconomic conmore more modern moder zones ditions 9%. 7% to 9%. ditions are, are, respectively, respectively,7% 3. the dependent variableappears the 3. When When the dependentvariable appearsin logs, logs, the expression where (jj coefficient 1)-100, 100, where (exp (jj expression(exp 1fjis the coefficient 13j-- 1). as the the percentage of a dummy percentage variable,is interpreted interpretedas dummy variable, change rents due presence of the the attribute attributein due to the presence change in rents question. For such For the the 9 significant variables,such question. dummy variables, significantdummy as follows: effects, percent, are are as follows: effects,expressed expressedin percent,

TWOM CHTW LESS TWOM AXIX CHTW FURN AXIX 20.5 62.9 -21.8 -21.8 12.7 62.9 12.7 29.0 TELF CHEAT GAR BEXP TELF BEXP CHEAT GAR 13.3 16.0 14.0 28.3 13.3 16.0 14.0 of fit is very In very satisfactory, In conclusion, the goodness conclusion,the satisfactory, goodnessoffit and rent differences differences in the economic and the economic explanation explanation of rent 17 significant terms variables is, of the final final model's model's 17 termsofthe is, on significantvariables balance, reasonable. balance,quite quite reasonable. 5. CONCLUSIONS CONCLUSIONS

To prevent least squares' outliers, squares'great sensitivityto outliers, preventleast greatsensitivity was recrobustnesswas the model's model's robustness recan internal study an internal study of the ommended the potentially Section 3 highlighting ommended in Section potentially highlightingthe influential as well those that, well as as those influentialobservations, fact, that, in fact, observations,as The diagonal terms clearly the estimation estimationresults. results.The affectthe diagonalterms clearlyaffect the indication of the of the matrix V is a good the projection good indication projectionmatrix measformer, while statisticis adequate adequate to measwhile the the Cook Cook Di Di statistic former, ure the latter. Moreover, servesto determine determine statisticserves the latter. ure Moreover,a t statistic which observations can consideredatypical. can be considered which observations atypical. could Our this methodology Ourempirical methodologycould empiricalapplication applicationof this placed in the Belsley, Kuh, the context context of Chapter be placed Kuh, Chapter4 of Belsley, and the Harrison Harrison These authors authorsanalyze Welsch(1980). and Welsch analyzethe (1980). These charand marketvalues and charvalues and data on market Rubinfeld(1978) and Rubinfeld (1978) data acteristics units in the the acteristicsof 506 owner-occupied owner-occupieddwelling dwellingunits firstplace, detect Boston Metropolitan Area. Area. In In the place, they the first BostonMetropolitan they detect the such a regression. residualsin such of OLS residuals the nonnormality regression. nonnormalityofOLS Then that sorne some observe that Then they M-estimators,observe compute M-estimators, they compute and verify coefficients modified, and verify that that are considerably coefficientsare considerablymodified, follow a normal the normal residuals follow Studentized residuals the weighted weighted Studentized distribution. hand, using some statistics statistics the other other hand, On the distribution.On using sorne 10%of the the sample different from ours, that 10% detect that differentfrom ours, they sample they detect consists and informally anainformally anaof influentialobservations, consistsofinfluential observations,and lyze deleting 5 OLS after afterdeleting the consequences consequencesof applying applyingOLS lyze the data points. In In neither they investigate investigate the the effect effect neithercase casedo they datapoints.

19

of outliers outliers on the the functional functionalform. form. In Section Section 4, we we also also found In found nonnormality OLS nonnormalityof OLS residuals an exploratory residuals in an However, when model. However, when we we exploratorymodel. the robustification apply robustification strategy we recommend, recommend, we we apply the strategywe find that (a) regarding the find that decisionsregarding the model's model's functional functional (a) decisions form variables to include form and and the the variables include in it can can be considconsiderably by a few few outliers; the residuals' residuals' lack affectedby lack outliers;(b) erablyaffected (b) the can be attributed attributedto sorne of normality some identified identifieddata data normalitycan coding errorsand and other other anomalous anomalous observations; and observations;and coding errors (c) the elimination elimination of outliers outliers improves the statistical statistical (c) the improves the model model of rent rent housing values in the the MMA, MMA, enhancing housing values enhancing its its economic economic meaning. meaning. ACKNOWLEDGMENTS ACKNOWLEDGMENTS

This work work is part This part of a larger project financed financedby the the largerproject Spanish Ministeriode de Economía Economia y Comercio. Comercio.The The auauSpanishMinisterio thors wish to acknowledge thors wish the cooperation Jose acknowledge the cooperation of José Antonio who was was responsible Antonio Quintero, for the the comcomQuintero, who responsiblefor putational aspects this work, work, as as helpful as well well as computational aspectsof this helpfulcomments this journal's ments by by this journal's referees. referees. 1982. Revised 1983.] [Received Revised July July 1983.] September1982. [ReceivedSeptember

REFERENCES REFERENCES P. F., ANDREWS, P. and PREGIBON, PREGIBON, D. (1978), the Outliers Outliers ANDREWS, F., and (1978),"Finding "Findingthe that Matter," Journal01 the Royal that Royal Statistical StatisticalSociety. Ser. B, Matter,"Journal B, 40, 40, of the Society, Ser. 85-93. C. (1982), Transformations A. C. Diagnostics, Transfonnations ATKINSON, A. ATKINSON, (1982),"Regression "RegressionDiagnostics, and and Constructed Royal Statistical SaciConstructedVariables," Journal01 the Royal StatisticalSociVariables,"Journal of the ety, Ser.B, 1-36. 44, 1-36. B, 44, ety, Ser. J. (1973), "RecentEmpirical Workon on the the Determinants M. J. Empirical Work Detenninants BALL, M. BALL, (1973), "Recent of RelativeHouse House Prices," ofRelative Prices," Urban UrbanStudies, Studies, 10,213-231. 10, 213-231. and LEWIS, T. (1978), Outliersin in Statistical BARNETT, V., and Data, StatisticalData, BARNETT,V., LEWIS,T. (1978), Outliers John Wiley. John Wiley. R. E. BELSLEY, D. D. A., KUH, E., and WELSCH, E. (1980), Regression BELSLEY, A., KUH, E., and WELSCH,R. (1980), Regression Diagnostics, John John Wiley. Diagnostics, Wiley. Tests on Variances," E. P. P. (1953), and Tests BOX, G. E. Variances," BOX, G. (1953), "Non-nonnality "Non-normalityand 318-335. Biometrika. 40, 318-335. Biometrika, and Modeling," in Robustness in Statis"Robustnessand Statis- - (1979), Robustness in (1979), "Robustness Modeling,"in L. Launer eds. R. R. L. and G. G. N. Wilkinson, AcademicPress. tics, Launer and Wilkinson, Academic Press. tics, eds. in Scientific Inferencein ScientificModelModel- - (1980), and Bayes' (1980), "Sampling Bayes'Inference "Samplingand the Royal Staand Robustness" Journal 01 Royal StaIing Robustness"(with (with discussion), discussion),Journal of the ling and Ser.A, 383-430. tisticalSociety, tistical A, 383-430. Society,Ser. "A Bayesian G. E. and TIAO, C. G. G. (1968), E. P., Bayesian Approach BOX, G. TIAO,C. BOX, P., and (1968), "A Approachto Sorne Biometrika, 55, Some Outlier OutlierProblems," 119-129. Problems,"Biometrika, 55, 119-129. - - (1973), Bayesian Inference in Statistical StatisticalAnalysis, Inferencein (1973), Bayesian Analysis, Reading, Reading, Mass.: Mass.:Addison-Wesley. Addison-Wesley. G. E. and TIDWELL, "Transformation BOX, G. E. P., P., and P. W. W. (1962), of BOX, TIDWELL,P. (1962), "Transfonnation the Independent Variables," 4,531-550. the Independent Variables,"Technometrics, Technometrics, 4, 531-550. E. P. P. (l979a), CHEN, Assumptions G. E. and BOX, G. G., "ImpliedAssumptions BOX, G. (1979a), "Implied G., and CHEN, G. TechnicalReport RobustEstimators," for Estimators," Technical Some Proposed forSorne 568, ReportNo. 568, ProposedRobust Statistics. University of Wisconsin, Wisconsin, Madison, Dept. of Statistics. Madison,Dept. University - - (1 979b), "A Technical Report "A Study Real Data," Data,"Technical 569, ReportNo. 569, (1979b), Studyof Real of Statistics. University of Wisconsin,Madison, Madison,Dept. Dept. ofStatistics. UniversityofWisconsin, Via aa Bayesian RobustificationVia "FurtherStudy - - (l979c), Bayesian (1979c), "Further Study of Robustification Approach," Technical Report No. 570, 570, University University of Wisconsin, Wisconsin, TechnicalReport Approach," Statistics. Madison, Dept. Madison, Dept. of Statistics. in Linear Observationsin COOK, Linear of InfluentialObservations R. D. (1977), "DetectionofInfluential COOK,R. (1977),"Detection 15-18. Regression," Technometrics, Technometrics,19, 19, 15-18. Regression," in Linear - - (1979), Linear Regression," JourObservationsin "InfluentialObservations Regression,"Jour(1979),"In!luential 169-17. StatisticalAssociation, American Statistical Association, 74, nal olthe 74, 169-17. of the American the Accuracy COOK, R. D., and P. (1981), "On the and PRESCOTT, Accuracyof PRESCOTT,P. (1981), "On COOK, R.

20 20

of Business &Economic EconomicStatistics, Joural of Business& 1984 Joumal Statistics,January January1984

in Linear for Detecting Outliersin BonferroniSignificance Levels for Linear Bonferroni SignificanceLevels DetectingOutliers 59-63. 23, Models," Models,"Technometrics, 23, 59-63. Technometrics, R. D., and WEISBERG, an S. (1980), "Characterization of an COOK, COOK,R. D., and WEISBERG,S. (1980),"Characterization in Functionfor for Detecting InfluenceFunction InfluentialCases Cases in Empirical EmpiricalInlluence DetectingInlluential 495-508. Regression," 22, Technometrics, 22, 495-508. Regression,"Technometrics, P. H. H. (1949), "Noteon on Sorne Some Properties of Maximum DlANANDA, DIANANDA,P. (1949),"Note PropertiesofMaximum LikelihoodEstimates," Likelihood Estimates,"Proceedings Philosophical Proceedingsolthe of theCambridge CambridgePhilosophical 536-544. Saciety, 45, 536-544. Society,45, andJOHN, N. R., J. A. A. (1981), "InfluentialObservations Observations DRAPER, R., and JOHN,J. DRAPER,N. (1981),"Influential in Regression," and Outliers Outliersin and 23, 23, 21-26. Technometrics, Regression,"Technometrics, Z. (oo.) PriceIndexes Indexesand and Quality GRILLlCHES, GRILLICHES,Z. (ed.) (1971), (1971), Price QualityChange, Change, HarvardUniversity Press. Harvard UniversityPress. "Premiumand and Protection Protectionof Several SeveralProceGUTTMAN, ProceGUTTMAN,1.I. (1973), (1973),"Premium WithOutliers duresfor forDealing OutliersWhen WhenSample Sizesare areModerate Moderate dures SampleSizes DealingWith to Large," to 15,385-404. Technometrics, 15, 385-404. Large,"Technometrics, andRUBINFELD, D. L. L.(1978), "HedonicHousing HARRISON, HARRISON,D., D., and RUBINFELD,D. (1978),"HOOonic Housing Pricesand and the the Demand for C1ean Demandfor CleanAir," Journal01Environmental Prices Air,"Journal of Environmental and Management, Economicsand Economics 5, 81-102. Management,5, and WELSCH, R. E. E. (1978), in "TheHat Hat Matrix Matrixin HOAGLlN, C., and HOAGLIN,D. c., WELSCH,R. (1978),"The and ANOVA," TheAmerican 17-22. American Statistician, Regression ANOVA,"The Statistician,32, 32, 17-22. Regressionand R. V. "An Introduction V. (1979), Introductionto Robust Robust Estimation," in HOGG, HOGG, R. (1979), "An Estimation,"in L. Launer Robustnessin in Statistics, eds. R. R. L. Launerand and G. G. N. Wilkinson, Robustness Statistics,eds. Wilkinson, Press. AcademicPress. Academic J. (1964), P. J. "RobustEstimation Estimationofa of a Location LocationParameter," HUBER, HUBER,P. (1964),"Robust Parameter," Annals Annals 01Mathematical Statistics,135, 135,73-101. of MathematicalStatistics, "Robustnessand and Designs," in A Survey - - (1975), (1975), "Robustness Designs,"in SurveyolStatistical of Statistical and Linear LinearModels, J. N. Srirastarra, ed. J. North-Holland. Design North-HoIland. Models,ed. Srirastarra, Designand RobustStatistics, John Wiley. - - (1981), Statistics,John (1981),Robust Wiley.

JEFFREYS, H. (1961), OxfordClarendon Clarendon JEFFREYS,H. (1961), Theory Theory01 of Probability, Probability,Oxford Press. Press. KRASKER, W.S., andWELSCH, E.(1982), R. E. "EfficientBoundOOBoundedKRASKER,W. S.,and WELSCH,R. (1982),"Efficient influenceRegression influence Journalolthe AmericanStatisStatisof theAmerican Estimation,"Journal RegressionEstimation," ticalAssociation, tical 595-604. Association,77, 77, 595-604. MILLER, R. G. G. (1977), in MuItiple MILLER,R. (1977), "Developments "Developmentsin MultipleComparisons Comparisons 1966-1976," Journal01 the American AmericanStatistical StatisticalAssociation, of the 1966-1976,"Journal Association,72, 72,

779-788. 779-788. MOSTELLER, and TUKEY, J. W. W. (1977), Data Analysis and MOSTELLER,F., F., and TUKEY, J. (1977), Data Analysisand Regression, Regression,Addison-Weslsy. Addison-Weslsy. PEÑA, J. Aspects andRUIZ-CASTlLLO, J. (1983), "Distributional PENA,D., D., and RUIZ-CASTILLO, (1983),"Distributional Aspects of Public Public Rental Rental Housing and Rent Rent Control in Spain," Control Policies Policies in Housingand Spain," Journal Journal01Urban Economics,forthcoming. of UrbanEconomics, forthcoming. QUlGLEY, J. M. M. (1979), Have We We LearnOO about Urban "WhatHave Learnedabout Urban (1979), "What QUIGLEY,J. in Current Housing CurrentIssues Issues in in Urban UrbanEconomics, eds. Markets?,"in Economics,eds. HousingMarkets?," P. P. Mieszkowski Mieszkowskiand andM. M. Straszheim, TheJohns JohnsHopkins Straszheim,The HopkinsUniversity University Press. Press. ROSEN, S. (1974), Pricesand "HedonicPrices and Implicit Markets:Product Product ROSEN,S. (1974), "Hedonic ImplicitMarkets: in Pure DitTerentiation Differentiationin PureCompetition," Journalof PoliticalEconomEconomCompetition,"JournalolPolitical ics, ics, 82, 82, 34-55. STIGLER, S. M. M. (1973), "SimonNewcomb, and the the STIGLER,S. Newcomb,Perey (1973), "Simon Daniel, and PereyDaniel, of RobustEstimation History Estimation1855-1920," Journalolthe American, oftheAmerican, 1855-1920,"Journal HistoryofRobust StatisticalAssociation, Statistical Association,63, 63, 872-879. TUKEY, W. (1960), J. W. "A Survey of Samplingfrom fromContaminated Contaminated TUKEY,J. (1960), "A SurveyofSampling in Contributions Distributions, Contributionsto to Probability and Statistics, ed. Distributions,in Statistics,oo. Probabilityand J. Olkin, StanfordUniversity J. Press. Olkin,Stanford UniversityPress. WElSBERG, Applied Linear Linear Regression, S. (1980), John Wilay. WEISBERG,S. (1980),Applied Regression,John Wiley.

Suggest Documents