Single-index model selections - CiteSeerX

5 downloads 67 Views 160KB Size Report
Accord V6). Thus, we find that excessive ..... Honda Accord LX V6. 33.45. 745.60. Mercury Mystique LS 4. 49.28. 1098.47. Honda Accord EX 4. 46.07. 1026.90.
Single-index model selections BY PRASAD A. NAIK

AND

CHIH-LING TSAI

Graduate School of Management, University of California, Davis, California 95616, U.S.A. [email protected]

[email protected]

SUMMARY We derive a new model selection criterion for single-index models, AICC , by minimizing the expected Kullback-Leibler distance between the true and candidate models. The proposed criterion selects not only relevant variables but also the smoothing parameter for an unknown link function. Thus, it is a general selection criterion that provides a uniÞed approach to model selection across both parametric and nonparametric functions. Monte Carlo studies demonstrate that AICC performs satisfactorily in most situations. We illustrate the practical use of AICC with an empirical example for modeling the hedonic price function for automobiles. In addition, we extend the applicability of AICC to partially linear and additive single-index models. Some key words. Hedonic price function; Local polynomial regression; Sliced inverse regression; Smoothing parameter estimator; Variable selections.

1. INTRODUCTION Regression analysis is commonly used to understand the relationship between a response variable y and a vector of regressors x. In many situations, a linear regression model E(y) = x0 β is used to assess the impact of the regressors on the expected response E(y). To make this analysis more ßexible, single-index models of the type E(y) = g(x0 β) can be used, in which the link function g is unknown. One of the advantages of singleindex models is that it mitigates the risk of misspecifying the link function. Horowitz & H¨ ardle (1996) have shown that misleading results are obtained if a binary probit model is estimated by specifying cumulative normal distribution as the link function rather than estimating g by non-parametric methods. Other advantages of single-index models are listed in Horowitz (1998, §2.2), including the ability to overcome the curse of dimensionality and the capability to extrapolate beyond the support of x.

Because of these advantages, single-index models have been studied extensively in both statistical and economic literatures (Powell, Stock & Stocker, 1989; Duan & Li, 1

1991; H¨ ardle, Hall & Ichimura, 1993; Ichimura, 1993; Horowitz & H¨ardle, 1996; Carroll, Fan, Gijbels & Wand, 1997). For example, Duan & Li (1991) provide a non-iterative approach, called sliced inverse regression, for estimating the direction of β even when the link function g is not known. These studies assume that the set of regressors x contains useful information to predict the response variable. If this set contains irrelevant regressors, a scenario that is quite likely in high-dimensional environments with hundreds of variables (Naik, Hagerty & Tsai, 2000), then the precision of parameter estimates, as well as the accuracy of forecast of the response variable, will deteriorate (Altham, 1984). Consequently, exclusion of irrelevant variables from a set of regressors used in a single-index model becomes crucial. Hence, the objective of this paper is to contribute to the single-index modeling literature by deriving an appropriate model selection criterion. Previous research has investigated model selection approaches in the context of parametric and nonparametric regression models. For parametric regression models, Hurvich & Tsai (1989) derived a bias-corrected version of Akaike’s (1973) information criterion, AICC , that leads to proper model choices especially when the sample size is small or the number of variables is large.

For nonparametric regression models,

Hurvich, Simonoff & Tsai (1998) (hereafter referred to as HST) obtained an improved version of AICC for linear smoothers, and showed that the resulting estimated regression functions are not undersmoothed relative to those obtained using generalized crossvalidation (Craven & Wahba, 1979) or Akaike’s (1973) information criteria. Simonoff (1998) extended the application of AICC to categorical data smoothing as well as density estimation; and recently, Simonoff & Tsai (1999) considered the problem of selecting variables and smoothing parameters for semiparametric and additive models. Although their work examined several models, they assumed that the link function was known. When the link function is not known, a natural approach is to combine the above selection criteria by incorporating an additive penalty term for the unknown link function. However, we Þnd that such a strategy leads to underÞtting.

Hence, we derive an

appropriate model selection criterion for the class of single-index models by minimizing the expected Kullback-Leibler distance. The resulting criterion simultaneously chooses relevant regressors and a smoothing parameter for an unknown link function. Thus, this criterion provides a uniÞed approach to model selection across both parametric and nonparametric functions. The rest of the paper is organized as follows. §2 describes the derivation of the

proposed criterion AICC . §3 presents Monte Carlo results that show AICC performs 2

well when either the sample size or the signal-to-noise ratio is not small. When both are small, the use of AICC may result in underÞtting. Hence, in §4, we show how to apply AICC cautiously to mitigate the risk of underÞtting via an example in which we estimate

hedonic price functions using a small sample and weak signal. In addition, we show how single-index models yield empirical insights that are not available from linear regression models. Finally, in §5, we generalize the applicability of AICC to partially linear as well as additive single-index models, and conclude by suggesting possible avenues for future work.

2. DERIVATION

OF AICC

We Þrst describe the single-index model and its estimation, and then derive the model selection criterion AICC . 2.1 Model structures Suppose that data Y = (y1 , · · · , yn )0 are generated from the true model Y = g0 (X0 β0 ) + ε,

(1)

where X0 = (x10 , · · · , xn0 )0 is an n × p0 matrix of random regressor values, xi0 and β0 are p0 × 1 vectors, g0 (X0 β0 ) is an unknown n × 1 vector with the ith component

g0 (x0i0 β0 ) (i = 1, · · · , n), ε for the given X0 = x0 is distributed as N(0, σ02 In×n ), and σ0

is an unknown scalar. In addition, we assume that g0 is a differentiable function and

||β0 || = 1 for identiÞcation; see Carroll et al. (1997). Two well-known models are special cases of equation (1): (i) the linear regression model (g0 is the identity function); and (ii)

the nonparametric regression model (p0 = 1). Hurvich & Tsai (1989) and HST (1998) have obtained the AICC criterion for models (i) and (ii), respectively. Let the candidate model be Y = g(Xβ) + u,

(2)

where X = (x1 , · · · , xn )0 is an n × p matrix of random regressor values, xi and β are p × 1 vectors, g(Xβ) is an n × 1 vector with the ith component g(x0i β) (i = 1, · · · , n), u for given X = x is distributed as N(0, σ2 In×n ), and σ is an unknown scalar. In addition, we

assume that g is an unknown differentiable function and β has an unit norm. To assess the distance between the true and the candidate models, we next describe the estimation of single-index models. 3

2.2 Model estimation Single-index models can be estimated by using iterative or direct methods (Horowitz, 1998, Ch. 2). In the iterative case, we apply nonparametric regression to obtain the consistent estimate gˆ, and solve nonlinear optimization problems to obtain the consistent ˆ e.g. maximum quasi-likelihood estimate βˆmql (Carroll et al., 1997). The estimate β; iterative methods are computationally intensive because they require an estimate of nonparametric mean regression at each data point to compute an objective function, which may be non-convex or multimodal (Horowitz, 1998, p. 35), whose iterative ˆ By contrast, the direct methods are not iterative, and provide optimization yields β. consistent estimate of β without requiring the estimate of g; e.g. sliced inverse regression estimate βˆsir (Duan & Li, 1991). Hence, direct methods are appealing in high-dimensional data analysis (Naik, Hagerty & Tsai, 2000). After obtaining βˆsir , we apply the local polynomial regression (Fan & Gijbels, 1996, p. 19, and Simonoff, 1996, p. 139), with a Gaussian kernel to estimate the unknown link function by gˆ(t), where t = X βˆsir . Thus, we can estimate gˆ and βˆ by either an iterative or direct approach. Next, we compute ˆ ˆ σ ˆ 0 {Y − gˆ(X β)}/n and use (ˆ g, β, ˆ 2 ) to select the appropriate model σ ˆ 2 = {Y − gˆ(X β)}

from a broad class of candidate models via the model selection criterion, AICC , derived below. 2.3 AICC criterion A useful measure of the discrepancy between the true and candidate models is the

Kullback-Leibler information. Omitting terms that are not functions of the candidate model (Linhart & Zucchini, 1986, p. 18), the resulting Kullback-Leibler information is given by the following expression: d(g, β, σ2 ) = E0 {−2 logf(Y )} = n log(2πσ 2 ) + E0 [{g0 (X0 β0 ) + ε − g(Xβ)}0 {g0 (X0 β0 ) + ε − g(Xβ)}/σ 2 ] = n log(2πσ 2 ) + nσ02 /σ2 + {g0 (X0 β0 ) − g(Xβ)}0 {g0 (X0 β0 ) − g(Xβ)}/σ2 ,

(3)

where f (Y ) denotes the likelihood for the candidate model (2), and E0 denotes expectation under the true model. Replacing (g, β, σ2 ) in (3) with the corresponding estimators from §2.2, we obtain

the discrepancy measure

ˆ 0 {g0 (X0 β0 ) − gˆ(X β)}/ˆ ˆ σ2 . ˆ σ σ2 ) + nσ02 /ˆ σ 2 + {g0 (X0 β0 ) − gˆ(X β)} d(ˆ g , β, ˆ 2 ) = n log(2πˆ 4

ˆ with respect to the data, we compute To judge the quality of the estimator gˆ(X β) ˆ σ g , β, ˆ 2 )}. Ignoring the constant nlog(2π), we have ∆ = E0 {d(ˆ ˆ 0 {g0 (X0 β0 )− gˆ(X β)}/ˆ ˆ σ 2 ]. (4) ∆ = E0 (n logˆ σ2 )+nσ02 E0 (1/ˆ σ 2 )+E0 [{(g0 (X0 β0 )− gˆ(X β)} Given the collection of competing candidate models, we select the model that results in the smallest ∆ (Hurvich & Tsai, 1989). In practice, ∆ is usually not computable since it depends on the unknown function g0 (X0 β0 ). Hence, to facilitate the computation of ∆, we make the following assumptions: (A.1) The parametric component of a candidate model includes the parametric component of the true model; that is the columns of X can be rearranged so that X0 β0 = Xβ ∗ , where β ∗ = (β00 , β10 )0 , and β1 is a (p − p0 ) × 1 vector of zeros.

(A.2) There exists a smoother matrix Hnp so that g˜(Xβ ∗ ) ≡ Hnp Y . That is, g˜ is the projection of Y through the hat matrix Hnp .

g (Xβ ∗ )} ≈ g0 (Xβ ∗ ). (A.3) E0 {˜ ˆ − g˜(Xβ ∗ ) ≈ V˜ (βˆ − β ∗ ) ≈ Hp (y − g˜(Xβ ∗ )), where Hp = V˜ (V˜ 0 V˜ )−1 V˜ 0 , V˜ = (A.4) gˆ(X β) ∂˜ g (Xβ)/∂β|β=β ∗ = g˜.(Xβ ∗ )X, and g˜. is the derivative of g˜.

In deriving AICC for parametric models, Hurvich & Tsai (1989) made the Assumption (A.1). In the derivation of AICC for nonparametric models, HST (1998) assumed (A.2) and (A.3). For semiparametric and additive model selection, Simonoff & Tsai (1999) made Assumptions (A.1), (A.2), and (A.3). Here, in order to derive the AICC criterion for single-index models, we add Assumption (A.4). The Þrst approximate equality in this assumption is based on the following reasoning: we apply the linear Taylor expansion to ˆ ≈ g0 (Xβ ∗ ) + V0 (βˆ − β ∗ ), where V0 = ∂g0 (Xβ ∗ )/∂β ∗ . Then, using the local get g0 (X β) ˆ by its estimate gˆ(X β), ˆ and using Assumption polynomial regression, we replace g0 (X β) (A.3), we replace g0 (Xβ ∗ ) by g˜(Xβ ∗ ) and V0 by V˜ , respectively. The second approximate equality in Assumption (A.4) is motivated by nonlinear regression models; see Seber & Wild (1998, equation 2.16). Following HST (1998) and Simonoff & Tsai (1999), we note that the above assumptions are made only to facilitate the derivation of a selection criterion whose performance is satisfactory in Þnite samples (see §3).

g (Xβ ∗ ) = −Hnp ε and Under the assumptions (A.1) through (A.4), we have g0 (Xβ ∗ )−˜ ˆ ≈ −Hp {ε + g0 (Xβ ∗ ) − g˜(Xβ ∗ )} ≈ −(Hp − Hp Hnp )ε. Hence, g0 (Xβ ∗ ) − g˜(Xβ ∗ ) − gˆ(X β) ˆ ≈ (I − Hp − Hnp + Hp Hnp )ε. Thus, ∆ ˆ ≈ −(Hp + Hnp − Hp Hnp )ε and Y − gˆ(X β) gˆ(X β) in equation (4) can be approximated by

5

˜ = E0 (n log σ ∆ ˆ2) ½ ¾ 1 2 2 + n σ0 E0 ε0 (I − Hp − Hnp + Hp Hnp )0 (I − Hp − Hnp + Hp Hnp )ε ½ ¾ ε0 (Hp + Hnp − Hp Hnp )0 (Hp + Hnp − Hp Hnp )ε . + nE0 ε0 (I − Hp − Hnp + Hp Hnp )0 (I − Hp − Hnp + Hp Hnp )ε

(5)

˜ In the context of nonparametric regression, HST derived three approximations for ∆. Since equation (5) has the same form as equation (2.2) in HST, we can obtain these three approximations for single-index models. The simplest of these approximations results in the criterion: ˆ2 + AICC = log σ

ˆpH ˆ np − H ˆ np )/n ˆp + H 1 + tr(H , ˆ pH ˆ np ) + 2}/n ˆp + H ˆ np − H 1 − {tr(H

(6)

ˆ p = Vˆ (Vˆ 0 Vˆ )−1 Vˆ 0 , Vˆ is obtained by replacing β ∗ and g˜. in V˜ with their correwhere H ˆ ˆ np is Hnp evaluated at Xβ = X β. sponding estimators, βˆ and gˆ., respectively, and H ˆ np component In parametric regression models, since g is known, we can omit the H in equation (6), resulting in a criterion that is equivalent to the AICC of Hurvich & Tsai ˆ p from equation (6) when we consider (1989). By contrast, we can drop the component H model selection for nonparametric regression, yielding a criterion that is identical to HST’s equation (2.5). A natural extension of these criteria to single-index models would ∗ = log σ ˆ 2 + {1 + include both Hp and Hnp additively. However, such a criterion, AICC ˆ np )/n}/[1 − {tr(H ˆp + H ˆ np ) + 2}/n], leads to underÞtting. This is because AIC ∗ ˆp + H tr(H C ˜ estimates ∆ with a greater bias than does AICC , which is approximately unbiased. Thus,

AICC is an appropriate model selection criterion for single-index models, generalizing and unifying model selection approaches across both parametric and nonparametric functions. Next we study the performance of AICC for simultaneously selecting variables and a smoothing parameter in Þnite samples.

3. SIMULATIONS In this section, we examine the performance of AICC as a function of sample size, signal-to-noise ratio, and shape of the link function. Although we have done extensive simulation studies, for the sake of brevity, we report the results for the following settings. (i) Sample sizes n = 25 and 100. (ii) True link functions g0 (X0 β0 ) = exp(−X0 β0 ) and 6

√ sin(πX0 β0 /10), where β0 = (1/ 5)(1, 1, 1, 1, 1)0 , X0 is an n × 5 matrix, and the i-th row of X0 , (xi1 , · · · , xi5 ) contains Þve independent uniform random variables, all on [0, 1]. (iii)

Candidate models: The explanatory variables of the candidate single-index models are stored in n × 10 matrix X containing independent uniform random variables (all on [0, 1])

in a nested fashion. In other words, columns 1 to p, p = 1, · · · , 10, deÞne the matrix of explanatory variables for the candidate single-index model with p regressors. The true

single-index model contains the explanatory variables with the Þrst Þve columns of X. (iv) Signal-to-noise ratio (SNR): ε ∼ N (0, σ02 ), and SNR = Ry /σ02 = 5 and 10, where

Ry is the range of g0 (X0 β0 ). We perform 1000 replications for each setting described

above. In each realization, we apply sliced inverse regression, SIR, and local polynomial regression to estimate β and g, respectively. Fig. 1 presents the frequency of model selection by AICC when the true link function is exp(−X0 β0 ). Panel (a) shows that AICC performs well when the sample size is 100. The model selection performance of AICC does not change much as SNR decreases from 10 to 5. Even when the sample size is small (n = 25), Panel (b) clearly indicates that the performance of AICC is quite good for SNR = 10. Comparing Panels (a) and (b), we observe that AICC performs better as the sample size increases. However, AICC tends to underÞt when both the sample size and signal-to-noise ratio are small. A similar Þnding has been noticed in parametric regression model selection (McQuarrie & Tsai, 1998, Ch. 2). Table 1 presents the average of the normed SIR estimates βˆsir and their standard ˆ when the correct deviations, as well as the average smoothing parameter estimate h model is chosen. We Þnd that βˆsir is a good estimate of β, even though the regressors are uniform variates and not distributed elliptically as required by the inverse regression theory. This Þnding is consistent with Li’s (1991, p. 337) comments. As n or SNR ˆ becomes larger as increases, the accuracy and precision of βˆsir increases. In addition, h n or SNR gets smaller. A similar pattern has been found in nonparametric regression smoothing parameter selection (HST, 1998). In summary, Fig. 1 together with Table 1 shows that AICC tends to underÞt and oversmooth as sample size or SNR decreases. In other words, AICC Þnds a simple parametric component and a nonparametric function with less structure in single-index models with small samples or weak signals. The previous link function exhibits a decreasing trend, and so we next consider a non-monotonic link function: g0 (X0 β0 ) = sin(πX0 β0 /10). The rest of the simulation settings are unchanged. The general pattern of AICC ’s performance is similar to that 7

displayed in Fig. 1, and hence it is not presented here. The main difference is that the correct model is chosen more frequently here than in the previous case for small samples and weak signals; see Table 2. To summarize, our simulation studies indicate that the proposed AICC criterion for single-index models performs well in Þnite samples. SpeciÞcally, it can be used to select both relevant regressors and a smoothing parameter when the sample size is large and/or the signal is strong. If the sample size is small and the signal is weak, then AICC leads to underÞtting, resulting in model choices that may exclude some relevant regressors. To avoid such outcomes in practice, users need to apply AICC with caution, as we illustrate in the following empirical example.

4. EMPIRICAL

EXAMPLE

We apply single-index modeling to estimate a hedonic price function for automobiles. Because economic literature on hedonic price theory is extensive (Rosen, 1974 and Palmquist, 1991), we describe hedonic price functions only brießy. Then, we present the data, discuss the model selection and estimation results, and investigate further insights available from single-index models. 4.1 Hedonic price function Automobile manufacturers produce differentiated brands of cars such as Camry and Maxima which consists of different levels of attributes, e.g. miles per gallon and horsepower. The manufacturers would like to charge the highest price that consumers are willing to pay. On the other hand, consumers search across different brands of cars, negotiate price information with car dealers, and eventually pay the least possible price for a set of attributes they prefer. The processes of consumers’ search and competition across manufacturers result in an equilibrium in which different market prices prevail for various brands of cars offering different levels of attributes. A relationship between market prices and a set of attributes is called the hedonic price function (Rosen, 1974). Economic theory does not specify the shape of the hedonic price function (Palmquist, 1991, p. 87) because it is likely to be different for different markets. Hence, single-index models offer the desired ßexibility for estimating hedonic price functions. 4.2 Car data Our data consists of 25 brands of family sedans. These brands of cars differ on nine attributes measured by Consumers Union; see Annual Auto Issue of Consumer Reports, 8

April, 1999. The attributes are mileage (X1 ), horsepower (X2 ), length (X3 ), width (X4 ), weight (X5 ), height (X6 ), satisfaction (X7 ), reliability (X8 ), and overall evaluation (X9 ). The response variable is price (Y ), obtained from the Internet company at the website www.carsdirect.com, which quotes non-negotiable transaction prices at which the company sells these brands of cars. The entire data set will be made available at the website www.gsm.ucdavis.edu. 4.3 Selection and estimation results Using the above data set, we estimate the single-index model given by equation (2) in which the response variable is price and the regressors are the nine attributes. Without specifying the unknown link function g(.), we obtain the SIR estimates βˆsir = (0.0172, 0.0185, −0.1046, 0.2112, 0.0001, 0.3190, 0.4530, −0.0313, −0.0126)0 . Applying Chen & Li’s (1998) results, we obtain the t-ratios (0.11, 3.23, −2.74, 1.19, 0.08, 1.38, 2.76,

−0.22, −0.74)0 , which are calculated by dividing the SIR estimates by their respective

standard deviations. Based on the SIR analysis, we Þnd that horsepower (X2 ), length

(X3 ), and satisfaction (X7 ) have signiÞcant effects on price. However, the t-ratios alone may not be an adequate guide for selecting relevant variables for the following two reasons. First, the standard errors of the SIR estimates are not exact (Chen & Li, 1998, p. 219). Second, the standard errors of SIR estimates are likely to inßate when the model contains irrelevant variables (Altham, 1984). Therefore, we apply the AICC criterion to select a parsimonious set of attributes. Using the absolute t-ratios, we sort the set of attributes in the following order (X2 , X7 , X3 , X6 , X4 , X9 , X8 , X1 , X5 ). This ordering allows us to consider only nine nested candidate models instead of 29 − 1 models. For each of the nested models, we obtain

Þrst the SIR estimates and then the link function gˆk (k = 1, · · · , 9) by applying local polynomial regression. Next, we determine the AICC value from equation (6). Across

the nine candidate models, the smallest AICC value is 15.79, and the corresponding single-index model is given by price = gˆ2 (0.020X2 + 0.446X7 ),

(7)

which could be presented as price = gˆ(0.045X2 + 0.999X7 ) upon dividing the coefficients by their norm. The estimate of the smoothing parameter is 0.7. The adjusted R2 is 0.75, and the residual plots (not presented here) do not exhibit any clear pattern. In addition, the score test, proposed by Simonoff and Tsai in the Technical Report of the University of California at Davis, does not indicate heteroscedasticity. Hence, we conclude that 9

this model Þts the data reasonably well. Therefore, based on AICC , only two variables inßuence market price: horsepower and satisfaction. From the above analysis, we see that AICC prevents overÞtting in small samples. However, since the estimated signal-to-noise ratio is 5.65 and sample size is 25, our simulation studies suggest that we may be underÞtting by excluding some relevant attributes. Hence, we exercise caution by considering the next best model: price = gˆ3 (0.023X2 − 0.035X3 + 0.397X7 ). The AICC value for the above model is 15.88. To ascertain whether this AICC value is marginally larger due to chance, we test the hypothesis H0 : price = gˆ2 (0.020X2 +0.446X7 ) versus H1 : price = gˆ3 (0.023X2 − 0.035X3 + 0.397X7 ). Applying the procedure described

by Simonoff & Tsai (1999, p. 28), we obtain the tail probability of the statistic A = AICC (under H1 ) − AICC (under H0 ) by using 1000 bootstrap simulations. We Þnd that

the p-value is 0.20, and hence cannot reject the null model.

In summary, model (7) adequately describes the hedonic price function for this automobile market.

Interestingly, we observe that engine horsepower and consumer

satisfaction alone predict market price as good as these variables plus styling features and other performance attributes predict price. Next, we provide an insight that is not available from a linear regression model which is typically used in this area (Boulding & Purohit, 1996). 4.4 Brand-specific implicit prices Having identiÞed the relevant attributes in the hedonic price function, we next estimate the implicit prices for these attributes. SpeciÞcally, an implicit price for the jth attribute is ηj = ∂E(Y )/∂Xj . In linear regression models, the slope estimate for an attribute is its implicit price, which is constant across all brands. In contrast, singleindex models provides the implicit price for a jth attribute of the ith brand, which is ηij = ∂E(yi )/∂xij = ∂g(x0i β)/∂xij = g.(x0i β)βj , where g. is the derivative of g. Table 3 displays the estimates of brand-speciÞc implicit prices for horsepower and satisfaction. For example, car manufacturers can determine the implicit price for horsepower of their brand, say Maxima, relative to another speciÞc brand, e.g. Camry. This information has strategic value because it is more relevant to each manufacturer than the average implicit price across all brands. For instance, although the average implicit price for horsepower is $46.26 per unit, Table 3 shows that consumers attach a smaller value, about $35, for brands with high-powered V6 engines (Passat V6, Camry V6 and 10

Accord V6). Thus, we Þnd that excessive power for family sedans might be undesirable, possibly due to a concern for safety. We conclude the empirical example by noting that this insight into consumers’ behavior is not available from a linear regression model, illustrating the practical value of single-index models.

5. CONCLUDING

REMARKS

In this paper, we derive the AICC criterion for single-index models that selects not only relevant variables but also a smoothing parameter for the unknown link function. Our simulation studies indicate that AICC works satisfactorily in most situations. Thus, AICC generalizes and uniÞes model selection approaches across parametric and nonparametric functions. Here, we extend its applicability to two important model structures: the partially linear single-index model and the additive single-index model. The partially linear single-index model is Y = g(Xβ) + Zγ + u,

(8)

where g, X, and β are deÞned as in equation (2), Z is an n × q matrix of random regressors not overlapping with X, γ is a q × 1 vector, and u for given X = x and Z = z is distributed as N(0, σ2 In×n ). The additive single-index model is given by Y = g(Xβ) + h(Zγ) + u,

(9)

where h is an unknown differentiable function, g, X, Z, and u are deÞned as in equation (8). For both models (8) and (9), AICC in equation (6) serves as the model selection ˆ p and H ˆ np are given in the criterion. The necessary formulae for the quantities σ ˆ2 , H Appendix, and detailed derivations of the selection criteria can be requested from the second author. Finally, we identify the following three research areas for further study. First, derive AICC for generalized partially linear single index models (Carroll et al., 1997). Second, generalize other model selection criteria such as F P E (Akaike, 1970), Cp (Mallow, 1973), and BIC (Schwarz, 1978) so that they are applicable to single-index models. One straightforward generalization is to replace the term for the number of parameters in the penalty functions for F P E, Cp and BIC by tr(Hp + Hnp − Hp Hnp ). However, this

generalization lacks theoretical justiÞcation. Third, study the efficacy of AICC by using

alternative parameter estimators; for example, ordinary least squares estimator (Brillinger, 1983) or quasi-maximum likelihood estimator (Carroll et al., 1997). We believe that these 11

efforts would lead to better methods for analyzing high-dimensional data (Naik, Hagerty & Tsai, 2000).

ACKNOWLEDGEMENT We thank the referee and Editor for their valuable comments that led to signiÞcant improvements in this paper. Chih-Ling Tsai’s research was supported in part by the National Science Foundation and the National Institute of Health.

APPENDIX ˆ p , and H ˆ np Formulae for σ ˆ2, H For model (8), we have ˆ − Z γˆ }/n; ˆ − Z γˆ }0 {Y − gˆ(X β) σ ˆ 2 = {Y − gˆ(X β) ˆp = U ˆ )−1 U ˆ 0, ˆ (U ˆ 0U H ˆ = (Vˆ , Z), and Vˆ is deÞned as in equation (6); and where U ˆ np = H ˆ ∗ + S, ˆ H 0 −1 0 ˆ ˆ ˆ and Sˆ is the n × n smoother matrix for ˆ ∗ = (I − S)Z{Z (I − S)Z} Z (I − S), where H

obtaining gˆ.

For model (9), we have ˆ − h(Z ˆ γˆ )}/n; ˆ − h(Z ˆ γˆ )}0 {Y − gˆ(X β) σ ˆ 2 = {Y − gˆ(X β) ˆp = W ˆ (W ˆ 0W ˆ )−1 W ˆ 0, H ˆ is the estimate of the derivative of h ˆ = (Vˆ , Tˆ), Tˆ = ∂h(Zγ)/∂γ| ˆ , and h. where W γ ˆ ,h. obtained from local polynomial regression; and ˆ np = I − (I − Sˆ2 )(I − Sˆ1 Sˆ2 )−1 (I − Sˆ1 ), H ˆ respectively. Similar where Sˆ1 and Sˆ2 are n × n smoother matrices for obtaining gˆ and h, ˆ np can be found in Simonoff & Tsai (1999) and Hastie & Tibshirani descriptions of H (1990, p. 120). 12

REFERENCES AKAIKE,

H.

(1970). Statistical predictor identiÞcation. Ann. Inst. Statist. Math. 22,

203-17. AKAIKE,

H.

(1973). Information theory and an extension of the maximum likelihood

principle. In 2nd International Symposium on Information Theory, Ed. B. N. Petrov & F. Csaki, pp. 267-281. Budapest: Akademia Kiado. ALTHAM,

P. M. E.

(1984). Improving the precision of estimation by Þtting a model. J.

R. Statist. Soc. B 46, 118-9. BOULDING,

W.

BRILLINGER,

& PUROHIT,

D. R.

variables.

(1996). The price of safety. J. Cons. Res. 23, 12-25.

D.

(1983).

A generalized linear model with “Gaussian” regressor

In A Festschrift for Erich L. Lehmann in Honor of His Sixty-Fifth

Birthday, Ed. P. J. Bickel, K. A. Doksum & J. L. Hodges. pp. 97-114. California: Wadsworth. CARROLL,

R. J.,

FAN,

J.,

GIJBELS,

I.

& WAND,

M. P.

(1997). Generalized partially linear

single-index models. J. Am. Statist. Assoc. 92, 477-89. CHEN,

C. H.

& LI,

(1998). Can SIR be as popular as multiple linear regression?

K. C.

Statist. Sin. 8, 289-316. CRAVEN,

& WAHBA,

P.

(1979).

G.

Smoothing noisy data with spline functions:

estimating the correct degree of smoothing by the method of generalized crossvalidation. Numer. Math. 31, 377-403. DUAN,

N.

& LI,

K. C.

(1991). Slicing regression: a link free regression method. Ann.

Statist. 19, 505-30. FAN,

J.

& GIJBELS,

I.

(1996). Local Polynomial Modelling and Its Applications. New

York: Chapman & Hall. HASTIE,

T. J.

& TIBSHIRANI,

R. J.

(1990). Generalized Additive Models. New York:

Chapman & Hall. ¨ HARDLE,

W.,

HALL,

P.

& ICHIMURA,

H.

(1993). Optimal smoothing in single-index

models. Ann. Statist. 21, 157-78. HOROWITZ,

J. L.

(1998). Semiparametric Methods in Econometrics. New York: Springer.

HOROWITZ,

J. L.

¨ & HARDLE,

W.

(1996). Direct semiparametric estimation of single-index

models with discrete covariates. J. Am. Statist. Assoc. 91, 1632-9. HURVICH,

C. M.

& TSAI,

C. L.

(1989). Regression and time series model selection in

small samples. Biometrika 76, 297-307. HURVICH,

C. M.,

SIMONOFF,

J. S.

& TSAI,

C. L.

13

(1998). Smoothing parameter selection

in nonparametric regression using an improved Akaike information criterion. J. R. Statist. Soc. B 60, 271-93. ICHIMURA,

H.

(1993). Semiparametric least squares

(SLS)

and weighted

SLS

estimation

of single-index models. J. Econ. 58, 71-120. LI,

K. C.

(1991). Sliced inverse regression for dimension reduction (with discussions). J.

Am. Statist. Assoc. 86, 316-42. LINHART,

H.

MALLOWS,

& ZUCCHINI,

C. L.

MCQUARRIE,

(1986). Model Selection. New York: Wiley.

W.

(1973). Some comments on Cp . Technometrics 15, 661-75.

A. D. R.

& TSAI,

C. L.

(1998). Regression and Time Series Model Selection.

Singapore: World ScientiÞc Publishing. NAIK,

P. A.,

HAGERTY,

M. R.

& TSAI,

(2000). A new dimension reduction approach

C. L.

for data-rich marketing environments: sliced inverse regression. J. Mark. Res. 37, 113-24. PALMQUIST, R. B. (1991). Hedonic methods. In Measuring the Demand for Environmental Quality, Ed. J. B. Braden and C. D. Kolstad, pp. 77-120. Amsterdam: NorthHolland. POWELL,

J. L.,

STOCK,

J. H.

& STOKER,

T. M.

(1989). Semiparametric estimation of

index coefficients. Econometrica 57, 1403-30. ROSEN,

S.

(1974). Hedonic prices and implicit markets: product differentiation in perfect

competition. J. Polit. Econ. 83, 34-3. SCHWARZ, SEBER,

G.

(1978). Estimating the dimension of a model. Ann. Statist. 6, 461-4.

G. A. F.

& WILD,

C. J.

(1988). Nonlinear Regression. New York: Wiley.

SIMONOFF,

J. S.

(1996). Smoothing Methods in Statistics. New York: Springer.

SIMONOFF,

J. S.

(1998). Three sides of smoothing: categorical data smoothing, nonpara-

metric regression, and density estimation. Inter. Statist. Rev. 66, 137-56. SIMONOFF,

J. S.

& TSAI,

C. L.

(1999). Semiparametric and additive model selection using

an improved Akaike information criterion. J. Comput. Graph. Statist. 8, 22-40.

14

(a) 1000

Frequency

750 SNR = 5 SNR = 10

500 250 0 1

2

3

4

5

6

7

8

9

10

Number of variables

(b) 1000

Frequency

750 SNR = 5 SNR = 10

500 250 0 1

2

3

4

5

6

7

8

9

10

Number of variables

Figure 1. Frequency of number of variables in the model chosen by AICC when the true model is exp(-X0β0). (a) n = 100. (b) n = 25.

Table 1. Average estimates and standard deviations of βˆ sir and average estimates of hˆ for the true model exp(-X0β0) n = 25, SNR = 5

n = 25, SNR = 10

n = 100, SNR = 5

n = 100, SNR = 10

SIR estimates

 0.4051     0.3814   0.4073     0.4052   0.4688   

 0.4297     0.4225   0.4344     0.4314   0.4316   

 0.4405     0.4342   0.4323     0.4285   0.4340   

 0.4415     0.4444   0.4447     0.4427   0.4423   

Std. Deviations

 0.1664     0.1675   0.1763     0.1801   0.1470   

 0.1227     0.1268   0.1275     0.1207   0.1180   

 0.1073     0.1108   0.1076     0.1127   0.1028   

 0.0613     0.0607   0.0603     0.0581   0.0614   

Bandwidth, hˆ

0.9608

0.9206

0.8649

0.7372

Table 2. Frequency of the correct model selected by AICC exp(-X0β0)

Model

sin(πX0β0/10)

n

25

100

25

100

SNR = 5 SNR = 10

535 932

917 919

560 928

910 907

Table 3. Brand-specific implicit prices ηˆ i 2 for horsepower

ηˆ i 7 for satisfaction

Volkswagon Passat GLS 4 Volkswagon Passat V6 Toyota Camry LE 4 Toyota Camry LE V6 Honda Accord LX V6

46.07 37.06 46.99 35.68 33.45

1026.90 825.98 1047.37 795.20 745.60

Mercury Mystique LS 4 Honda Accord EX 4 Ford Contour SE 4 Subaru Legacy L Oldsmobile Cutlass GLS

49.28 46.07 49.28 47.23 48.71

1098.47 1026.90 1098.47 1052.59 1085.59

Chevrolet Malibu 4 Oldsmobile Intrique GL Nissan Maxima GXE Pontiac Grand Prix SE Ford Taurus SE

48.71 42.12 46.45 46.85 48.96

1085.59 938.89 1035.37 1044.26 1091.32

Mercury Sable GS Chrysler Cirrus Lxi Dodge Stratus ES Plymouth Breeze Nissan Altima GXE

48.96 48.93 48.35 48.35 49.34

1091.32 1090.55 1077.68 1077.68 1099.61

Chevrolet Lumina LS Mazda 626 LX 4 Mazda 626 LX V6 Buick Regal LS Buick Century Limited

48.16 47.60 48.83 46.99 48.16

1073.29 1060.82 1088.30 1047.30 1073.29

Average

46.26

1031.13

Brands i