our knowledge of the many important discrete decisions that define the demand for .... Gillen 1978), activity choice and time allocation (Damm and Lerman 1981, ..... have only begun to scratch the surface with regard to transport applications of.
Discrete/continuous econometric models and their application to transport analysis
Department of Civil Engineering, T h e Pennsylvania State University, University Park, PA 16802, U.S.A.
and DAVIDA. HENSHER School of Economic and Financial Studies, Macquarie University, NSW 2109, Australia
A wide range of transport-related decisions involve the linking of discrete choices (e.g. of vehicle choice) and continuous choices (e.g. of vehicle use). In recent years econometricians have developed procedures for integrating such choices into a framework that is both economically and statistically sound. The literature is however somewhat technical. T h e objective of this paper is to provide a general overview of the basic elements of discrete/continuous econometric modelling with an emphasis on transport applications. It is hoped that such an introduction will demonstrate that the essence of the approach for the practitioner is quite straightforward and can be implemented with widely available computer software.
5 1. INTRODUCTION Historically, transport-related problems have served as both a motivating force in the development of new econometric methods and a forum for the application of such methods. This bond between the econometric and transport fields has provided for a steady evolution of analytic techniques that have in turn improved our understanding of complex transport phenomena. A classic example is the relatively recent development and implementation of discrete choice econometric methods in the transport sector. T h e use of such methods has allowed us to explore and expand our knowledge of the many important discrete decisions that define the demand for transport services, such as the choice of mode, route, and type and quantity of vehicles to own. Discrete analysis techniques, along with the more traditional continuous methods (e.g., least-squares regression), encompass the range of econometric methods currently enjoying widespread use in transport analysis. Recently, however, researchers have come to recognize that many transport decisions involve both discrete and continuous components. For example, a freight shipper decides which mode to use and the quantity of goods to ship; a consumer decides whether to purchase a new or used car and the extent of utilization (i.e., accumulated kilometres); a traveller chooses a route and the speed of travel; a shopper decides where to shop and how much to spend; a freight transport company chooses whether to rent or purchase vehicles as well as the number of vehicles needed. All of the above examples involve discrete and continuous choices that are interrelated, as the outcome of one clearly affects the other. T h e idea of interrelated decisions brings to mind classical simultaneous equation methods that have been used by econometricians for years; however, the application of such methods to the
228
F . Mannering and D. A. Hensher
problem of mixed discrete and continuous choices is not conceptually straightforward. Subsequently, a number of econometricians have in the past few years developed theoretically consistent and empirically estimable discrete/continuous models. T h e objective of this paper is to provide a general introduction to discretelcontinuous econometric modelling with an emphasis on transport applications. T h e paper begins by presenting an intuitive overview of the discrete/continuous problem. This is followed by a discussion of alternative discrete/continuous model structures. Next, the evolution of corrective econometric techniques is summarized. Finally, applications of discrete/continuous modelling systems in the transport sector are reviewed, and a summary and directions for future research are presented.
5 2. DISCRETE/CONTINUOUS MODELS: AN INTUITIVE OVERVIEW Interrelated discrete/continuous choices give rise to a challenging econometric problem. Essentially, this problem can be viewed as one of 'sample or self-selection' and arises frequently in practice where either only individuals (or items) with certain characteristics are selected into the sample (i.e. sample selection) and/or where the revealed empirical setting is generated by individuals making choices (i.e., selfselection). Depending on the issue under study, failure to allow for these sources of selectivity may result in statistically biased estimates of model parameters. T h e literature in econometrics on the joint modelling of discrete and continuous components of choice provides a suitable framework for selectivity correction. T o illustrate the selectivity problem that evolves from interrelated discretelcontinuous choices, consider the choice of route and operating speed. In this case, we will assume that the motorist has decided to make a trip and now must select among alternative routes (discrete) and operating speeds (continuous). For the selection of route, we expect factors such as travel time (which results from the choice of operating speed), variance of travel time (i.e., related to the number of signalized intersections, and so on), conditions of road surface, and terrain to be important concerns. On the choice of operating speeds, determining factors may include surface conditions, horizontal and vertical alignments, traffic conditions (volumes), and the extent of speed limit enforcement. Clearly, then, the two decisions are interrelated since route selection is dependent on operating speed (i.e., travel time) and operating speed is dependent on route selection (i.e., physical and operational characteristics of the route). If the interest is on the operating speed of actual users of the route (i.e. inference from the conditional distribution), rather than the speed of potential users (i.e. inference from the marginal distribution), then it is sufficient to study operating speed conditional on those actually using the route. T h e issue of selectivity bias arises when we are concerned with questions of the form 'if travellers were to use. . .',in contrast with questions of the form 'if travellers do u s e . . .'. It evolves from the fact that the speeds of individuals observed using a particular route are unlikely to constitute a random sample. More specifically, consider an origin-destination pair that are connected by two routes, one a highspeed freeway and the other an urban arterial. One is led to expect that, all else being equal, motorists who tend to drive faster may be attracted to the freeway, while slower motorists may be more likely to take the arterial. If we were to estimate a model of freeway speeds based on the data collected from observed freeway users, our results would be biased because the sample of freeway users is not random and is in fact censored by operating speed. Viewed from another perspective, we have
Econometric models and transport analysis
229
'missing data' in the sense that we do not know how fast motorists selecting the arterial would have driven had they selected the freeway. T o provide a visual demonstration of the sample selection problem, consider a as a function of a driver's income (Ii). simple regression model of freeway speed (Si) Si=Bo+BIIi
(1)
T h e speed data collected from all freeway users is represented by the '0' values in the figure. T h e speed data for non-freeway users, had they taken the freeway, is represented by the 'X' values. However, these values are not available for estimation since we do not observe arterial users on the freeway. If we estimate the regression equations using only the observed freeway data the result is line a-a. When the interest in on the speed of any user, actual or potential, a-a is clearly a biased result from the true line b-b. T h e latter is an estimate of freeway speeds based on all , d nvers. ' T h e consequences of using the biased estimates of line a-a for analysis can be severe. For example, if there is road construction that causes a major shift in motorists' route choice, our predictions of operating speed will be inaccurate as they will reflect the bias of line a-a parameter estimates. From an econometric standpoint, this bias arises because the disturbances (which are capturing unobserved effects) of the discrete model and the continuous model are correlated. For example, the unobserved effects that tend to increase operating speeds, such as the use of an expensive, high-performance vehicle, would also tend to increase the likelihood of the motorist selecting the freeway, where there are generally lower accident rates than urban arterials. This correlation of unobserved effects has served as the basis for the development of corrective econometric procedures. Typically, data are available on the endogenous and exogenous variables for all individuals and all relevant alternatives for the discretechoice, but only for the endogenous variable for the continuous choice which is associated with the chosen discrete alternative. This data perspective is known as a
Illustration of bias in a model of freeway operating speeds, based on actual users.
230
F. Mannering and D. A. Hensher
censored specification of the choice process, and provides enough data to allow for selectivity. If we had no information on the exogenous variables for the non-chosen alternatives in the discrete choice model we have a truncated specification which for all practical purposes has inadequate data for selectivity correction.
3 3. APPLICATIONS AND ALTERNATIVE MODEL STRUCTURES Discrete/continuous choices have been identified in many disciplines. Examples include workforce participation and wages (Heckman 1976, Hill 1983), choice of union participation and earnings (Schmidt and Strauss 1976), choice of whether or not to own a house and how much to spend (Lee and Trost 1978), occupational choice and income (Hay 1980), a firm's choice of geographic location and production level (Duncan 1980), technical training and earnings (Trost and Lee 1984), residential appliance choice and utilization (Dubin and McFadden 1984), regional location and wage (Falaris 1983), and choice of workshift and reservation wage (Terza 1980). In the transport field, researchers have explored the choice of mode and distance between the auto parking location and final trip destination (Westin and Gillen 1978), activity choice and time allocation (Damm and Lerman 1981, Kitamura 1984), discounted bus coupon choice and trip frequency (Jacobson 1983), choice of freight mode and quantity of freight to be shipped (McFadden et al. 1986), choice of shopping destination and amount of expenditure (Barnard 1987), choice of trip timing and time allocation to various activities (Chin 1986) and choice of automobile type and the extent of utilization (kilometres) (Mannering and Winston 1985, Hensher and Milthorpe 1986, 'Train 1986). These applications have produced a wide range of model structures, some of which are based on an underlying economic theory and others which rely more heavily on standard statistical structures. For a large percentage of transport applications, two fundamental model structures can be identified: (1) a reduced form structure, and (2) a structure that ensures economic consistency. Structure (1) can also be consistent with economic assumptions. T h e reduced form structure is, intuitively, the more appealing of the two. For illustration purposes, consider a consumer's choice to buy either a new or used automobile (discrete) and the related decision of determining the extent to which the vehicle is to be utilized. In this case, we expect new vehicles to be used more than used vehicles due to their lower per kilometre operating costs (reflecting greater reliability, improved comfort and ride, and so on). As in the earlier example of route choice, the auto choice is a classic discrete/continuous problem since, for instance, bias in estimated utilization equations will result where we have no information on the amount of driving a used-car owner would have undertaken had he chosen a new car (i.e., self-selection bias). Formalizing this choice problem, we expect the utility provided by a car to be a function of consumer characteristics, vehicle characteristics, and a disturbance term. If we use a reduced form structure, the utility will also be dependent on the continuous variable, utilization. Therefore, for the discrete choice of vehicle type, with
U i the utility provided by vehicle i Zi vector of consumer and vehicle attributes
Econometric models and transport analysis y, E,
pi 4i
utilization of vehicle i (i.e., kilometres per year) disturbance term accounting for unobserved effects vector of estimable parameters an estimable parameter
and for the continuous choice of utilization,
with
X i a vector of household characteristics qi unobserved household characteristics a , vector of estimable parameters
The estimation of this modelling system requires the substitution of equation (3) into equation (2) giving the 'reduced form' utility,
T h e discrete choice model based on this utility function can be readily formulated given a distributional assumption for the disturbance terms (i.e., the assumption of a generalized extreme value distribution will result in a logit formulation, a normal distribution will give a probit formulation and so on) (see McFadden 1981). The alternative to a reduced form structure is one that ensures economic consistency. Such structures are based on the knowledge that utility functions typically used in random utility models are 'indirect' utility functions. That is, they include prices and incomes as arguments instead of part of an income constraint. With an indirect utility function, we can recover the underlying demand function which, for the automobile example, would be vehicle utilization. The relation between an indirect utility function and its implied demand functions is defined by the well ltnown Roy's Identity in (5):
with g,
the demand for commodity i (in our case, a continuous variable)
U the indirect utility function pi unit price of commodity i I income In defining an economically consistent discrete/continuous structure, one generally starts by specifying either a continuous demand function or an indirect utility function (for the discrete choice) and then derives the other via Roy's Identity. Both reduced form and economically consistent models have enjoyed widespread use. Hay (1980) applied the reduced form to occupational choice and income and Mannering (1986 a) later applied it to the choice of vehicle type and utilization. Economically consistent structures have been used by Dubin and McFadden (1984) for appliance choice and utilization and by Train (1986), Mannering and Winston (1985), and Hensher and Milthorpe (1986) for vehicle type choice and utilization. In choosing between the two, one must compromise theory and estimability. The
F . Mannering and D. A. Hensher
232
economically consistent models are often preferred on theoretical grounds, but they can involve highly non-linear functional forms of either utility or demand equations or both, making estimation difficult. T h e reduced form models are more easily estimated but the relation between discrete and continuous components is relatively arbitrary. TECHNIQUES $4. CORRECTION Once the model structure is specified, attention must be given to the correction of the estimation bias. T h e problem of potential bias illustrated earlier in the figure is addressed as a correlation of disturbance terms (e.g. E~ in equation (2) and q i in equation (3)). Over the years, a wide range of correction procedures have been developed and applied. These procedures vary considerably in their complexity and their application potential. Below we categorize and briefly describe the econometric procedures that have been used most frequently in the estimation of discretelcontinuous models.
4.1. Indirect methods Indirect methods provide an obvious solution to the selectivity bias encountered in discrete/continuous models. T o illustrate the use of such methods, recall the previous example with the choice of a new or used automobile and the extent of utilization. Define a simple utilization equation of the following form:
yj=po+Bl
OPCOST+p2 INCj
with y j kilometres driven by individual j OPCOST vehicle operating cost INCj income of individual j. In this case the vehicle specific variable, OPCOST, is endogenous because it depends on the chosen vehicle, which is a related discrete choice. We can 'indirectly' correct this endogeneity problem by using econometric methods that were developed for traditional interrelated or simultaneous continuous equation systems. One of the most popular indirect methods is the instrumental variables approach. If we were to implement this approach, OPCOST would be regressed on a series of exogenous variables such as income, household size, and type of housing unit. T h e values of OPCOST predicted by the resulting regression equation would then be used in the estimation of equation (6). T h e instrumental variables approach has been successfully applied to discrete/continuous analysis by Dubin and McFadden (1984), Train and Lohrer (1983) and Train (1986). It has the strong advantage of being simple and easy to estimate, although cumbersome when we have a lot of vehicle characteristics in the utilization equation. However, improvements in the precision of parameter estimates can be achieved via more elaborate correction techniques. 4.2. Direct methods Direct methods of bias correction are defined herein as those techniques that provide for an explicit interaction between the discrete and continuous components of the modelling system. Three distinct types of direct methods can be identified: (1) bias correction term, (2) expected value, and (3) full information. T h e basic concepts underlying these corrective procedures are summarized below.
Econometric models and transport analysis
233
4.2.1. Bias correction term Work on the bias correction term approach began with the early contributions of Heckman (1976, 1978, 1979) who showed that selectivity is a specification error associated with unobserved (missing) explanafory variables. The approach seeks to develop a single term that can be used to correct for the selectivity bias in continuous equations. We can write the vehicle utilization equation of our previous example as,
where yll the bias correction term and all other variables are as previously defined. Values of the yll term are obtained from the functional form of the discrete choice model. Early works (Heckman 1976, 1978, 1979, Westin and Gillen 1978) derived the correction term on the assumption that the discrete choice was defined by a probit probabilistic choice model. Unfortunately, this method proved difficult to extend beyond a trinary discrete choice (see Terza 1980). As a result, the probit discrete choice assumption has limited applicability in many fields, particularly transport where most decisions involve selection among many alternatives. Hay (1980) and Dubin and McFadden (1984) extended the bias correction term to problems where multiple discrete choices were encountered. They based their correction terms on the assumption that the discrete choice is defined by a logit probabilistic choice model. The results of their studies provided the first applications of direct methods to multiple discrete choices. Unfortunately, the Hay and Dubin-McFadden bias correction term approach is somewhat cumbersome for choice situations that involve more than five or six alternatives, since in the most general case it requires the use of N- 1 correction terms (ylls), where N represents the total number of choices. Consequently, applications of the method to situations where many discrete alternatives are involved often requires that additional econometric restrictions be applied (see Mannering 1986a), in particular the commonality of error correlation between the chosen alternative and each of the nonchosen alternatives. Lee (1983) presented an alternative formulation of the selectivity bias correction term for multiple discrete choice models. Based on assumptions that differ from those made by Hay and Dubin and IVIcFadden, the Lee correction procedure tends to be even more computationally intense, but it has been successfully applied in a number of contexts (Hensher and Milthorpe 1986, and Barnard 1987). The forms of two computationally tractable correction terms are given in equations (8) and (9). log Pr8bi
y l (Hay, Dubin-McFadden) = J
$I
log Pr6bj
[
Pr8bj 1 -Prbbj
-
J
1)
(8)
j#i
yll (Lee) = - aipi(d[cD-'(Pr6bi)])/~rGbi
(9)
where Pr6bi is the predicted probability of selecting the chosen alternative Pr8bi pi
is the predicted probability of selecting a non-chosen alternative out of the set of J alternatives is the correlation between the error terms of the discrete and continuous choice
F. Mannering and D. A. Hensher o
@
is the standard error of the estimate for the continuous choice model is the standard normal density function is the inverse of the distribution function of the standard normal.
4.2.2. Expected value T h e expected value approach is the most versatile of the direct methods and can be readily applied to the problem of multiple discrete choices. The approach replaces all right-hand side endogenous variables with their expected values which are derived from the discrete choice model's selection probabilities. Thus we write equation (6) as
where EVO is the expected value of operating costs defined as
and Piis the selection probability estimated from the discrete choice model; all other variables are as previously defined. The expected value method has been applied by Dubin and McFadden (1984) and Mannering and Winston (1985) with considerable success. 4.2.3. Full information Both bias correction term and expected value methods typically involve sequential estimation procedures. That is, we must first estimate the discrete choice model, and then use the predicted probability estimates from these models to construct the bias correction term or the expected values to be used in continuous equation estimation. From an econometric perspective, we can gain precision by estimating both continuous and discrete components of the modelling system jointly instead of sequentially. Such a full information approach has been applied by McFadden et al. (1986) to the choice of freight mode and shipment size. Unfortunately, full information modelling systems are computationally cumbersome and extensions beyond simple binary discrete choices do not appear promising. Indeed, binary choice applications are relatively straightforward only if one uses a standard probit model with a linear specification of the indirect utility expression. Before turning our attention to specific transport applications of discretelcontinuous models we present in table 1 a summary classification of previous research, presented briefly in 9 3 by model structure and correction method. § 5. TRANSPORT APPLICATIONS T o date, transport applications of discrete/continuous modelling systems have largely been directed towards the study of (a) automobile type choice and extent of utilization and (b) activity type choice and time allocation. These decisions have been long recognized as classical discrete/continuous problems. Historically, work in the automobile type choice area was concentrated on developing discrete choice models of either the quantity of vehicles to own or the type of vehicles (i.e., make, model and vintage) to own. In the early 1980s, a number of researchers began to develop methods for modelling the relationship between the choices of vehicle quantity, type, and the extent to which vehicles are used (e.g. kilometres per year). Berkovec
Dubin and McFadden (1984) Train and 1,ohrer (1983) Train (1986)
Instrumental variables Heckman (1976, 1978, 1979)B Westin and Gillen (1978)B Hay (1980)' Trost and Lee (1984)' Mannering (1986 a)' Dubin and McFadden (1984)B Hensher and Milthorpe (1986)' Barnard (1987)' Chin (1986)' Kitamura (1984)B Kitamura and Bovy (1986)' Hensher (1986)B
Bias correction term (B, binary choice; P, polychotomous choice)
Indirect
Dubin and McFadden (1984) RIannering and Winston (1985) Mannering (1986 b) Jacobson (1983)
Expected value
Full information McFadden et al. (1986)
Direct
Table 1. Classification of selective previous research, grouped on the basis of indirect or direct correction procedure.
236
F. Mannering and D . A. Hensher
and Rust (1985) estimated a vehicle type choice model for one-vehicle households and developed an economically consistent theoretical basis for linking the type choice decision to vehicle utilization. Their empirical estimates included a regression equation of household vehicle utilization, but no selectivity or bias correction technique was used. Mannering (1983) also recognized the interrelations of the type and utilization decisions and presented a number of discussions on the subject. However, his empirical estimates did not incorporate any correction procedures. Perhaps the first true empirical application of discrete/continuous procedures to automobile type choice and utilisation was the work of Train and Lohrer (1983) as reported in Train (1986). Train and Lohrer developed a complete and economically consistent model of vehicle quantity, type, and utilization and estimated their models using a sample of 1095 American households. Their type and quantity models assumed a standard (sequentially estimated) nested logit structure, and the utilization model was estimated by ordinary least squares. T o account for potential bias in utilization equation estimates, Train and Lohrer used an indirect method (see $4.1). Their approach was to replace potentially endogenous variables (i.e., those that are dependent on the discrete choice, such as vehicle operating costs) with an instrument. Thus, they regress observed operating costs on a series of exogenous variables (e.g., income, household size, etc.) and use the values of operating costs predicted by this resulting regression. This procedure is computationally one of the easiest methods of correcting for potential bias in discrete/continuous modelling systems, particularly when multiple discrete choices are involved. In the Train and Lohrer case, the extent of potential bias is not explored since they did not perform a comparison between parameter estimates that were corrected for the bias and those that were not. Mannering and Winston (1985) analysed the quantity, type, and utilization decisions using a sample of 962 American households. They derived an economically consistent model and corrected for potential bias using the expected value approach described earlier in $4.2.2. to assess the extent of bias. Mannering (1986 b) performed a series of numerical experiments to compare utilization equations corrected for bias and those that were not. He found that for households owning a single vehicle, fuel priceldistance travelled elasticity was underestimated in uncorrected regressions by 27% and for households owning two vehicles the elasticity was underestimated by 20%. Mannering went on to forecast the impacts of a doubling of fuel prices for both corrected and uncorrected models (see table 2). T h e forecasts strongly underscore the importance of bias correction, as uncorrected equations could seriously understate fuel-related impacts. In another study of vehicle type and utilization, Mannering (1986 a) estimated a reduced form model (see 9 3). T o correct the parameter estimates of the utilization equations Mannering used the bias correction term procedure developed by Hay and Dubin and McFadden. T h e modelling system was estimated with a sample of 364 American households, all of which owned a single vehicle. Both bias corrected and uncorrected utilization equations were estimated. He went on to compare the utilization rates predicted by the uncorrected and corrected utilization equations. T h e comparisons revealed that, on average, the difference between the two values was 10.8%. However, for some households this difference was as high as 82%. In a very extensive study on automobile type choice and utilization, Hensher and Milthorpe (1986) performed a series of numerical evaluations. They developed an economically consistent model and corrected for utilization equation bias using the
237
Econometric models and transport analysis
Table 2. Forecasted impacts of a doubling of fuel price, percentage reduction in vehiclekilometres travelled (Mannering 1986 b). Uncorrected forecasts
Corrected forecasts
Households owning one vehicle Households owning two vehicles
bias correction term procedure developed by Hay and Dubin and McFadden as well as the procedure presented by Lee (1983). Fleet size-specific type mix choice models were estimated for a sample of Sydney households drawn from the first wave (1981) of a four-wave panel. T h e probabilities calculated from these models for the chosen and non-chosen alternatives were used in the empirical determination of the selectivity correction variables (8) and (9) (see § 4.2.1). Since the probabilities were derived from type mix choice models estimated on a randomly generated subset of the 4137 individual vehicles in the universal choice set, they investigated the empirical implications of varying the choice set size in estimation (11 and 30) and in application to calculate the bias correction (40 and 80). T h e justification for these sizes is given in Hensher and 1Uilthorpe (1986). This gave eight empirical measures of the bias correction variable for each of the fleet sizespecific vehicle utilization models ( 1 , 2 , 3 , 4 or more vehicles). Ordinary least squares (OLS) was used to obtain estimates of parameters for one-vehicle households, and 3SLS for multiple-vehicle households. Hensher and Milthorpe found that although differences do exist in the parameter estimates, elasticities and prediction of vehicle utilization obtained from the use of alternative bias correction procedures, the differences are in general marginal both within the alternative choice set contexts and relative to the absence of bias correction. T h e exception to this conclusion is in the parameter estimates of two variables-fuel cost and residential location. For example, if the mean point elasticities of vehicle use with respect to fuel cost (table 3) are considered, the relative differences are substantial with a variation up to 30% for two-vehicle households, up to 10% for three-vehicle households and up to 35% for households with more than three vehicles. In the absence of bias correction the fuel cost elasticity is significantly higher for two-vehicle households, and significantly lower for households with three or more vehicles. T h e non-significance of fuel cost for the one-vehicle model prevented any meaningful comparison. This provides sufficient support for the argument that even in the presence of a non-significant bias correction variable, its inclusion is necessary to detect and account for the magnitude of selection bias on individual parameters. T h e application by Chin (1986) illustrates the way in which the bias correction method is used to model the relationship between travel choices and budget constraints. He estimated a logit model of trip timing for a sample of Singapore commuters and used the resulting probabilities in formula (8) above to study time allocation among four activities: travel, home, work and 'other'. He found that the timing of the commuter trip had a significant influence on the amount of time available for home activities, but very little impact on time allocated to work (as expected) or other activities. T h e timing of commuter trips is strongly influenced by the Area Licensing Scheme in downtown Singapore, which has tended to reduce the
45
a A
80111
'f Not statistically significant. $40111 =40 alternatives in application, 11 alternatives in type choice estimation equation, etc. §A, Formula (8); B, formula (9).
Uncorrected
40/11$
B
A
Corrected 40130
B
Sensitivity of elasticity of vehicle utilization with respect to fuel cost (Hensher and Milthorpe 1986).
One-vehicle households Two-vehicle households Three-vehicle households Four or more vehicle households
Table 3.
A
80130
B
Econometric models and transport analysis
239
amount of time individuals can spend at home with the entire family, since the commuter tends to leave either much earlier or much later with less commonality of family time in the home. Another area in which discrete/continuous models have been applied to transport problems is the choice of freight mode and the related decisions of shipment size and frequency of shipment. T h e proper specification of such freight demand models is vital for the analysis of the welfare effects of deregulation, railroad mergers, and new freight modes. McFadden et al. (1986) develop an economically consistent model of freight transport decisions based on inventory theory. T o correct for potential bias they used a full information maximum likelihood estimation procedure. They estimated their modelling system using the United States Department of Transportation's 1977 study on produce transportation. Consideration was given to the choice between two modes: railroad and motor carrier. T h e results of their study, while illustrating the importance of accounting for interrelated discrete/continuous choices, also underscored the computational problems associated with full information correction procedures. In principle the scope for application of these new methods is unlimited; in practice however we suspect that suitable data may not be sufficiently widely accessible to take full advantage of these methods. One area ?here recent extensions of application have been made in transport is in investigation of the nature and extent of attrition bias in panel data sets.
5.1. Attrition bias T w o recent studies have recognized the usefulness of discrete/continuous choice modelling in the study of attrition in transport panel data sets (Hensher 1985, 1986, Hensher 1986, Kitamura and Bovy 1986). I n answering the question: does it make any significant difference to the results if units with certain characteristics are lost from the sample at each reinterview?, Hensher and Bodkin have estimated binary probit models on participation/non-participation in pairs of adjacent waves of a fourwave panel of household vehicle choice and utilisation. T h e explanatory variables include socioeconomic descriptions of the household and respondent, and background data on the interview (interviewer, duration and month of interview, and level of cooperation). T h e resulting attrition bias correction term is then included in continuous choice models of vehicle utilization and number of vehicles owned for the subsequent wave of continuing respondents. Providing one can assume that, given particular values of the observed variables (e.g., socioeconomics), the values missing on other variables (e.g., vehicle use) are missing at random (that is, participants and non-participants with the same characteristics on the observed variables do not differ systematically on other variables), then one can use the attrition-bias corrected continuous choice model to predict the levels of the endogenous variables for nonparticipants (i.e., imputation) and hence assess the likely extent of any bias due to attrition. If bias is present, one can use the inverse of the participation probabilities for the continuing respondents as sample weights for the next wave participants. One can also use responses on one wave as a powerful imputation predictor of a missing response for that same item on another wave. Weighting cannot use this information; however, by weighting u p participating responders the weighting approach (inverse probabilities) retains full respondent covariance structure for all items in the longitudinal record. T h u s in summary, the inclusion of participation probabilities (via the bias correction term) in the continuous choice (imputation)
240
F . Mannering and D. A. Hensher
model provides protection against non-participation bias introduced by misspecification of the relationship between variables known for all units and those only known for continuing participants. Further details are given in Hensher (1986). Kitamura and Bovy (1986) used a similar approach to study the nature of reporting errors and attrition biases in the first two waves of a Dutch household panel survey of weekly trip diaries. They investigated the relationship between reporting errors in wave one and the subsequent decision to participate in wave two, and the relationship between the same wave one errors and wave two errors for participating households. There are trip equations for each wave and a binary probit attrition probability model, with chronologial dependencies among the error terms enabling the relationship between mobility, trip reporting errors and attrition behaviour to be assessed. T h e strong correlations suggest that those who are less mobile and/or under-reported their trips in wave one tended to drop out of wave two, and those who accurately reported their trips in wave one tended to do so in wave two. They found that households with older children, with more cars and low income tend to drop out of the panel, as well as those who under-reported their trips in wave one. A comparison of the wave two trip equations with and without the attrition bias correction terms shows no differences in the estimated coefficients and the overall fit of the model. This is due to the statistical non-significance of the correction variable (t-\-alue= -0.94). Kitamura and Bovy conclude that, given a household chose to participate in the wave two survey, its unobserved propensity to do so does not influence the number of trips in the second wave. However, given that households of certain characteristics chose to leave the panel, the wave two respondents must be weighted. Using the reciprocal of the probability to participate in wave two resulted in sample means of key mobility indicators (i.e., trips in total and by mode) which were significantly closer to the wave one means than were the unweighted wave two means. Thus sample weighting has deleted the effect of participants tending to make a more-than-expected number of trips and/or reported their trips more accurately. $6. SUMMARY AND DIRECTIONS FOR FUTURE WORK T h e development of new econometric methods has historically led to empirical analyses that have improved our understanding of transport phenomena. T h e discrete/continuous econometric methods discussed in this paper are yet another example of how advances in analytic procedures have added to our knowledge of transport related behaviour. This is particularly evident in the area of automobile choice and utilization where a number of studies have demonstrated that discrete/continuous econometric procedures have improved the precision of our parameter estimates. However, there are many other transport problems that encompasss interrelated discrete and continuous decisions. In this sense, researchers have only begun to scratch the surface with regard to transport applications of discrete/continuous models. In terms of model structures and analytic procedures, there is a wide range of alternatives from which to choose. T h e goal of economic consistency in model structure is always preferred on theoretical grounds, but it is often necessary to compromise the theoretical structure of such models to obtain estimable functional forms. As a result, the more simplistic reduced form method is often a justifiable option. T h e corrective analytic procedures include instrumental variables, bias correction terms, expected values and full information system methods. Although these procedures undoubtedly have different econometric properties regarding bias
Econometric models nnd transport analysis
241
correction a n d parameter estimation efficiency, there has been little theoretical o r empirical work to suggest a superior method. Consequently, researchers have often selected a correction m e t h o d o n t h e basis of convenience rather t h a n o n some justifiable empirical concern. G i v e n t h e evolution of discrete/continuous econometric modelling methods a n d their transport applications t o date, we are able t o isolate a n u m b e r of research areas that hold t h e m o s t promise for f u t u r e work. First a n d foremost there is an urgent need to apply discrete/continuous modelling techniques t o t h e m a n y transport problems that involve interrelated discrete a n d continuous decisions. I n t h e past, analyses have taken t h e simplistic approach a n d ignored t h e interrelation of such decisions a n d treated t h e m i n isolation t h u s producing flawed estimates a n d forecasts. A n o t h e r promising direction for f u t u r e work is t h e theoretical a n d empirical evaluation of alternative model structures a n d correction techniques. S u c h evaluation would provide a valuable basis for directing applications of method. Finally, there is t h e whole issue of t h e dynamics of decision-making. I t is well known that m o s t transport decisions are m a d e i n a dynamic environment with current decisions being affected b y both past behaviour a n d f u t u r e expectations. T h e implementation of dynamics i n discrete choice models has recently received considerable attention b u t work o n dynamics in discrete)continuous choice applications has been minimal. Although incorporating dynamics i n such models is likely t o produce complex structural forms, t h e analytic benefits would b e substantial. ACKNOWLEDGMENT T h e comments of Peter Barnard a n d referees o n earlier drafts are appreciated.
Un eventail tres large de decisions concernant le transport implique soit des decisions discontinues (le choix d'un vehicule, par exemple), soit des decisions continues (telles que i'utilisation de ce vehicule). Depuis quelques annCes, I'CconomCtrie dispose de procedures qui permettent d'intkgrer de tels choix dans un cadre qui ait du sens, tant sur les plans Cconomique que statistique. En gCnCral, les publications existantes a ce propose sont d'un haut niveau technique. Cet article vise prCsenter a un public plus large ce qui peut &re fait en matikre de modClisation CconomCtrique a la fois discrkte et continue, en soulignant particuliirement les domaines d'application aux transports. Une des conclusions qui en ressort est que l'usage de tels modkles ne pose pa: de problime particulikrement difficile aux practiciens de 1'Cconomie des transports, et qu'ils peuvent Ctre mis en oeuvre en recourant a des logiciels largement disponibles. Eirle groRe Zahl von Entscheidungen im Verkehrssektor beruht auf einer Verbindung von Wahlentscheidungen zwischen diskreten (2.B. die Wahl des Fahrzeuges) und stetigen GroUen (z.B. Nutzung des Fahrzeugs). In den letzten Jahren haben Okonometriker Verfahren entwickelt, um solche unterschiedlichen Wahlsituationen in einen Zusammenhang zu stellen, der sowohl in okonomischer wie in statistischer Hinsicht abgesichert ist. Zielsetzung dieses Papiers ist es, einen allgemeinen ~ b e r b l i c kiiber die weserltlichen Elemente diskreterjstetiger okonometrischer Modellierung zu geberl und ihren Einstatz im Verkehrsbereich zu empfehlen. Diese Einfiihrung wird hoffentlich auch zeigen, daR der wesentliche Kern dieser Vorgehensweise den Praktiker weiterbringt und leicht in Datenverarbeitungsanlagen mit weit verbreiteter Software implementiert werden kann. Un amplio rango de decisiones relacionadas con transporte considera el enlace entre elecciones discretas (ej.: eleccio~lde vehiculo) y elecciones continuas (ej.: uso de vehiculos). En 10s ultimos aiios, 10s ecollometristas hall desarrollado procedimientos para integrar tales elecciones en un marco riguroso tanto desde el punto de vista econ6mico como estadistico. Sin
242
F. Mannering a n d D. A. Hensher
embargo la literatura es quizis demasiado tecnica. El objetivo de este trabajo es proporcionar una vision general de 10s elementos basicos de la modelacion economCtrica discreto/continua, con Cnfasis en aplicaciones a1 transporte. Esperamos que esta introduction sirva para demostrar que la esencia del enfoque es en realidad muy simple y que puede ser implementada con software computacional ampliamente disponible.
REFERENCES RARNARD, P. O., 1987, Modelling shopping destination choice behaviour using the basic multinomial logit model and some of its extensions, Transport Reviews, 7. BERKOVEC, J., and RUST,J., 1985, A nested logit model of automobile holdings for one vehicle households, Transportation Research, B, 19, 275-286. CHIN,A,, 1986, Trip timing and time budget allocation, draft of Ph.D. thesis in preparation, School of Economic and Financial Studies, Macquarie University. DAMM,D., and LERMAN, S. R., 1981, A theory of activity scheduling behaviour, Environment and Planning A, 13, 703-718. DUBIN,J., and MCFADDEN, D., 1984, An econometric analysis of residential electric appliance and holdings and consumption, Econornetrica, 51, 345-362. DUNCAN,G., 1980, Formulation and statistical analysis of mixed, continuous/discrete dependent variable models in classical production theory, Econornetrica, 48, 839-852. FALARIS,E. M., 1983, A nested logit migration model with selectivity, Department of Economics, Ohio State University (mimeo). W. M., 1984, Discrete/continuous models of consumer demand, Econometrica, HANEMANN, 52, 541-561. HAY,J., 1980, Occupational choice and occupational earnings, unpublished Ph.D. dissertation, Yale University. HECKMAN, J., 1976, The common structure of statistical models for truncation, sample selection, and limited dependent variables and a simple estimator for such models, Annals of Economic and Social Measurement, 5, 475-492. HECKMAN, J., 1978, Dummy endogenous variables in a simultaneous equation system, Econornetrica, 46, 931-960. HECKMAN, J., 1979, Sample selection bias as a specification error, Econornetrica, 47, 153-162. HENSHER, D. A., 1985, Longitudinal surveys in transport: an assessment, New Survey Methods in Transport, edited by E. Ampt, A. J. Richardson and W. Brog (Utrecht: VNU Science Press BV), pp. 77-98. HENSHER, D. A,, 1986, Issues in the pre-analysis of panel data, Transportation Research (Special Issue on Longitudinal Data Methods, Guest Edited by D. Hensher) (forthcoming). HENSHER, D . A,, and MILTHORPE, F., 1986, Selectivity correction in discrete-continuous choice analysis: with empirical evidence for vehicle choice and use, Regional Science and Urban Economics, (Special Issue on Spatial Choice Analysis, Edited by P. Nijkamp) (forthcoming). HENSHER, D. A,, 1986, An assessment of attrition in a household panel data set, heo ore tical and Quantitative Geography, edited by F. Hauser, H. Timmermans and N. Wrigley (Utrecht: H . Reidel Publishers BV) (in the press). HILL, M. A., 1983, Female labour force participation in developing and developed countries-consideration of the informal sector, The Review of Economics and Statistics, 65, 459-468. JACOBSON, J., 1983, Bus, taxi and walk frequency models that account for sample selectivity and simultaneous equation bias, Transportation Research Record (944), 57-60. R., 1984, A model of daily time-allocation to discretionary out-of-home activities KITAMURA, and trips, Transportation Research B, 18, 255-266. KITAMURA, R., and BOVY,P. H . L., 1986, Analysis of attrition biases and.trip reporting errors for panel data, Transportation Research B Special Issue on Longitudinal Data Methods (forthcoming). LEE, L. F., 1983, Generalized econometric models with selectivity, Econometrica, 51, 507-512. LEE,L . F., and TROST, R., 1978, Estimation of some limited dependent variable models with application to housing demand, Journal of Econometrics, 8, 357-382.
Econometric models a n d transport analysis
243
MANNERING, F., 1983, An econometric analysis of vehicle use in multivehicle households, Transportation Research, 17, 183-1 89.
MANNERING, F., 1986 a, Selectivity bias in models of discrete/continuous choice: an empirical analysis, Transportation Research Record (forthcoming).
MANNERING, F., 1986 b, A note on endogenous variables in household vehicle utilization equations, Transportation Research B, 20, 1-6.
MANNERING, F., and WINSTON, C., 1985, A dynamic empirical analysis of household vehicle ownership and utilization, Rand Journal of Economics, 16, 215-236.
MCFADDEN, D., 1981, Econometric models of probabilistic choice, Structural Analysis of Discrete Data with Econometric Applications, edited by C. F. Manski and D. McFadden (MIT Press). MCFADDEN, D., WINSTON, C., and BOERSCH-SUPAN, A., 1986, Joint estimation of freight transportation decisions under non-random sampling, Analytic Studies in Transport Economics, edited by A. Daugherty (Cambridge University Press). SCHMIDT, P., and STRAUSS, R., 1976, The effect of unions on earnings and earnings on unions: a mixed logit approach, International Economic Review, 17, 20+212. TERZA, J., 1980, Heckman's method extended to polychotomous choice models, Proceedings of American Statistical Association Business and Economy Section, 466470. TRAIN, K., 1986, Qualitative Choice Analysis: Theory, Econometrics and an Application to Automobile Demand ( M I T Press). TRAIN, K., and LOHRER, M., 1983, Vehicle ownership and usage: an integrated system of disaggregate demand models, presented at the 1983 meeting of the Transportation Research Board (mimeo). TROST, R., and LEE,L. F., 1984, Technical training and earnings: a polychotomous choice model with selectivity, Review of Economics and Statistics, 66, 151-156. WESTIN, R., and GILLEN, D., 1978, Parking location and transit demand: a case study of endogenous attributes in disaggregate model choice models, Journal of Econometrics, 8, 75-101.
EDITORIAL SUGGESTIONS FOR FURTHER READING BERKOVE'C, J., and RUST,J., 1985, A nested logit model of automobile holdings for one vehicle households, Transportation Research B, 19 (4), pp. 275-85. This paper presents a model of automobile choice by single vehicle households. This effort is distinguished from previous disaggregate automobile holdings models primarily by the use of the nested logit model rather than the more restrictive multinomial logit model. We present a two-step estimation technique that provides consistent and asymptotically efficient parameter estimates, yet is tractable for very large choice sets. Using disaggregate data on 237 one-vehicle households we estimate the unknown parameters on an automobile choice model containing 785 individual makes, models and vintages of passenger vehicles. (Authors)
DOXSEY, L. B., 1984, Demand for unlimited use transit passes, J o u r n a l of'Trarzsport Economics a n d Policy, 18 (I), pp. 7-22. This paper attempts to identify the factors which lead an individual user to buy a pass, so as to provide information for operators assessing advantages and disadvantages of introducing or continuing a transit pass. The first part of the paper develops some of the microeconomic reasons for individual demand for transit passes. The model developed not only provides the basis for a subsequent empirical analysis, but in itself suggests that a transit property is likely to increase ridership but reduce
244
Econometric models nnd transport analysis
revenue through issuance of transit passes. The second part of the paper presents an econometric model of individual choice on purchase of a pass. (Author) HENSHER, D. A., 1985, An econometric model of vehicle use in the household sector, Transportation Research B, 19 (4), pp. 303-13. Vehicle-use modelling at the household level has taken on new importance with the pressures on governments to encourage more efficient utilisation of increasingly scarce nonreplenishible liquid fuels. T h e fundamental energy equation recognises two direct influences on consumption-the fuel efficiency of the vehicle and the amount of use. Until recently, the interrelationship between vehicle choice and vehicle utilisation at the household level was acknowledged but ignored. T h e availability of reliable vehicle-use data at the household level now enables a more serious effort at amending the imbalance of research effort where the reliance has been predominantly on vehicle choice modelling and gross (exogenous) assumptions on utilisation as a basis for predicting fuel consumption. This paper proposes an econometric method for identifying the influences on household vehicle use. It differs from previous empirical work in that vehicle kilometres, fuel cost per kilometre and vehicle fuel efficiency are endogenous, with utilisation of each vehicle endogeneously dependent on the utilisation of each and every household vehicle. T h e data are drawn from wave 1 of a four-wave panel of 1436 households in the Sydney metropolitan area. T h e empirical findings expose a set of influence on use hitherto not considered. T h e model specification provides an appropriate module for integration with household-based discrete choice models of vehicle choice: (Author)
F. L., 1986, A note on endogenous variables in household vehicle MANNERING, utilization equations, Transportation Research B, 20 (I), pp. 1-6. This paper presents an empirical analysis of the statistical bias resulting from the endogeneity of vehicle specific attributes in econometric models of household vehicle utilization. Using data from a recent U.S. household survey, both corrected and uncorrected vehicle utilization models are estimated. A comparison of these estimation results reflects the substantial bias in model coefficients, price elasticities and income elasticities that can result if the endogeneity of vehicle specific attributes is not considered. (Author)
See also HENSHER, D. A,, and SMITH,N. C . , 1984, Automobile classification for choice and demand modelling, Transport Reviews, 4 (3), pp. 245--71.