school episodes using activity ... - CiteSeerX

0 downloads 0 Views 442KB Size Report
Jun 12, 2005 - Miller (2002) divides projects into a number of major types. ...... uses auto and he/she needs to travel (0.23055/9.13E-4)=253 minutes then the ...
MODELLING DURATION OF WORK/SCHOOL EPISODES USING ACTIVITY DIARY DATA FOR THE SPECIFICATION OF ACTIVITY-TRAVEL SCHEDULER K. M. Nurul Habib, University of Toronto, Toronto, Canada Eric J. Miller, University of Toronto, Toronto, Canada

PROCESSUS Second International Colloquium on the Behavioural Foundations of Integrated Land-use and Transportation Models: Frameworks, Models and Applications June 12-15, 2005, University of Toronto

1

INTRODUCTION Determining the duration of activity episodes is one of the most important elements of the activity scheduling process. The concept of activity scheduling is centered upon the accommodation of different durations (corresponding to activities) within a modelling time frame (daily or weekly). The main philosophy of activity-based travel demand models is to model activities, not travel/trips per se, where travel/trips are emergent activities resulting from the activity participation process (Hägerstrand, 1970). Activity participation generally contributes towards the achievement of one or more objectives, either during the process of participation or upon completion of the activity or both. Participation in different activities may have correlations or prerequisite requirements according to the person’s objectives or goals. Such inter-activity relationships shape the pattern of activity engagement. For this reason, life is often described as being composed of a number of projects, where projects refer to collections of different types of activities with common goal(s) (Axhausen, 1998). Different projects spawn different activity episodes, forming the collections of activities called the activity agenda for the time range of a day, a week or so on. Within this time range, the scheduling process determines the episodes of the agenda to be executed and their sequences. Agent-based modelling techniques can mimic this behavioural process. So, one way of modelling may be to think of different projects as individual agents and to consider a separate agent, often called the ‘scheduler’, that decides which activity episode from which project will be performed along with all other episode attributes (start time, duration, sequences etc.). Such an agent-based modelling process is proposed by Miller (2005) and identified as project-based activity scheduling. Within this framework, the durations of activity episodes proposed by different projects to the scheduler are considered to be the most important episode attributes because these durations work as the “container” within which episodes will occur, and the scheduler needs at least some knowledge about these durations to “get started” with the scheduling process. These desired durations might later be modified by the scheduler in response to time-space constraints, among other factors. Miller (2002) divides projects into a number of major types. Out of all possible projects, the ones necessary to meet the fundamental needs of the individual and corresponding household are defined as primary projects. Work/school projects are among the most important primary projects for most people. Practically, work/school activity episodes dominate and shape the activity patterns of daily life (Scott, 2000). Again, the work/school activities of a person not only influence his/her non-work/school activities but also influence the overall activity patterns of other household members (Picado, 1999). Scott (2000) also argues that work/school activity episodes act as pivotal points of household based activity allocation and scheduling.

2

For this reason, researchers often investigate work/school activities from a variety of perspectives. Kitamura et al (1988, 1992) investigated the relationship between work duration and corresponding travel time. They found that for the case of Osaka, travel time and distance do not affect commuters’ work durations. On the other hand, for The Netherlands and California, they found that work duration and work travel time are negatively related to the time allocated to other daily activities. It is clear that urban transportation policy sensitivities are influenced and shaped by the duration and timing of workers’ work episodes. Care, therefore, must be taken to model work/school episode durations so that various policy variables can be housed at this very root level of activity-based travel demand analysis. Often activity schedulers take the duration of work/school episodes from the observed distributions derived from travel diary or activity diary data (e.g., TASHA (Miller and Roorda, 2003), FAMOS (Pendyala et al, 2005), etc.). Use of such empirical distributions reduces the policy sensitivity of the overall travel demand model because of the lack of policy variables that are actually at play at that level. Thus, the objective of this research is to develop activity duration models for work/school episodes from activity diary data. The objective is to develop models that will fit within a projectbased scheduling model, in which episode durations are initially generated within a project (and which may be subsequently altered within the scheduling process). It is desirable that such models include policy-sensitive variables (e.g., travel time) and not be just statistical representations of observed behaviour. A second objective of this research is to investigate the relative merits of alternative methods for modelling episode durations.

LITERATURE REVIEW ON WORK DURATION MODELS FOR ACTIVITY SCHEDULERS Utility-based activity schedulers consider the activity patterns (a series of activities for the whole planning period, aligned in proper sequence) as a whole, where work/school duration is embedded inside the competing patterns. Adler and Ben Akiva (1979) and Kawakami and Isobe (1989) provide early examples of this modelling approach. The concept of considering an entire activity pattern at a specific point in time is often unrealistic, especially given the presence of within-day and day-to-day dynamics within the transportation system. Roorda and Miller (2004) found that changes of duration or start time of an activity within a planned pattern do not create a ripple effect throughout the entire schedule. Rather, people “locally” rearrange their plans. Recker et al (1986a, 1986b) also proposed a utilitybased framework, but explicitly divided the activity program and the activity participation process so as to be able to consider activity rescheduling. On the other hand, rule-based scheduling models recognize the dynamics of

3

activity generation, scheduling and rescheduling process explicitly, (Gärling et al, 1994; Axhausen and Garling, 1992). Following is the discussion on work/school duration models used in some of the operational activity-travel scheduling models. The rule-based operational model ALBATROSS (Arentze and Timmermans, 2000) develops preliminary schedules and revised schedules for the planning stage and adjusted schedules for execution. The sequential scheduling engine of ALBATROSS generates a range of minimum, maximum and average durations for the execution or rescheduling engine. Work/school episodes are considered to be within the fixed component (i.e., part of the “skeleton”) of the schedule. A CHAID algorithm is used to determine work/school episode durations. CHAID is a decision tree induction method and is a purely data-driven approach. It defines an exhaustive set of possible condition variables and selects the variables that are relevant for homogeneous condition state and specifies the corresponding decision tree. Based on these selected condition variables and decision tree, the decision variable (e.g. duration) is predicted. The model uses socio-economic variables to decide the work/school episode duration: household type, age group, child index and socio-economic class; weekly work engagement time information; household vehicle availability information. The main problems of such a decision tree approach are data hungriness and data stickiness. If condition variables are not selected properly, decision trees can produce inaccurate relationships. The selection of accurate condition variables is always a crucial and often uncertain job (Spybroeck et al, 2004). Zorman et al (1997) discuss the limitations of decision tree approaches, especially for modelling human decision-making processes. In addition, the diary data used for ALBATROSS includes only activities that were actually executed during the survey time period. They do not ensure that all reported activities were planned or all planned activities were executed. The assumption of “executed activities directly correspond to the planned activities (information about the modifications of activity attributes are missing and thereby ignored)” and “all planned activities are executed”, (page 252, Arentze and Timmermans, 2000) makes the resulting decision tree doubtful because the relationship among activity attributes may be different in planning and execution stages (Habib and Miller, 2005). The econometric model CEMDEP proposes a comprehensive activity generation-allocationscheduling model (Bhat et al, 2004). It uses a sequential approach to simulate activity travel pattern of people and divides the simulation process into three levels: Pattern-Level, Tour-Level and Stop-Level. It considers ‘work/school’ as the primary activity of the activity patterns and starts the simulation with work/school start time, duration and so forth. For prediction of the work/school duration it uses an econometric hazard model with covariates as sociodemographics and work characteristics. Dallas-Fort Worth household activity survey data for1996 are used to

4

estimate the model (Bhat et al, 2002). This survey is a one-day activity diary, where the participants listed only the activities in which they participated,as in the case for ALBATROSS. FAMOS, Florida Activity Mobility Simulator, (Pendyala at al, 2005) proposes a prismconstrained simulation approach where work/school are considered to be fixed activities within the daily activity schedule and part of the schedule skeleton. The work/ school duration that forms the skeleton of the daily schedule is randomly generated from the observed distribution in the base year travel survey data. The rule-based activity scheduler TASHA (Roorda and Miller, 2003) also takes the same approach for generating work/ school duration. Thus, both TASHA and FAMOS lack policy sensitivity because the observed distributions used to generate work/ school duration are purely empirical and do not accommodate any policy variables to reflect the day-today dynamics. With a view to make the next generation activity scheduler TASHA-II more dynamic and policy sensitive, CHASE data have been used to develop duration models of work/school episodes. This paper summarizes detailed comparisons of different approaches to duration modelling for work/school episodes in terms of their ability to predict the observed duration distributions. The main objective is to incorporate different policy variables to reflect land-use transportation policy impacts at this very bottom level of skeleton development (Habib and Miller, 2005) for activity scheduling, and at the same time determine theoretically robust and empirically supportable duration modelling techniques compatible with practical activity scheduling models.

DISCUSSIONS ON THE DURATION MODELLING TECHNIQUES This section describes the basic theories of several econometric duration modelling techniques. Methods discussed are: •

Hazard models.



Ordinal probability and flexible parameter hazard models.

• Limited dependent variable models. Statistical issues of dealing with heteroscedasticity, statistical significance of parameters and model goodness of fit are also discussed.

Hazard Model

5

Hazard models are the main analytical tool used by researchers to investigate the duration of events (Schjerning, 2004) and especially activity episodes, (Scott, 2000). Keifer (1988) describes the basic theories of hazard models; Hensher and Mannering (1994) and Bhat (2000) describe the application of hazard models to transportation analysis. The hazard-based approach recognizes the dynamics of activity duration by considering the conditional probability of termination of the episode. This method is powerful because it can handle censored data and incorporate time varying variables. If the duration of an episode is considered as T, which is a positive, continuous, random variable for the time to terminate the episode, then the probability that the episode terminates within a time interval dt conditional upon survival up to the time t is Pr(t ”7” t+dt | T •W). The average probability per unit of time for terminating the episode duration is then called the hazard rate. Considering the time interval dt is very small, the hazard function of the GXUDWLRQ W EHFRPHVDV Pr( t ≤ T ≤ t + dt | T ≥ t ) (1) dt If the cumulative distribution function of duration is F(t)=Pr(T ”W) and the survival function is S(t) = Pr(T •W) = 1 –F(t). The expression of probability distribution function of duration then can be described as ∂F (t ) ∂S (t ) f (t ) = =− ∂t ∂t Now the hazard function can be described in terms of a cumulative distribution and aprobability density function using the rule of conditional probability. Pr( t ≤ T ≤ t + dt ∩ T ≥ t) Pr( t ≤ T ≤ t + dt) Pr(T ≥ t | t ≤ T ≤ t + dt) Pr( t ≤ T ≤ t + dt | T ≥ t ) = = Pr(T ≥ t) Pr(T ≥ t) Considering the fact that Pr(T ≥ t | t ≤ T ≤ t + dt ) = 1 Pr( t ≤ T ≤ t + dt) F (T + dt ) − F (t ) = = Pr(T ≥ t) 1 − F (t )

λ (t ) = lim dt →0

the hazard function becomes: Pr( t ≤ T ≤ t + dt | T ≥ t ) F (T + dt ) − F (t ) 1 f (t ) f (t ) = lim dt →0 = = dt dt 1 − F (t ) 1 − F (t ) S (t ) − ∂ (1 − F (T )) 1 − ∂[log(1 − F (T ))] = = dt 1 − F (t ) dt − ∂[log(1 − F (T ))] λ (t ) = ; dt Integrating both sides of the above equation we get

λ (t ) = lim dt →0

6

t

[− log(1 − F ( D))]t0 = ∫ λ ( D)dD = − log( S (t )) 0

  S (t ) = exp − ∫ λ ( D)dD  = exp(− Λ t )  0  t

where Λ t is the cumulative hazard rate. Based on these principle relationships, three general types of hazard model can be developed: Non-Parametric, Semi-Parametric and Parametric hazard models. In the Non-Parametric hazard model only the duration is considered to be a variable. So the hazard or survival distribution of the observed data is calculated using an actuarial method. The commonly used method for Non-Parametric hazard model is known as Cutler and Ederer method, (Greene, 2002), where the range of duration is divided into a number of equal intervals. The risk set at each interval is calculated by deducting the censored observations from the total observations having duration greater than or equal to the interval of concern. The proportion of observations in the risk set that exited is the ratio of the number of observations exited at that interval to the total risk set. Hence, the survival rate is the difference between one and the proportion of observations in the risk set. The cumulative survival rate is the current survival rate multiplied by previous interval survival rate. For the first interval it is considered to be one. LIMDEP’s “Survival” function and STATA’s “ltable” can calculate such “lifetable”. Another widely used Non-Parametric method is Kaplan-Meier estimation technique. In order to overcome the problems of different observational biases and censoring, this method defines the survival rate as the proportions of observations in the population whose value exceed the interval of concern to the total observations without making any assumption about the form of the proportion function. It arranges the observed durations sequentially and uses product-limit estimate that gives the observed distribution, unrestricted as to form and maximizes the likelihood of observations, (Kaplan and Meier, 1958). The Non-Parametric method does not recognize any variables that may influence the termination of an event. One way to consider different policy variables as covariates of hazard rates is to parameterize the functions. The parameterization of hazard or survival function is usually done by two methods: Proportional Hazard method and Accelerated Lifetime method. The proportional hazard method specifies the effects of covariates to be multiplicative on the underlying baseline hazard rate. The general formula is: h(t ) = h0 (t ) f ( x, β ) ; where h(t) is the

hazard rate at any time t, h0(t) is the baseline hazard rate at time t, x and are the covariates and corresponding parameters. Here, the assumption is that the covariates have only scaling effects on the hazard rate, the shape of the hazard function depends on the baseline hazard rate. This

7

proportional assumption makes it easier to estimate the parameters of the covariates by partial likelihood estimation method, (Cox, 1975). This type of model is well known as the Cox Proportional hazard model or semiparametric hazard model. In this type of model the hazard function is not a function of time, and the covariates function is assumed to be exponential, H[S [ to ensure the nonnegativity of the duration modelled. However, methods may vary based on the estimation process of the baseline rates. The partial likelihood method throws the baseline rates out of the estimation process, so the baseline hazard rates are estimated separately. LIMDEP’s “survival” function and STATA’s “stcox” function estimate this type of model. It should be mentioned that the semiparametric model does not contain a constant term because it is homogeneous of degree zero in the number of covariates (see Greene, 2002). It is worth mentioning that the covariate effects are direct to the hazard rates and opposite to the duration. For example, a negative covariate parameter decreases the hazard rate and increases the duration. Unlike the semiparametric model, the Accelerated Lifetime method considers the hazard rate as a function of time so that the covariates influence not only the scale but also the shape of the hazard function. In general this type of model is also known as a parametric hazard model. The general equation of the survival rate of duration in parametric hazard models is: S (t , x, β ) = S (t , φ ( x, β )) = S (t , λ ) To ensure the nonnegativity of duration modelled λ = exp(− β x) is commonly assumed. Unlike a semiparametric model, the negative sign with the parameter is often used to facilitate the fact that negative parameter covariates increase the hazard rate and decrease duration. The prerequisite of this model is the assumption of the functional form of the distribution, S (t , λ ) . Given the assumed distribution, the covariates rescale the modelled distribution and affect the acceleration or deceleration of the termination of the event. These models in general contain at least three major types of parameWHUVVKDSH 1 VLJPD UDWH  OHPGD DQGVFDOH P) parameters. Commonly used distributions in these models include: exponential, Weibull, Log-Logistic, Lognormal, generalized Gamma, Gompertz, Generalized F etc. Details descriptions of these formulations are available in Greene (2002). With the specification of λ = exp(− β x) , such models can be viewed as log-linear regression of duration on the covariates, and parameters can be estimated by full information log likelihood, (Bhat 2000). The basic assumptions of semiparametric and parametric model described so far are based on the assumption of a homogeneous distribution of the population. The presence of unobserved heterogeneity across the population is very likely and demands careful consideration. Heckman and Singer (1984), Keifer (1988) discuss in detail the effects of heterogeneity on duration modelling. Flinn and Heckman (1982) describe the possible specifications for incorporating unobserved heterogeneity. The semiparametric hazard model cannot consider heterogeneity

8

directly but in the case of parametric hazard models the random effect estimator can be used. To use a random effect estimator, heterogeneity is considered to be conditional to the survival or hazard function and in multiplicative form, (Meyer, 1987). S (t , x, β | ϖ ) = S (t , λ )ϖ , ϖ represent unobserved heterogeneity. Two general approaches for the specification of ϖ are a parametric distribution or the nonparametric distribution. Heckman and Singer (1982, 1984 and 1986) discuss details about the importance and the way of incorporating nonparametric specification of unobserved heterogeneity. For parametric specification, the most commonly used forms are the Gamma distribution and inverse-Gaussian distributions (STATA, 2005). Parametric hazard models with parametric heterogeneity are also known as frailty models. One example of parametric hazard model with gamma heterogeneity is shown below. Let the random variable ϖ have the Gamma distribution with parameters k and R: f (ϖ ) = [k R / Γ( R)]e kϖ ϖ R −1 Now if the baseline duration distribution is assumed to be Weibull distributed, the final formulation becomes: S (t ) = [1 + θ (λt ) p ]−1 / θ

(2)

h(t ) = S (t )θ λP (λt ) P −1

Where, θ =1/k = the variance of ϖ . The same approach can be followed for other baseline duration of parametric hazard model. The above descriptions cover Nonparametric, Semiparametric and Parametric hazard models. However, as discussed, the Nonparametric method cannot consider covariates; the Semiparametric method makes the proportional hazard assumption that may not be true for the types of duration of concern and cannot consider unobserved heterogeneity; and the parametric methods require assumption of the distribution of the duration that may lead to wrong conclusions if the actual distribution is significantly different from the assumed distribution. Also the same criticism is applicable for the distribution of unobserved heterogeneity. Considering all of these issues, Meyer (1987) proposes a method of hazard model where both baseline hazard rate and unobserved heterogeneity should be estimated nonparametrically and the covariates are considered according to a specific functional form. The implementation of such method is available in Bhat (1996).

9

Ordered Probability Model and Flexible Parameter Hazard Model Ordered probability models were proposed by McKelvey and Zavoina (1975); Prentice (1976) can also be applied for duration modelling. The idea of this type of model is that the dependent variable of concern occurs in terms of some arbitrary orders. In case of activity duration, one can think that the people terminate durations or the episodes in terms of time orders like 5 minutes or 10 minutes or 15 minutes or so on. This is also a type of latent variable model, where the dependent variable is actually latent and the observed dependent variable is the outcome after a certain threshold value of the underlying latent variable occurs. Hence, the model structure is: Yi* = X i β + ε i , X represents covariates,  corresponds parameters and 0 is error term Yi = j

if τ j −1 ≤ Yi* ≤ τ j , j ∈ {1,2,.......J }; j represents the Observed Dependent Variables

τ represents the Threshold Value of Latent Yi* The observed dependent variables should be arranged into J ordered outcomes and for J orders, the model will have J-³FXWSRLQWV´RIWKUHVKROGYDOXH2%\GHILQLWLRQWKHHQGSRLQWFDWHJories correspond to τ 0 = −∞ and τ j = +∞ , so the corresponding probabilities are 0 and 1 respectively. For example if J=3, then Yi = Category 0 if − ∞ ≤ Yi * ≤ τ 1 = Category 1 if τ 1 ≤ Yi* ≤ τ 2 if τ 2 ≤ Yi * ≤ +∞

= Category 2

P(Yi = 0) = F (τ 1 − Xβ ) − 0 P(Yi = 1) = F (τ 2 − Xβ ) − F (τ 1 − Xβ ) P(Yi = 2) = F (τ 3 − Xβ ) − F (τ 2 − Xβ ) .................................................. P(Yi = J − 1) = 1− F (τ J −1 − Xβ ); This formulation refers the likelihood function of: J −1

N

L(Y | X , β ,τ ) = ∏

∏[ F (τ

i =1

j =0

− X i β ) − F (τ j −1 − X i β )] ij z

j

Where z ij = 1 if Yi = j and 0 otherwise. The log likelihood is then N

ln L(Y | X , β ,τ ) = ∑ i =1

J −1

∑z j =0

ij

ln[ F (τ j − X i β ) − F (τ j −1 − X i β )]

Here F(.) represents the cumulative probability function. For the normal error distribution, the Ordered Probit model will have )  3  and for the logistic error distribution, the Ordered Logit model will have F(.)=exp(.)/(1+exp(.)). The threshold parameters 2 of the ordered probability model can be thought as a series of intercepts. The intercept is basically the baseline probability when all independent variables are zero. So, the constant term of the right hand side expression of 10

covariate function can be dropped because the threshold parameters 2 is taking care of the same objective. Moreover keeping both intercept and the “cut points”, 2 , may make the model unidentified. The STATA’s “ologit”, “oprobit” functions drop the constant term but LIMDEP’s “ordered” function keeps the both. Again, the loglikelihood function of ordered logit model can be rewritten as N

ln L(Y | X , β ,τ ) = ∑ i =1

τ j − X iβ

J −1

∑z j =0

ij

∫ f (ε ) dε

τ j −1 − X i β

This likelihood function indicates the probability of failure of an event at the ordered duration of j, where 2 indicates the logarithm of integrated baseline hazard rates. The formula can be expressed as: j

log ∫ h0 (t ) dt = τ j , j = 0,1, 2, .........( J − 1) 0

Going back to the proportional hazard formulation shown in the previous section: h(t ) = h0 (t ) f ( x, β ) = h0 (t ) exp(− X i β ) can be expressed in the log form of integrated hazard: t

log ∫ h0 (t ) dt = X i β + ε i

(3)

0

The ordered logit model thus can be transformed into a proportional hazard model formulation. Han and Hausman (1990) describe such type of proportional hazard model as the “Flexible parameter” hazard model because the baseline hazard (corresponding to threshold parameter of ordered logit model) is nonparametric while the effects of covariates can take a particular functional form (the above description is for a linear in parameter functional form). It is also worth mentioning that the true parameters of the covariates are invariant to the scale of orders chosen for duration. It is also easier to consider heterogeneity within the underlying hazard model specification. An application of this type of model is available in Bhat et al (2004)

Limited Dependent Variable Model The limited dependent variable model ensures the value of the dependent variable is contained within specified limits. Lognormal regression and Tobit model are widely used examples of such models. However, this section discusses the Tobit model only. In case of duration modelling, the non-negativity of duration can be ensured by the Tobit model. Tobit models can also be considered as latent variable models because they assume that the dependent variable is basically

11

a latent variable and it the observed choice occurs if the corresponding latent variable falls within the specified range. The general formulation is: d i = xi β + ε *

di = di =0

*

if

di > 0

if

di ≤ 0

*

*

Here di* is the latent variable that takes the value of observed variable di if it falls within the range specified; xi represents a set of covariates and  represents the corresponding set of parameters. If the error term ε LVFRQVLGHUHGLLG1RUPDOGLVWULEXWLRQ1 1 7KHOLNHOLKRRG function becomes: L = ∏ [1 − Φ( xi β / σ )]∏ σ −1φ [(d i − xi β ) / σ ] 0

1

The Φ and φ are distribution and density functions respectively of the standard normal variable. For detailed review of Tobit models see Tobin (1958) and Amemiya (1984).

Considering Heteroscedasticity For all of the above-mentioned models, the heterogeneity of the population can be incorporated in terms of heteroskedastic variables. Heteroskedastic variables are the set of variables used in a specific functional form that can capture the individual or group specific variations within the observed data. Heteroskedasticity is incorporated by multiplying the heteroskedastic function

∑x β (Greene, 2002). For duration models, the value of the variance is then multiplied by ∑ x β with the variance term. The commonly used functional form is linear in parameters,

/ i

/

,

/ i

/

.

This has the effect of weighting the variance value of the hazard or survival distribution.

Statistical Significance and Goodness of Fit For individual parameters of covariates/independent variables, the statistical significance is measured by the ‘t’ statistics. In case of parametric and flexible parameter hazard model, the sign of the covariate parameters have the same effect on expected duration and the opposite effect on the hazard rate. For example, a negative sign parameter indicates an increasing value of the corresponding variable increases the hazard rate and thereby reduces the expected duration. In case of Tobit, the variable sign is also straightforward, since in this case duration is modelled directly. In the case of semiparametric models the covariate parameter signs have the opposite effect of those in parametric models, as has already been discussed.

12

All models described in this paper are estimated using maximum likelihood estimation, (Gould and Sribney, 1999; Greene 2002). Thus, the statistical significances of the individual models are measured by the likelihood ratio test. The Likelihood ratio statistic, -2[L(c)-/  ] is asymptotically chi-square distributed with degrees of freedom equals to the number of free parameters in the model. Here, L(c) is the loglikelihood value of a constant-only model and /   is the loglikelihood of the full model. The pseudo R2 value, [1-/  / F ] can also be used to measure goodness of fit. the closer the value of pseudo R2 towards 1, the better is the model fit. However, for this paper, to make it clearer, we have also used graphical comparisons of predicted versus observed distributions, whioch provides useful information concerning model fit, over and above the aggregate fit statistics. Comparing two different types of model is always a difficult task. The value of the likelihood ratio test between two models, (-2[L(model 1)-/ model 2) ]) can be used to decide whether two models are significantly different or not. But this comparison is not applicable if the likelihood functions of the models of concern are totally different. For example, the likelihood function of an ordered probability model is totally different from that of a WeibullParametric hazard model. So in this paper, to compare two different types of model the pseudo R2 value and corresponding graphical comparison are used.

DATA The data set used in these analyses is derived from CHASE (Computerized Household Activity Scheduling Elicitor), which is the first wave of a three-wave panel survey for Toronto. The details description of CHASE and the panel survey is available in Doherty and Miller (2000) andRoorda and Miller (2004). The first wave was conducted in Toronto in 2002-2003 with 271 households, including 426 adults. It focused on multi-day information on observed activity travel while tracing the underlying activity scheduling process. It divides all activities into several types. “Work/School” is the second major activity category in CHASE that is the concern of this paper. For all types of activities, the self-reporting software, CHASE traced ‘when’ and ‘why’ scheduling decisions was made over the course of 7-day period. Participants added individual activity episodes with all attributes of start time, end time, mode, participants, location etc. They modified the attributes if necessary and sometimes deleted some of them. So the observations can easily be sorted out according to added episodes, modified episodes and deleted episodes. Not all activities that are planned are eventually executed. Various time-space constraints compel us to delete or modify some of the planned episodes. CHASE traced these phenomena very efficiently. So within the CHASE data set we observe some activities that were planned and executed according to the originally planned attributes, but others were subsequently modified before being executed, while others were deleted from the provisional schedule and never executed.

13

These special features of CHASE data set give us with the opportunity to assume that the work/school activity episodes that are first ‘added’ to the provisional schedule (i., prior to any subsequent modification or deletion) are generated directly within the “Work/School Project”. The data used for this research thus represent the output of the project and input to the scheduler. CHASE data gives detailed information about personal, household and activity attributes. The attributes in general are household socio-economic and individual specific attributes, attributes of the transportation system in terms of travel time, mode selected, ride sharing etc. and episode attributes like starting time, duration, participants, locations etc. To get a more disaggregate picture, the data is divided into three categories: Full time Workers, Part Time Workers and Students. To capture job specific influences on the duration of work episodes, the job types are divided into 17 job industry categories according to Job Industry Census 1991. These job categories are: Goods producing industry-Job Type 01, Service producing industries-Job Type 02, Agriculture-Job Type 03, Other primary industries-Job Type 04, Manufacturing-Job Type 05, Construction and Maintenance-Job Type 06, Transportation_Communications_Utilities-Job Type 07, Retail trade-Job Type 08, Finance_Insurance_Real Estate-Job Type 09, Business Services_Administrations-Job Type 10, Education_Research-Job Type 11, Health service-Job Type 12, Social service-Job Type 13, Accommodation service-Job Type 14, Food service-Job Type 15, Public administrations-Job Type 16 and Retired-Job Type 17). Figure 01 compares the duration distribution of Full-Time; Part-Time workers and Schoolwork derived from TTS (Transportation Tomorrow Survey) travel diary data collected by Joint Program in Transportation, University of Toronto and the CHASE data used in this research. It is clear from Figure 01 that the distributions of full-time, part-time and schoolwork durations are significantly different for the two data sources. The TTS distributions overestimate the durations in comparisons to the CHASE distributions. The smoother curves from TTS data indicate the missing minute variations in duration distributions. This is a shortcoming of a typical trip diary data, where the durations of activity episodes are needed to be estimated from reported trip-end time data, unlike the activity diary data, where the participants report both start and end time of activity episodes, including travel episodes. Moreover, estimated durations from TTS data are of all executed episodes. So the TTS distribution includes scheduling effects and does not represent the episodes that are generated by projects as input for the scheduling. So, clearly TTS data lacks minute variations in duration dynamics here. Actually, this (capturing duration dynamics) is the major motivation for us to move to the approaches to duration modelling techniques reported in this paper. The next sections discuss the lessons learnt from different types of modelling approaches considered for modelling work/school duration.

14

After cleaning for missing attributes and suspicious information, the data used for the analyses include: •

1598 individual episodes for full time workers for 192 households and 238 individual persons. These episodes have a mean duration of 373.6 minutes, a standard deviation of 200 minutes, minimum duration of 5 minutes and maximum duration of 989 minutes.



248 individual episodes for part time workers (44 households, 49 individual persons) with mean duration 297 minutes, standard deviation of 184 minutes, minimum duration 10 minutes and maximum duration 720 minutes.



632 individual episodes for students (72 households, 85 individuals) with mean duration 78.5 minutes, standard deviation of 151 minutes, minimum duration 1 minute and maximum duration 810 minutes.

DESCRIPTION OF THE MODELS Nonparametric life tables for duration of work episode of full-time workers and part-time workers are presented in Table 01 and Table 02. Table 03 presents the life table for the duration of school for students. Since the data set does not contain any censored observations, the life tables give the observed pattern of duration distribution of work/school episodes. So, these observed distributions are used for comparison with the predicted distributions by different models.

Semiparametric Hazard Model The Semiparametric hazard model for full-time, part-time worker and students are shown in Table 04. Figure 02, Figure 03 and Figure 04 show the fittings of predicted survival distributions against corresponding observed distributions respectively. The Likelihood ratio of the full-time worker’s model is higher in comparison to the part-time worker and school models. This is may be due to the higher number of observations for full-time workers. The start time dummies of these three models indicate that people starting work in the morning typically work longer hours than those starting in the afternoon. Although start times are often dictated by job type and other variables, these start time dummies capture the general expected trend. The dummy variables for ‘weekday/weekend’ indicate that full time workers, part-time worker and students work shorter hours if they need to work on weekends compared to weekdays and vice versa. But these weekday-weekend variables are not very significant statistically (lower ‘t’ statistics in compared to the 5% significance value 1.64). Travel time appears to have negative

15

effects on the duration for full-time workers but has the opposite effect for part-time workers and students. The ‘t’ statistics of travel time variable is statically insignificant (less that 1.64) for fulltime workers, whereas for part-time and school it is very significant. The travel mode variables ‘Auto’ and ‘Transit’ reflect relative impact in comparison to the other modes, like biking and walking. For full-time work, although travel time has a negative effect on duration, the travel modes, ‘Auto’ and ‘Transit’ have positive effects and are statistically significant. The combined travel time and travel mode variable will reflect the actual transportation system influence on work duration. Being the travel modes as dummy variable (value is either 0 or one), the travel time is actually dictating the transportation system influence. For example if a full-time worker uses auto and he/she needs to travel (0.23055/9.13E-4)=253 minutes then the total effects of transportation system performance on work duration is zero. Similarly if he/she uses transit the threshold time is (0.29184/9.13E-4)=320 minutes. It also indicates that the people, using auto are used to travel longer distance than the people using transit to go to the work place. Similarly it can also be shown that for part-time workers using transit the threshold travel time is 132 minutes but for people using auto the threshold travel time is zero (as variable ‘Auto’ is not in the model). For school the threshold travel time for auto users is very small, (3.43E-2/1.47E-02)=2.3 minutes but the transit users are fully sensitive to the travel time (both travel time and transit have same sign). Another key variable ‘Driving license’ becomes significant only for full-time worker and it has positive effect on the full-time work duration. The locational influence on work duration is reflected by the dummy variable “Home-based work” and indicates that people working at home tpically work shorter hours. The other locational variable, “Number of Location for Work” has a negative effect on duration of full-time and part-time works but it is the opposite effect for school. The work frequency per week dummy variable shows a negative effect for full time worker and positive effects for school. The personal attribute ‘age’ becomes significant for fulltime and part-time workers only but the signs are opposite. The models indicate that older people work shorter hoursfor full-time work. The variable ‘sex’ indicates the males work longer hours than females for full-time work, part-time work and school. The variable ‘marital status’ indicates that single people work shorter hours for full-time work. Income has negative effects on the duration of both full-time and part-time workers. The job type dummy variables better capture individual job-specific regulations and it is seen that different job types have different effects (positive or negative) on the duration of full-time and part-time work. It is important to note that in case of schoolk, key variables should be the type and level of school, however these variable are poorly coded in data set. This problem is overcome by considering other variables that are basically reflections of the type and level of school, such as fulltime/part time student; students’ age; household structure as household size, number of

16

household kids, number of household teens, number of household adult children; number of days needed to go to school per week (frequency), etc. Considering the fact that the effects of some variables often have combined and balancing effects as described previously for the travel time and the travel mode variables, it can be said that the sign of parameters corresponding to variables in the semiparametric hazard models are according to the expectation. But the parameter signs alone do not indicate the models’ predictive capability. To understand the models’ prediction capacity, comparisons of the predicted and observed distributions are shown in Figures 02, 03 and 04. These figures indicate that although the likelihood ratios of the models cross the required chi-square value significantly (significance level is zero, as shown in corresponding tables), the semiparametric hazard models do not fit the observed pattern. at all well. This is reflected in the corresponding pseudo R2 values as wll. Among these three models the part-time work model has the highest value of 0.07. Several reasons may exist for this poor goodness of fit. It may be that the proportional hazard assumption embedded in this type of model does not apply for work/school episodes. It may also be that some key variables are missing in the data sets. Of course it is also true that semiparametric models never predict well if the data posses some definite distributional patterns, (Mohammadian and Doherty, 2004). If this is the case, then parametric models may yield better results. These models are discussed next. Parametric Hazard Model-Weibull Distribution The equation for survival distribution of the Weibull-Parametric hazard model is: 1 (4) S (t ) = [1 + θ (λt ) p )]−1 / θ = [1 + θ (exp(−∑ β / xi ) * t ) p ]−1 / θ ; p = σ * exp ∑ β // x j

Where, xi is the set of covariates, xj is the heteroskedastic variables, / and // are corresponding parameters, 1 is the variance and t is the continuous time scale,  is the parameter for gamma distribution of heterogeneity. However, it is seen that for the data used in this study, the value of  becomes very small so the value of (1/ becomes enormously large. As a result, the likelihood function is very volatile and does not converge. For this reason the Gamma heterogeneity component is dropped and the final equation becomes: 1 S (t ) = exp(−(λt ) p ) = exp(−(exp(−∑ β / xi ) * t ) p ); p = (5) σ * exp ∑ β // x j Note that this same problem occurs for all other models described in the following sections. The Parametric hazard model with Weibull baseline distribution assumption for full-time, parttime worker and students are shown in Table 05. Figures 05, 06 and 07 show the fitting of

17

predicted survival distributions against observed data for the three models, respectively. In these three models the start time dummies show the same effects as those in the semiparametric models. As found previously, people starting work/school early in the morning generally have longer durations than those starting late in the afternoon. The ‘weekday/weekend’ dummy, however is not statistically significant in the main part of any of the three models. However, for full-time and part-time workers it is very significant within the heteroskedastic component. This indicates the capturing of day-to-day dynamics of work duration within the models. The signs of the parameters corresponding to ‘travel time’ and ‘modes’ (auto/transit) in these three models also match with those in the semiparametric models. ‘Travel time’ shows a positive effect on full-time workers’ work duration. This effect, however, is then reduced by the modal (transit/auto) variable. For part-time workers, travel time has positive effects on duration. This effect is accentuated if auto mode is used, but it decreases if transit is used. For students travel time always has a positive effect on duration regardless of the mode type. The work location dummy for full-time workers “Home-based work” has a negative parameter, indicating homebased workers work shorter hours. The work frequency per week dummy variable shows a negative effect for full-time workers and positive effects for school. A priori, it is sensible that people working fewer days per week tend to work longer durations, and vice versa. The dummy variable for duration flexibility indicates that the flexible work duration opportunity influences full-time workers to work longer durations, as was also found in the same in semiparametric model. Unlike the semiparametric model, the personal attribute ‘age’ does not enter in the main model of full-time and part-time worker. This variable becomes statistically significant in the variance heteroskedastic component, where it has a negative parameter value. This indicates (as was found in semiparametric model) that younger people work longer durations than older people. Similarly the male/female dummy variable, ‘sex’ that enters in both main and heteroskedastic component indicates the same effects in these models as in the semiparametric models. For full-time and part-time workers the corresponding parameters are always positive indicating that male workers work longer durations than female workers. In case of school, this variable has a positive parameter in main model, but has a negative (and lower) parameter value in heteroskedastic component. These two effect tend to balance one another and thereby reduce the male-female differential in school duration. Similarly, the dummy variable for ‘marital status’ enters in both main and heteroskedastic components with negative and positive parameter values, respectively. It is very interesting that these variables that enter both main and heteroskedastic component capture both heterogeneity of the population under study and at the same time it also indicates a direct influence on work durations. Another variable ‘income’ also shows the same effect on duration in these models as in the semiparametric models.

18

Similar criticisms as in semiparametric model concerning the lack of variables representing the type and level of school are also applicable here. However, the Weibull-Parametric model better captures heterogeneity in the population under study by incorporating the key variables of student type, sex and marital status in the heteroskedastic component. Finally, the comparisons of predicted and observed distribution shown in Figures 05, 06 and 07 indicate that the Weibull-Parametric hazard models fit the observed duration distribution better than the semiparametric models. The likelihood ratios of all three of these models cross the required chi-square value significantly (significance level is zero as mentioned in corresponding tables). The pseudo R2 values are 0.3, 0.37 and 0.24 for full-time, part-time workers and students respectively, which are reasonable and much higher than those of the semiparametric models.

Parametric Hazard Model-Log-Logistic and Lognormal Distribution Similar to the Weibull baseline distribution, the other two types of distributions considered are Log-Logistic and Lognormal distribution. The equation of survival function for Log-Logistic distribution is 1 S (t ) = 1 /(1 + (λt ) p ) = 1 /(1 + (exp(− β / x) * t ) p ), p = (6) σ * exp ∑ β // x j The equation of survival function for Lognormal distribution is S (t ) = Φ (− p ln(λt ))

(7)

Here -  LVWKHFXPXODWLYHQRUPDOGLVWULEXWLRQ7KHLog-Logistic distribution models are presented in Table 06, while the Lognormal distribution models are presented in Table 07.

For Full-Time workers’ duration, it is seen that the effects of the respective parameters on duration are the same for the Log-Logistic model as for the other models considered. The number of statistically significant variables, however, is higher in the Log-Logistic model compared to the previous models. The likelihood ratio value is 1284 for Log-Logistic model, 1126 is for Lognormal model and 976 for in Weibull model. The corresponding pseudo R2 value is the highest for Log-Logistic model, which is 0.31 compared to that of Weibull model, 0.27, and that of Lognormal model, 0.26. Thus, it seems that the Log-Logistic model performs better for FullTime workers’ duration. For Part-Time workers’ duration the effects of the respective parameters on duration are also the same in the Log-Logistic and Lognormal models as in the Weibull and semiparametric models. Unlike the models for Full-time Workers, the number of statistically significant variables are the

19

same for the Weibull and Lognormal models, but the Log-Logistic model has a lower number of significant variables. The likelihood ratio value is 226 for the Log-Logistic model, 216 for the Lognormal model and 194 for in Weibull model. The pseudo R2 value is the highest for LogLogistic model, which is 0.36 compared to that of Weibull model, 0.34, and that of Lognormal model, 0.34. Although the pseudo R2 value is higher for Log-Logistic model, the number of variables housed in the model should be considered, the higher number of statistically significant variables increases the explanatory power of the model. In that sense Weibull and Lognormal models may be better than the Log-Logistic model For School duration the effects of the respective parameters on duration are again the same in the Log-Logistic and Lognormal models as in the Weibull model, but the values are different. The numbers of statistically significant parameters are also the same for all three models. The likelihood ratio value is 372 for the Log-Logistic model, 368 for the Lognormal model and 408 for in Weibull model. . The pseudo R2 value is the highest for the Weibull model, 0.24, compared to 0.21 for the Lognormal and Log-Logistic models. Comparing the likelihood ratio values and the pseudo R2 values, the Weibull model seems to perform better than others.

Flexible Parameter Hazard Model As previously described, the Flexible parameter hazard model is basically the proportional hazard model of non-parametric baseline with covariates. In order to develop the Flexible Parameter hazard model the observed duration is ordered into a number of arbitrary intervals. Care has been taken to keep as minimum interval duration as possible provided every interval contains at least one observation. For Full-Time and Part-Time workers the arbitrary divisions are developed considering 60minute time intervals. The resulting models are presented in Table 08 and the model predictions are presented graphically in Figures 09 and 10. The threshold parameters shown in these tables correspond to the logarithms of integrated baseline hazard rates for corresponding intervals. The effects of the covariates considered in these two models are the same as those for the parametric and semiparametric hazard models. For Full-Time workers the likelihood ratio of this model is 1212 and the pseudo R2 value ies 0.16, while for Part-Time workers the values are 205 and 0.17, respecitvely. Although the pseudo R2 values are lower than those of parametric hazard models, Figures 09 and 10 indicate that these models better predict the shape of the observed survival distributions.

20

The School duration data allow us to consider 30-minute intervals. The resulting model is presented in Table 08 and the corresponding prediction is presented in Figure 11. The covariates have the similar effects on duration in this model as in the parametric and semiparametric hazard models. The likelihood ratio of this model is 329 and the pseudo R2 value is 0.11. Despite the lower pseudo R2 value this model again predicts the shape of the observed distribution better, as shown in Figure 11.

Limited Dependent Variable Model: Tobit Model The results for the continuous time Tobit model for Full-Time, Part-Time workers and School are reported in Table 09. The predictions of these models are presented graphically in Figures 12, 13 and 14. It is clear that the parameter effects on duration are the same as for the other models considered. For Full-Time workers the likelihood ratio is 1450 and pseudo R2 value is 0.07. For Part-Time workers the likelihood ratio is 216 and pseudo R2 value is 0.07. For School the likelihood ratio is 410 and pseudo R2 value is 0.07. So considering the pseudo R2 value, it is seen that Tobit models fit poorly in comparison to the Parametric and Flexible Parametric hazard models. The figures show that the Tobit model can follow the shape of the observed distributions but cannot predict the scale of the distribution. It is clear that for Full-Time workers, the Tobit model under-predicts longer durations (above 4 hours or so). For Part-Time workers, the Tobit model over-predicts durations below 5 to 6 hours and under-predicts longer durations.

COMPARISON OF MODELS From the discussions above, it can be inferred that the semiparametric models do not predict work/school durations well. Looking beyond the overall statistical measures (Likelihood ratio, pseudo R2 etc.) the graphical comparisons show that the semiparametric models predict neither the shape nor the scale of the observed work/school duration distribution. The reason of this failure may be the presence of definite underlying duration distributions within the observed data and the inapplicability of the inherent proportional hazard rate assumption of the semiparametric model. Although the validity of proportional hazard assumption can be tested on the Schoenfeld residuals, (see Schoenfeld, 1980, 1982; STATA 2005), we defer this to the discussion of the flexible parameter hazard model, which is also based on the proportional hazard assumption. Meyer (1987) provides a good description of the shortcomings of semiparametric model.

21

After the semiparametric model, the Tobit models’ performances are also poor. As already discussed, the Tobit models can predict the shape of the observed distributions better than the scale of the distribution. A reason for this may be the underlying linearity assumption of the model. The relationship between the dependent variables and the duration may not be directly linear, as is assumed by the Tobit model. The types of data used for the estimation also affect the performance of the model. In the data set usedhere, the number of the observations with longer duration (more than 8 hours) is very low compared to the medium-length durations (6 to 8 hours). For this reason, the Tobit model under-predicts longer durations and over-predicts the medium and shorter durations. However, the advantage of the Tobit models over semiparametric hazard models is that they can consider population heterogeneity in terms of variance heteroskedisticity. Although the Flexible parameter hazard models are also based on the proportional hazard assumption (as are semiparametric hazard models), the estimation process of the non-parametric baseline distribution for flexible parameter hazard models is totally different from that of semiparametric hazard models. Semiparametric hazard models are estimated using a partial likelihood estimation method, where the baseline rates are considered as nuisance parameters and are thrown out of the estimation process. On the other hand, the flexible parameter hazard model estimates the non-parametric baseline distribution combined with the parameters of the covariates. For this reason flexible parameter hazard models can capture the underlying duration distribution better and thus outperform semiparametric hazard models. One of the major shortcomings of flexible parameter hazard model developed in this study that makes them to perform poorer than the parametric hazard models is the lack of incorporation of the unobserved heterogeneity. An attempt was made to consider Gamma distributed and Inverse Gaussian distributed unobserved heterogeneity, but the data distribution made the corresponding likelihood functions unstable and unidentified. As a result, the likelihood functions did not converge. Another way of incorporating unobserved heterogeneity might be to assume nonparametric heterogeneity (see Bhat 1996). But the concern in this case was the need to discretize the durations. It would be the best if we could discretize the observed duration into the finest intervals possible (say 5minutes or 10 minutes or 15 minutes) but the data extracted from CHASE allowed us to discretize full-time and part-time work durations by a minimum of 60 minutes and schoolwork durations by a minimum of 30 minutes, which are quite course in comparison to the 5 minutes time step used for atypical activity scheduler (e.g in TASHA, see Miller and Roorda, 2003). Again chances of over fitting the data should also be a concern for such models. Now, the parametric hazard models assume that the underlying duration distribution takes a specific form. In this study, Weibull, Log-Logistic and Lognormal distributions were considered. The Exponential distribution was also considered, but the Weibull distribution performs better

22

(because the Exponential model considers the hazard rate to be constant over time), so the Exponential model is not reported in this paper. Practically,both the Weibull and Exponential distributions are special cases of the Gamma distribution. The functional form of Gamma distribution is: f ( D) = (λp)(λD) pθ −1 exp(−(λD) p ) / Γθ

(8)

Here if the value of  is 1 then it becomes Weibull and if both  and p are equal to 1 then it becomes Exponential distribution. So, the general functional form is the Gamma distribution. But for the Gamma distribution, the value of  needs to be assumed prior to the estimation process (Greene, 2002) and it was found that the value of  equals to 1 (which yields the Weibull model) gives the better estimation in terms of number of covariates considered, likelihood ratio and pseudo R2 value. On the other hand, the Gamma, Exponential, Weibull, Log-Logistic and Lognormal distributions all are specific forms of the Generalized F distribution. The Generalized F distribution contains 4 structural parameters and putting different values of those parameters causes to generate the other distributions (see Greene, 2004). It would be best if theGeneralized F distribution could be assumed and the data used to determine the values of the structural parameters. But the data used in this study, however, causes us to restrict the structural parameters of Generalized F distribution to those that yield the Exponential, Weibull, LogLogistic and Lognormal distributions only. As discussed in earlier sections, it is worth mentioning that in terms of goodness of fit value and continuous time representations, the parametric models perform better than the other models developed and discussed in this paper.

CONCLUSIONS This paper investigates alternative approaches to modelling work/school activity episode durations. The data used in this study are significantly different from the data used in other similar studies. The data used here include only initially added episodes by the respondents within an interactive activity diary. The individual observations thus represent the outcome of work/school project activity episode generation and are assumed to be the input for an activity scheduler. The aim of this research is to develop models that can be used to simulate individual specific work/school durations for an agent–based activity scheduler.. The paper also presents the comparisons of different types of modelling techniques in terms of fitting the same observed data. It identifies the fact that work/school activity durations do posses specific underlying distributional patterns; as a result, semi-parametric models are found to not fit observed behviour well. The failure to incorporate the Gamma heterogeneity in parametric hazard models implies that work/school activities are significantly different (the comparisons with other types of activities will be found in Habib and Miller, 2005); it is not only the method the researcher

23

chooses to model the duration but also it is the types and distributions of the data that dictate the method that should be used. This paper also presents a way for dealing with a non-homogeneous population in terms of parametric variance heterogeneity functions. The heteroskedastic variables are selected iteratively to identify the variable that better describes the non-homogeneity of the population under study. In conclusion it can be said that among all models developed in this study, the parametric hazard models perform the best. Considering the goodness of fit and the number of covariates, among the parametric hazard models, for Full-Time work duration the Log-logistic model performs better and for Part-Time work and Schoolwork duration the Weibull model performs better. The sound behavioural basis of these selected models are ensured by incorporation of various covariates. The interpretations of the sign and values of these parameters provide useful insight into the behavioural process of activity duration decision-making in the planning stage (here the planning stage indicates decisions prior to the start of the day). In addition, it should be noted that the effects of transportation system performance in terms of travel time and the travel modes used are found to be statistically significant, in contrast to the findings of other studies, such as Kitamura et al (1988).

ACKNOWLEDGEMENTS This research was funded by a Major Collaborative Project Initiative (MCRI) grant from the Social Sciences and Humanities Research Council (Canada).

REFERENCES: 1. Alder, T.; M., Ben-Akiva (1979), “A Theoretical and Empirical Model of Trip Chaining Behaviour”, Transportation Research, 13B, pp. 243-257 2. Amemiya, T., (1984), “Tobit Models: A Survey”, Journal of Econometrics, 24, pp:3-61 3. Arentze, T. and H., Timmermans (2000), “ALBATROSS A Learning Based Transportation Oriented Simulation System”, EIRSS. ISBN 90-6814-100-7 4. Axhausen, K. (1998) “Can We Ever Obtain the Data We Would Like to Have?”, in T. Gårling, T. Laitila and K. Westin (eds.) Theoretical Foundations of Travel Choice Modelling, Oxford: Pergamon Press. 5. Axhausen, K.; T., Gärling (1992), “Activity Approach to Travel Analysis: Conceptual Frameworks, Models, and Research Problems”, Transport Reviews, Vol. 12, No. 4, pp 323-341, 1992

24

6. Bhat, C., R (1996), “A Hazard Based Duration Model of Shopping Activity with Nonparametric Baseline Specification and Nonparametric Control for Unobserved Hetergeneity”, Tranportation Research B, Vol. 30, pp. 189-207, 1996 7. Bhat, C., R. (2000), “Chapter 6: Duration Modelling”, Handbook of Transportation Modeling, Edited by J., A., Hensher and K., A., Button, Elsevier Science, 2000 8. Bhat, C., R.; J., Y., Guo; S., Srinivasan and A., Sizakumar (2004), “A Comprehensive Econometric Micro-Simulator For Daily Activity Travel Patterns (CEMDAP)”, TRB CDROM 2004 9. Bhat, C., R.; S. Srinivasan and J., Y., Guo (2002), “Activity-Based Travel Demand Modelling for Metropolitan Areas in Texas: Data Source, Sample Formation and Estimation Result”, Research Report 4080-3 or Research Project 0-4080, Department of Civil Engineering, University of Texas at Austin 10. Bhat, C., R.; T., Frusti; H., Zhao; S., Schönfelder and K., W., Axhause (2004), “Intershopping Duration: An Analysis Using Multiweek Data”, Transportation Research Part B38 (2004) 39-60 11. Cox, D., R. (1975), “Partial Likelihood”, Biometrika, Vol. 62, No. 2 (Aug 1975), 269-276 12. Doherty, S., T. and E., J., Miller (2000), “A Computerized Household Activity Scheduling Survey”, Transportation, Volume 21, No. 1, pp. 75-97 13. Eberhard, L., K. (2002), “A 24-Hour Household-Level Activity-Based Travel Demand Model for the GTA”, M.A.Sc Thesis, Department of Civil Engineering, University of Toronto 14. Flinn, C. and J. Heckman (1982). New methods for analyzing structural models of labor force dynamics, Journal of Econometrics, 18, pp. 115-168. 15. Gärling, T,; M-P, Kwan; R., D., Golledge (1994), “Computational-Process Modelling of Household Activity Scheduling”, Transportation Research B. Volume 28B, No. 5, pp.355-364, 1994 16. Gould, W and W., Sribney (1999), “Maximum Likelihood Estimate with Stata”, STATA Press, 1999 17. Greene, W., H. (2002), “LIMDEP Econometric Modelling Guide”, Econometric Software Inc. 18. Habib, K., M., N. and E., J. Miller (2005), “Development of Skeleton Activity Schedule for Full Time Workers, Part Time Workers and Students” Working paper, Being Prepared for Submission in TRB 2006, Department of Civil Engineering, University of Toronto 19. Habib, K., M., N. and Eric, J. Miller (2005), “Investigating the Relationship between Start Time and Duration of Activities considering Planning and Execution Dynamics”, Working paper, Being Prepared for Submission in TRB 2006, Department of Civil Engineering, University of Toronto

25

20. Hägerstrand, T., “What about people in regional science”, Papers of Regional Science Association, 24, 7-21, 1970 21. Han, A. and J., A. Hausman (1990), “Flexible Parametric Estimation of Duration and Competing Risk Model”, journal of Applied Econometrics, Vol. 5, 1-28 (1990) 22. Heckman, J. and B. Singer (1982), “The Specification Problem in Econometric Models for Duration Data”, in W. Hildenbrand (ed.), Advances in Econometrics, Cambridge, Cambridge University Press 23. Heckman, J. and B. Singer (1984), “A Method for Minimizing the Distributional Assumptions in Econometric Models for Duration Data”, Econometrica, 52, pp. 271-320. 24. Heckman, J. and B. Singer (1986), "Econometric Analysis of Longitudinal Data," in Z. Griliches and M. D. Intriligator (eds.), Handbook of Econometrics, Volume III. Amsterdam: North-Holland 25. Hensher, D., A.; F., L. Mannering (1994), “Hazard-Based Duration Models and Their Application to Transport Analysis”, Transport Reviews, 14, pp 63-82 26. Kaplan, E., L. and P. Meier (1958), “Nonparametric Estimation from Incomplete Observations”, American Statistical Association Journal, June 1958 27. Kawakami, S. and T., Isobe (1989), “Development of Activity Travel Scheduling Model Considering Time Constraints and Temporal Transferability” Test of a Model”, Proceedings of the 5th World Conference on Transport Research, Volume IV, pp 221233 28. Keifer, M., N. (1988), “Economic Duration Data and Hazard Function”, Journal of Economic Literature, Volume XXVI (June 1998), pp. 646-679 29. Kitamura, R.; J., P., Robinson.; T., F., Golob; M., A., Bradley; J., Leonard and T., van der Hoorn (1992), “A Comparative Analysis of Time Use in the Netherlands and California”, Proceedings of 20th PTRC Summer Annual Meetings, Seminar E, PTRC, London, 127-138 30. Kitamura, R.; K., Nishi and K Goulias (1988), “Trip Chaining Behaviour by Central City Commuters: A causal Analysis of Time Space Constraints”, Paper presented at Oxford Conference on Travel and Transportation 31. McKelvey, R., D. and W., Zavoina (1975), “A Statistical model for the Analysis of Ordinal Level Dependent Variables”, Journal of Mathematical Sociology, 4, pp. 103-120 32. Meyer, B., D (1987), “Semiparametric Estimation of Hazard Model”, Chapter 1, PhD thesis, MIT 33. Miller E., J., (2002), “Propositions for Household Decision-Making”, Paper presented in International Colloquium on the Behavioural Foundations of Integrated Land-use and Transport Models: Assumptions and New Conception Frameworks, Quebec City, June, 2002

26

34. Miller, E., J. (2005), “Project Based Activity Scheduling for Person Agents”, 16th International Symposium in Transportation and Traffic Theory, Maryland, USA 35. Miller, E., J. ; M., J., Roorda (2003), “A Prototype Model of Household Activity Scheduling”, Transportation Research Record 2003 36. Mohammadian, A and Doherty, S., T (2004), “A Hazard Model for Duration of Time Between Planning and Execution of an Activity”, Proceedings of the Conference on Progress of Activity Based Analysis, Vaeshartelt Castle, Maastrict, the Netherlands, May 28-31, 2004 37. Pendyala, R., M.; R., Kitamura; A., Kikuchi, T., Yamamato and S., Fijii (2005), “FAMOS: The Florida Activity Mobility Simulator”, TRB CD-ROM 2005 38. Picado, R. (1999), “Non-Work Activity Scheduling Effects in the Timing of Work Trips”, Ph.D Thesis, University of California, Berkeley 39. Prentice, B., L. (1976), “A Generalization of Probit and Logit Methods for Dose Response Curve”, Biometrics 32, pp. 761-768, December 1976 40. Recker, w., w.; M., G., McNally and G., S. Root (1986a), “A Model of Complex Travel Behaviour: Part 1: Theoretical Development”, Transportation Research A 20, pp 307-318 41. Recker, w., w.; M., G., McNally and G., S. Root (1986b), “A Model of Complex Travel Behaviour: Part 1: Theoretical Development”, Transportation Research A 20, pp 319-330 42. Roorda, M., J and E., J., Miller (2004), “Strategies for Resolving Activity Scheduling Conflict: An Empirical Analysis”, Proceedings of Activity Based Modelling Conference, MAASTRICT, the Netherlands, 2004 43. Roorda, M., J and E., J., Miller (2004), “Toronto Activity Panel Survey-Demonstrating the Benefits of a Multiple Instrument Panel Survey, Proceedings of the Seventh International Conference on Travel Methods, Costa Rica, August 1-6, 2004 44. Schjerning, B.; D., L., Maire; J., P., Peterson (2004), “An Economic Enquiry into Self Employment in Denmark” Centre for Economic and Business Research-CEBR Student Paper, 2004-02 45. Schoenfeld, D (1980), “Chi-Squared Goodness of Fit Test for the Proportional Hazard Regression Model”, Biometrika, 67, pp. 145-153 46. Schoenfeld, D (1982), “Partial Residual for the Proportional Hazards Regression Model” Biometrika, 69, pp. 239-241 47. Scott, D., M., (2000), “Toward an Operational Model of Daily Household Activity-Travel Behaviour”, Ph.D Thesis, McMaster University. 48. Speybroeck, N.; D. Berkvens; A., M. Ntsakala; M., Aerts; N., Hens; G., V., Huylenbroeck and E., Thys, (2004), “Classification Trees Versus Multinomial Models in the Analysis of Urban Farming System in Central Africa”, Agricultural Systems 80, pp133-149, 2004 49. STATA (2005), “Base Reference Manual”, STATA Press

27

50. Tobin, J., (1958), “Estimation of Relationships for Limited Dependent Variable”, Econometrica, 26, No.1, pp. 24.36

Table 01: Nonparametric Life Table for Duration of Works (Full Time Workers) Survival, Time in Minutes .0- 39.6 39.6- 79.1 79.1- 118.7 118.7- 158.2 158.2- 197.8 197.8- 237.4 237.4- 276.9 276.9- 316.5 316.5- 356.0 356.0- 395.6 395.6- 435.2 435.2- 474.7 474.7- 514.3

Enter Censored At Risk Exited Survival Rate Hazard Rate 1598 0 1598 87 1 0.0014 1511 0 1511 87 0.9456 0.0015 1424 0 1424 61 0.8911 0.0011 1363 0 1363 99 0.8529 0.0019 1264 0 1264 98 0.791 0.002 1166 0 1166 70 0.7297 0.0016 1096 0 1096 67 0.6859 0.0016 1029 0 1029 56 0.6439 0.0014 973 0 973 32 0.6089 0.0008 941 0 941 44 0.5889 0.0012 897 0 897 59 0.5613 0.0017 838 0 838 62 0.5244 0.0019 776 0 776 330 0.4856 0.0137

28

514.3- 553.8 553.8- 593.4 593.4- 633.0 633.0- 672.5 672.5- 712.1 712.1- 751.6 751.6- 791.2 791.2- 830.8 830.8- 870.3 870.3- 909.9 909.9- 949.4 949.4- 989.0

446 256 153 88 41 23 9 6 2 1 1 1

0 0 0 0 0 0 0 0 0 0 0 0

446 256 153 88 41 23 9 6 2 1 1 1

190 103 65 47 18 14 3 4 1 0 0 1

0.2791 0.1602 0.0957 0.0551 0.0257 0.0144 0.0056 0.0038 0.0013 0.0006 0.0006 0.0006

0.0137 0.0127 0.0136 0.0184 0.0142 0.0221 0.0101 0.0253 0.0169 0 0 0.0506

Table 02: Nonparametric Life Table for Duration of Works (Part Time Workers) Survival .0- 28.8 28.8- 57.6 57.6- 86.4 86.4- 115.2 115.2- 144.0 144.0- 172.8 172.8- 201.6 201.6- 230.4 230.4- 259.2 259.2- 288.0 288.0- 316.8 316.8- 345.6 345.6- 374.4 374.4- 403.2 403.2- 432.0 432.0- 460.8 460.8- 489.6 489.6- 518.4 518.4- 547.2 547.2- 576.0 576.0- 604.8 604.8- 633.6 633.6- 662.4 662.4- 691.2 691.2- 720.0

Enter 248 241 230 216 192 182 174 157 146 131 124 102 94 92 86 71 58 44 32 27 19 10 4 4 4

Censored 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

At Risk 248 241 230 216 192 182 174 157 146 131 124 102 94 92 86 71 58 44 32 27 19 10 4 4 4

Exited 7 11 14 24 10 8 17 11 15 7 22 8 2 6 15 13 14 12 5 8 9 6 0 0 4

Survival Rate 1 0.9718 0.9274 0.871 0.7742 0.7339 0.7016 0.6331 0.5887 0.5282 0.5 0.4113 0.379 0.371 0.3468 0.2863 0.2339 0.1774 0.129 0.1089 0.0766 0.0403 0.0161 0.0161 0.0161

Hazard Rate 0.001 0.0016 0.0022 0.0041 0.0019 0.0016 0.0036 0.0025 0.0038 0.0019 0.0068 0.0028 0.0007 0.0023 0.0066 0.007 0.0095 0.011 0.0059 0.0121 0.0216 0.0298 0 0 0.0694

Table 03: Nonparametric Life Table for Duration of School Works (Students) Survival .0- 38.6 38.6- 77.1 77.1- 115.7 115.7- 154.3 154.3- 192.9 192.9- 231.4 231.4- 270.0 270.0- 308.6 308.6- 347.1 347.1- 385.7 385.7- 424.3 424.3- 462.9 462.9- 501.4 501.4- 540.0 540.0- 578.6 578.6- 617.1 617.1- 655.7 655.7- 694.3 694.3- 732.9 732.9- 771.4 771.4- 810.0

Enter 632 570 447 358 261 198 165 143 124 105 84 51 34 22 16 11 7 7 7 5 5

Censored 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

At Risk 632 570 447 358 261 198 165 143 124 105 84 51 34 22 16 11 7 7 7 5 5

Exited 62 123 89 97 63 33 22 19 19 21 33 17 12 6 5 4 0 0 2 0 5

Survival Rate 1 0.9019 0.7073 0.5665 0.413 0.3133 0.2611 0.2263 0.1962 0.1661 0.1329 0.0807 0.0538 0.0348 0.0253 0.0174 0.0111 0.0111 0.0111 0.0079 0.0079

29

Hazard Rate 0.0027 0.0063 0.0057 0.0081 0.0071 0.0047 0.0037 0.0037 0.0043 0.0058 0.0127 0.0104 0.0111 0.0082 0.0096 0.0115 0 0 0.0086 0 0.0519

Table 04: Semiparametric Hazard model Variable Names Before 9am 9 To 10 Am 10 To 11 Am 11 Am To 12 Noon Start Time 12 To 1 Pm Dummy 2 To 3 Pm 3 To 4 Pm 4 To 5 Pm 5 To 6 Pm After 6 Pm Weekday Travel Time In Minutes Auto Transit Available Locations Frequency Per Week Duration Flexibility (1= Flexible, 0 Otherwise) Age in Years Sex (Male=1, Female=0) Marital Status (1 For Single, 0 Otherwise) Driving License Home Based Job Yearly Income In CAD Household Size No. Of Teen In Household No. Of Kids In Household No. Of Adult Child In Household Full Time Student Part Time Student Job05 Job09 Job10 Job Type Job12 Job13 Job15 Job16

Goodness of Fit

FULL-TIME WORKER Coefficient T Statistics -1.3507 -13.888 -0.87676 -6.982 -0.86318 -5.26 -1.0009 -5.324 -0.30223 -2.187 0.223692 1.408 0.580425

2.881

0.920157 6.00e-02 9.13e-04 -0.23055 -0.29184 3.17e-02 5.42e-02

6.31 1.067 1.026 -3.189 -2.827 2.462 1.83

-0.3173

-5.391

4.76e-03 -0.42729

1.687 -7.061

0.137047

2.199

-0.22616 0.204219 2.38e-06

-1.809 1.693 3.122

-0.48727 -4.299 -0.18674 -1.975 -0.13589 -1.99 -0.20989 -1.864 0.321252 2.523 -0.93469 -4.261 -0.31128 -2.758 Likelihood Ratio =1098 Degrees Of Significance =0.00; Pseudo R2 = 0.05

PART-TIME WORKER Coefficient T Statistics -1.40007 -7.552 -0.29709 -1.251 -0.79993 -3.077

0.727466

1.97

1.609063 -0.14883 -7.22e-03

4.778 -1.043 -1.874

0.954418 6.06e-02

3.783 1.751

-8.08e-03 -0.65675

-1.383 -4.207

1.36e-05 -0.15455

4.097 -2.808

-0.39412 0.867377 0.849751 0.909801

SCHOOLWROK Coefficient T Statistics -0.8501623 -4.338 0.1247736 0.472 -0.1122739 -0.407 -9.49e-02 -0.381 -0.3513023 -1.569 0.5575305 2.114 0.7261346 3.045 0.4132632 1.613 0.1534506 0.63 0.7092517 3.745 -0.17791 -1.949 -1.47e-02 -4.465 3.43e-02 0.24 -5.56e-02 -0.296 -0.3171823 -3.282 -9.10e-02 -1.738 -0.1042803

-0.913

-0.1520595

-1.545

-7.02e-02 0.3100455 0.2602853 -3.41e-02 -0.4826752

-0.507 5.737 1.902 -0.225 -2.534

-1.124 3.172 4.241 1.893

Likelihood Ratio =160 Degrees Of Significance =0.00; Pseudo R2 =0.07

Likelihood Ratio =313 Degrees Of Significance =0.00; Pseudo R2 =0.045

PART-TIME WORKER Coefficient T Statistics 5.885685 68.942 0.521351 6.288 0.136571 1.542 0.250613 2.023

SCHOOLWROK Coefficient T Statistics 4.7459146 24.916 0.4168828 4.267

Table 05: Weibull-Parametric Hazard model Variable Names Constant Before 9am 9 To 10 Am 10 To 11 Am 11 Am To 12 Noon Start Time 12 To 1 Pm Dummy 2 To 3 Pm 3 To 4 Pm 4 To 5 Pm 5 To 6 Pm After 6 Pm Weekday Travel Time In Minutes Auto

FULL-TIME WORKER Coefficient T Statistics 5.91475 85.839 0.3481246 10.324 0.2404988 5.237 0.2314107 4.183

-0.323633

-5.599

-0.33369

-2.195

-0.4135242

-6.344

-0.6202539

-13.394

-0.9004

-7.016

-0.0005785 0.1366611

-2.006 4.492

2.96e-03 8.22e-02

1.729 0.885

30

0.1262759 9.05e-02 0.1266841 -0.3565645 -0.4229596 -0.4316086 -0.1530203 -0.4384464 6.85e-02 6.95e-03 0.1161435

0.953 0.841 0.918 -2.737 -3.034 -3.219 -1.204 -4.76 1.137 3.15 1.214

Transit Available Locations Frequency Per Week Duration Flexibility (1= Flexible, 0 Otherwise) Age in Years Sex (Male=1, Female=0) Marital Status (1 For Single, 0 Otherwise) Driving License Home Based Job Yearly Income In CAD Household Size No. Of Teen In Household No. Of Kids In Household No. Of Adult Child In Household Full Time Student Part Time Student Job05 Job09 Job10 Job Type Job12 Job13 Job15 Job16

0.1814242 -0.0184836 -0.0367527

3.191 -2.705 -2.377

8.01e-02

2.462

-0.26195 -8.18e-02

-1.601 -5.219

0.1631654 0.1738283 2.67e-02

1.264 3.043 0.898

0.1266913

1.862

4.39e-02

0.498

-0.1313755 0.1864408 0.3910194

-1.552 1.537 2.576

-0.1638786

-2.204

0.2501023

2.376

-0.5872452 -0.3027753

-6.524 -2.146

0.933967

12.331

0.1028804

3.213

0.227464

2.531

-0.0733438

-2.05

-0.17833

-1.924

-3.23e-06

-2.161

-0.1605496

-3.725

0.1006012

0.1032748 -0.1801525 3.30e-01 0.0673013

1.461 -0.53468 -0.34322 -0.39094

2.409 -2.795 2.058 1.283

-4.019 -3.793 -2.857

Variables In Variance Heterogeneity Age In Years Sex (Male=1, Female=0) Marital Status (1 For Single, 0 Otherwise) Weekend Full Time Student Part Time Student Sigma Lambda P Median

Goodness of Fit

-6.61e-03 0.2006441

-3.794 6.158

-1.25e-02 0.424836

-3.431 4.155

0.1547296

4.184

0.679675

4.333

-0.1580671

-4.837

-0.2316

-1.957

Ancillary Parameters For Survival 0.5229283 13.319 0.538148 6.382 Parameters Of Underlying Distribution 0.00268 0.00338 2.27365 2.29859 317.52846 252.3097 Likelihood Ratio =976 Likelihood Ratio =194 Degrees Of Significance Degrees Of Significance =0.00; Pseudo R2 =0.3 =0.00; Pseudo R2 =0.37

0.0058 1.73235 139.46234 Likelihood Ratio =408 Degrees Of Significance =0.00; Pseudo R2 =0.24

Table 06: Log-Logistic Parametric Hazard model Variable Names Constant Before 9am 9 To 10 Am 10 To 11 Am 11 Am To 12 Noon Start Time 12 To 1 Pm Dummy 2 To 3 Pm 3 To 4 Pm 4 To 5 Pm 5 To 6 Pm After 6 Pm Weekday Travel Time In Minutes Auto Transit Available Locations

FULL-TIME WORKER Coefficient T Statistics 5.8741592 71.585 0.4218784 11.068 0.1953895 3.605 0.1297089 1.771

PART-TIME WORKER Coefficient T Statistics 5.6659006 73.774 0.530773 6.802

SCHOOLWROK Coefficient T Statistics 4.0414174 18.259 0.6366151 5.636

0.2523827

2.057

-0.4987701

-7.876

-0.3135386

-1.862

-0.8689455

-10.522

-0.8898076

-14.141

-1.1152846

-8.19

-0.0012028 0.1695707 0.2267905 -0.0165127

-4.394 4.775 3.727 -2.235

1.50e-03 0.1989357

1.385 2.435

-0.1039164

-6.968

0.2563137 8.05e-02 0.4277142 -8.33e-02 -0.1081444 -0.2778073 -3.96e-02 -0.1410401 3.59e-02 8.20e-03 0.2221166 0.2147272 0.1557057

31

1.699 0.594 2.846 -0.528 -0.766 -1.883 -0.252 -1.265 0.601 4.128 2.395 1.86 2.96

Frequency Per Week Duration Flexibility (1= Flexible, 0 Otherwise) Age in Years Sex (Male=1, Female=0) Marital Status (1 For Single, 0 Otherwise) Driving License Home Based Job Yearly Income In CAD Household Size No. Of Teen In Household No. Of Kids In Household No. Of Adult Child In Household Full Time Student Part Time Student Job05 Job09 Job10 Job11 Job Type Job12 Job13 Job15 Job16

-0.0628913

-3.574

0.0878718

2.404

0.1004163

2.73

0.144276

1.508

-0.0775724

-1.796

-0.2694028

-3.298

-0.1543849

-2.604

0.091355

-0.0253891 0.0658089 -0.2043531 0.4187

5.47e-02

1.779

0.1417211

2.071

0.1415014

1.565

-0.1214334 0.3733426 0.5204659

-1.476 2.439 2.934

1.083

-0.39 1.194 -2.759 2.619

-0.7220719 -0.3711649

-4.539 -4.6

-0.4521941

-2.552

Variables In Variance Heterogeneity Age In Years Sex (Male=1, Female=0) Marital Status (1 For Single, 0 Otherwise) Weekend No. Of Kids In Households Full Time Student Part Time Student Sigma Lambda P Median

Goodness of Fit

-1.12e-02 2.84e-01

-6.536 8.793

-1.04e-02 0.3717094

-2.984 3.583

-8.06e-02

-1.044

2.32e-01

6.029

0.472457

3.049

0.2735569

2.488

-1.67e-01

-5.268 0.1262159

2.532 -0.7278493 -0.3573143

-8.014 -2.623

0.6952177

13.306

Ancillary Parameters For Survival 0.5229283 13.319 0.3259295 6.167 Parameters Of Underlying Distribution 0.00325 0.00423 3.18258 3.29297 308.11437 236.3386 Likelihood Ratio =1284 Likelihood Ratio =226 Degrees Of Significance Degrees Of Significance =0.00; Pseudo R2 =0.31 =0.00; Pseudo R2 =0.36

0.0077 2.50929 129.87822 Likelihood Ratio =226 Degrees Of Significance =0.00; Pseudo R2 =0.36

Table 07: Lognormal Parametric Hazard model Variable Names Constant Before 9am 9 To 10 Am 10 To 11 Am 11 Am To 12 Noon Start Time 12 To 1 Pm Dummy 2 To 3 Pm 3 To 4 Pm 4 To 5 Pm 5 To 6 Pm After 6 Pm Weekday Travel Time In Minutes Auto Transit Available Locations

FULL-TIME WORKER Coefficient T Statistics 5.9055235 63.791 0.4702363 10.143 0.2242617 3.401 0.1379423 1.565

PART-TIME WORKER Coefficient T Statistics 5.2024528 23.995 0.7012567 6.937 0.1686597 1.249 0.4981014 3.028 0.3465694

1.819

-9.414

0.2563172

1.272

-0.9732837

-15.202

-0.9701414

-6.039

-0.0010209

-4.001

1.54e-03 0.2490497

1.292 2.697

-0.0134897

-1.481

-8.81e-02

-5.903

-0.4552947

-5.705

-0.8507998

32

SCHOOLWROK Coefficient T Statistics 3.9612953 14.412 0.6800987 5.788 0.3247859 0.1358235 0.3449904 1.61e-02 -0.1103412 -0.2045928 -4.05e-02 -0.1070197 4.03e-02 1.01e-02 0.1907307 0.1033825 8.11e-02

1.981 0.982 2.497 0.083 -0.739 -1.335 -0.252 -0.985 0.603 4.104 1.609 0.819 1.582

Frequency Per Week Duration Flexibility (1= Flexible, 0 Otherwise) Age in Years Sex (Male=1, Female=0) Marital Status (1 For Single, 0 Otherwise) Driving License Home Based Job Yearly Income In CAD Household Size No. Of Teen In Household No. Of Kids In Household No. Of Adult Child In Household Full Time Student Part Time Student Job05 Job09 Job10 Job11 Job Type Job12 Job13 Job15 Job16

-0.0643577

-3.235

0.1264781

2.794

0.0945737

2.035

-0.1003681

-1.761

-0.1543849 -1.039e-06

-2.604 -1.887

0.091355

-0.0253891 0.0658089 -0.2043531 0.4187

5.04e-03

1.207

-0.3235967

-3.038

0.2258862

1.397

-0.6447071 -0.3487545

-4.045 -3.489

-0.5607937

-2.929

3.67e-02

1.029

0.1145551

1.379

0.1108716

1.022

-0.1385015 0.5370283 0.6846401

-1.47 2.485 2.895

1.083

-0.39 1.194 -2.759 2.619

Variables In Variance Heterogeneity Age In Years Sex (Male=1, Female=0) Marital Status (1 For Single, 0 Otherwise) Weekend No. Of Kids In Households Full Time Student Part Time Student Sigma Lambda P Median

Goodness of Fit

-8.84e-03 2.44e-01

-6.826 9.269

-9.96e-03 0.4140407

-3.236 5.093

-9.14e-02

-1.412

2.19e-01

6.51

0.2963996

2.111

0.2017943

2.199

-1.36e-01

-6.36 0.1415999

3.272 -0.7859277 -0.4786511

-10.258 -3.961

1.3953531

15.209

Ancillary Parameters For Survival 0.8345525 16.098 0.6135546 7.106 Parameters Of Underlying Distribution 0.00354 0.00451 1.49404 1.76687 282.4912 221.66906 Likelihood Ratio =1126 Likelihood Ratio =216 Degrees Of Significance Degrees Of Significance =0.00; Pseudo R2 =0.3 =0.00; Pseudo R2 =0.34

0.00822 1.34168 121.64942 Likelihood Ratio =368 Degrees Of Significance =0.00; Pseudo R2 =0.21

Table 08: Flexible Parameter Hazard model Variable Names Constant Before 9am 9 To 10 Am 10 To 11 Am 11 Am To 12 Noon Start Time 12 To 1 Pm Dummy 2 To 3 Pm 3 To 4 Pm 4 To 5 Pm 5 To 6 Pm After 6 Pm Weekday Travel Time In Minutes Auto Transit Available Locations

FULL-TIME WORKER Coefficient T Statistics 3.7837863 11.422 1.5072987 11.136 0.4176714 2.275 0.2114835 0.828

PART-TIME WORKER Coefficient T Statistics 4.8608091 6.568 1.8391388 5.705

-1.0184287

-1.531

-1.0873643

-4.215

-1.3023079

-1.658

-1.2688083

-3.495

-0.6833368

-1.065

-1.9883199

-8.362

-3.5371611

-6.19

-3.00e-03 0.6238824 0.8010446 -6.57e-02

-2.052 5.31 4.193 -2.889

1.55e-02

2.768

-0.7888955 -0.3813252

-1.161 -5.56

33

SCHOOLWROK Coefficient T Statistics 4.21730219 6.555 0.1533893 0.443 -0.4439924 -0.959 -0.2326146 -0.502 0.1735093 0.388 -0.191104 -0.441 -0.5404621 -0.97 -0.3550044 -0.824 -0.2489029 -0.531 -0.3869232 -0.96 1.62e-02 0.048 .2146361 1.118 -9.88e-03 -1.834 2.90e-02 0.13 -0.18546 -0.582 -.26327 -1.525

Frequency Per Week Duration Flexibility (1= Flexible, 0 Otherwise) Age in Years Sex (Male=1, Female=0) Marital Status (1 For Single, 0 Otherwise) Driving License Home Based Job Yearly Income In CAD Household Size No. Of Teen In Household No. Of Kids In Household No. Of Adult Child In Household Full Time Student Part Time Student Job05 Job09 Job10 Job11 Job Type Job12 Job13 Job15 Job16

-0.1689108

-3.31

0.5211651

4.844

-0.2998406

-1.516

-.175156

-2.113

-7.50e-03 0.5427561

-1.474 5.186

0.8211476

2.826

0.6212688

3.224

-0.1876462

-1.715

0.6950255

2.198

-0.5078359

-2.364

2.271

7.522

2.779 -2.4167 0.5765

10.598 -7.035 1.808

0.7307266 0.33163

3.52 2.064

-0.732508 1.9695061

-3.363 5.799

-1.9568299 -1.3157126

-4.456 -3.191

1.9498311

2.11

Threshold Parameters Mu( 1) Mu( 2) Mu( 3) Mu( 4) Mu( 5) Mu( 6) Mu( 7) Mu( 8) Mu( 9) Mu(10) Mu(11) Mu(12)

Goodness of Fit

1.0765302 12.73 1.9048308 18.799 2.5997728 23.321 3.0485652 25.947 3.3730845 27.292 3.7807296 28.891 4.6000839 33.245 6.3152408 40.748 7.3713376 43.097 8.4960321 40.617 9.451226 33.239 10.567978 24.259 Likelihood Ratio =1212 Degrees Of Significance =0.00; Pseudo R2 =0.16

1.7280062 6.209 2.4374611 8.126 3.2505071 9.783 3.9671956 11.095 4.4731801 11.39 5.1642178 12.34 5.8589529 13.716 6.9012972 14.698 8.2671062 15.137 9.2433857 13.186 1.7280062 6.209 2.4374611 8.126 Likelihood Ratio =205 Degrees Of Significance =0.00; Pseudo R2 =0.17

1.6540528 9.01 2.755467 12.571 3.4478816 15.123 3.8158936 16.322 4.2187651 16.929 4.588594 18.235 4.7308134 18.782 4.8400856 19.059 5.0416699 19.569 5.0976994 19.782 5.4316558 20.789 9.1038875 14.001 Likelihood Ratio =329 Degrees Of Significance =0.00; Pseudo R2 =0.11

Full-Time Workers TIME INTERVAL IN MINUTES 0 60 120 180 240 300 360 420 480 540 600 660 720 780 •

TIME INTERVAL IN HOUR

HAZARD RATE

INTEGRATED HAZARD RATE

SURVIVAL RATE, S(T)

CUMULATIVE DISTRIBUTION F(T)

PROBABILITY DISTRIBUTION F(T)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

0 0.021564 0.040044 0.072636 0.114599 0.114723 0.108439 0.16453 0.384109 0.757873 0.634091 0.669047 0.613076 0.671875 0.671875

0 0.021564 0.061608 0.134244 0.248843 0.363566 0.472005 0.636535 1.020644 1.778517 2.412608 3.081655 3.694731 4.366606 5.038481

1 0.978667 0.940251 0.874376 0.779702 0.695193 0.62375 0.529122 0.360363 0.168888 0.089581 0.045883 0.024854 0.012694 0.006484

0 0.021333 0.059749 0.125624 0.220298 0.304807 0.37625 0.470878 0.639637 0.831112 0.910419 0.954117 0.975146 0.987306 0.993516

0 0.021333 0.038416 0.065875 0.094674 0.08451 0.071442 0.094628 0.16876 0.191474 0.079307 0.043698 0.021029 0.01216 0.006211

Time

Hazard Rate

Part-Time Workers Time

Integrated

Survival Rate,

34

Cumulative

Probability

Interval In Minutes 0 60 120 180 240 300 360 420 480 540 600 660 720

Interval In Hour 0 1 2 3 4 5 6 7 8 9 10 11 12

Time Interval In Minutes 0 30 60 90 120 150 180 210 240 270 300 330 360 390 420

Time Interval In Hour 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7

0 0.0269546 0.110941 0.1223 0.231968 0.303959 0.281238 0.413581 0.454014 0.624676 0.737984 0.621121 0.621121

Hazard Rate

S(T)

Distribution F(T)

Distribution F(T)

0 0.0269546 0.1378956 0.2601956 0.4921636 0.7961226 1.0773606 1.4909416 1.9449556 2.5696316 3.3076156 3.9287366 4.5498576

1 0.9734054 0.8711896 0.7709008 0.6113023 0.4510746 0.340493 0.2251605 0.1429936 0.0765637 0.0366033 0.0196685 0.0105687

0 0.0265946 0.1288104 0.2290992 0.3886977 0.5489254 0.659507 0.7748395 0.8570064 0.9234363 0.9633967 0.9803315 0.9894313

0 0.0265946 0.1022158 0.1002889 0.1595984 0.1602278 0.1105815 0.1153325 0.082167 0.0664298 0.0399604 0.0169348 0.0090998

Schoolwork Hazard Rate

Integrated Hazard Rate

Survival Rate, S(T)

Cumulative Distribution F(T)

Probability Distribution F(T)

0 0.041936 0.150606 0.272208 0.289341 0.204848 0.248164 0.250875 0.110338 0.087677 0.159077 0.047814 0.258111 0.97211 0.97211

0 0.041936 0.192542 0.46475 0.754091 0.958939 1.207103 1.457978 1.568316 1.655992 1.815069 1.862883 2.120994 3.093104 4.065214

1 0.958931 0.82486 0.628292 0.470438 0.383299 0.299062 0.232706 0.208396 0.190903 0.162827 0.155224 0.119912 0.045361 0.017159

0 0.041069 0.17514 0.371708 0.529562 0.616701 0.700938 0.767294 0.791604 0.809097 0.837173 0.844776 0.880088 0.954639 0.982841

0 0.041069 0.134072 0.196568 0.157854 0.087139 0.084237 0.066356 0.024311 0.017493 0.028076 0.007602 0.035312 0.074551 0.028202

Table 09: Tobit model Variable Names Constant Before 9am 9 To 10 Am 10 To 11 Am 11 Am To 12 Noon Start Time 12 To 1 Pm Dummy 2 To 3 Pm 3 To 4 Pm 4 To 5 Pm 5 To 6 Pm After 6 Pm Weekday Travel Time In Minutes Auto Transit Available Locations Frequency Per Week Duration Flexibility (1= Flexible, 0 Otherwise) Age in Years Sex (Male=1, Female=0) Marital Status (1 For Single, 0

FULL-TIME WORKER Coefficient T Statistics 348.5406 19.738 130.45485 14.415 50.163674 4.299 40.969299 2.547

PART-TIME WORKER Coefficient T Statistics 417.77227 9.613 171.92322 8.463

-46.482307

-0.964

-75.701738

-1.088

-78.229453

-3.963

-107.70834

-4.39

-134.68815

-7.112

-180.40332

-3.565

-0.1187324 50.048555 61.162644 -4.3151679 -12.49959

-1.646 6.268 4.747 -2.597 -3.273

0.3872878

1.236

-51.859082 -22.150721 -36.323761

-1.402 -5.697 -3.089

31.472743

4.017

33.322638 -17.952373

4.431 -2.146

69.666462 -28.040871

3.43 -1.457

35

SCHOOLWROK Coefficient T Statistics 76.147967 2.388 140.43524 8.09 10.357122 0.396 35.793047 1.556 30.940012 1.403 47.284305 2.201 -20.179061 -0.546 -38.092626 -1.321 -21.290916 -0.791 14.897427 0.566 -19.586272 -0.906 14.887682 1.467 2.0803542 7.064 9.8893397 0.661 20.733009 1.234 33.388961 4.28 0.7933754 0.151

3.2273126

0.286

Otherwise) Driving License Home Based Job Yearly Income In CAD Household Size No. Of Teen In Household No. Of Kids In Household No. Of Adult Child In Household Full Time Student Part Time Student Job05 Job09 Job10 Job11 Job Type Job12 Job13 Job15 Job16

-39.056297 -1.61e-04 1.20e-03

-2.747 -1.579 0.098

49.752476 19.009296

2.882 1.611

63.437871 -148.08921

1.622 -4.784

17.553921 -59.902021 149.60023

1.341 -3.601 4.754

-53.550293 127.39061

-1.013 1.826

34.821803

2.044

-8.0654171 4.9323095 60.235199

-0.546 0.309 1.784

Variables In Variance Heterogeneity Age In Years Sex (Male=1, Female=0) Marital Status (1 For Single, 0 Otherwise) Weekend Household Size No. Of Kids In Households Full Time Student Part Time Student Sigma

Goodness of Fit

-3.78e-03 0.1732163

-2.21 5.357

-1.33e-02 0.353287

-3.966 3.291

-1.36e-03

-0.021

6.52e-02 -9.49e-02

1.487 -2.685

0.3498296

2.495

0.1517072

2.359

0.1193462

2.381 2.68e-02 0.6254023

0.273 4.664

Ancillary Parameters For Survival 136.1958 11.027 113.51331 3.322 Likelihood Ratio =1450 Likelihood Ratio =216 Degrees Of Significance Degrees Of Significance =0.00; Pseudo R2 =0.07 =0.00; Pseudo R2 =0.07

36

97.301797 10.253 Likelihood Ratio =410 Degrees Of Significance =0.00; Pseudo R2 =0.05

Comparison of Duration Distributions

Cumulative Probability Distributions

1

0.8

0.6

TTS Full Time Job 0.4

CHASE Full Time Job TTS Part Time Job CHASE Part Time Job TTS School

0.2

CHASE School

0

0

100

200

300

400

500

Duration in Minutes

600

700

800

900

1000

Figure 01: Comparison of Duration Distributions Derived from Two Different Data Sources 1

1

0.9

0.9

0.8

0.8

Predicted

0.7

0.7

Observed

Predicted

0.6

Survival

Survival

0.6

Observed 0.5

0.5 0.4

0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0

0 0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

1

2

3

4

5

6

7

8

9

10

11

12

16

Duration in Hours

Duration in Hours

Figure 02: Survival Distribution by Semiparametric Hazard model-Full Time Workers

Figure 03: Survival Distribution by Semiparametric Hazard model-Part Time Workers

37

1.000 1 0.900 0.9

Predicted

0.800

0.8

Observed; 10% Error Offset

Predicted 0.700

0.7

Observed 0.600

Survival

Survival

0.6 0.5 0.4

0.500

0.400

0.3 0.300 0.2 0.200 0.1 0.100 0 0

1

2

3

4

5

6

7

8

9

10

11

12

0.000

Duration in Hours

0

1

2

3

4

5

6

7

8

9

10

11 12

13

14

15 16

Duration in Hours

Figure 04: Survival Distribution by Semiparametric Hazard model-School Work

Figure 05: Survival Distribution by Weibull Parametric Hazard Model: Full-Time Workers

1.000

1.000

0.900

Predicted

0.800

Observed; 10% Error Offset

0.900

0.700

Predicted

0.700

Observed; 10% Error Offset

0.600

Survival

Survival

0.600

0.800

0.500 0.400

0.500

0.400 0.300 0.300 0.200 0.200 0.100 0.100 0.000 0

1

2

3

4

5

6

7

8

9

10

11

12

0.000 0

Duration in Hours

1

2

3

4

5

6

7

8

9

10

11

Duration in Hours

Figure 06: Survival Distribution by Weibull Parametric Hazard Model: Part-Time Workers

38

Figure 07: Survival Distribution by Weibull Parametric Hazard Model: School Work

12

1

1

0.9

0.9

Predicted

Observed; 10% Error Offset 0.8

0.8

Predicted

Observed; 10% Error Offset

0.7

0.7 0.6

Survival

Survival

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1 0

0 0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

0

16

1

2

3

5

6

7

8

9

10

11 12 13 14

15 16

Duration in Hours

Duration in Hours

Figure 08: Survival Distribution by Log-Logistic Parametric Hazard Model: Full-Time Workers

Figure 09: Survival Distribution by Flexible Parameter Hazard Model: Full-Time Workers

1

1

0.9

0.9

0.8

Observed; 10% Error Offset

0.8

0.7

Predicted

0.7

Observed; 10% Error Offset Predicted 0.6

Survival

0.6

Survival

4

0.5

0.5 0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0 0

1

2

3

4

5

6

7

8

9

10

11

0

12

1

2

3

4

5

6

7

Duration in Hours

Duration in Hours

Figure 10: Survival Distribution by Flexible Parameter Hazard Model: Part-Time Workers

Figure 11: Survival Distribution by Flexible Parameter Hazard Model: Schoolwork

39

1

1

0.9

0.9

0.8

0.8

Observed; 10% Error Offset Predicted

Observed; 10% Error Offset 0.7

Predicted

0.7

0.6

Survival

Survival

0.6 0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

0

1

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16

0

1

2

3

4

Duration in Hours

5

6

9

10

11

12

Figure 13: Survival Distribution by Tobit Model: PartTime Workers

1 0.9

Survival

8

Duiration in Hours

Figure 12: Survival Distribution by Tobit Model: FullTime Workers

0.8

Observed; 10% Error Offset

0.7

Predicted

0.6 0.5 0.4 0.3 0.2 0.1 0 0

7

1

2

3

4

5

6

7

8

9

10

11

12

Duration in Hourd

Figure 14: Survival Distribution by Tobit Model: Schoolwork

40

13