The impact of simplification in a sequential rule ... - Semantic Scholar

Environment and Planning A 2005, volume 37, pages 551 ^ 568

DOI:10.1068/a36167

The impact of simplification in a sequential rule-based model of activity-scheduling behavior Elke A L M G Moons, Geert P M Wets

Data Analysis and Modeling Group, Limburgs Universitair Centrum, 3590 Diepenbeek, Belgium; e-mail: [email protected], [email protected]

Marc Aerts

Center for Statistics, Limburgs Universitair Centrum, 3590 Diepenbeek, Belgium; e-mail: [email protected]

Theo A Arentze, Harry J P Timmermans

Urban Planning Group, Eindehoven University of Technology, 5600 MB Eindhoven, The Netherlands; e-mail: [email protected], [email protected] Received 9 July 2003; in revised form 5 June 2004

Abstract. The aim of this paper is to gain a better understanding of the impact of simplification on a sequential model of activity-scheduling behavior which uses feature-selection methods. To that effect, the predictive performance of the Albatross model, which incorporates nine different facets of activity ^ travel behavior, based on the original full decision trees, is compared with the performance of the model based on trimmed decision trees. The results indicate that significantly smaller decision trees can be used for modeling the different choice facets of the sequential model system without losing much in predictive power. The performance of the models is compared at three levels: the choice-facet level, the activity-pattern level (comparing the observed and generated sequences of activities), and the trip-matrix level, comparing the correlation coefficients that determine the strength of the associations between the observed and the predicted origin ^ destination matrices. The results indicate that the model based on the trimmed decision trees predicts activity-diary schedules with a minimum loss of accuracy at the decision level. Moreover, the results indicate a slightly better performance at the activity-pattern and the trip-matrix level.

1 Introduction During the last decade interest in spatial interaction patterns in transportation research and spatial sciences alike has shifted away from trips and tours to the analysis of complex daily activity ^ travel patterns (for example, Bhat and Koppelman, 1999). This shift in interest was motivated both by methodological and by policy considerations. It was realized that travel patterns are a manifestation of activity participation at different points in space (for example, Axhausen and Ga«rling, 1992). A focus on daily activity patterns as opposed to single trips and multistop multipurpose tours was felt to lead to potentially better predictions of travel demand in time and space. The activitybased approach would allow one to capture better the interdependencies of activity participation and travel, within a particular spatial and institutional context. Furthermore, it would allow one to assess the impact of such new policy areas as teleworking and teleshopping that were virtually impossible to tackle with conventional models (for example, see Timmermans et al, 2002). From a methodological perspective, the new focus on timing and duration of activities led to the application of statistical methods, including hazard (Bhat, 1996) and Tobit models that were rather new to the field. In addition, and perhaps more challenging, were the attempts to develop more comprehensive models of activity ^ travel behavior. These models typically attempt to predict various facets of travel behavior. In addition to the traditional facets of destination, transport mode choice, and perhaps trip chaining (multipurpose trips), activity-based models consider activity

552

E A L M G Moons, G P M Wets, M Aerts, and co-workers

choice, timing, duration, travel party, and route choice. Moreover, various types of constraints (spatial, temporal, institutional, spatial ^ temporal) were incorporated in the modeling efforts, and in some cases the decisionmaking unit was the household as opposed to the individual (for example, Gliebe and Koppelman, 2002; Zhang et al, 2002). This increased complexity (more choice facets, more choice alternatives, preferences, and constraints, coordination of multiple persons, intertemporal and spatial dependencies, etc) caused a major challenge for the modeling community. The multitude of modeling attempts seems to converge on two modeling approaches. First, the discrete-choice ^ utility-maximizing models, originally developed for trip and tour data, were extended to include more facets. Examples of such utility-maximizing models include the Daily Activity Scheduling model (Bowman, 1998), the CATGW (Comprehensive Activity ^ Travel Generation for Workers) model system (Bhat, 1999), PCATS (Prism-Constrained Activity Travel Simulator) (Kitamura and Fujii, 1998), and Patricia (Borgers et al, 2002), to name a few. Second, as individuals do not necessarily maximize their utility, rule-based computational process models of activity-scheduling behavior have been developed. Unlike the econometric models, computational process models do not rely on algebraic equations but on a set of Boolean decision rules or neural networks to predict observed activity ^ travel patterns. Examples of such models include Scheduler (Ga«rling et al, 1989), AMOS (Activity ^ Mobility Simulator) (Pendyala et al, 1995), Albatross (Arentze and Timmermans, 2000; 2002), and Aurora (Joh et al, 2001a), although Aurora uses a combination of algebraic equations and decision heuristics. A more detailed state-of-the-art review is given in Timmermans et al (2002). These competing modeling approaches each have their specific advantages and pitfalls. Protagonists of computational process models argue that the utility-maximizing econometric models do not reflect the true behavioral mechanisms underlying travel decisions and these models are based on assumptions about travel behavior that are too rigorous (for example, Ga«rling et al, 1998). Likewise, advocates of econometric approaches argue that computational process models lack rigor, ease of interpretation, and the ability to assess the significance of the decision rules statistically. This leads to the paradox that, although computational process models have been developed to reflect better the behavioral mechanism underlying activity ^ travel decisions, they are often viewed as black boxes, consisting of a large number of decision rules the specific influence of which on the final outcome of the model is impossible to identify. This discussion should take place in the context of the purpose of the model. If the goal is to understand behavioral mechanisms underlying travel behavior better, rules that better reflect actual decisionmaking seem paramount. On the other hand, if the goal is to predict travel patterns, the situation seems less clear as behaviorally sounder models do not necessarily also predict better. Unfortunately there is a lack of comparative studies in which the predictive performance of competing models, derived from the same data, is compared. Consequently, the discussion about future directions and pros and cons of the competing modeling approaches from a predictive point of view remains almost philosophical in nature. Recently, several studies have indicated an increasing interest in the computational process model approach in order to model activity-diary data. These models derive choice rules (Boolean expressions) from activity ^ travel-diary data. This process of rule induction is similar to the process of parameter estimation in algebraic econometric models. Albatross (Arentze and Timmermans, 2000), the most complex fully operational computational process model to date, derives decision rules by using a CHAID (chi-squared automatic interaction detection) induction algorithm. This means that the set of condition variables, assumed to influence some facet of activity ^ travel behavior,

Sequential rule-based model of activity-scheduling behavior

553

is successively split on the basis of the w2 measure, so as to find sets of conditions that are as homogeneous as possible until some stop criterion is met. This process can be represented in terms of a decision tree, which indicates which combination of condition states leads to a particular action (for more details see, for example, Arentze et al, 2000). As indicated, a large number of such rules is often extracted from the data. Although a larger number of rules may be valuable when one wishes to understand the data better, from a predictive perspective a large number of rules may imply that the decision-tree induction algorithm has overfitted the data. The decision-tree structure obtained (set of decision rules) may then be very unstable and sensitive to highly correlated condition variables. Feature selection (FS) offers a solution to reducing the number of irrelevant attributes and as a consequence the size of the decision tree will often also be reduced. The key notion underlying FS is that the number of decision rules (size of the tree) is reduced by selecting and deleting irrelevant features or condition variables, on the basis of some statistical measure. The impact of FS on the predictive performance of rulebased models is, however, not clear a priori. On the one hand, because the irrelevant conditions are deleted, FS may not have a substantial negative effect on predictive performance. However, a smaller decision tree may also result in a higher probability of misclassification, leading to worse predictive performance. It is against this background that in the present paper we report the findings of a methodological study that was conducted to gain a better understanding of the influence of a smaller set of decision rules (trimmed decision tree) on the predictive performance of sequential models of activity-scheduling behavior in general and the Albatross model in particular. In a previous paper (Moons et al, 2001), we examined the influence of irrelevant attributes on the performance of the decision tree for the transport mode agent of the Albatross model system. We found that a trimmed decision tree, involving considerably fewer decision rules, did not result in a significant drop in predictive performance compared with the original larger set of rules that was derived from the activity ^ travel diaries. In this sequel we will investigate to what extent this result can be generalized to the full set of decision trees, representing different choice facets, that make up the complete Albatross model system. The predictive performance will also be evaluated at activity-pattern level, where observed and generated sequences of activities are compared and at trip-matrix level where the correlation coefficients that determine the strength of the associations between the observed and predicted origin ^ destination (OD) matrices are judged against each other. The paper is organized as follows. First, to give the necessary background for this study, the Albatross model will be summarized, followed by a discussion of tree induction and FS. Next, the design of the study will be explained in more detail. This is followed by a description of the data that were used for the analysis. In the next section we then report the results of the analysis. Finally, we will draw some conclusions and discuss the implications of our findings for the development and application of activity-based models of travel demand. 2 The Albatross system The Albatross model was developed for the Dutch Ministry of Transportation (Arentze and Timmermans, 2000). In the present study we used the data that were used to find the set of rules for the original model. It should be noted, however, that in the meantime an extended set of rules, based on a larger dataset, has been derived (Arentze and Timmermans, 2002; 2004). When we started this study, the larger dataset was not available.

554


Empty list

no

List of activities

Next activity

Next activity

Select

Start time

Schedule Next tour

Mode 1

yes With whom

Mode 2 Next activity Next activity

Duration

Trip chain

yes

Location 1 no Location 2

Add to list

List of activities

Schedule

Schedule + mode + location

Figure 1. The Albatross scheduling engine.

The computational process model relies on a set of Boolean decision rules that are used to predict activity ^ travel patterns. These rules were extracted from activity-diary data. The activity-scheduling process is sequential in nature. Figure 1 provides a schematic representation of the Albatross scheduling model. The activity-scheduling agent of Albatross is based on an assumed sequential execution of decision trees to predict activity ^ travel patterns. The model first executes a set of decision rules to predict which activity will be inserted into the schedule. It then determines, on the basis of another set of rules, with whom the activity is conducted and the duration of the activity. The order in which activities are evaluated is predefined as: daily shopping, services, nondaily shopping, and social and leisure activities. The assignment of a scheduling position to each selected activity is the result of the next two steps. After a start-time interval has been selected for an activity, tripchaining decisions determine for each activity whether the activity is to be connected with a previous and/or later activity. Those trip-chaining decisions are important not only for timing activities but also for organizing trips into tours. The next steps involve the choice of transport mode for work (referred to as mode 1), the choice of transport mode for other purposes (referred to as mode 2), and the choice of location. Possible interactions between mode and location choices are taken into account by using location information as a condition of mode-selection rules. 3 Decision-tree induction Decision-tree induction is similar to parameter-estimation methods in econometric models. The goal of tree induction is to find the set of Boolean rules that best represents the empirical data. The original Albatross system was derived by using a w2-based approach.


555

In this study, however, the decision trees were reinduced by using the C4.5 method (Quinlan, 1993) because this method is a benchmarking method in the data-mining community. Wets et al (2000) found approximately equal performance of these two tree-induction algorithms in terms of goodness of fit in a representative case study. The C4.5 algorithm works as follows. Let there be a given set of choice observations i taken from activity ^ travel-diary data. Consider the n different attributes or conditions Xi1 , Xi 2 , ... , Xin and the choice or action variable Yi , Yi 2 f1, 2, ::: , pg for i 1, ::: , I. In general, a decision tree consists of different layers of nodes. It starts from the root node in the first layer or first parent node. This parent node will split into daughter nodes on the second layer. In turn, each of these daughter nodes can become a new parent node in the next split, and this process may continue with further splits. A leaf node is a node which has no offspring nodes. Nodes in deeper layers become increasingly more homogeneous. An internal node is split by considering all allowable splits for all variables and the best split is the one with the most homogeneous daughter nodes. The C4.5 algorithm recursively splits the same space on X into increasingly homogeneous partitions in terms of Y, until the leaf nodes contain only cases from a single class. Increase in homogeneity achieved by a candidate split is measured in terms of an information-gain ratio. To understand this concept, the following definitions are relevant: Definition 1: information of a message. The information conveyed by a message depends on its probability and can be measured in bits as minus the logarithm to base 2 of that probability. For example, if there are four equally probable messages, the information conveyed by any of them is ÿ log2 (1=4 2 bits. Definition 2: information of a message that a random case belongs to a certain class freqCi , T bits , ÿ log2 jT j where T is a training set of cases, Ci is a class i, and freq(Ci , T ) is the number of cases in T that belong to class Ci . Based on these definitions, the average amount of information needed to identify the class of a case in a training set (also called entropy) can be deduced as follows: Definition 3: entropy of a training set k X freqCi , T freqCi , T infoT ÿ bits. log2 jT j jT j i1 Entropy can also be measured after T has been partitioned into n sets by using the outcome of a test carried out on attribute X. This yields: Definition 4: entropy after the training set has been partitioned on a test X n X jTi j infox T infoTi . jT j i1 Using these two measurements, the gain criterion can be defined as follows: Definition 5: gain criterion gainX infoT ÿ infoX T .

556


The gain criterion measures the information gained by partitioning the training set by using the test X. In ID3, the ancestor of C4.5, the test selected is the one which maximizes this information gain because one may expect that the remaining subsets in the branches will be the easiest to partition. Note, however, that this is by no means certain because we have looked ahead only one level deep in the tree. The gain criterion has proved only to be a good heuristic. Although the gain criterion performed quite well in practice, the criterion has one serious deficiency, that is, it tends to favor conditions or attributes with many outcomes. Therefore, in C4.5, a somewhat adapted form of the gain criterion is used, called the gain ratio criterion. According to this criterion, the gain attributable to conditions with many outcomes is adjusted by using some kind of normalization. In particular, the split info(X ) measure is defined as: Definition 6: split info of a test X n X jTi j jTi j split infoX ÿ log2 . jT j jT j i1 This indicates the information generated by partitioning T into n subsets. Using this measure, the gain ratio is defined as: Definition 7: gain ratio gain ratioX

gainX . split infoX

This ratio represents how much of the gained information is useful for classification. In the case of very small values of split info(X ) (for trivial splits), the ratio will tend to infinity. Therefore, C4.5 will select the condition which maximizes the gain ratio, subject to the constraint that the information gain must be at least as large as the average information gain over all possible tests. After the tree has been built, pruning strategies are adopted. This means that the decision tree is simplified by discarding one or more subbranches and replacing them with leaves. 4 Feature selection Feature-selection (FS) strategies are often applied to explore the effect of irrelevant attributes on the performance of classifier systems. An FS method ranks all the attributes or conditions (features) in descending order of relevance. This relevance can be measured in several ways, leading to two large subclasses in FS methods: the filter and the wrapper approach. The fundamental difference between these approaches is the evaluation criterion used to select or rank attributes. For wrappers, the selection or ranking results from the estimation of the performance on the associated induction algorithm, whereas the filter approach makes use only of the characteristics of the data itself. Both methods have been compared extensively (Hall, 1999a; 1999b; Koller and Sahami, 1996). In this analysis, the filter approach, more specifically the Relief-F FS method is chosen because it can handle multiple classes of the dependent variable (the nine different choice facets that we are predicting range from two to seven classes) and because it can be easily combined with the C4.5 induction algorithm. FS strategies can be regarded as one way of coping with correlation between attributes. This is relevant because the structure of trees is sensitive to possible multicollinearity, which implies that some variables would be redundant (given the presence of other variables). Redundant variables do not affect the impact of


557

the remaining variables in the tree model, but it would simply be better if they were not used for splitting. Therefore, a good FS method would search for a subset of relevant features that are highly correlated with the class or action variable that the tree-induction algorithm is trying to predict, while mutually having the lowest possible correlations. Relief (Kira and Rendall, 1992), the predecessor of Relief-F, is a distance-based feature-weighting algorithm. It orders attributes according to their importance. To each attribute it assigns the initial value of zero that will be adapted with each run through the instances of the dataset. The features with the highest values are considered to be the most relevant, whereas those with values close to zero or with negative values are judged irrelevant. Thus Relief imposes a ranking on features by assigning each a weight. The weight for a particular feature reflects its relevance in distinguishing the classes. In determining the weights, the concepts of near hit and near miss are central. A near hit of instance i is defined as the instance that is closest to i (based on Euclidean distance) and which is of the same class (concerning the output or action variable), whereas a near miss of instance i is defined as the instance that is closest to i (based on Euclidean distance) and which is of a different class (concerning the output variable). The algorithm attempts to approximate the following difference of probabilities for the weight of a feature X: Wx P(different value of X j nearest instance of different class) ÿ P(different value of X j nearest instance of same class), where P is probability. Thus, Relief works by random sampling an instance and locating its nearest neighbor from the same and opposite class. The nearest neighbor is defined in terms of the Euclidean distance, that is, in an n-dimensional space, the following distance measure: X 1=2 n 2 dx; y xi ÿ yi , i1

where x and y are two n-dimensional vectors. By removal of the context sensitivity provided by the `nearest instance' condition, attributes are treated as mutually independent, and the previous equation becomes: Relief X P(different value of X j different class) ÿ P(different value of X j same class). Relief-F (Kononenko, 1994) is an extension of Relief that can handle multiple classes and noise caused by missing values, outliers, etc. To increase the reliability of Relief's weight estimation, Relief-F finds the k nearest hits and misses for a given instance, where k is a parameter that can be specified by the user. For multiple-class problems, Relief-F searches for nearest misses from each different class (with respect to the given instance) and averages their contribution. The average is weighted by the prior probability of each class. 5 Study design The overall aim of this study is to investigate whether a reduced set of decision rules underlying the Albatross model leads to a significant loss in predictive power. The original model consists of nine choice facets. For each of these choice facets, a set of decision rules was extracted from activity ^ travel diaries. To predict activity ^ travel patterns, these decision trees are executed sequentially in the Albatross system according to some scheduling process model. We will investigate the effect of a reduced set

558


of decision rules for each choice facet. First, we will build decision trees for each of the nine choice facets, by using the C4.5 algorithm (Quinlan, 1993). This approach will be called the full approach. Next, we will first identify the relevant attributes for each of the nine choice facets separately, and then build the C4.5 trees incorporating only a subset of the most relevant features. This approach will be called the FS approach. The results of these two approaches will be compared at three levels of aggregation: the choice-facet level, the activity-pattern level, and the trip-matrix level. In our first analysis (the full approach), the C4.5 trees were induced on the basis of one simple restriction: the final number of cases in a leaf node must meet a particular minimum. For eight out of the nine choice facets, this minimum was set equal to fifteen (except for the very large dataset of the `select' dimension, where this number was set to thirty). In both analyses we used the first 75% of the cases to build and optimize the decision trees for each choice facet, and the remaining subset of 25% was used as a validation set to compute accuracies (percentage of correctly classified instances), etc. These percentages are arbitrary but are common in validation studies (for example, see Wets et al, 2000). In the second analysis (the FS approach), all the irrelevant attributes were first removed from the data by means of Relief-F FS method with the k parameter set to 10. Next, the C4.5 trees were built on the basis of the same restriction and using only the remaining relevant attributes. To determine the selection of features, the following procedure was adopted. Several decision trees were built, each time with one more irrelevant attribute removed, as they appeared lowest in the ranking provided by the FS method. For each of these decision trees, the accuracy was calculated and compared with the accuracy of the full decision tree. The smallest decision tree, which resulted in a maximum decrease of 2% in accuracy on the validation set compared with the decision tree including all features, was chosen as the final model for a single choice facet in the FS approach. This strategy was applied to all nine dimensions of the Albatross model. To use a decision tree for prediction, a rule needs to be specified that assigns a class Yi to each case classified by the tree. Instead of just using the commonly used deterministic assignment rule of C4.5, a probabilistic assignment rule was used because this might result in a better prediction of the aggregate distributions in the activitydiary data. Each rule was assigned a probability distribution that was derived from Distance

Slow Car Car pool Public

Short

Long

Parking

Train service

Bad

Good

Bad

Good

0.90 0.08 0 0.02

0.71 0.29 0 0

0 0.70 0.30 0

0 0.40 0.20 0.40

Figure 2. Example of a decision tree.


559

the frequency distribution over the different alternatives in the training set for each leaf. These corresponding probabilities will be reflected in the predicted activity schedules. For each choice facet, this set of probabilistic rules gives us the decision trees that are used in the analysis. A simplified example of a decision tree and its corresponding probability distributions are presented in figure 2. 6 The data The analyses are based on the activity-diary data used to derive the original Albatross system. The data were collected in February 1997 for a random sample of 1649 respondents in the municipalities of Hendrik ^ Ido ^ Ambacht and Zwijndrecht (South Rotterdam Region) in the Netherlands. The activity diary asked respondents, for each successive activity, to provide information about the nature of the activity, the day, start and end times, the location where the activity took place, the transport mode (chain) and the travel time per mode, if relevant, accompanying individuals (alone, other member of household, other), and whether the activity was planned. Open time intervals were used to report the start and end times of activities. A precoded scheme was used for activity reporting. More details can be found in Arentze and Timmermans (2000). The data were cleaned by using a large set of rules incorporated in a dedicated computer program, called Sylvia (Arentze et al, 1999). These cleaned activity ^ travel diaries were used for the present analysis. In this study we consider the sequential execution of nine different choices and we are looking for the most important attributes concerning each of these nine dependent variables. Table 1 is a summary of the general variables that were used for each choice facet of the model. These include known household and person characteristics that might be relevant for the segmentation of the sample, including socioeconomic variables, such as household type, age group, child index, and socioeconomic class; information about the (normal) activity program on a weekly basis with regard to time engaged in work at the household or person level; and car availability at the household level, indicated by a ratio between the number of cars and the number of adult members, so that, for example, a single-adult household with one car is equivalent to a double-adult household with two cars. Table 1. General characteristics used in the various choice facets of Albatross. Label

Definition

Categories

DAY CSEC CAGE

Day of the week Socioeconomic class of the household Age of the oldest person in the household Household type

1: Monday ± 7: Sunday 1: low ± 4: high 1: 64 years

CCOMP CCHILD GEND NCAR

Presence of children in Gender of the person Ratio between number number of adults HWORK1 Hours of official work per week HWORK Hours of official work per week

the household of cars and of the person of the household

1: single, no work, 2: single, work, 3: double, one work, 4: double, two work, 5: double, no work 1: none, 2: 12 years 1: male, 2: female 1: 38 0; 1: 1 ± 32, 2: 33 ± 38, 3: 39 ± 60, >60

560


Furthermore, each dimension has its own list of more specific variables. We will not describe them here in detail. However, the relevant attributes will be discussed in the `results' section of this paper. 7 Analysis and results Model-performance tests were conducted at three levels: the choice-facet level, the activity-pattern level, and the trip-matrix level. At the choice-facet level, we will discuss the attributes that remained in the final decision-tree model of each of the two approaches. The expected hit ratio was also calculated for each decision tree. At the activity-pattern level, sequence-alignment methods (Joh et al, 2001b; 2001c; 2001d; 2001e) were used to assess the correspondence between the observed and predicted activity sequences. At the trip-matrix level, correlation coefficients were calculated to measure the degree of correspondence between the observed and the predicted OD matrices. 7.1 Choice-facet level

Table 2 provides the results of the analyses conducted to assess model performance at the choice-facet level. The first column of this table presents the nine choice facets of Albatross. The second column lists the number of alternatives (levels of the action variable), and the third column gives the total number of attributes that were considered in building the final decision tree. The fourth column shows the total number of leaves of the decision tree. The fifth column reports the expected number of hits, as defined in equation (1), and in the last column the expected number of hits are compared with the expected number under the null model [equation (2)]. In the present study, the null model assigns a new case to a category of the Y-variable with a probability equal to the number of observed cases in the category divided by the total number of cases in the dataset. From Arentze and Timmermans (2003), the expected number of hits or likelihood is equal to X 2 fik 1X i e , (1) n k fk where e is the probability of correctly predicting the choice for any given case j in the sample space, n is the total number of cases, fk is the number of cases at leaf node k, fik is the number of cases at leaf node k with observed choice i. By comparing e with a null model, a measure of relative performance can be derived. The expected number of hits or the likelihood for this null model can be found from 1 X 2 e0 2 f , n i i where fi is the overall frequency of choice i in the sample. The quotient eratio

e ÿ e0 1 ÿ e0

(2)

then indicates the increase in likelihood as a ratio of the maximum increase that is possible given the null model. Note that this indicator is comparable to the (log-)likelihood ratio which is commonly used as a measure of goodness of fit for


561

Table 2. Model performance: choice facet level. Decision tree Full approach Mode for work Selection With whom Duration Start time Trip chain Mode other Location 1 Location 2

Number of alternatives

Number of attributes

Number of leaves

e

eratio

4 2 3 3 6 4 4 7 6

32 40 39 41 63 53 35 28 28

8 35 72 148 121 8 63 30 47

0.598 0.686 0.499 0.431 0.408 0.802 0.524 0.540 0.372

0.155 0.052 0.223 0.145 0.285 0.576 0.222 0.264 0.214

2 1 4 4 8 10 11 6 8

6 1 51 38 110 13 60 15 14

0.595 0.669 0.467 0.368 0.382 0.811 0.508 0.513 0.312

0.147 0.000 0.173 0.051 0.253 0.596 0.196 0.222 0.141

Feature-selection approach Mode for work 4 Selection 2 With whom 3 Duration 3 Start time 6 Trip chain 4 Mode other 4 Location 1 7 Location 2 6

conventional discrete choice models. This is the measure that is provided in the last column of table 2. The results of the analysis indicate that FS generally generates considerably less complex decision trees than the full approach. One exception is the `trip-chaining' choice facet, which has more leaves in the final tree with FS than in the tree without FS. A logical consequence of this result is that the likelihood ratio of the models with FS is smaller. However, a two-sample t-test comparing the likelihood ratio of the simple models with these of the more complex models yields a t-statistic of 0.5365 with a corresponding P-value of 0.599. Thus the null hypothesis stating that there is no difference between the likelihood ratio of the simpler and the more complex models cannot be rejected at a 5% significance level. This indicates that overall the simpler models do not perform significantly worse than the complex models. Another analysis at the choice-facet level is concerned with comparing the most important attributes for the full and the FS approaches. In table 3 (see over) the four most relevant features for predicting each choice facet are described for each approach. The attributes on which the tree makes its first splits are considered to be more relevant than attributes on which splits are based further down in the partitioning process. For the `mode for work' facet, clearly only transport characteristics are important. For the FS approach only the shortest travel time by bicycle seems to be relevant for the prediction of the transport model for work. This might seem odd at first sight, but one has to bear in mind that the cycling facilities in the Netherlands are very good, and thus if there is the possibility of going to work by bicycle, many people will do so. In the full approach, the variable selection seems to point to the distinction between bicycle as transport mode or car. It should be noted that the dataset is quite skewed in favor of the car as the most frequently used transport mode for work. This possibly explains why variables that are concerned with public transport do not appear among the four most important variables in predicting this choice facet. The one variable that

562


Table 3. Description of the most important attributes for each approach per choice facet. Choice facet

Variable description

Full approach

Mode for work

RCABI: travel-time ratio between car and bicycle PTTMAX: maximum bicycle travel time across activities in the schedule of the partner during the chain of work episodes for which the transport mode is selected (in minutes) NCAR: number of cars per adult TTBIKE: objective travel time by bicycle to the location of chain of work episodes

*

YAVAIL: is selection of activities feasible given the evolving schedule and the minimum duration for the activity type? TMAX4: maximum available time in a certain (the 4th) time interval in the fixed schedule YGROC: is there a grocery activity in the schedule? ATYPE: activity type

*

YAVAIL: is ònly others inside the household' available as a travel-party option given the household composition? ATYPE: activity type YLEISO: is there an out-of-home leisure activity in the schedule YCAR2: availability of car in a certain (the 2nd) time interval in the fixed schedule CCOMP: household type DAY: day of the week

*

YAVAIL3: is the long duration class feasible given the schedule and the minimum duration for that class? AWITH: travel party YAVAIL2: is the average duration class feasible given the schedule and the minimum duration for that class? TLEISO: total time of out-of-home leisure activities in the schedule DAY: day of the week ATYPE: activity type CSEC: socioeconomic class of the household

*

BTWO1: is there a work activity with start time in period 1 of the schedule TMAX2: maximum available time in the 2nd time interval for the fixed schedule TMAX3: maximum available time in the 3rd time interval for the fixed schedule IACT: number of the current activity type of an activity in the schedule TMAX4: maximum available time in the 4th time interval for the fixed schedule

*

*

*

*

*

*

Selection

With whom

Duration

Start time

Feature selection approach

*

* *

* * *

*

*

* * * *

*

*

* * * * *

* *


563

Table 3 (continued). Choice facet

Variable description

Full approach

Feature selection approach

Trip chain

YCANTU: can an activity be conducted between two successive activities in the schedule given the existing space ^ time constraints YCANNA: can an activity be conducted immediately after an activity in the schedule given the existing space ^ time constraints? YCANVO: can an activity be conducted immediately before an activity in the schedule given the existing space ^ time constraints? XNTIME: time available in the schedule after activity X

*

*

*

*

*

*

*

*

AWITH1: travel party of the first activity in the concerned tour RCABI: travel-time ratio between car and bicycle GEND: gender of the person TTBIKE: shortest travel time by bicycle for the concerned tour (in minutes)

*

*

*

*

*

*

*

*

YAVAIL1: is the option `nearest location from home' feasible given the schedule? YAVAIL3: is the option `highest order location within 5 minutes extra travel time' feasible given the schedule? MODE: transport mode YAVAIL4: is the option `highest order location within 10 minutes extra travel time' feasible given the schedule? ATYPE: activity type

*

*

*

*

*

*

YAVAIL2: is the option `highest order location between 6 and 10 minutes extra travel time' feasible given the schedule? MODE: transport mode YAVAIL4: is the option `highest order location between 21 and 30 minutes extra travel time' feasible given the schedule? YAVAIL5: is the option `highest order location within more than 30 minutes extra travel time' feasible given the schedule? NOUT: number of out-of-home activities in the concerned tour

*

*

*

*

Mode for other than work trips

Location 1

Location 2

*

*

*

*

*

*

is needed to make the splits in the FS approach (TTBIKE) does not occur in the list of the three variables needed to build the tree in the full approach; however, this variable is quite highly correlated with one variable of the full approach: r (TTBIKE, RCABI) ÿ0:72. This confirms our initial idea that irrelevant attributes can disturb the tree-induction process. In the case of the `selection' choice facet, which indicates which activity is included in the activity ^ travel schedule, the FS approach uses the unconditional probabilities. The reason for applying a single rule can be that the distribution of the choice variable in this dataset is very skewed: 79% of the entire dataset can be explained by this default rule. Another reason might be that C4.5 splits only on the basis of the modal class and not on the basis of the total frequency distribution over the classes of the response variable. If the latter had been the case, there would be at least a split on the activity

564


type, because the chance of a `yes' response varies highly across the different types of activities. In the full approach, we observe that the likelihood is not much higher, although fifteen variables were used to build the tree. So probably there is information lacking to predict this choice facet accurately. In general, variables which take activity type into consideration (in particular, grocery shopping appeared to have a high impact in the schedule making in the full approach) are important for this choice facet, which seems logical. Moreover, the time component is crucial. Surprisingly, these are not the variables that are highest in the ranking of the FS method. However, we have to bear in mind that the C4.5 algorithm (in the full approach) does not take any correlation into account [for example, r (YAVAIL, ATYPE) 0.22], implying it can select highly correlated attributes to build the tree if they increase the homogeneity of the split. As for the `with whom' facet, the attributes that play a role in the choice of whether the activity is performed alone or with others (a four-level variable), are more general (household) characteristics. It seems reasonable that household composition and activity type play a prominent role in building the trees. The three variables in the FS approach all reappear in the full approach, which needs nineteen features to build the complete model. Again, the variables in the two approaches do not match because of the possible high correlations between the variables in the full approach that do not occur in the FS approach and variables that are part of the FS approach [for example, r (CCOMP, YAVAIL) 0.67]. The next choice facet is `duration'. Strangely enough, the activity type and the travel party are the leading attributes in the FS approach, and time variables also play a role in the full approach. Four attributes are necessary to build the FS model, whereas twenty eight are needed in the full approach and one variable that is important in the FS approach (the socioeconomic status of the household) does not appear among the variables in the full approach. Recall that the C4.5 tree does not account for correlation in the full approach, so highly correlated variables [for example, r (YAVAIL2, YAVAIL3) 0.36] may occur together in the tree, whereas in fact they capture partially the same information. Slightly different results were obtained for `start time'. In both approaches time features were fundamental. The full approach needed twenty-eight variables to build the model, whereas in the FS approach only thirteen were necessary. In addition to the differences shown in the table, two relevant features in the FS approach did not come up in the full approach: the total time of work 1 including travel and the total time of work 1 and work 2, where work 1 stands for the primary work or school activity and work 2 denotes a voluntary work activity. With regard to the four most important variables, the full approach and the FS approach largely coincide. The variables included in these trees can be regarded as being robust for the prediction of the start-time dimension. The next choice facet is `trip chain'. Variables indicating whether there was enough time to include the activity in the corresponding place in the schedule are important in building the tree in both approaches. These attributes appear robust for predicting the trip-chain dimension. Only two additional variables in each approach were needed to build the full models. For the FS approach, these features described the number of mandatory out-of-home activities other than the work activity and whether there is a travel party available in the schedule before activity X, whereas in the full approach the two extra variables denoted whether the first activity is a grocery activity and whether there is a bring or get activity at all in the total schedule. Features measuring the travel times by bicycle and by car, as well as the travel party and the gender were used to predict the `mode for other than work trips'. With regard to the transport modes chosen, we come to the same conclusions as for the `mode for work' choice facet. The day of the week, the socioeconomic status of


565

the household, and the total time of work 1 and work 2 in the schedule appear to be valuable features that are not incorporated in the full model induction tree. These variables are rather highly correlated with variables in the full approach [for example, r (CSEC, NCAR) ÿ0:51]. Almost the same four crucial variables occurred in both decision trees to predict `location 1', and most of them were related to the feasibility of the location-selection heuristic given the schedule. Apart from these time-related variables, the activity type and the transport mode were also prominent variables. All variables that appeared to be important in the FS approach were also found in the full approach, together with four other features. These variables are reasonably robust in predicting the location. Finally, for the `location 2' facet, time-related variables also remained the most important ones. Six variables were necessary to model the FS approach, whereas fifteen (among which were the previous six) were needed in the full approach. In summary, at the choice-facet level (that is, looking at each dimension separately) the two sets of decision trees do not differ dramatically in their predictive performance, because the likelihood ratios of the simple and the complex models are not significantly different. The variables selected as being most important for some choice facets also do not differ that much. However, a difference can be discerned in some trees, indicating high correlations between variables. 7.2 Activity-pattern level

The performance of the two model approaches at the activity-pattern level was assessed by comparing observed and predicted sequences of activities. Several sequence-alignment methods (SAM) were used to measure the goodness of fit. These methods measure the dissimilarity of the two sequences in terms of the effort required to make the two sequences identical by using insertion, deletion, and substitution operators. Insertion and deletion operations incur the same cost of one unit, whereas substitution of an element requires twice that cost. The lower the SAM measure, the more similar the sequences are. The mean length of the observed activity sequences is 5.16 with a standard deviation of 2.807. For the full approach the predicted mean length of the sequences is 5.286 with a standard deviation of 2.953, whereas for the FS approach the predicted mean length is 4.946 with a standard deviation of 3.041. We observe that in general the full approach predicts activity sequences that are somewhat too long, whereas those of the FS approach are a little bit too short. In order to measure the dissimilarity between the observed and the predicted schedules, the full approach will require more deletion, whereas the FS approach will demand more insertion. As both have the same cost, this normally would result in not too much difference between the SAM measures, unless the predicted activities deviate heavily from the observed ones. The distances between the observed and predicted schedules for several SAM measures are provided in table 4 (see over). This table shows that the distance between the observed and the predicted schedules is in general a little bit better for the FS approach, regardless of the specific SAM measure. The first set of four measures indicates the unidimensional SAM for the activity-pattern attributes separately, the UDSAM indicates a weighted sum of attribute SAM values, whereby activity type was given a weight of two units and the other attributes a weight of one unit, and the MDSAM indicates the multidimensional SAM using the same weights. The MDSAM (Joh et al, 2001b; 2001c) differs from the UDSAM in that it takes possible correlations between choice facets into account by allowing the alignment procedure to implement joint operations. The results indicate that, although the predicted sequences are a little shorter than the observed ones in the FS approach, they deviate less from the observed activity sequences than the predicted sequences in the full approach.

566


Table 4. Model performance: activity-pattern level. Measure a

SAM (activity type) SAM (with) SAM (location) SAM (mode) UDSAM MDSAM

Mean distance full approach

feature-selection approach

2.929 3.205 3.188 4.706 16.957 8.558

2.862 3.112 3.034 4.559 16.430 8.257

a SAMÐunidimensional

sequence-alignment measure; UDSAMÐweighted sum of attribute SAM values; MDSAMÐmultidimensional SAM. 7.3 Trip-matrix level

At the trip-matrix level, the observed and predicted OD matrices were compared. An OD matrix contains the frequency of trips for each combination of origins (rows) and destinations (columns). Correlations were calculated between observed and predicted matrix entries in general and for trip matrices that are disaggregated each time in a different way based on some selected trip facets. The facets considered include transport mode, day of the week, and activity (purpose). In all cases, the cells of the OD matrices are rearranged into a single vector across categories to calculate the correlation coefficient. The results are listed in table 5. Table 5 indicates that all correlation coefficients are similar. In general, as well as in the case of transport mode, the correlation coefficient is slightly lower for the FS approach, although the difference does not exceed the 1% level. In the case of OD matrices with a difference made by day and by primary activity, the FS approach even performs a little bit better than the full approach. Table 5. Model performance: trip-matrix level. Matrix

None Mode Day Primary activity

r (observed, predicted) full approach

feature-selection approach

0.962 0.885 0.959 0.899

0.961 0.876 0.960 0.903

8 Conclusions and discussion Recently, computational process models of activity ^ travel behavior have been suggested in the literature. These models rely on a set of decision rules that are derived from activity ^ travel-diary data. The model is not developed using principles of full and perfect information and utility-maximizing behavior, but rather tries to find a set of conditions under which individuals with particular sets of characteristics exhibit similar activity ^ travel choice behavior. Decision-tree-induction algorithms are used to create classes that are homogeneous according to some information measure. Unfortunately, the results of the tree-induction algorithms can be heavily influenced by the inclusion of irrelevant attributes. On the one hand, this may lead to overfitting, but on the other hand, it is not evident whether the inclusion of irrelevant attributes would lead to a substantial loss of accuracy and/or predictive performance. The aim of the study reported in this paper therefore was to explore further this issue in the context of the Albatross model system, currently the most comprehensive


567

operational, computational, rule-based process model of travel demand. The results of the analyses conducted at the three different levels of performance, indicate that the simpler models do not necessarily perform worse. In fact, more or less the same results were obtained at the activity-pattern level and at the trip-matrix level. At the choice-facet level, one can observe that a strong reduction in the size of the trees as well as in the number of predictors is possible without adversely affecting predictive performance too much. Thus, at least in this study, there is no evidence of substantial loss in predictive power in the sequential use of decision trees to predict activity ^ travel patterns. The results indicate that using feature selection in a step prior to tree induction can improve the performance of the resulting model. It should be noted, however, that predictive performance and simplicity are not the only criteria. The most important criterion is that the model needs to be responsive to policy-sensitive attributes and for that reason policy-sensitive attributes (such as service level of the transport system) should have a high priority in the selection of attributes if the model is to be used for predicting the impact of policies. The feature-selection method allows one to identify and then eliminate correlated factors that prevent the selection of the attributes of interest during the construction of the tree, so that the resulting model will be more robust to policy measures. Similarly, the results of a trimmed decision tree should be assessed in terms of behavioral mechanisms. On the one hand, if one has strong theoretical reasons for including particular conditions, they should be kept in the decision tree. On the other hand, feature selection also allows one to test assumptions of homogeneity in underlying behavioral mechanisms. References Arentze T A, Timmermans H J P, 2000 Albatross: A Learning-based Transportation Oriented Simulation System EIRASS, Eindhoven University of Technology, Eindhoven Arentze T A, Timmermans H J P, 2002 Albatross 2.0 EIRASS, Eindhoven University of Technology, Eindhoven Arentze T A, Timmermans H J P, 2003, ``Measuring the goodness-of-fit of decision-tree models of discrete and continuous activity ^ travel choice: methods and empirical illustration'' Journal of Geographical Systems 4 1 ^ 22 Arentze T A, Timmermans H J P, 2004, `À learning-based transportation oriented simulation system'' Transportation Research B 38 613 ^ 633 Arentze T A, Hofman F, Kalfs N, Timmermans H J P, 1999, ``(Sylvia) system for logical verification and inference of activity diaries'' Transportation Research Record number 1660, 156 ^ 163 Arentze T, Hofman F, van Mourik H, Timmermans H,Wets G, 2000, `Ùsing decision tree induction systems for modeling space ^ time behavior'' Geographical Analysis 32 330 ^ 350 Axhausen K, Ga«rling T, 1992,`Àctivity-based approaches to travel analysis: conceptual frameworks, models and research problems'' Transport Reviews 12 324 ^ 331 Bhat C R, 1996, `À generalized multiple durations proportional hazard model with an application to activity behavior during the work-to-home commute'' Transportation Research 30B 465 ^ 480 Bhat C R, 1999, `À comprehensive and operational analysis framework for generating the daily activity travel profiles of workers'', in Electronic Proceedings of the 78th Annual Meeting of the Transportation Research Board, Washington, DC Transportation Research Board, Washington, DC Bhat C R, Koppelman F, 1999, `À retrospective and prospective survey of time use research'' Transportation 26 119 ^ 129 Borgers A W J, Timmermans H J P, van der Waerden P J H J, 2002, ``Patricia: predicting activitytravel interdependencies using a suite of choice-based, inter-linked analyses'' Transportation Research Record number 1807, 145 ^ 153 Bowman J L, 1998 The Day Activity Schedule Approach to Travel Demand Analysis PhD dissertation, Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA

568


Ga«rling T, Bra«nna«s K, Garvill J, Golledge R G, Gopa S, Holm E, Lindberg E, 1989, ``Household activity scheduling'', in Transport Policy Management and Technology Towards 2001: Selected Proceedings of the Fifth World Conference on Transport Research,Volume 4 (Western Periodicals, Ventura, CA) pp 235 ^ 248 Ga«rling T, Laitila T, Westin K, 1998, ``Theoretical foundations of travel choice modeling: introduction'', in Theoretical Foundations of Travel Choice Modeling Eds. T Ga«rling, T Laitila, K Westin (Elsevier, Oxford) pp 1 ^ 32 Gliebe J P, Koppelman F S, 2002, `À model of joint activity participation between household members'' Transportation 25 49 ^ 74 Hall M A, 1999a Correlation-based Feature Selection for Machine Learning PhD dissertation, Department of Computer Science, University of Waikato, Hamilton, New Zealand Hall M A, 1999b, ``Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper'', in Proceedings of the Florida Artificial Intelligence Symposium (FLAIRS) (AAAI Press, Orlando, FL) pp 235 ^ 239 Joh C-H, Arentze T A, Timmermans H J P, 2001a, ``Towards a theory and model of activity-travel rescheduling behavior'', in Proceedings of the 9th World Conference on Transportation Research Seoul Joh C-H, Arentze T A, Hofman F, Timmermans H J P, 2001b, `Àctivity pattern similarity: a multidimensional sequence alignment method'' Transportation Research B 36 385 ^ 403 Joh C-H, Arentze T A, Timmermans H J P, 2001c, ``Multidimensional sequence alignment methods for activity-travel pattern analysis: a comparison of dynamic programming and genetic algorithms'' Geographical Analysis 33 247 ^ 270 Joh C-H, Arentze T A, Timmermans H J P, 2001d,`À position-sensitive sequence-alignment method illustrated for space ^ time activity-diary data'' Environment and Planning A 33 313 ^ 338 Joh C-H, Arentze T A, Timmermans H J P, 2001e, ``Pattern recognition in complex activity-travel patterns: a comparison of Euclidean distance, signal processing theoretical and multidimensional sequence alignment methods'' Transportation Research Record number 1752 16 ^ 22 Kira K, Rendall L A, 1992, `À practical approach to feature selection'', in Proceedings of the Ninth International Conference on Machine Learning Aberdeen, Scotland, UK (Morgan Kaufmann, San Mateo, CA) pp 249 ^ 256 Kitamura R, Fujii S, 1998, ``Two computational process models of activity-travel choice'', in Theoretical Foundations of Travel Choice Modeling Eds T Ga«rling, T Laitila, K Westin (Elsevier, Oxford) pp 251 ^ 279 Koller D, Sahami M, 1996, ``Toward optimal feature selection'', in Proceedings of the 13th International Conference on Machine Learning, Bari, Italy Ed. L Saitta (Morgan Kaufmann, San Francisco, CA) pp 284 ^ 292 Kononenko I, 1994, `Èstimating attributes: analysis and extensions of relief'', in Proceedings of the 7th European Conference on Machine Learning, Catania, Italy (Springer, Berlin) pp 171 ^ 182 Moons E, Wets G, Aerts M, Vanhoof K, Arentze T A, Timmermans H J P, 2001, ``The impact of irrelevant attributes on the performance of classifier systems in generating activity schedules'', in Electronic Proceedings of the 81st Annual Meeting of the Transportation Research Board Transportation Research Board, Washington, DC Pendyala R M, Kitamura R, Reddy D V G P, 1995, `À rule-based activity-travel scheduling algorithm integrating neural networks of behavioral adaptation'', paper presented at the EIRASS Conference on Activity-Based Approaches, Eindhoven; copy available from Professor Pendyala, Department of Civil and Environmental Engineering, University of South Florida, Tampa, FL Quinlan J R, 1993 C4.5 Programs for Machine Learning (Morgan Kaufmann, San Mateo, CA) Timmermans H J P, Arentze T A, Joh C-H, 2002,`Ànalyzing space ^ time behavior: new approaches to old problems'' Progress in Human Geography 26 175 ^ 190 Wets G, Vanhoof K, Arentze T A, Timmermans H J P, 2000, `Ìdentifying decision structures underlying activity patterns: an exploration of data mining algorithms'' Transportation Research Record number 1718, 1 ^ 9 Zhang J, Timmermans H J P, Borgers A W J, 2002, `À utility-maximizing model of time use incorporating group decisions mechanisms'', in Electronic Proceedings of the 82nd Annual Meeting of the Transportation Research Board, Washington, DC Transportation Research Board, Washington, DC

ß 2005 a Pion publication printed in Great Britain

The impact of simplification in a sequential rule ... - Semantic Scholar

The impact of simplification in a sequential rule ... - Semantic Scholar

Suggest Documents

The impact of simplification in a sequential rule

The Impact of Sequential Data on Consumer ... - Semantic Scholar

Image-Driven Simplification - Semantic Scholar

The Welfare Effects Of Tax Simplification: A ... - Semantic Scholar

A Simplification of the Completeness Proofs for ... - Semantic Scholar

SEQUENTIAL TREATMENT OF A FEEDING ... - Semantic Scholar

Evaluation of Memoryless Simplification - Semantic Scholar

Simplification of Large, Closed Triangulated ... - Semantic Scholar

Simplification of Tetrahedral Meshes - Semantic Scholar

Topology Simplification of Symmetric, Second ... - Semantic Scholar

Sequential Conceptual Simplification of the Effective Rainfall ... - iEMSs

Simplification of Intermediate Results during ... - Semantic Scholar

Treatment simplification in HIV-infected adults as a ... - Semantic Scholar

the role of perceptual expertise in infants' sequential rule learning

Program Simplification as a Means of ... - Semantic Scholar

A comparison of mesh simplification algorithms - Semantic Scholar

Sequential Integration of Object Locations in a ... - Semantic Scholar

Program Simplification as a Means of ... - Semantic Scholar

Animal Rule - Semantic Scholar

A Sequential Parametric Convex Approximation ... - Semantic Scholar

A Comparison Introduction Sequential Elimination ... - Semantic Scholar

A sequential uncertainty domain inverse ... - Semantic Scholar

Assessing the Impact of Canopy Structure Simplification in ... - MDPI

One-pill once-a-day HAART: a simplification ... - Semantic Scholar