Calibrating Unsupervised Machine Learning Algorithms for the Prediction of Activity-Travel Patterns Davy Janssens Transportation Research Institute Faculty of Applied Economic Sciences Hasselt University Email:
[email protected]
Talk GRT, FUNDP 18/12/2006
Why transportation modelling? • Sector has several negative effects: – Social, economic, environmental
• Transport planning policies aim to reduce these effects by means of TDM – Alter behaviour without necessarily embarking on largescale infrastructure expansion – Examples: Spreading peak-period traveling, congestion charging, etc.
• Transport modelling: help to analyze and understand the impact of policy decisions before they are implemented (prediction)
Travel Demand Nature
• Increasing Modelling Complexity – Trip-Based à Tour-Based à Activity-Based
Travel Demand Nature: Activity-based approach –Travel demand is derived from the activities that individuals need/wish to perform –Household and other social structures influence travel and activity behaviour –Spatial, temporal, transportation and interpersonal interdependencies constrain activity/travel behaviour
àAim to predict which activities are conducted where, when, for how long, with whom, the transport mode involved and ideally also the implied route decision
Extracting Knowledge (Improve AB-models): Machine Learning Machine Learning
Task
Reinforcement Learning
Technique
Example
Regression
Predicting (often) continuous variable
Linear regression, NN, SVM
Predicting Sales Amount
Classification
Predicting (categoric) dependent variable
DT (CHAID, C4.5), NN, Rule Induction
Predicting Transport Mode Choice
Association Analysis
Identify relationships between items
Association Rules
Identify frequently bought products
Dependence Analysis
Identify dependencies between items/variables
Bayesian Networks, Graphical Methods
Identify dependencies between demographic var.
Sequence Analysis
Identify relationships between items over time
Sequential AR, Markov Chains
Identify time sequence of purchase
Clustering
Identify homogeneous subpopulations
K-means Clustering
Market segmentation
Multiple
Learning through experience/ interaction
Q-learning, Temporal Difference Method
Grid World, Elevator Dispatching
Supervised (predictive) ML
Unsupervised (descriptive) ML
Important Characteristic
Contributions of my work
The starting framework: Albatross
Albatross: The scheduling model (Arentze and Timmermans, 2000) Aim: Determine the schedule (=agenda) of activity-travel behaviour Components: 1. a model of the sequential decision making process 2. models to compute dynamic constraints on choice options 3.
] ]
a set of decision trees representing choice behavior of individuals related to each step in the process model
a-priori defined
derived from observed choice behavior
Skeleton refers to the fixed and given part of the schedule Flexible activities: optional activities added on the skeleton
Albatross: (Arentze and Timmermans, 2000)
Each oval represents a DT
Part 1: Classification based on Association Rules in Albatross
Introduction: Classification and Association rule mining: the difference • Association rule mining – Finding frequent patterns among sets of items in transaction databases – Frequent pattern: determined by means of minimum support and minimum confidence criteria – Most popular: Apriori • Classification rule mining – Organize and categorize (unseen) data in distinct classes – Prediction (Decision trees, Classification rules, Neural Networks) – Most popular: C4.5 (Quinlan, 1993)
Classification and Association rule mining: how can they be integrated ? • Why – Association rules search globally ž full set of rules ž all potential important associations to make predictions are incorporated – Results of previous research efforts to integrate them are encouraging • How – Focusing on a special subset of association rules, i.e. whose right-hand-side are restricted to the classification class attribute. Referred to as CARs: Class Association Rules
Application 1: CBA; originally from Liu et al. 1998; tested in Albatross in Janssens et al., 2005d) • CBA consists of 2 parts: – A rule generator (CBA-RG), based on Apriori for finding association rules. This part of the algorithm generates all CARs
– A classifier builder (CBA-CB) based on the generated CARs
An example
CBA-RG •Generate all frequent itemsets: 11158! •Generate and prune CARs: 34 (using default minimum support and confidence values in original CBA program)
An example
CBA-CB •Sort the generated rules (confidence) •Insert in classifier and delete cases covered by R Rule 1: Var1 = 1 à class = 1 (100% conf) àAssign
default class: 0
àCompute
total errors:4
An example CBA-CB •Insert in classifier and delete cases covered by R Rule 1: Var1 = 1 à class = 1 (100% conf) (0, 4 errors) Rule 10: Var5=2 à class = 1 (100 % conf) (0, 3 errors) Rule 27: Var 6=2 ^ Var 3=2 à class= 1 (100% conf) à no case correctly classified è not added to the classifier Rule 16: Var4 =1 ^ Var1=2à class=0 (100% conf) Default: 1, errors: 2 Etc.
An example: CBA-CB • Errors will decline, discard those rules that do not improve accuracy of classifier • Stop when no cases are left or when all rules are used
Final Classifier: - Error rate on training set: 5% - Error rate by 10-fold cross validation: 40% - Number of rules on training dataset: 4 (Rule 1; Rule 10; Rule 16; Rule 35; Default Class=1)
- Average number of rules by 10-fold cross validation: 3.7
• Results in Albatross follow…
Develop an adapted CBA algorithm; see Janssens et al. 2005b, 2005d and Lan et al. 2006) • Advance state-of-the-art: use other sorting criteria than in CBA and build other classifiers • CBA-1: Use Intensity of Implication (Application 2):
• CBA-2: Use dilated Chi-Square (Application 3):
Example • Approach was tested on an example. Outcome:
Original CBA
Adapted CBA
- Error rate on training set: 5%
- Error rate on training set: 10%
- Error rate by 10-fold crossvalidation: 40%
- Error rate by 10-fold crossvalidation: 20%
- Number of rules on training dataset: 4
- Number of rules on training dataset: 2
- Average number of rules by 10-fold cross validation: 3.7
- Average number of rules by 10-fold cross validation: 1.8
àThis example already shows that the sorting of the rule may indeed have an important impact on the accuracy à Smaller classifier, that performs better when doing crossvalidation à Less overfitting? à Better for generalisation??
Results CBA, CBA-1, CBA-2 (Choice Facet Level) Dataset
Adapted CBA-1
Adapted CBA-2
Original CBA
CHAID
Train (%)
Test (%)
Num. Of rules
Train (%)
Test (%)
Num. Of rules
Train (%)
Test (%)
Num. Of rules
Duration
40.7
40.9
17
43.0
41.2
8
44.7
39.2
147
41.3
38.8
Location1
64.5
68.1
25
59.6
60.8
3
66.3
62.7
234
57.5
58.9
Location2
26.8
26.3
1
48.3
38.0
55
52.6
41.1
136
35.4
32.6
Mode Work
74.7
76.8
38
75.6
74.4
12
83.5
73.7
172
64.8
66.7
Mode other
54.9
54.8
5
68.0
60.5
259
66.5
60.9
245
52.8
49.5
Selection
79.1
79.2
1
57.5
57.4
1
79.6
78.7
594
72.4
71.6
Start Time
33.3
33.0
69
37.6
36.2
102
34.5
33.7
120
39.8
35.4
Trip Chain
82.7
82.0
21
84.2
83.4
3
83.9
80.4
65
83.3
80.9
With whom
54.7
48.1
24
55.9
51.1
51
61.1
56.2
222
50.9
48.4
Average Accuracy
56.8
56.6
/
58.9
55.9
/
63.6
58.5
/
55.4
53.6
Average Number of rules
/
/
22.3
/
/
54.9
/
/
215
Train (%)
Test (%)
Part 2: Bayesian Networks
Application 1: Bayesian Networks (BN) (see also Janssens et al. 2005a) • CARs: measure co-occurence relationships between variables • BN: not only capable of measuring but also used for modelling and reasoning. Characteristics: – Able to capture (complex) relationships between variables – Able to be learned from data – Visualize interdependencies between variables – Prior and posterior probability distributions per variable – Well suited to conduct what-if scenarios and sensitivity analysis – White box
BN Example Example of pruning a network
è
Some empirical findings (conclusions)
• Conclusions: – Better predictions à Reason: Unlike decision trees (CHAID), variables are selected simultaneously, no hierarchy of importance of the selected variables – Selection of the variables +/- the same in both approaches (à difference in performance more due to different nature than to additional insights) – Much larger number decision rules in Albatross compared with CHAID, however performance is also OK on the test data – Interpretation is an issue, BN link several variables in sometimes complex direct and indirect ways.
Application 2: BNT (see also Janssens et al, 2005c) • Advance state-of-the-art: BNT (BN augmented Tree) • Idea: – Integrate BN with Decision Trees – Use information that is stored in BN for building DT • Recursive algorithm has been developped which uses mutual information criterion that can be derived from a BN to build a decision tree
Results BN, BNT (Choice Facet Level) Dataset
BN
BNT
Train (%)
Test (%)
Train (%)
Test (%)
Duration
40.9
40.5
41.0
40.2
Location1
69.6
67.9
69.4
68.5
Location2
47.3
42.0
47.3
41.9
Mode Work
76.9
77.9
77.0
78.3
Mode other
58.3
52.1
58.3
52.1
Selection
79.1
79.2
79.1
79.2
Start Time
47.7
38.0
42.3
39.3
Trip Chain
83.1
82.3
83.1
82.5
With whom
57.7
53.4
57.7
53.5
Average
62.3
59.3
61.7
59.5
Results BN, BNT (Complexity) Individual rule complexity
Model complexity
BN (number of variables)
BNT (“depth” of DT)
BN (total number of rules)
BNT (number of leaves/ number of nodes)
Duration
4
2
84
6/4
Location1
9
7
5760
253/215
Location2
10
6
9216
131/124
Mode work
8
5
36864
187/64
Mode other
5
4
432
108/45
Start time
11
5
983040
210/58
Trip chain
11
6
124416
384/175
With whom
4
3
280
70/16
Part 3: Development of a Markov Chain simulation framework; including a segmentation scheme
• In our approach: – Activity-Travel combinations that sequentially occur together (e.g. Have breakfast – Car – Work) – Almost completely data-driven (as opposed to domain knowledge) – Use Markov Chains
• Research Motivation: – Most previous research does not explicitly capture sequential information – Comparison with activity-scheduling models is lacking
Contributions (see also Janssens et al. 2005a, 2005e, 2007) Transition Matrices
Contributions: 1) Modification of computation of transition probabilities • 2 alternative approaches were implemented • Proven that type of calculation matters • More efficient calculation
2) Segmentation procedure • Time information (bifurcation points) (relaxation of stationarity assumption) • Socio-demographic information (modified decision tree: use transition matrices for splitting the tree instead of class attributes)
…
3) Simulation procedure (Recursive application of information stored in transition matrices)
Simulation Model: Results
Observed tours
2.435
Order of transition matrix
l =1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 l = 10
Without segmentation
With segmentation
Predicted tours (Approach 1)
Predicted tours (Approach 2)
Predicted tours (Approach 1)
Predicted tours (Approach 2)
1.321
1.621
2.215
2.232
1.745
1.954
2.314
2.319
2.148
2.128
2.112
2.141
2.312
2.316
1.543
1.510
2.414
2.424
1.541
1.432
2.721
2.621
/
/
2.732
2.730
/
/
2.012
2.003
/
/
1.521
1.621
/
/
0.927
1.222
/
/
Part 4: Reinforcement Learning
Allocating Time and Location Information: Reinforcement Learning • Sequences of activities and travel were generated by iterative application of low- and high-order transition probability matrices • However: Time and Location information is still missing in the generated patterns • Solution: Interaction with environment: Reinforcement Learning technique (Q-learning) – Decision (Action) is based on: • • • • •
Current activity Start time of current activity Duration of current activity Duration of transfer between two locations Sequence of activities and transport modes
Reinforcement Learning: The Decision Agent • Monkey wants to learn how to get to the banana without getting burned • Define a set of states: positions in the grid world • Define a set of actions: left, right, up, down • Learning by doing – – – – –
Observe state Select action Execute action Receive reward Memorise state-action pair and its utility – Repeat
à Learn an optimal sequence of actions
Reinforcement Learning • Assign rewards to each state – Time reward table: Based on observed frequency information – Location reward: Based on travel time information between locations: simultaneously maximise both time and location utility!
Reinforcement Learning • Assign rewards to each state – Time reward table: Based on observed frequency information – Location reward: Based on travel time information between locations: simultaneously maximise both time and location utility!
• Learn by Moving Around
Reinforcement Learning • Wherever the monkey now is, it knows how to get safely to the banana
• Same as in our application: –Given the Activity-travel sequence: determine optimal time and location information Home
- Car - Work
(0AM-6AM)
A
- Car - Shop
(6.15AM-12AM)
B
(12.20PM-6PM)
C
- Walk -Leisure (6.20 PM-12PM)
C
Part 5: Final comparison of approaches that have been developed
Final Comparisons: Full simulation model vs Activity-scheduling • SAM distance measures (full patterns) SAM distance measure
Albatross
Simulation model
CBA
BNT
C4.5
SAM activity-type
2.710
2.712
2.719
3.113
SAM location
3.111
3.101
3.109
5.135
SAM mode
4.515
4.419
4.439
4.975
UDSAM
16.319
16.313
16.328
17.551
• SAM distance measures (fixed elements removed) SAM distance measure
Albatross
Simulation model
CBA
BNT
C4.5
SAM activity-type
2.513
2.515
2.519
2.719
SAM location
2.682
2.685
2.690
4.832
SAM mode
2.625
2.622
2.629
2.929
UDSAM
11.356
11.352
11.362
11.921
Final Comparisons (Activity-scheduling vs Simulation) • Correlation coefficients between OD-matrices SAM distance measure Correlation coefficient
Albatross
Simulation model
CBA
BNT
C4.5
0.945
0.942
0.940
0.879
• SAM was somewhat worse for simulation model but minor differences à Time allocation and sequencing (activity-travel) performs fairly well • Larger differences for SAM location and OD-matrices -Need to be complemented in future research with additional information of facilities at location (e.g. floor space)
-Better specification of time and location relationship
References •
• • • • • • • •
Janssens, D., Wets, G., Brijs, T., Vanhoof, K., Timmermans, H.J.P. and Arentze, T.A. (2004) Improving the performance of a multi-agent rule-based model for activity pattern decisions using Bayesian networks. Journal of the Transportation Research Board, 1894, 75-83 Janssens, D., Wets, G., Brijs, T. and Vanhoof, K. (2005a) The development of an adapted Markov chain modelling heuristic and simulation framework in the context of transportation research. Expert systems with applications, 28, 105-117, Janssens, D., Wets, G., Brijs, T. and Vanhoof, K. (2005b) Adapting the CBA Algorithm by means of intensity of implication. Information Sciences, 173(4), 305318 Janssens, D., Wets, G., Brijs, T., Vanhoof, K., Arentze, T.A. and Timmermans, H.J.P. (2005c) Integrating Bayesian networks and decision trees in a sequential rule-based transportation model. Forthcoming in European Journal of Operational Research Lan, Y., Janssens, D., Chen, G. and Wets, G. (2006) Improving associative classification by incorporating novel interestingness measures. Forthcoming in Expert systems with applications. Janssens, D., Wets, G., Brijs, T. and Vanhoof, K. (2005d) Using an adapted classification based on associations algorithm in an activity-based transportation system. In Intelligent Data Mining: Techniques and Applications, 275-292. Janssens, D., Wets, G., Brijs, T. and Vanhoof, K. (2005e) Simulating daily activity patterns throught the identification of sequential dependencies. In Timmermans, H.J.P. (Eds.): Progress in Activity-based analysis, Elsevier, Amsterdam, 67-90. Arentze, T.A. and Timmermans, H.J.P. (2000) Albatross: A Learning-Based Transportation Oriented Simulation System. European Institute of Retailing and Services Studies, Eindhoven. Liu, B., Hsu, W. and Ma, Y. (1998) Integrating classification and association rule mining. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), New York, USA, 80-86.