Calibrating Unsupervised Machine Learning Algorithms for the

Calibrating Unsupervised Machine Learning Algorithms for the Prediction of Activity-Travel Patterns Davy Janssens Transportation Research Institute Faculty of Applied Economic Sciences Hasselt University Email: [email protected]

Talk GRT, FUNDP 18/12/2006

Why transportation modelling? • Sector has several negative effects: – Social, economic, environmental

• Transport planning policies aim to reduce these effects by means of TDM – Alter behaviour without necessarily embarking on largescale infrastructure expansion – Examples: Spreading peak-period traveling, congestion charging, etc.

• Transport modelling: help to analyze and understand the impact of policy decisions before they are implemented (prediction)

Travel Demand Nature

• Increasing Modelling Complexity – Trip-Based à Tour-Based à Activity-Based

Travel Demand Nature: Activity-based approach –Travel demand is derived from the activities that individuals need/wish to perform –Household and other social structures influence travel and activity behaviour –Spatial, temporal, transportation and interpersonal interdependencies constrain activity/travel behaviour

àAim to predict which activities are conducted where, when, for how long, with whom, the transport mode involved and ideally also the implied route decision

Extracting Knowledge (Improve AB-models): Machine Learning Machine Learning

Task

Reinforcement Learning

Technique

Example

Regression

Predicting (often) continuous variable

Linear regression, NN, SVM

Predicting Sales Amount

Classification

Predicting (categoric) dependent variable

DT (CHAID, C4.5), NN, Rule Induction

Predicting Transport Mode Choice

Association Analysis

Identify relationships between items

Association Rules

Identify frequently bought products

Dependence Analysis

Identify dependencies between items/variables

Bayesian Networks, Graphical Methods

Identify dependencies between demographic var.

Sequence Analysis

Identify relationships between items over time

Sequential AR, Markov Chains

Identify time sequence of purchase

Clustering

Identify homogeneous subpopulations

K-means Clustering

Market segmentation

Multiple

Learning through experience/ interaction

Q-learning, Temporal Difference Method

Grid World, Elevator Dispatching

Supervised (predictive) ML

Unsupervised (descriptive) ML

Important Characteristic

Contributions of my work

The starting framework: Albatross

Albatross: The scheduling model (Arentze and Timmermans, 2000) Aim: Determine the schedule (=agenda) of activity-travel behaviour Components: 1. a model of the sequential decision making process 2. models to compute dynamic constraints on choice options 3.

] ]

a set of decision trees representing choice behavior of individuals related to each step in the process model

a-priori defined

derived from observed choice behavior

Skeleton refers to the fixed and given part of the schedule Flexible activities: optional activities added on the skeleton

Albatross: (Arentze and Timmermans, 2000)

Each oval represents a DT

Part 1: Classification based on Association Rules in Albatross

Introduction: Classification and Association rule mining: the difference • Association rule mining – Finding frequent patterns among sets of items in transaction databases – Frequent pattern: determined by means of minimum support and minimum confidence criteria – Most popular: Apriori • Classification rule mining – Organize and categorize (unseen) data in distinct classes – Prediction (Decision trees, Classification rules, Neural Networks) – Most popular: C4.5 (Quinlan, 1993)

Classification and Association rule mining: how can they be integrated ? • Why – Association rules search globally ž full set of rules ž all potential important associations to make predictions are incorporated – Results of previous research efforts to integrate them are encouraging • How – Focusing on a special subset of association rules, i.e. whose right-hand-side are restricted to the classification class attribute. Referred to as CARs: Class Association Rules

Application 1: CBA; originally from Liu et al. 1998; tested in Albatross in Janssens et al., 2005d) • CBA consists of 2 parts: – A rule generator (CBA-RG), based on Apriori for finding association rules. This part of the algorithm generates all CARs

– A classifier builder (CBA-CB) based on the generated CARs

An example

CBA-RG •Generate all frequent itemsets: 11158! •Generate and prune CARs: 34 (using default minimum support and confidence values in original CBA program)

An example

CBA-CB •Sort the generated rules (confidence) •Insert in classifier and delete cases covered by R Rule 1: Var1 = 1 à class = 1 (100% conf) àAssign

default class: 0

àCompute

total errors:4

An example CBA-CB •Insert in classifier and delete cases covered by R Rule 1: Var1 = 1 à class = 1 (100% conf) (0, 4 errors) Rule 10: Var5=2 à class = 1 (100 % conf) (0, 3 errors) Rule 27: Var 6=2 ^ Var 3=2 à class= 1 (100% conf) à no case correctly classified è not added to the classifier Rule 16: Var4 =1 ^ Var1=2à class=0 (100% conf) Default: 1, errors: 2 Etc.

An example: CBA-CB • Errors will decline, discard those rules that do not improve accuracy of classifier • Stop when no cases are left or when all rules are used

Final Classifier: - Error rate on training set: 5% - Error rate by 10-fold cross validation: 40% - Number of rules on training dataset: 4 (Rule 1; Rule 10; Rule 16; Rule 35; Default Class=1)

- Average number of rules by 10-fold cross validation: 3.7

• Results in Albatross follow…

Develop an adapted CBA algorithm; see Janssens et al. 2005b, 2005d and Lan et al. 2006) • Advance state-of-the-art: use other sorting criteria than in CBA and build other classifiers • CBA-1: Use Intensity of Implication (Application 2):

• CBA-2: Use dilated Chi-Square (Application 3):

Example • Approach was tested on an example. Outcome:

Original CBA

Adapted CBA

- Error rate on training set: 5%

- Error rate on training set: 10%

- Error rate by 10-fold crossvalidation: 40%

- Error rate by 10-fold crossvalidation: 20%

- Number of rules on training dataset: 4

- Number of rules on training dataset: 2



àThis example already shows that the sorting of the rule may indeed have an important impact on the accuracy à Smaller classifier, that performs better when doing crossvalidation à Less overfitting? à Better for generalisation??

Results CBA, CBA-1, CBA-2 (Choice Facet Level) Dataset

Adapted CBA-1

Adapted CBA-2

Original CBA

CHAID

Train (%)

Test (%)

Num. Of rules

Train (%)

Test (%)

Num. Of rules

Train (%)

Test (%)

Num. Of rules

Duration

40.7

40.9

17

43.0

41.2

8

44.7

39.2

147

41.3

38.8

Location1

64.5

68.1

25

59.6

60.8

3

66.3

62.7

234

57.5

58.9

Location2

26.8

26.3

1

48.3

38.0

55

52.6

41.1

136

35.4

32.6

Mode Work

74.7

76.8

38

75.6

74.4

12

83.5

73.7

172

64.8

66.7

Mode other

54.9

54.8

5

68.0

60.5

259

66.5

60.9

245

52.8

49.5

Selection

79.1

79.2

1

57.5

57.4

1

79.6

78.7

594

72.4

71.6

Start Time

33.3

33.0

69

37.6

36.2

102

34.5

33.7

120

39.8

35.4

Trip Chain

82.7

82.0

21

84.2

83.4

3

83.9

80.4

65

83.3

80.9

With whom

54.7

48.1

24

55.9

51.1

51

61.1

56.2

222

50.9

48.4

Average Accuracy

56.8

56.6

/

58.9

55.9

/

63.6

58.5

/

55.4

53.6

Average Number of rules

/

/

22.3

/

/

54.9

/

/

215

Train (%)

Test (%)

Part 2: Bayesian Networks

Application 1: Bayesian Networks (BN) (see also Janssens et al. 2005a) • CARs: measure co-occurence relationships between variables • BN: not only capable of measuring but also used for modelling and reasoning. Characteristics: – Able to capture (complex) relationships between variables – Able to be learned from data – Visualize interdependencies between variables – Prior and posterior probability distributions per variable – Well suited to conduct what-if scenarios and sensitivity analysis – White box

BN Example Example of pruning a network

è

Some empirical findings (conclusions)

• Conclusions: – Better predictions à Reason: Unlike decision trees (CHAID), variables are selected simultaneously, no hierarchy of importance of the selected variables – Selection of the variables +/- the same in both approaches (à difference in performance more due to different nature than to additional insights) – Much larger number decision rules in Albatross compared with CHAID, however performance is also OK on the test data – Interpretation is an issue, BN link several variables in sometimes complex direct and indirect ways.

Application 2: BNT (see also Janssens et al, 2005c) • Advance state-of-the-art: BNT (BN augmented Tree) • Idea: – Integrate BN with Decision Trees – Use information that is stored in BN for building DT • Recursive algorithm has been developped which uses mutual information criterion that can be derived from a BN to build a decision tree

Results BN, BNT (Choice Facet Level) Dataset

BN

BNT

Train (%)

Test (%)

Train (%)

Test (%)

Duration

40.9

40.5

41.0

40.2

Location1

69.6

67.9

69.4

68.5

Location2

47.3

42.0

47.3

41.9

Mode Work

76.9

77.9

77.0

78.3

Mode other

58.3

52.1

58.3

52.1

Selection

79.1

79.2

79.1

79.2

Start Time

47.7

38.0

42.3

39.3

Trip Chain

83.1

82.3

83.1

82.5

With whom

57.7

53.4

57.7

53.5

Average

62.3

59.3

61.7

59.5

Results BN, BNT (Complexity) Individual rule complexity

Model complexity

BN (number of variables)

BNT (“depth” of DT)

BN (total number of rules)

BNT (number of leaves/ number of nodes)

Duration

4

2

84

6/4

Location1

9

7

5760

253/215

Location2

10

6

9216

131/124

Mode work

8

5

36864

187/64

Mode other

5

4

432

108/45

Start time

11

5

983040

210/58

Trip chain

11

6

124416

384/175

With whom

4

3

280

70/16

Part 3: Development of a Markov Chain simulation framework; including a segmentation scheme

• In our approach: – Activity-Travel combinations that sequentially occur together (e.g. Have breakfast – Car – Work) – Almost completely data-driven (as opposed to domain knowledge) – Use Markov Chains

• Research Motivation: – Most previous research does not explicitly capture sequential information – Comparison with activity-scheduling models is lacking

Contributions (see also Janssens et al. 2005a, 2005e, 2007) Transition Matrices

Contributions: 1) Modification of computation of transition probabilities • 2 alternative approaches were implemented • Proven that type of calculation matters • More efficient calculation

2) Segmentation procedure • Time information (bifurcation points) (relaxation of stationarity assumption) • Socio-demographic information (modified decision tree: use transition matrices for splitting the tree instead of class attributes)

…

3) Simulation procedure (Recursive application of information stored in transition matrices)

Simulation Model: Results

Observed tours

2.435

Order of transition matrix

l =1 l=2 l=3 l=4 l=5 l=6 l=7 l=8 l=9 l = 10

Without segmentation

With segmentation

Predicted tours (Approach 1)




1.321

1.621

2.215

2.232

1.745

1.954

2.314

2.319

2.148

2.128

2.112

2.141

2.312

2.316

1.543

1.510

2.414

2.424

1.541

1.432

2.721

2.621

/

/

2.732

2.730

/

/

2.012

2.003

/

/

1.521

1.621

/

/

0.927

1.222

/

/

Part 4: Reinforcement Learning

Allocating Time and Location Information: Reinforcement Learning • Sequences of activities and travel were generated by iterative application of low- and high-order transition probability matrices • However: Time and Location information is still missing in the generated patterns • Solution: Interaction with environment: Reinforcement Learning technique (Q-learning) – Decision (Action) is based on: • • • • •

Current activity Start time of current activity Duration of current activity Duration of transfer between two locations Sequence of activities and transport modes

Reinforcement Learning: The Decision Agent • Monkey wants to learn how to get to the banana without getting burned • Define a set of states: positions in the grid world • Define a set of actions: left, right, up, down • Learning by doing – – – – –

Observe state Select action Execute action Receive reward Memorise state-action pair and its utility – Repeat

à Learn an optimal sequence of actions

Reinforcement Learning • Assign rewards to each state – Time reward table: Based on observed frequency information – Location reward: Based on travel time information between locations: simultaneously maximise both time and location utility!

Reinforcement Learning • Assign rewards to each state – Time reward table: Based on observed frequency information – Location reward: Based on travel time information between locations: simultaneously maximise both time and location utility!

• Learn by Moving Around

Reinforcement Learning • Wherever the monkey now is, it knows how to get safely to the banana

• Same as in our application: –Given the Activity-travel sequence: determine optimal time and location information Home

- Car - Work

(0AM-6AM)

A

- Car - Shop

(6.15AM-12AM)

B

(12.20PM-6PM)

C

- Walk -Leisure (6.20 PM-12PM)

C

Part 5: Final comparison of approaches that have been developed

Final Comparisons: Full simulation model vs Activity-scheduling • SAM distance measures (full patterns) SAM distance measure

Albatross

Simulation model

CBA

BNT

C4.5

SAM activity-type

2.710

2.712

2.719

3.113

SAM location

3.111

3.101

3.109

5.135

SAM mode

4.515

4.419

4.439

4.975

UDSAM

16.319

16.313

16.328

17.551

• SAM distance measures (fixed elements removed) SAM distance measure

Albatross

Simulation model

CBA

BNT

C4.5

SAM activity-type

2.513

2.515

2.519

2.719

SAM location

2.682

2.685

2.690

4.832

SAM mode

2.625

2.622

2.629

2.929

UDSAM

11.356

11.352

11.362

11.921

Final Comparisons (Activity-scheduling vs Simulation) • Correlation coefficients between OD-matrices SAM distance measure Correlation coefficient

Albatross

Simulation model

CBA

BNT

C4.5

0.945

0.942

0.940

0.879

• SAM was somewhat worse for simulation model but minor differences à Time allocation and sequencing (activity-travel) performs fairly well • Larger differences for SAM location and OD-matrices -Need to be complemented in future research with additional information of facilities at location (e.g. floor space)

-Better specification of time and location relationship

References •

• • • • • • • •

Janssens, D., Wets, G., Brijs, T., Vanhoof, K., Timmermans, H.J.P. and Arentze, T.A. (2004) Improving the performance of a multi-agent rule-based model for activity pattern decisions using Bayesian networks. Journal of the Transportation Research Board, 1894, 75-83 Janssens, D., Wets, G., Brijs, T. and Vanhoof, K. (2005a) The development of an adapted Markov chain modelling heuristic and simulation framework in the context of transportation research. Expert systems with applications, 28, 105-117, Janssens, D., Wets, G., Brijs, T. and Vanhoof, K. (2005b) Adapting the CBA Algorithm by means of intensity of implication. Information Sciences, 173(4), 305318 Janssens, D., Wets, G., Brijs, T., Vanhoof, K., Arentze, T.A. and Timmermans, H.J.P. (2005c) Integrating Bayesian networks and decision trees in a sequential rule-based transportation model. Forthcoming in European Journal of Operational Research Lan, Y., Janssens, D., Chen, G. and Wets, G. (2006) Improving associative classification by incorporating novel interestingness measures. Forthcoming in Expert systems with applications. Janssens, D., Wets, G., Brijs, T. and Vanhoof, K. (2005d) Using an adapted classification based on associations algorithm in an activity-based transportation system. In Intelligent Data Mining: Techniques and Applications, 275-292. Janssens, D., Wets, G., Brijs, T. and Vanhoof, K. (2005e) Simulating daily activity patterns throught the identification of sequential dependencies. In Timmermans, H.J.P. (Eds.): Progress in Activity-based analysis, Elsevier, Amsterdam, 67-90. Arentze, T.A. and Timmermans, H.J.P. (2000) Albatross: A Learning-Based Transportation Oriented Simulation System. European Institute of Retailing and Services Studies, Eindhoven. Liu, B., Hsu, W. and Ma, Y. (1998) Integrating classification and association rule mining. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), New York, USA, 80-86.

Calibrating Unsupervised Machine Learning Algorithms for the

Calibrating Unsupervised Machine Learning Algorithms for the

Suggest Documents

A Survey on Unsupervised Machine Learning Algorithms for ...

Unsupervised Machine Learning Application to

Unsupervised Machine Learning for Networking - Lancaster EPrints

Learning Multiple Defaults for Machine Learning Algorithms

Machine Learning Algorithms

Supervised Machine Learning Algorithms

Machine Learning Algorithms - analyticsvidhya.com

Supervised and unsupervised machine learning for the detection ...

Supervised and unsupervised machine learning for the detection ...

Machine Learning Algorithms for Real Data Sources

educational tool for machine learning algorithms

machine learning algorithms for damage detection

Machine learning approximation algorithms for high ...

evaluating machine learning algorithms for detecting ... - CiteSeerX

Evaluating Machine Learning Algorithms for Automated ... - CAIA

Comparison of Supervised and Unsupervised Learning Algorithms ...

Parallelizing Machine Learning Algorithms - CS229

Read Unsupervised Machine Learning in Python - WordPress.com

Unsupervised Machine Learning Application to ... - Semantic Scholar

Supervised and Unsupervised Machine Learning ... - Google Sites

Bayesian and Unsupervised Machine Learning ...

Unsupervised Machine Learning of Open Source Russian

Unsupervised Variable Selection - Proceedings of Machine Learning ...

Read Unsupervised Machine Learning in Python - WordPress.com