Tutorials

8 downloads 26122 Views 13MB Size Report
Sep 7, 2009 - Tutorials. Monday, September 7, 2009: Learning from Multi-label Data. G. Tsoumakas ..... and ranking. □ Query categorization. ▫ Example-based precision. Ideas ...... MIMLBoost (an illustration of Solution 1). MIBoosting.
Tutorials Monday, September 7, 2009: Learning from Multi-label Data G. Tsoumakas (Aristotle University of Thessaloniki), M. L. Zhang (Hohai University), Z.-H. Zhou (Nanjing University) Language and Document Analysis: Motivating Latent Variable Models W. Buntine (Helsinki Institute of IT, NICTA) Methods for Large Network Analysis V. Batagelj (University of Ljubljana)

Friday, September 11,2009: Evaluation in Machine Learning P. Cunningham (University College Dublin) Transfer Learning for Reinforcement Learning Domains A. Lazaric (INRIA Lille), M. Taylor (University of Southern California) Graphical Models T. Caetano (NICTA)

Tutorial Chair: C. Archambeau (University College London)

Tutorial at ECML/PKDD’09 Bled, Slovenia 7 September, 2009

Learning from Multi-Label Data Grigorios Tsoumakas

Min-Ling Zhang

Zhi-Hua Zhou

Department of Informatics,

College of Computer and

Aristotle University of

Information Engineering,

Thessaloniki, Greece

Hohai University, China

LAMDA Group, National Key Laboratory for Novel Software Technology, Nanjing University, China

Outline Introduction  Overview of existing techniques  Advanced topics  The Mulan open-source software 

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Outline 

Introduction   

What is multi-label learning Applications and datasets Multi-label evaluation metrics

Overview of existing techniques  Advanced topics  The Mulan open-source software 

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Outline 

Introduction   

What is multi-label learning Applications and datasets Multi-label evaluation metrics

Overview of existing techniques  Advanced topics  The Mulan open-source software 

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

The Larger Picture Data with multiple target variables  What can the type of targets be? 



Numerical  



Ecological modeling and environmental applications Industrial applications (automobile)

Categorical targets  

Binary targets Multi-class targets

Multi-Label Data

 Ordinal 

Combination of types

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Notation for Multi-Label Data 

A d-dimensional input space, 



Numeric or nominal features

A set of q output labels 



A multi-label dataset of m training examples  

where

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Multi-Label Learning Tasks (1/3) 

Classification 



Produce a bipartition of the set of labels into a relevant (positive) and an irrelevant (negative) set

For example, given  

An unobserved instance x

produce a bipartition 

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Multi-Label Learning Tasks (2/3) 

Ranking 



Produce a ranking (total strict order) of all labels according to relevance to the given instance

For example, given  

An unobserved instance x

produce a ranking  

where ranking

denotes the position of label

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

in the

Multi-Label Learning Tasks (3/3) 

Classification and Ranking  



Produce both a bipartition and a ranking of all labels Should be consistent:

For example, given  

An unobserved instance x

produce a bipartition and a ranking  

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Outline 

Introduction   

What is multi-label learning Applications and datasets Multi-label evaluation metrics

Overview of existing techniques  Advanced topics  The Mulan open-source software 

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Applications and Datasets 

(Semi) automated annotation of large object collections for information retrieval 

Text/web, image, video, audio, biology

Tag suggestion in Web 2.0 systems  Query categorization  Drug discovery  Direct marketing  Medical diagnosis 

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Text (1/4) 

News  



An article concerning the Antikythera Mechanism can be categorized to Science/Technology, History/Culture

Reuters Collection Version I [Lewis et al., JMLR04]    

804414 newswire stories indexed by Reuters Ltd 103 topics organized in a hierarchy, 2.6 on average 350 industries (2-level hierarchy post-produced) 296 geographic codes

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Text (2/4) 

Research articles  



A research paper on an ensemble method for multilabel classification can be assigned to the areas Ensemble methods, Structured output prediction

Collections 

OHSUMED [Hersh et al., SIGIR94] 



Medical Subject Headings (MeSH) ontology

ACM-DL [Veloso et al., ECMLPKDD07] 

ACM Computing Classification System (1st:11, 2nd:81 labels)



81251 Digital Library articles

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Text (3/4) 

EUR-Lex collection [Loza Mencia & Furnkranz, ECMLPKDD08]  

19596 legal documents of the European Union (EU) Hierarchy of 3993 EUROVOC labels, 5.4 on average 

 



EUROVOC is a multilingual thesaurus for EU documents

201 subject matters, 2.2 on average 412 directory codes, 1.3 on average

WIPO-alpha collection [Fall et al., SIGIRForum03]   

World Intellectual Patents Organization (WIPO) 75000 patents 4 level hierarchy of ~5000 categories

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Text (4/4) 

Aviation safety reports (tmc2007)    



Competition of SIAM Text Mining 2007 Workshop 28596 NASA aviation safety reports in free text form 22 problem types that appear during flights 2.2 annotations on average

Free clinical text in radiology reports (medical)  

Computational Medicine Center's 2007 Medical NLP Challenge [Pestian et al., ACL07w] 978 reports, 45 labels, 1.3 labels on average

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Web 

Email 

Enron dataset   



UC Berkeley Enron Email Analysis Project 1702 examples, 53 labels, 3.4 on average 2 level hierarchy

Web pages 

Hierarchical classification schemes  

Open Directory Project Yahoo! Directory [Ueda & Saito, NIPS02]

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Image and Video 

Application 



Automated annotation for retrieval

Datasets 

Scene [Boutell et al., PR04] 



2407 images, 6 labels, 1.1 on average

Mediamill [Snoek et al., MM06] 



85 hours of video data containing Arabic, Chinese, and US broadcast news sources, recorded during November 2004 43907 frames, 101 labels, 4.4 on average

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Image and Video

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Audio (1/2) 

Music and meta-data db of the HiFind company  

450000 categorized tracks since 1999 935 labels from 16 categories (340 genre labels) 



Annotation   



Style, genre, musical setup, main instruments, variant, dynamics, tempo, era/epoch, metric, country, situation, mood, character, language, rhythm, popularity 25 annotators (musicians, music journalists) + supervisor Software-based annotation takes 8 min per track on average 37 annotation per track on average

A subset was used in [Pachet & Roy, TASLP09] 

32,978 tracks, 632 labels, 98 acoustic features

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Audio (2/2) 

Emotional categorization of music 

Relevant works 



Dataset emotions in [Trohidis et al., ISMIR08]  



593 tracks, 6 labels, 1.9 on average {happy, calm, sad, angry, quiet, amazed}

Some applications  



[Li & Ogihara, ISMIR03; TMM06; Wieczorkowska et al., IIPWM06]

Song selection in mobile devices, music therapy Music recommendation systems, TV and radio programs

Acoustic data [Streich & Buhmann, ECMLPKDD08]  

Construction of hearing aid instruments Labels: Noise, Speech, Music

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Biology (1/2) 

Applications 



Automated annotation of proteins with functions

Annotation hierarchies 

The Functional Catalogue (FunCat) 



A tree-shaped hierarchy of annotations for the functional description of proteins from several living organisms

The Gene Ontology (GO) 

A directed acyclic graph of annotations for gene products

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Biology (2/2) 

Datasets 

Yeast [Elisseeff & Weston, NIPS02] 



Phenotype (yeast) [Clare & King, ECMLPKDD01] 



2417 examples, 14 labels (1st FunCat level), 4.2 on average 1461 examples, 4 FunCat levels

12 yeast datasets [Clare, PhdThesis03; Vens et al., MLJ08]   

Gene expression, homology, phenotype, secondary structure FunCat, 6 levels, 492 labels, 8.8 on average GO, 14 levels, 3997 labels, 35.0 on average

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Tag Suggestion in Web 2.0 Systems 

Benefits  



Input 



Feature representation of objects (content)

Challenges  



Richer descriptions of objects Folksonomy alignment

Huge number of tags Fast online predictions

Related work 

[Song et al., CIKM08; Katakis et al., ECMLPKDD08w]

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Query Categorization 

Benefits  

Integrate query-specific rich content from vertical search results (e.g. from a database) Identify relevant sponsored ads 



Place ads on categories vs. keywords

Example: Yahoo! [Tang et al., WWW09]   

6433 categories organized in an 8-level taxonomy 1.5 million manually labeled unique queries Labels per query range from 1 to 26

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

3+ labels 3% 2 labels 16%

1 label 81%

Drug Discovery 

MDL Drug Data Report v. 2001  



119110 chemical structures of drugs 701 biological activities (e.g. calcium channel blocker, neutral endopeptidase inhibitor, cardiotonic, diuretic)

Example: Hypertension [Kawai & Takahashi, CBIJ09] 

Two major activities of hypertension drugs  



Angiotensin converting enzyme inhibitor Neutral endopeptidase inhibitor

Compounds producing both these 2 specific activities found to be an effective new type of drug

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Other 

Direct marketing [Zhang et al., JMLR06]   

A direct marketing company sends offers to clients for products of categories they potentially are interested Historical data of clients and product categories that they got interested (multiple categories) Data from Direct Marketing Association 



Classification and Ranking 



19 categories Send only relevant products, send top X products

Medical diagnosis 

A patient may be suffering from multiple diseases at the same time, e.g. {obesity, hypertension}

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Multi-Label Statistics 

Label cardinality, c 



Label density 



Average number of labels per example Label cardinality divided by the total number of labels

Distinct labelsets 

Number of different label combinations

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Outline 

Introduction   

What is multi-label learning Applications and datasets Multi-label evaluation metrics

Overview of existing techniques  Advanced topics  The Mulan open-source software 

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Evaluation Metrics: A Taxonomy 

Based on calculation [Tsoumakas & Vlahavas, ECMLPKDD07]  



Example-based are calculated separately for each test example and averaged across the test set Label-based are calculated separately for each label and then averaged across all labels

Based on the output of the learner   

Binary prediction for each label Ranking of the labels (example-based) Probability or score for each label

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Example-Based Binary (1/2) 

Notation  



Subset accuracy [Zhu et al., SIGIR05; Ghamrawi & McCallum, CIKM05] 



, where I(true)=1 and I(false)=0

Hamming loss [Schapire & Singer, MLJ00]  



Pi is the set of predicted labels for instance xi Yi is the set of actual labels for instance xi

, where is the XOR operation Average binary classification error

Accuracy [Godbole & Sarawagi, PAKDD04] 

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Example-Based Binary (2/2) 

Information retrieval view [Godbole & Sarawagi, PAKDD04] 

Precision



Recall



Harmonic mean

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Example-Based Ranking (1/2) 

One-error 



Evaluates how many times the top-ranked label is not in the set of proper labels of the example

Coverage 

Evaluates how many steps are needed, on average, to go down the label list to cover all proper labels of the example

[Schapire & Singer, MLJ00] G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Example-Based Ranking (2/2) 

Ranking loss 



Evaluates the average fraction of label pairs that are mis-ordered for the instance

Average precision



Evaluates the average fraction of labels ranked above a proper label which are also proper labels [Schapire & Singer, MLJ00]

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Label-Based Binary 

Let B(TPj, FPj, TNj, FNj)  



Contingency Table for λj Learner Output

Actual Value POS

NEG

POS

TPj

FPj

NEG

FNj

TNj

be a binary evaluation measure calculated based on the above contingency table E.g. accuracy = (TPj + TNj)/(TPj + FPj + TNj + FNj)

Macro-averaging 

Ordinary averaging of a binary measure





Micro-averaging 

Labels as different instances of the same global label

 G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Probabilities or Scores per Label 

Bipartition can be obtained via thresholding 



A ranking can be obtained after solving ties 



Example-based ranking measures

A vertical ranking can be computed 



Example/label-based binary measures

Ranking measures calculated vertically (label-based)

Threshold independent label-based evaluation  

Only for probabilities! Area under a ROC or PR curve

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Which Measure Should I Use? 

Computer aided annotation by humans  



Automated annotation for retrieval 



Macro-averaged F-measure

Direct marketing 



E.g. tag suggestion in web 2.0 systems Example-based ranking measures

Example-based precision and ranking

Query categorization 

Example-based precision

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Ideas

Outline Introduction  Overview of existing techniques 

  

Problem transformation methods Algorithm adaptation methods From ranking to classification

Advanced topics  The Mulan open-source software 

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Outline Introduction  Overview of existing techniques 

  

Problem transformation methods Algorithm adaptation methods From ranking to classification

Advanced topics  The Mulan open-source software 

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

A Categorization 



[Tsoumakas & Katakis, IJDWM07]

Problem transformation methods 

They transform the learning task into one or more single-label classification tasks



They are algorithm independent



Some could be used for feature selection as well

Algorithm adaptation methods 

They extend specific learning algorithms in order to handle multi-label data directly



Boosting, generative (Bayesian), SVM, decision tree, neural network, lazy, ……

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Problem Transformation Methods Binary relevance  Ranking via single-label learning  Pairwise methods 

 



Methods that combine labels 



Ranking by pairwise comparison Calibrated label ranking Label Powerset, Pruned Sets

Ensemble methods 

RAkEL, EPS

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Example Multi-Label Dataset 

= {λ1, λ2, λ3, λ4} Example Features 1 2 3 4

r x1 r x2 r x3 r x4

Label set {λ1, λ4} {λ3, λ4} {λ1} {λ2, λ3, λ4}

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Binary Relevance (BR) 

How it works   



Limitation 



Learns one binary classifier for each label Outputs the union of their predictions Can do ranking if classifier outputs scores

Ex #

Label set

1

{λ1, λ4}

2

{λ3, λ4}

3

{λ1}

4

{λ2, λ3, λ4}

Does not consider label relationships

Complexity O(qm) Ex #

λ1

Ex #

λ2

Ex #

λ3

Ex #

λ4

1

true

1

false

1

false

1

true

2

false

2

false

2

true

2

true

3

true

3

false

3

false

3

false

4

false

4

true

4

true

4

true

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Ranking via Single-Label Learning 

Basic concept  



Transform the multi-label dataset to a single-label multiclass dataset with the labels as classes A single-label classifier that outputs a score (e.g. probability) for each class can produce a ranking

Transformations [Boutell et al, PR04; Chen et al., ICDM07]   

ignore select-max, select-min, select-random copy, copy-weight (entropy)

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Ignore 

Simply ignore all multi-label examples 

Major information loss!

Ex #

Label set

1

{λ1, λ4}

2

{λ3, λ4}

3

{λ1}

4

{λ2, λ3, λ4}

Ex #

Label set

3

{λ1}

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Select Min, Max and Random 

Select one of the labels   



Most frequent label (Max) Less frequent label (Min) Random selection

Ex #

Label set

1

{λ1, λ4}

2

{λ3, λ4}

3

{λ1}

4

{λ2, λ3, λ4}

Information loss! Ex # Label

Ex # Label

Ex #

λ4

1

λ4

1

λ1

1

λ4

2

λ4

2

λ3

2

λ3

3

λ1

3

λ1

3

λ1

4

λ4

4

λ2

4

λ2

Max

Min

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Random

Copy and Copy-Weight (Entropy) 

Replace each example ( xi , Yi ) with Yi examples ( xi , λ j ) , one for each λ j ∈ Yi (xi,λj) 

Copy-weight requires learners that take the weights of examples into account 

 

Weights examples by

1 Yi

No information loss Increased examples O(mc)

Ex # Label Weight 1a

λ1

0,50

1b

λ4

0,50

2a

λ3

0,50

Ex #

Label set

2b

λ4

0,50

1

{λ1, λ4}

3

λ1

1,00

2

{λ3, λ4}

4a

λ2

0,33

3

{λ1}

4b

λ3

0,33

4

{λ2, λ3, λ4}

4c

λ4

0,33

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Ranking by Pairwise Comparison (1/4) 

How it works [Hullermeier et al., AIJ08]    

It learns q(q-1)/2 binary models, one for each pair of labels (λ i , λ j ), 1 ≤ i < j ≤ q Each model is trained based on examples that are annotated by at least one of the labels, but not both It learns to separate the corresponding labels Given a new instance, all models are invoked and a ranking is obtained by counting the votes received by each label

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Ranking by Pairwise Comparison (2/4)

Ex # 1vs2

Ex # 1vs3

Ex #

Label set

1

{λ1, λ4}

2

{λ3, λ4}

3

{λ1}

4

{λ2, λ3, λ4}

Ex # 1vs4

1

true

1

true

2

false

3

true

2

false

3

true

4

false

3

true

4

false

4

false

Ex # 2vs3 2

false

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Ex # 2vs4 1

false

2

false

Ex # 3vs4 1

false

Ranking by Pairwise Comparison (3/4) new instance x' 1vs2 1vs3 1vs4 2vs3 2vs4 3vs4 1

3

1

3

2

3

Label Votes λ1

2

λ2

1

λ3

3

λ4

0

λ3 Ranking:

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

λ1 λ2 λ4

Ranking by Pairwise Comparison (4/4) 

Time complexity 

Training: O(mqc), where c is the label cardinality 





Each example x appears in |Px |(q-|Px |) < |Px |q datasets

Testing: Needs to query q2 binary models

Space complexity   

Needs to maintain q2 binary models in memory Pairwise decision tree/rule learning models might be simpler than one-vs-rest Perceptrons/SVMs store a constant number of parameters

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Calibrated Label Ranking (1/4) 

How it works [Furnkranz et al., MLJ08] 



Extends ranking by pairwise comparison by introducing an additional virtual label λV, with the purpose of separating positive from negative labels Pairwise models that include the virtual label correspond to the models of binary relevance   



All examples are used When a label is true, the virtual label is considered false When a label is false, the virtual label is considered true

The final ranking includes the virtual label, which acts as the split point between positive/negative labels

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Calibrated Label Ranking (2/4) Ex # 1vsV

Ex # 2vsV

Ex #

Label set

Ex # 3vsV

Ex # 4vsV

1

true

1

false

1

{λ1, λ4}

1

false

1

true

2

false

2

false

2

{λ3, λ4}

2

true

2

true

3

true

3

false

3

{λ1}

3

false

3

false

4

false

4

true

4

{λ2, λ3, λ4}

4

true

4

true

Ex # 1vs2

Ex # 1vs3

Ex # 1vs4

1

true

1

true

2

false

3

true

2

false

3

true

4

false

3

true

4

false

4

false

Ex # 2vs3 2

false

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Ex # 2vs4 1

false

2

false

Ex # 3vs4 1

false

Calibrated Label Ranking (3/4) new instance x'

1vs2 1vs3 1vs4 2vs3 2vs4 3vs4 1

1

1

2

2

1vsV 2vsV 3vsV 4vsV

4

1

V

V

λ1

Label Votes λ1

4

λV

λ2

2

Ranking: λ2

λ3

0

λ4

λ4

1

λ3

λV

3

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

V

Calibrated Label Ranking (4/4) 

Benefits  



Improved ranking performance Classification and ranking (consistent)

Limitations 

Space complexity (as in RPC) 



A solution for perceptrons [Loza Mencıa & Furnkranz, ECMLPKDD08]

Querying q2 + q models at runtime 

QWeighted algorithm [Loza Mencia et al., ESANN09]

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Label Powerset (LP) (1/5) 

How it works 



Each different set of labels in a multi-label training set becomes a different class in a new single-label classification task Given a new instance, the single-label classifier of LP outputs the most probable class (a set of labels) Ex #

Label set

Ex #

Label

1

{λ1, λ4}

1

1001

2

{λ3, λ4}

2

0011

3

{λ1}

3

1000

4

{λ2, λ3, λ4}

4

0111

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Label Powerset (LP) (2/5) 

Ranking  

It is possible if a classifier that outputs scores (e.g. probabilities) is used [Read, NZCSRS08] Are the bipartition and ranking always consistent? c

P(c|x)

λ1 λ2 λ3 λ4

1001

0,7 0,1

1

0

0

1

0011

0,2 0,3

0

0

1

1

1000

0,1 0,4

1

0

0

0

0111

0 0,2

0

1

1

1

ΣP(c|x)λj 0,8 0,5 0,0 0,2 0,2 0,5 0,9 0,6

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Label Powerset (LP) (3/5) 

Complexity   



Depends on the number of distinct labelsets that exist in the training set It is upper bounded by min(m,2q) It is usually much smaller, but still larger than q

Limitations   

High complexity Limited training examples for many classes Cannot predict unseen labelsets

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Label Powerset (LP) (4/5) Labelsets Dataset emotions enron hifind

m

q

Bound Actual Diversity

593

6

64

27

0,42

1702

53

1702

753

0,44

32971 632 32971 32734

0,99

mediamill 43907 101 43907 medical scene tmc2007 yeast

6555

0,15

978

45

978

94

0,10

2407

6

64

15

0,23

22 28596

1341

0,05

198

0,08

28596 2417

14

2417

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

1 4104 2 895 3 384 4 215 5 156 6 111 7 85 8 77 9 48 10 39

Label Powerset (LP) (5/5) 10000

Number of Classes (Combinations)

mediamill 1000

100

10

1 1

10

100

Number of Appearences

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

1000

10000

Pruned Sets (1/2) 

How it works [Read, NZCSRS08; Read et al., ICDM08]  

Follows the transformation of LP, but it also … Prunes examples whose labelsets (classes) occur less times than a small user-defined threshold p (e.g. 2 or 3) 



Deals with the large number of infrequent classes

Re-introduces pruned examples along with subsets of their labelsets that do exist more times than p 



Strategy A: Rank subsets by size/number of examples and keep the top b of those Strategy B: Keep all subsets of size greater than b

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Pruned Sets (2/2) 

p=3



Strategy A, b=2

Labelset

Count

Subsets

λ1

16

λ2, λ3

2

12

λ2

14

λ1

1

16

λ2, λ3

12

λ2

1

14

λ1, λ4

8

λ3, λ4

7

λ1, λ2, λ3

2



Size Count

Strategy B, b=1

Subsets

Size

λ2, λ3

2

λ1

1

λ2

1

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Random k-Labelsets (1/3) 

How it works [Tsoumakas & Vlahavas, ECMLPKDD07]   

Randomly break a large set of labels into a number (n) of subsets of small size (k), called k-labelsets For each of them train a multi-label classifier using the LP method Given a new instance, query models and average their decisions per label 



Thresholding to obtain final model

Benefits   

Computationally simpler sub-problems More balanced training sets Can predict unseen labelsets

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Random k-Labelsets (2/3) model 3-labelsets h1 {λ1, λ2, λ6} h2 {λ2, λ3, λ4} h3 {λ3, λ5, λ6} h4 {λ2, λ4, λ5} h5 {λ1, λ4, λ5} h6 {λ1, λ2, λ3} h7 {λ1, λ4, λ6} average votes final prediction

predictions λ1 λ2 λ3 λ4 λ5 1 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 1 3/4 1/4 2/3 1/4 1/3 1 0 1 0 0

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

λ6 1 1 0 2/3 1

Random k-Labelsets (3/3) 

Comments   



The mean number of votes per label is nk/q The large it is, the higher the effectiveness It characterizes RAkEL as an ensemble method

How to set parameters k and n?   

k should be small enough to deal with LP's problems n should be large enough to obtain more votes Proposed default parameters 

k=3, n=2q (6 votes per label)

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Ensembles of Pruned Sets 

How it works [Read et al., ICDM08]  

Constructs n pruned sets models by sampling the training set (e.g. 63%) Given a new instance, queries models and averages their decisions (each decision concerns all labels)  

A ranking is obtained Thresholding is used to obtain bipartition

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Outline Introduction  Overview of existing techniques 

  

Problem transformation methods Algorithm adaptation methods From ranking to classification

Advanced topics  The Mulan open-source software 

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

A Categorization 



[Tsoumakas & Katakis, IJDWM07]

Problem transformation methods 

They transform the learning task into one or more single-label classification tasks



They are algorithm independent



Some could be used for feature selection as well

Algorithm adaptation methods 

They extend specific learning algorithms in order to handle multi-label data directly



Boosting, generative (Bayesian), SVM, decision tree, neural network, lazy, ……

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Algorithm Adaptation Methods 

Boosting 



Generative (Bayesian) 



BP-MLL [Zhang & Zhou, TKDE06]

Lazy (kNN) 



Multi-label C4.5 [Clare & King, PKDD01]

Neural network 



Rank-SVM [Elisseeff & Weston, NIPS02]

Decision tree 



[McCallum, AAAI99w], [Ueda & Saito, NIPS03]

SVM 



AdaBoost.MH [Schapire & Singer, MLJ00]

ML-kNN [Zhang & Zhou, PRJ07]

......

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

AdaBoost.MH Description: The core of a successful multi-label text categorization system BoosTexter [Schapire & Singer, MLJ00]

Basic Strategy: Map the original multi-label learning problem into a binary learning problem, which is then solved by traditional AdaBoost algorithm [Freund & Schapire, JCSS97]

Example transformation: Transform each multi-label training example

into

binary labeled examples: concatenation of xi and each label y

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

AdaBoost.MH (Cont’) Training procedure: Classical AdaBoost is employed to learn from the transformed binary-labeled examples iteratively

Weak hypotheses (base learners): Has the basic form of decision stump (one-level decision tree) E.g.: for text categorization task, each possible term w (e.g. bigram) specifies as follows: a weak hypothesis where x is a text document, and c0y and c1y are predicted outputs

In each boosting round, the choices of weak hypothesis as well as its combination weight is optimized towards the minimization of empirical hamming loss G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Generative (Bayesian) Approach Description: Modeling the generative procedure of multi-label texts [McCallum, AAAI99w] [Ueda & Saito, NIPS03]

Basic Assumption: The word distribution given a set of labels is determined by a mixture (linear combination) of word distributions, one for each single label Settings: word vocabulary word distribution given a single label word distribution given a set of labels q-dimensional mixture weight

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

for Y

Generative (Bayesian) Approach (Cont’) MAP (Maximum A Posteriori) Principle: Given a test document x*, its associated label set Y* is determined as: [applying Bayesian rule] [assuming word independence] Prior probability

Mixture of word distributions

Directly estimated from training set by frequency counting The parameters

and

are learned by EM-style procedure

Here, the two generative approaches are specific to text applications instead of general-purpose multi-label learning methods G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Rank-SVM Description: A maximum margin approach for multi-label learning, implemented with kernel trick to incorporate non-linearity [Elisseeff & Weston, NIPS02]

Basic Strategy: Assume one classifier for each individual label, and define “multi-label margin” on the whole training set, which is then minimized under QP (quadratic programming) framework Classification system: q linear classifiers

, each with weight wk and bias bk:

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Rank-SVM (Cont’) Margin definition: margin for a multi-label example labels in should be ranked higher than labels not in

margin for the training set S:

QP formulation (ideal case): Solved by introducing slack variables and then optimized in its dual form (with incorporation of kernel trick)

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Multi-Label C4.5 Description: An extension of popular C4.5 decision tree to deal with multi-label data [Clare & King, PKDD01]

Basic Strategy: Define “multi-label entropy” over a set of multi-label examples, based on which the information gain of selecting a splitting attribute is calculated, and then a decision tree is constructed recursively in the same way of C4.5 Multi-label entropy: Given a set of multi-label examples

, let p(y) denote the probability that an example in S has label y, then the multi-label entropy is:

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

BP-MLL Description: An extension of popular BP neural networks to deal with multi-label data [Zhang & Zhou, TKDE06]

Basic Strategy: Define a novel global error function capturing the characteristics of multilabel learning, i.e. labels belonging to an example should be ranked higher than those not belonging to that example Network architecture:  Single-hidden-layer feed forward neural network Adjacent layers are fully connected  Each input unit corresponds to a dimension of input space  Each output unit corresponds to an individual label G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

BP-MLL (Cont’) Global error function: Given multi-label training set

, the global training error E

on S is defined as:

Ei: the error of the network on (xi,Yi);

cij: the actual network output on xi on the j-th label

 Approximately optimizing the ranking loss criterion Lead the system to output larger values for the labels belonging to the test instance and smaller values for the labels not belonging to it

Parameter optimization: gradient descent + error back-propagation strategy G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

ML-kNN Description: An extension of popular kNN to deal with multi-label data [Zhang & Zhou, PRJ07]

Basic Strategy: Based on statistical information derived from the neighboring examples (i.e. membership counting statistic), the MAP principle is utilized to determine the label set of an unseen example Settings: the k nearest neighbors of x identified in the training set q-dimensional membership counting vector, where the l-th dimension counts the number of examples in N(x) having the l-th label the event that an example having (not) the l-th label the event that there are exactly j examples in N(x) having the l-th label G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

ML-kNN (Cont’) Procedure: Given a test example x, its associated label set Y is determined as:  Identify its k nearest neighbors in the training set, i.e. N(x)  Compute its membership counting vector

based on N(x)

 Determine the label set using MAP principle based on

Probabilities needed: directly estimated from the training set based on frequency counting

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Outline Introduction  Overview of existing techniques 

  

Problem transformation methods Algorithm adaptation methods From ranking to classification

Advanced topics  The Mulan open-source software 

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

From Ranking to Classification 

BR can output numerical scores for each label  



E.g. perceptrons, SVMs, decision trees, kNN, Bayes We use an intuitive threshold to go from these scores to 0/1 decisions (e.g. 0 in perceptrons, SVMs, 0.5 in probabilistic/confidence outputs)

The same applies to other problem transformation and algorithm adaptation methods Are there general approaches to deliver classification from a ranking?

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

From Ranking to Classification 

Thresholding strategies  



RCut, PCut, SCut, RTCut, SCutFBR [Yang, SIGIR01] A study of SCutFBR [Fan and Lin, TechRep07]

Learning the number of labels  

Based on the ranking [Elisseeff and Weston, NIPS02] Based on the content [Tang et. al, WWW09]

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Thresholding Strategies: RCut 

How it works 



Given a document, it sorts the labels by score and selects the top k labels, where k inside [1,q]

How to set the parameter k? 

It can be specified by the user 





Typically it is set to the label cardinality of the training set

It can be globally tuned using a validation set

Comments  

What if the number of labels per example varies? It does not perform well in practice [Tang et. al, WWW09]

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Thresholding Strategies: PCut 

How it works   

λ j sort test instances based on score and For each label λ, λ j to the top kj= k j =kkP (λ j ) test instances P(λj) assign λj P(λj) P(λ j ) is the prior probability of a document belonging λj to λj(estimated on the training set) k is a proportionality constant to trade-off false positives and false negatives 



It can be globally tuned as in RCut

Comments 

It requires the prediction scores for all test instances, so it is not suitable for online decisions

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Thresholding Strategies: SCut 

How it works 



For each label λ λj j , tune a threshold based on a validation set

Comments   

In contrast to RCut and PCut it tunes a separate parameter for each label Requires a validation phase, whose complexity is linear with respect to q Overfits when the binary learning problem is unbalanced (few positive labels) 

Too high or too low thresholds

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Thresholding Strategies: RTCut 

How it works  



Given a document, it sorts the labels by synthetic score and selects those above a threshold t It is optimized using a validation set

Synthetic score

ss (λ j ) = r (λ j ) +

s (λ j ) max j∈L {s (λ j )} + 1

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Thresholding Strategies: SCutFBR 

How it works   



SCutFBR.0 



When the SCut threshold is too high macro-F1 is hurt When the SCut threshold is too low both macro and micro-F1 are hurt (prefer to increase the threshold) Solution: When the calculated threshold is below a given value fbr then… Set the threshold to infinity

SCutFBR.1 

Set the threshold to the largest score during validation

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Learning the Number of Labels 



[Elisseeff & Weston, NIPS02]  Input: a q-dimensional feature space with the obtained scores for each label  Output: The threshold t that minimizes the symmetric difference between predicted and true sets  Learning: linear least squares [Tang et. al, WWW09]  Input: original feature space, scores, sorted scores  Output: the size of the labelset  Learning: multi-class classification, with a cut-off parameter

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Outline Introduction  Overview of existing techniques  Advanced topics 

 



Learning in the presence of Label Structure Multi-instance multi-label learning

The Mulan open-source software

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Outline Introduction  Overview of existing techniques  Advanced topics 

 



Learning in the presence of Label Structure Multi-instance multi-label learning

The Mulan open-source software

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Hierarchy Types and Implications 

Trees   

Singe parent per label When an object is labeled with a node, it is also labeled with its parent (paths) Path types  



Annotation paths end at a leaf (full path) Annotation paths end at internal nodes (partial paths)

Directed acyclic graphs (DAGs)  

Multiple parents per label When an object is labeled with a node  

It is also labeled with all its parents (multiple inheritance) It is also labeled with at least one of its parents

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

A Simple Approach 

Ignore hierarchies   

Simple binary relevance Should be used as a baseline Learn leaf models only, in the case of full paths!

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Hierarchical Binary Relevance 

Training [Koller & Sahami, ICML97; Cesa-Bianchi et al., JMLR06] 



One binary model at each node, using only those examples that are annotated with the parent node

Predictions are formed top-down  

A node can predict true only if its parent predicted so What about probabilities? p(λ)=p(λ|par(λ)p(par(λ)) 



When thresholding, the threshold for a node should not be higher than that of its parent

Comments 

Handles both partial and full paths

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Hierarchical Multi-Label Learning 

Generalization of HBR [Tsoumakas et al., ECMLPKDD08w]   



Training and testing follows same approach as HBR One multi-label learner is trained at each internal node If BR is used at each node, then we get HBR

TreeBoost.MH [Esuli et al., IR08] 

Instantiation using AdaBoost.MH

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Other Approaches Predictive Clustering Trees [Vens et al., MLJ08]  B-SVM [Cesa-Bianchi et al., ICML06] 

 



Train similarly to HBR Bottom-up Bayesian combination of probabilities

Bayesian networks [Barutcuoglu et al., BIOINF06]  

Train independent binary classifiers for each label Combine them using a Bayesian network to correct inconsistencies

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Outline Introduction  Overview of existing techniques  Advanced topics 

 



Learning in the presence of Label Structure Multi-instance multi-label learning

The Mulan open-source software

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Consider … An image usually contains multiple regions each can be represented by an instance

The image can simultaneously belong to multiple classes Elephant Lion Grassland Tropic Africa ……

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Consider … A document usually contains multiple sections each can be represented by an instance

The document can simultaneously belong to multiple categories

Scientific novel Jules Verne’s writing Book on traveling ……

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

MIML

Multi-Instance Multi-Label (MIML) Learning

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Why MIML? Appropriate representation is important Having an appropriate representation is as important as having a strong learning algorithm Real-world objects are usually inherited with input ambiguity as well as output ambiguity

Traditional supervised learning, multi-instance learning and multi-label learning are degenerated versions of MIML

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Why MIML? (Cont’)

Traditional supervised learning

Multi-label learning

Multi-instance learning

Multi-instance multi-label learning

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Why MIML? (Cont’) To learn an one-to-many mapping is an ill-posed problem Why there are multiple labels? many-to-many mapping seems better; and moreover, MIML also offers a possibility for understanding the relationship between instances and labels label ……

instance different aspects

object instance

label …… label

instance

…… label

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Why MIML? (Cont’) MIML can also be helpful for learning single-label examples involving complicated high-level concepts

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Why MIML? (Cont’) MIML can also be helpful for learning single-label examples involving complicated high-level concepts

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Multi-Instance Multi-Label Learning MIML task: To learn a function from a given data set , where instances , and labels .

is a set of , is a set of ,

Ξ - the instance space Ψ - the set of class labels ni - the number of instances in Xi li - the number of labels in Yi

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Solving MIML by Degeneration MIMLBoost (an illustration of Solution 1)

May suffer from information loss

MIBoosting

MIL

during degeneration process!

MIML

SISL

MLSVM

Category-wise decomposition

MLL

Representation Transformation

MIMLSVM (an illustration of Solution 2) unambiguous

ambiguous

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Advanced Topics  Solving MIML by regularization  No access to original data objects  Learning single-label examples involving complicated highlevel concepts Zhou et al., MIML: A Framework for Learning with Ambiguous Objects. CORR abs/0808.3231, 2008

 Large margin MIML algorithm M3MIML

[Zhang & Zhou, ICDM’08]

 MIML for image annotation

[Zha et al., CVPR’08]

 MIML metric learning

[Jin et al., CVPR’09]

 …… G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Outline Introduction  Overview of existing techniques  Advanced topics  The Mulan open-source software 

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

What It Is 

Mulan  



Open source software => scientific progress 



"better reproducibility of experimental results, quicker detection of errors, innovative applications, and faster adoption of machine learning methods in other disciplines and in industry" JMLR OSS

Built on top of Weka 



An open source software for multi-label learning Current version: 1.0.1

Well established code and user base

Programming language: Java

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Data Format 

ARFF file

@relation MultiLabelExample @attribute @attribute @attribute @attribute @attribute @attribute @attribute @attribute

feature1 numeric feature2 numeric feature3 numeric label1 {0, 1} label2 {0, 1} label3 {0, 1} label4 {0, 1} label5 {0, 1}

@data 2.3,5.6,1.4,0,1,1,0,0



XML file



Datasets available at: http://mlkd.csd.auth.gr/multilabel.html

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Hierarchies of Labels

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Validity Checks All labels specified in the XML file must be also defined in the ARFF file with same name  Label names must be unique  Each ARFF label attribute must be nominal with binary values {0, 1}  Data should be consistent with the hierarchy 



If any child label appears at an example, then all parent labels must also appear

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

mulan.core 

Exception handling infrastructure   



Util 



WekaException MulanException MulanRuntimeException Utilities

MulanJavadoc 

Documentation support

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

mulan.data (1/2) MultiLabelInstances  LabelsMetaData and LabelsMetaDataImpl  LabelNode and LabelNodeImpl  LabelsBuilder 





Creates a LabelsMetaData instance from an XML file

LabelSet 

Labelset representation class

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

mulan.data (2/2) 

Converters from LibSVM and CLUS formats  



ConverterLibSVM ConverterCLUS

Statistics   

Number of numeric/nominal attributes, labels Cardinality, density, distinct labelsets Phi-correlation matrix, co-occurrence matrix

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

MultiLabelInstances 

MultiLabelInstances  

         

Instances dataSet LabelsMetaData labelsMetaData MultiLabelInstances(String arffFile, String xmlFile) MultiLabelInstances(String arffFile, int numLabels) MultiLabelInstances(Instances d, LabelsMetaData m) getDataset(): Instances getLabelsMetaData(): LabelsMetaData getLabelIndices(): int[] getFeatureIndices(): int[] getNumLabels(): int clone(): MultiLabelInstances reintegrateModifiedDataSet(Instances d): MultiLabelInstances

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

LabelsMetaData 

LabelsMetaDataImpl  

       

Map allLabelNodes Set rootLabelNodes; getRootLabels(): Set getLabelNames(): Set getLabelNode(String labelName): LabelNode getNumLabels(): int containsLabel(String labelName): boolean isHierarchy(): boolean addRootNode(LabelNode rootNode) removeLabelNode(String labelName): int

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

LabelNode 

LabelNodeImpl   

        

Set childrenNodes String name LabelNode parentNode getChildren(): Set getChildredLabels(): Set getDescendantLabels(): Set getName(): String getParent(): LabelNode hasChildren(): boolean hasParent(): boolean addChildNode(LabelNode node): boolean removeChildNode(LabelNode node): boolean

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

mulan.transformation 

Package mulan.transformation 



Common for learning and feature selection   



Includes data transformation approaches BinaryRelevance LabelPowerset mulan.transformation.multiclass

Simple transformations that support complex ones 

RemoveAllLabels

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

mulan.transformation.multiclass 

MultiClassTransformation 



MultiClassTransformationBase  

     

transformInstances(MultiLabelInstances d): Instances transformInstances(MultiLabelInstances d): Instances transformInstance(Instance i): List

Copy CopyWeight Ignore SelectRandom SelectBasedOnFrequency SelectionType

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

mulan.classifier (1/3) MultiLabelLearner  MultiLabelLearnerBase  MultiLabelOutput 

mulan.classifier.transformation  mulan.classifier.meta  mulan.classifier.lazy  mulan.classifier.neural 

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

mulan.classifier (2/3) 

MultiLabelLearner   



build(MultiLabelInstances instances) makePrediction(Instance instance): MultiLabelOutput makeCopy: MultiLabelLearner

MultiLabelLearnerBase         

int numLabels int[] labelIndices, featureIndices boolean isDebug build(MultiLabelInstances instances) buildInternal(MultiLabelInstances instances) makeCopy debug(String msg) setDebug(boolean debug) getDebug: boolean

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

mulan.classifier (3/3) 

MultiLabelOutput   

boolean[] bipartition double[] confidences int[] ranking

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

mulan.classifier.transformation 

TransformationBasedMultiLabelLearner    

 

BinaryRelevance CalibratedLabelRanking 

   

Classifier baseClassifier TransformationBasedMultiLabelLearner TransformationBasedMultiLabelLearner(Classifier base) getBaseClassifier: Classifier

boolean useStandardVoting

LabelPowerset PPT MultiLabelStacking MultiClassLearner

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

mulan.classifier.meta 

MultiLabelMetaLearner    

  

MultiLabelLearner baseLearner MultiLabelMetaLearner MultiLabelMetaLearner(MultiLabelLearner base) getBaseLearner: MultiLabelLearner

RAkEL HOMER HMC

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Lazy and Neural Methods 

Package mulan.classifier.lazy 

MultiLabelKNN 



Abstract base class for kNN based methods

Implemented algorithms 

MLkNN [Matlab version at http://lamda.nju.edu.cn/datacode/MLkNN.htm]

 



BRkNN IBLR_ML [Cheng & Hullermeier, MLJ09]

Package mulan.classifier.neural 

BPMLL [Matlab version at http://lamda.nju.edu.cn/datacode/BPMLL.htm]



Package mulan.classifier.neural.model 

Support classes

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Evaluation (1/6) 

Evaluator    

Evaluator() Evaluator(int seed) evaluate(MultiLabelLearner learner, MultiLabelInstances test): Evaluation crossValidate(MultiLabelLearner learner, MultiLabelInstances test, int numFolds): Evaluation

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Evaluation (2/6) 

Evaluation       

getExampleBasedMeasures(): ExampleBasedMeasures getLabelBasedMeasures(): LabelBasedMeasures getConfidenceLabelBasedMeasures(): ConfidenceLabelBasedMeasures getRankingBasedMeasures(): RankingBasedMeasures getHierarchicalMeasures(): HierarchicalMeasures toString() toCSV()

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Evaluation (3/6) 

ExampleBasedMeasures 

       

ExampleBasedMeasures(MultiLabelOutput[] output, boolean[][] trueLabels, double forgivenessRate) ExampleBasedMeasures(MultiLabelOutput[] output, boolean[][] trueLabels) ExampleBasedMeasures(ExampleBasedMeasures[] array) getSubsetAccuracy(): double getAccuracy(): double getHammingLoss(): double getPrecision(): double getRecall(): double getFMeasure(): double

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Evaluation (4/6) 

LabelBasedMeasures          



LabelBasedMeasures(MultiLabelOutput[] output, boolean[][] trueLabels) LabelBasedMeasures(LabelBasedMeasures[] array) getLabelAccuracy(int label): double getLabelPrecision(int label): double getLabelRecall(int label): double getLabelFMeasure(int label): double getAccuracy(Averaging type): double getPrecision(Averaging type): double getRecall(Averaging type): double getFMeasure(Averaging type): double

Averaging 

MACRO, MICRO

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Evaluation (5/6) 

RankingBasedMeasures      

RankingBasedMeasures(MultiLabelOutput[] output, boolean[][] trueLabels) RankingBasedMeasures(ExampleBasedMeasures[] array) getAvgPrecision(): double getOneError(): double getRankingLoss(): double getCoverage(): double

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Evaluation (6/6) 

ConfidenceLabelBasedMeasures    



ConfidenceLabelBasedMeasures(MultiLabelOutput[] output, boolean[][] trueLabels) ConfidenceLabelBasedMeasures( ConfidenceLabelBasedMeasures[] array) getLabelAUC(int label): double getAUC(Averaging type): double

HiearchicalMeasures 

 

HierarchicalMeasures(MultiLabelOutput[] output, boolean[][] trueLabels, LabelsMetaData metaData) HiearchicalMeasures(HierachicalMeasures[] array) getHierarchicalLoss: double

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Examples 

Package mulan.examples     

TrainTestExperiment CrossValidationExperiment EstimationOfStatistics GettingPredictionsOnTestSet AttributeSelectionTest

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

An Example MultiLabelInstances train, test; train = new MultiLabelInstances("yeast-train.arff", "yeast.xml"); test = new MultiLabelInstances("yeast-test.arff", "yeast.xml"); Classifier base = new NaiveBayes(); BinaryRelevance br = new BinaryRelevance(base); br.build(train); Evaluator eval = new Evaluator(); Evaluation results = eval.evaluate(br, test); System.out.println(results.toString()); G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Mulan Credits 

Main co-developers   

Robert Friberg Lefteris Spyromitros Jozef Vilcek



Code contributors      

Thank you!



Stavros Bakirtzoglou Weiwei Cheng Ioannis Katakis Sang-Hyeun Park Elise Rairat George Saridis George Traianos

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09

Bibliography 

An online multi-label learning bibliography is maintained at  



http://www.citeulike.org/group/7105/tag/multilabel Currently includes more than 90 articles

You can…    

Grab BibTeX and RIS records Subscribe to the corresponding RSS feed Follow links to the papers' full pdf (may require access to digital libraries) Export the complete bibliography for BibTeX or EndNote use (requires CiteULike account)

G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09