Sep 7, 2009 - An article concerning the Antikythera. Mechanism can be categorized to. â« Science/Technology, History/Culture. â¡ Reuters Collection Version ...
Tutorials Monday, September 7, 2009: Learning from Multi-label Data G. Tsoumakas (Aristotle University of Thessaloniki), M. L. Zhang (Hohai University), Z.-H. Zhou (Nanjing University) Language and Document Analysis: Motivating Latent Variable Models W. Buntine (Helsinki Institute of IT, NICTA) Methods for Large Network Analysis V. Batagelj (University of Ljubljana)
Friday, September 11,2009: Evaluation in Machine Learning P. Cunningham (University College Dublin) Transfer Learning for Reinforcement Learning Domains A. Lazaric (INRIA Lille), M. Taylor (University of Southern California) Graphical Models T. Caetano (NICTA)
Tutorial Chair: C. Archambeau (University College London)
Tutorial at ECML/PKDD’09 Bled, Slovenia 7 September, 2009
Learning from Multi-Label Data Grigorios Tsoumakas
Min-Ling Zhang
Zhi-Hua Zhou
Department of Informatics,
College of Computer and
Aristotle University of
Information Engineering,
Thessaloniki, Greece
Hohai University, China
LAMDA Group, National Key Laboratory for Novel Software Technology, Nanjing University, China
Outline Introduction Overview of existing techniques Advanced topics The Mulan open-source software
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Outline
Introduction
What is multi-label learning Applications and datasets Multi-label evaluation metrics
Overview of existing techniques Advanced topics The Mulan open-source software
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Outline
Introduction
What is multi-label learning Applications and datasets Multi-label evaluation metrics
Overview of existing techniques Advanced topics The Mulan open-source software
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
The Larger Picture Data with multiple target variables What can the type of targets be?
Numerical
Ecological modeling and environmental applications Industrial applications (automobile)
Categorical targets
Binary targets Multi-class targets
Multi-Label Data
Ordinal
Combination of types
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Notation for Multi-Label Data
A d-dimensional input space,
Numeric or nominal features
A set of q output labels
A multi-label dataset of m training examples
where
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Multi-Label Learning Tasks (1/3)
Classification
Produce a bipartition of the set of labels into a relevant (positive) and an irrelevant (negative) set
For example, given
An unobserved instance x
produce a bipartition
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Multi-Label Learning Tasks (2/3)
Ranking
Produce a ranking (total strict order) of all labels according to relevance to the given instance
For example, given
An unobserved instance x
produce a ranking
where ranking
denotes the position of label
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
in the
Multi-Label Learning Tasks (3/3)
Classification and Ranking
Produce both a bipartition and a ranking of all labels Should be consistent:
For example, given
An unobserved instance x
produce a bipartition and a ranking
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Outline
Introduction
What is multi-label learning Applications and datasets Multi-label evaluation metrics
Overview of existing techniques Advanced topics The Mulan open-source software
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Applications and Datasets
(Semi) automated annotation of large object collections for information retrieval
Text/web, image, video, audio, biology
Tag suggestion in Web 2.0 systems Query categorization Drug discovery Direct marketing Medical diagnosis
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Text (1/4)
News
An article concerning the Antikythera Mechanism can be categorized to Science/Technology, History/Culture
Reuters Collection Version I [Lewis et al., JMLR04]
804414 newswire stories indexed by Reuters Ltd 103 topics organized in a hierarchy, 2.6 on average 350 industries (2-level hierarchy post-produced) 296 geographic codes
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Text (2/4)
Research articles
A research paper on an ensemble method for multilabel classification can be assigned to the areas Ensemble methods, Structured output prediction
Collections
OHSUMED [Hersh et al., SIGIR94]
Medical Subject Headings (MeSH) ontology
ACM-DL [Veloso et al., ECMLPKDD07]
ACM Computing Classification System (1st:11, 2nd:81 labels)
81251 Digital Library articles
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Text (3/4)
EUR-Lex collection [Loza Mencia & Furnkranz, ECMLPKDD08]
19596 legal documents of the European Union (EU) Hierarchy of 3993 EUROVOC labels, 5.4 on average
EUROVOC is a multilingual thesaurus for EU documents
201 subject matters, 2.2 on average 412 directory codes, 1.3 on average
WIPO-alpha collection [Fall et al., SIGIRForum03]
World Intellectual Patents Organization (WIPO) 75000 patents 4 level hierarchy of ~5000 categories
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Text (4/4)
Aviation safety reports (tmc2007)
Competition of SIAM Text Mining 2007 Workshop 28596 NASA aviation safety reports in free text form 22 problem types that appear during flights 2.2 annotations on average
Free clinical text in radiology reports (medical)
Computational Medicine Center's 2007 Medical NLP Challenge [Pestian et al., ACL07w] 978 reports, 45 labels, 1.3 labels on average
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Web
Email
Enron dataset
UC Berkeley Enron Email Analysis Project 1702 examples, 53 labels, 3.4 on average 2 level hierarchy
Web pages
Hierarchical classification schemes
Open Directory Project Yahoo! Directory [Ueda & Saito, NIPS02]
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Image and Video
Application
Automated annotation for retrieval
Datasets
Scene [Boutell et al., PR04]
2407 images, 6 labels, 1.1 on average
Mediamill [Snoek et al., MM06]
85 hours of video data containing Arabic, Chinese, and US broadcast news sources, recorded during November 2004 43907 frames, 101 labels, 4.4 on average
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Image and Video
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Audio (1/2)
Music and meta-data db of the HiFind company
450000 categorized tracks since 1999 935 labels from 16 categories (340 genre labels)
Annotation
Style, genre, musical setup, main instruments, variant, dynamics, tempo, era/epoch, metric, country, situation, mood, character, language, rhythm, popularity 25 annotators (musicians, music journalists) + supervisor Software-based annotation takes 8 min per track on average 37 annotation per track on average
A subset was used in [Pachet & Roy, TASLP09]
32,978 tracks, 632 labels, 98 acoustic features
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Audio (2/2)
Emotional categorization of music
Relevant works
Dataset emotions in [Trohidis et al., ISMIR08]
593 tracks, 6 labels, 1.9 on average {happy, calm, sad, angry, quiet, amazed}
Some applications
[Li & Ogihara, ISMIR03; TMM06; Wieczorkowska et al., IIPWM06]
Song selection in mobile devices, music therapy Music recommendation systems, TV and radio programs
Acoustic data [Streich & Buhmann, ECMLPKDD08]
Construction of hearing aid instruments Labels: Noise, Speech, Music
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Biology (1/2)
Applications
Automated annotation of proteins with functions
Annotation hierarchies
The Functional Catalogue (FunCat)
A tree-shaped hierarchy of annotations for the functional description of proteins from several living organisms
The Gene Ontology (GO)
A directed acyclic graph of annotations for gene products
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Biology (2/2)
Datasets
Yeast [Elisseeff & Weston, NIPS02]
Phenotype (yeast) [Clare & King, ECMLPKDD01]
2417 examples, 14 labels (1st FunCat level), 4.2 on average 1461 examples, 4 FunCat levels
12 yeast datasets [Clare, PhdThesis03; Vens et al., MLJ08]
Gene expression, homology, phenotype, secondary structure FunCat, 6 levels, 492 labels, 8.8 on average GO, 14 levels, 3997 labels, 35.0 on average
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Tag Suggestion in Web 2.0 Systems
Benefits
Input
Feature representation of objects (content)
Challenges
Richer descriptions of objects Folksonomy alignment
Huge number of tags Fast online predictions
Related work
[Song et al., CIKM08; Katakis et al., ECMLPKDD08w]
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Query Categorization
Benefits
Integrate query-specific rich content from vertical search results (e.g. from a database) Identify relevant sponsored ads
Place ads on categories vs. keywords
Example: Yahoo! [Tang et al., WWW09]
6433 categories organized in an 8-level taxonomy 1.5 million manually labeled unique queries Labels per query range from 1 to 26
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
3+ labels 3% 2 labels 16%
1 label 81%
Drug Discovery
MDL Drug Data Report v. 2001
119110 chemical structures of drugs 701 biological activities (e.g. calcium channel blocker, neutral endopeptidase inhibitor, cardiotonic, diuretic)
Example: Hypertension [Kawai & Takahashi, CBIJ09]
Two major activities of hypertension drugs
Angiotensin converting enzyme inhibitor Neutral endopeptidase inhibitor
Compounds producing both these 2 specific activities found to be an effective new type of drug
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Other
Direct marketing [Zhang et al., JMLR06]
A direct marketing company sends offers to clients for products of categories they potentially are interested Historical data of clients and product categories that they got interested (multiple categories) Data from Direct Marketing Association
Classification and Ranking
19 categories Send only relevant products, send top X products
Medical diagnosis
A patient may be suffering from multiple diseases at the same time, e.g. {obesity, hypertension}
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Multi-Label Statistics
Label cardinality, c
Label density
Average number of labels per example Label cardinality divided by the total number of labels
Distinct labelsets
Number of different label combinations
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Outline
Introduction
What is multi-label learning Applications and datasets Multi-label evaluation metrics
Overview of existing techniques Advanced topics The Mulan open-source software
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Evaluation Metrics: A Taxonomy
Based on calculation [Tsoumakas & Vlahavas, ECMLPKDD07]
Example-based are calculated separately for each test example and averaged across the test set Label-based are calculated separately for each label and then averaged across all labels
Based on the output of the learner
Binary prediction for each label Ranking of the labels (example-based) Probability or score for each label
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Example-Based Binary (1/2)
Notation
Subset accuracy [Zhu et al., SIGIR05; Ghamrawi & McCallum, CIKM05]
, where I(true)=1 and I(false)=0
Hamming loss [Schapire & Singer, MLJ00]
Pi is the set of predicted labels for instance xi Yi is the set of actual labels for instance xi
, where is the XOR operation Average binary classification error
Accuracy [Godbole & Sarawagi, PAKDD04]
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Example-Based Binary (2/2)
Information retrieval view [Godbole & Sarawagi, PAKDD04]
Precision
Recall
Harmonic mean
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Example-Based Ranking (1/2)
One-error
Evaluates how many times the top-ranked label is not in the set of proper labels of the example
Coverage
Evaluates how many steps are needed, on average, to go down the label list to cover all proper labels of the example
[Schapire & Singer, MLJ00] G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Example-Based Ranking (2/2)
Ranking loss
Evaluates the average fraction of label pairs that are mis-ordered for the instance
Average precision
Evaluates the average fraction of labels ranked above a proper label which are also proper labels [Schapire & Singer, MLJ00]
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Label-Based Binary
Let B(TPj, FPj, TNj, FNj)
Contingency Table for λj Learner Output
Actual Value POS
NEG
POS
TPj
FPj
NEG
FNj
TNj
be a binary evaluation measure calculated based on the above contingency table E.g. accuracy = (TPj + TNj)/(TPj + FPj + TNj + FNj)
Macro-averaging
Ordinary averaging of a binary measure
Micro-averaging
Labels as different instances of the same global label
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Probabilities or Scores per Label
Bipartition can be obtained via thresholding
A ranking can be obtained after solving ties
Example-based ranking measures
A vertical ranking can be computed
Example/label-based binary measures
Ranking measures calculated vertically (label-based)
Threshold independent label-based evaluation
Only for probabilities! Area under a ROC or PR curve
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Which Measure Should I Use?
Computer aided annotation by humans
Automated annotation for retrieval
Macro-averaged F-measure
Direct marketing
E.g. tag suggestion in web 2.0 systems Example-based ranking measures
Example-based precision and ranking
Query categorization
Example-based precision
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Ideas
Outline Introduction Overview of existing techniques
Problem transformation methods Algorithm adaptation methods From ranking to classification
Advanced topics The Mulan open-source software
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Outline Introduction Overview of existing techniques
Problem transformation methods Algorithm adaptation methods From ranking to classification
Advanced topics The Mulan open-source software
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
A Categorization
[Tsoumakas & Katakis, IJDWM07]
Problem transformation methods
They transform the learning task into one or more single-label classification tasks
They are algorithm independent
Some could be used for feature selection as well
Algorithm adaptation methods
They extend specific learning algorithms in order to handle multi-label data directly
Boosting, generative (Bayesian), SVM, decision tree, neural network, lazy, ……
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Problem Transformation Methods Binary relevance Ranking via single-label learning Pairwise methods
Methods that combine labels
Ranking by pairwise comparison Calibrated label ranking Label Powerset, Pruned Sets
Ensemble methods
RAkEL, EPS
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Example Multi-Label Dataset
= {λ1, λ2, λ3, λ4} Example Features 1 2 3 4
r x1 r x2 r x3 r x4
Label set {λ1, λ4} {λ3, λ4} {λ1} {λ2, λ3, λ4}
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Binary Relevance (BR)
How it works
Limitation
Learns one binary classifier for each label Outputs the union of their predictions Can do ranking if classifier outputs scores
Ex #
Label set
1
{λ1, λ4}
2
{λ3, λ4}
3
{λ1}
4
{λ2, λ3, λ4}
Does not consider label relationships
Complexity O(qm) Ex #
λ1
Ex #
λ2
Ex #
λ3
Ex #
λ4
1
true
1
false
1
false
1
true
2
false
2
false
2
true
2
true
3
true
3
false
3
false
3
false
4
false
4
true
4
true
4
true
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Ranking via Single-Label Learning
Basic concept
Transform the multi-label dataset to a single-label multiclass dataset with the labels as classes A single-label classifier that outputs a score (e.g. probability) for each class can produce a ranking
Transformations [Boutell et al, PR04; Chen et al., ICDM07]
ignore select-max, select-min, select-random copy, copy-weight (entropy)
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Ignore
Simply ignore all multi-label examples
Major information loss!
Ex #
Label set
1
{λ1, λ4}
2
{λ3, λ4}
3
{λ1}
4
{λ2, λ3, λ4}
Ex #
Label set
3
{λ1}
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Select Min, Max and Random
Select one of the labels
Most frequent label (Max) Less frequent label (Min) Random selection
Ex #
Label set
1
{λ1, λ4}
2
{λ3, λ4}
3
{λ1}
4
{λ2, λ3, λ4}
Information loss! Ex # Label
Ex # Label
Ex #
λ4
1
λ4
1
λ1
1
λ4
2
λ4
2
λ3
2
λ3
3
λ1
3
λ1
3
λ1
4
λ4
4
λ2
4
λ2
Max
Min
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Random
Copy and Copy-Weight (Entropy)
Replace each example ( xi , Yi ) with Yi examples ( xi , λ j ) , one for each λ j ∈ Yi (xi,λj)
Copy-weight requires learners that take the weights of examples into account
Weights examples by
1 Yi
No information loss Increased examples O(mc)
Ex # Label Weight 1a
λ1
0,50
1b
λ4
0,50
2a
λ3
0,50
Ex #
Label set
2b
λ4
0,50
1
{λ1, λ4}
3
λ1
1,00
2
{λ3, λ4}
4a
λ2
0,33
3
{λ1}
4b
λ3
0,33
4
{λ2, λ3, λ4}
4c
λ4
0,33
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Ranking by Pairwise Comparison (1/4)
How it works [Hullermeier et al., AIJ08]
It learns q(q-1)/2 binary models, one for each pair of labels (λ i , λ j ), 1 ≤ i < j ≤ q Each model is trained based on examples that are annotated by at least one of the labels, but not both It learns to separate the corresponding labels Given a new instance, all models are invoked and a ranking is obtained by counting the votes received by each label
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Ranking by Pairwise Comparison (2/4)
Ex # 1vs2
Ex # 1vs3
Ex #
Label set
1
{λ1, λ4}
2
{λ3, λ4}
3
{λ1}
4
{λ2, λ3, λ4}
Ex # 1vs4
1
true
1
true
2
false
3
true
2
false
3
true
4
false
3
true
4
false
4
false
Ex # 2vs3 2
false
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Ex # 2vs4 1
false
2
false
Ex # 3vs4 1
false
Ranking by Pairwise Comparison (3/4) new instance x' 1vs2 1vs3 1vs4 2vs3 2vs4 3vs4 1
3
1
3
2
3
Label Votes λ1
2
λ2
1
λ3
3
λ4
0
λ3 Ranking:
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
λ1 λ2 λ4
Ranking by Pairwise Comparison (4/4)
Time complexity
Training: O(mqc), where c is the label cardinality
Each example x appears in |Px |(q-|Px |) < |Px |q datasets
Testing: Needs to query q2 binary models
Space complexity
Needs to maintain q2 binary models in memory Pairwise decision tree/rule learning models might be simpler than one-vs-rest Perceptrons/SVMs store a constant number of parameters
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Calibrated Label Ranking (1/4)
How it works [Furnkranz et al., MLJ08]
Extends ranking by pairwise comparison by introducing an additional virtual label λV, with the purpose of separating positive from negative labels Pairwise models that include the virtual label correspond to the models of binary relevance
All examples are used When a label is true, the virtual label is considered false When a label is false, the virtual label is considered true
The final ranking includes the virtual label, which acts as the split point between positive/negative labels
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Calibrated Label Ranking (2/4) Ex # 1vsV
Ex # 2vsV
Ex #
Label set
Ex # 3vsV
Ex # 4vsV
1
true
1
false
1
{λ1, λ4}
1
false
1
true
2
false
2
false
2
{λ3, λ4}
2
true
2
true
3
true
3
false
3
{λ1}
3
false
3
false
4
false
4
true
4
{λ2, λ3, λ4}
4
true
4
true
Ex # 1vs2
Ex # 1vs3
Ex # 1vs4
1
true
1
true
2
false
3
true
2
false
3
true
4
false
3
true
4
false
4
false
Ex # 2vs3 2
false
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Ex # 2vs4 1
false
2
false
Ex # 3vs4 1
false
Calibrated Label Ranking (3/4) new instance x'
1vs2 1vs3 1vs4 2vs3 2vs4 3vs4 1
1
1
2
2
1vsV 2vsV 3vsV 4vsV
4
1
V
V
λ1
Label Votes λ1
4
λV
λ2
2
Ranking: λ2
λ3
0
λ4
λ4
1
λ3
λV
3
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
V
Calibrated Label Ranking (4/4)
Benefits
Improved ranking performance Classification and ranking (consistent)
Limitations
Space complexity (as in RPC)
A solution for perceptrons [Loza Mencıa & Furnkranz, ECMLPKDD08]
Querying q2 + q models at runtime
QWeighted algorithm [Loza Mencia et al., ESANN09]
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Label Powerset (LP) (1/5)
How it works
Each different set of labels in a multi-label training set becomes a different class in a new single-label classification task Given a new instance, the single-label classifier of LP outputs the most probable class (a set of labels) Ex #
Label set
Ex #
Label
1
{λ1, λ4}
1
1001
2
{λ3, λ4}
2
0011
3
{λ1}
3
1000
4
{λ2, λ3, λ4}
4
0111
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Label Powerset (LP) (2/5)
Ranking
It is possible if a classifier that outputs scores (e.g. probabilities) is used [Read, NZCSRS08] Are the bipartition and ranking always consistent? c
P(c|x)
λ1 λ2 λ3 λ4
1001
0,7 0,1
1
0
0
1
0011
0,2 0,3
0
0
1
1
1000
0,1 0,4
1
0
0
0
0111
0 0,2
0
1
1
1
ΣP(c|x)λj 0,8 0,5 0,0 0,2 0,2 0,5 0,9 0,6
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Label Powerset (LP) (3/5)
Complexity
Depends on the number of distinct labelsets that exist in the training set It is upper bounded by min(m,2q) It is usually much smaller, but still larger than q
Limitations
High complexity Limited training examples for many classes Cannot predict unseen labelsets
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Label Powerset (LP) (4/5) Labelsets Dataset emotions enron hifind
m
q
Bound Actual Diversity
593
6
64
27
0,42
1702
53
1702
753
0,44
32971 632 32971 32734
0,99
mediamill 43907 101 43907 medical scene tmc2007 yeast
6555
0,15
978
45
978
94
0,10
2407
6
64
15
0,23
22 28596
1341
0,05
198
0,08
28596 2417
14
2417
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
1 4104 2 895 3 384 4 215 5 156 6 111 7 85 8 77 9 48 10 39
Label Powerset (LP) (5/5) 10000
Number of Classes (Combinations)
mediamill 1000
100
10
1 1
10
100
Number of Appearences
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
1000
10000
Pruned Sets (1/2)
How it works [Read, NZCSRS08; Read et al., ICDM08]
Follows the transformation of LP, but it also … Prunes examples whose labelsets (classes) occur less times than a small user-defined threshold p (e.g. 2 or 3)
Deals with the large number of infrequent classes
Re-introduces pruned examples along with subsets of their labelsets that do exist more times than p
Strategy A: Rank subsets by size/number of examples and keep the top b of those Strategy B: Keep all subsets of size greater than b
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Pruned Sets (2/2)
p=3
Strategy A, b=2
Labelset
Count
Subsets
λ1
16
λ2, λ3
2
12
λ2
14
λ1
1
16
λ2, λ3
12
λ2
1
14
λ1, λ4
8
λ3, λ4
7
λ1, λ2, λ3
2
Size Count
Strategy B, b=1
Subsets
Size
λ2, λ3
2
λ1
1
λ2
1
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Random k-Labelsets (1/3)
How it works [Tsoumakas & Vlahavas, ECMLPKDD07]
Randomly break a large set of labels into a number (n) of subsets of small size (k), called k-labelsets For each of them train a multi-label classifier using the LP method Given a new instance, query models and average their decisions per label
Thresholding to obtain final model
Benefits
Computationally simpler sub-problems More balanced training sets Can predict unseen labelsets
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Random k-Labelsets (2/3) model 3-labelsets h1 {λ1, λ2, λ6} h2 {λ2, λ3, λ4} h3 {λ3, λ5, λ6} h4 {λ2, λ4, λ5} h5 {λ1, λ4, λ5} h6 {λ1, λ2, λ3} h7 {λ1, λ4, λ6} average votes final prediction
predictions λ1 λ2 λ3 λ4 λ5 1 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 1 3/4 1/4 2/3 1/4 1/3 1 0 1 0 0
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
λ6 1 1 0 2/3 1
Random k-Labelsets (3/3)
Comments
The mean number of votes per label is nk/q The large it is, the higher the effectiveness It characterizes RAkEL as an ensemble method
How to set parameters k and n?
k should be small enough to deal with LP's problems n should be large enough to obtain more votes Proposed default parameters
k=3, n=2q (6 votes per label)
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Ensembles of Pruned Sets
How it works [Read et al., ICDM08]
Constructs n pruned sets models by sampling the training set (e.g. 63%) Given a new instance, queries models and averages their decisions (each decision concerns all labels)
A ranking is obtained Thresholding is used to obtain bipartition
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Outline Introduction Overview of existing techniques
Problem transformation methods Algorithm adaptation methods From ranking to classification
Advanced topics The Mulan open-source software
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
A Categorization
[Tsoumakas & Katakis, IJDWM07]
Problem transformation methods
They transform the learning task into one or more single-label classification tasks
They are algorithm independent
Some could be used for feature selection as well
Algorithm adaptation methods
They extend specific learning algorithms in order to handle multi-label data directly
Boosting, generative (Bayesian), SVM, decision tree, neural network, lazy, ……
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Algorithm Adaptation Methods
Boosting
Generative (Bayesian)
BP-MLL [Zhang & Zhou, TKDE06]
Lazy (kNN)
Multi-label C4.5 [Clare & King, PKDD01]
Neural network
Rank-SVM [Elisseeff & Weston, NIPS02]
Decision tree
[McCallum, AAAI99w], [Ueda & Saito, NIPS03]
SVM
AdaBoost.MH [Schapire & Singer, MLJ00]
ML-kNN [Zhang & Zhou, PRJ07]
......
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
AdaBoost.MH Description: The core of a successful multi-label text categorization system BoosTexter [Schapire & Singer, MLJ00]
Basic Strategy: Map the original multi-label learning problem into a binary learning problem, which is then solved by traditional AdaBoost algorithm [Freund & Schapire, JCSS97]
Example transformation: Transform each multi-label training example
into
binary labeled examples: concatenation of xi and each label y
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
AdaBoost.MH (Cont’) Training procedure: Classical AdaBoost is employed to learn from the transformed binary-labeled examples iteratively
Weak hypotheses (base learners): Has the basic form of decision stump (one-level decision tree) E.g.: for text categorization task, each possible term w (e.g. bigram) specifies as follows: a weak hypothesis where x is a text document, and c0y and c1y are predicted outputs
In each boosting round, the choices of weak hypothesis as well as its combination weight is optimized towards the minimization of empirical hamming loss G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Generative (Bayesian) Approach Description: Modeling the generative procedure of multi-label texts [McCallum, AAAI99w] [Ueda & Saito, NIPS03]
Basic Assumption: The word distribution given a set of labels is determined by a mixture (linear combination) of word distributions, one for each single label Settings: word vocabulary word distribution given a single label word distribution given a set of labels q-dimensional mixture weight
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
for Y
Generative (Bayesian) Approach (Cont’) MAP (Maximum A Posteriori) Principle: Given a test document x*, its associated label set Y* is determined as: [applying Bayesian rule] [assuming word independence] Prior probability
Mixture of word distributions
Directly estimated from training set by frequency counting The parameters
and
are learned by EM-style procedure
Here, the two generative approaches are specific to text applications instead of general-purpose multi-label learning methods G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Rank-SVM Description: A maximum margin approach for multi-label learning, implemented with kernel trick to incorporate non-linearity [Elisseeff & Weston, NIPS02]
Basic Strategy: Assume one classifier for each individual label, and define “multi-label margin” on the whole training set, which is then minimized under QP (quadratic programming) framework Classification system: q linear classifiers
, each with weight wk and bias bk:
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Rank-SVM (Cont’) Margin definition: margin for a multi-label example labels in should be ranked higher than labels not in
margin for the training set S:
QP formulation (ideal case): Solved by introducing slack variables and then optimized in its dual form (with incorporation of kernel trick)
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Multi-Label C4.5 Description: An extension of popular C4.5 decision tree to deal with multi-label data [Clare & King, PKDD01]
Basic Strategy: Define “multi-label entropy” over a set of multi-label examples, based on which the information gain of selecting a splitting attribute is calculated, and then a decision tree is constructed recursively in the same way of C4.5 Multi-label entropy: Given a set of multi-label examples
, let p(y) denote the probability that an example in S has label y, then the multi-label entropy is:
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
BP-MLL Description: An extension of popular BP neural networks to deal with multi-label data [Zhang & Zhou, TKDE06]
Basic Strategy: Define a novel global error function capturing the characteristics of multilabel learning, i.e. labels belonging to an example should be ranked higher than those not belonging to that example Network architecture: Single-hidden-layer feed forward neural network Adjacent layers are fully connected Each input unit corresponds to a dimension of input space Each output unit corresponds to an individual label G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
BP-MLL (Cont’) Global error function: Given multi-label training set
, the global training error E
on S is defined as:
Ei: the error of the network on (xi,Yi);
cij: the actual network output on xi on the j-th label
Approximately optimizing the ranking loss criterion Lead the system to output larger values for the labels belonging to the test instance and smaller values for the labels not belonging to it
Parameter optimization: gradient descent + error back-propagation strategy G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
ML-kNN Description: An extension of popular kNN to deal with multi-label data [Zhang & Zhou, PRJ07]
Basic Strategy: Based on statistical information derived from the neighboring examples (i.e. membership counting statistic), the MAP principle is utilized to determine the label set of an unseen example Settings: the k nearest neighbors of x identified in the training set q-dimensional membership counting vector, where the l-th dimension counts the number of examples in N(x) having the l-th label the event that an example having (not) the l-th label the event that there are exactly j examples in N(x) having the l-th label G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
ML-kNN (Cont’) Procedure: Given a test example x, its associated label set Y is determined as: Identify its k nearest neighbors in the training set, i.e. N(x) Compute its membership counting vector
based on N(x)
Determine the label set using MAP principle based on
Probabilities needed: directly estimated from the training set based on frequency counting
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Outline Introduction Overview of existing techniques
Problem transformation methods Algorithm adaptation methods From ranking to classification
Advanced topics The Mulan open-source software
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
From Ranking to Classification
BR can output numerical scores for each label
E.g. perceptrons, SVMs, decision trees, kNN, Bayes We use an intuitive threshold to go from these scores to 0/1 decisions (e.g. 0 in perceptrons, SVMs, 0.5 in probabilistic/confidence outputs)
The same applies to other problem transformation and algorithm adaptation methods Are there general approaches to deliver classification from a ranking?
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
From Ranking to Classification
Thresholding strategies
RCut, PCut, SCut, RTCut, SCutFBR [Yang, SIGIR01] A study of SCutFBR [Fan and Lin, TechRep07]
Learning the number of labels
Based on the ranking [Elisseeff and Weston, NIPS02] Based on the content [Tang et. al, WWW09]
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Thresholding Strategies: RCut
How it works
Given a document, it sorts the labels by score and selects the top k labels, where k inside [1,q]
How to set the parameter k?
It can be specified by the user
Typically it is set to the label cardinality of the training set
It can be globally tuned using a validation set
Comments
What if the number of labels per example varies? It does not perform well in practice [Tang et. al, WWW09]
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Thresholding Strategies: PCut
How it works
λ j sort test instances based on score and For each label λ, λ j to the top kj= k j =kkP (λ j ) test instances P(λj) assign λj P(λj) P(λ j ) is the prior probability of a document belonging λj to λj(estimated on the training set) k is a proportionality constant to trade-off false positives and false negatives
It can be globally tuned as in RCut
Comments
It requires the prediction scores for all test instances, so it is not suitable for online decisions
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Thresholding Strategies: SCut
How it works
For each label λ λj j , tune a threshold based on a validation set
Comments
In contrast to RCut and PCut it tunes a separate parameter for each label Requires a validation phase, whose complexity is linear with respect to q Overfits when the binary learning problem is unbalanced (few positive labels)
Too high or too low thresholds
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Thresholding Strategies: RTCut
How it works
Given a document, it sorts the labels by synthetic score and selects those above a threshold t It is optimized using a validation set
Synthetic score
ss (λ j ) = r (λ j ) +
s (λ j ) max j∈L {s (λ j )} + 1
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Thresholding Strategies: SCutFBR
How it works
SCutFBR.0
When the SCut threshold is too high macro-F1 is hurt When the SCut threshold is too low both macro and micro-F1 are hurt (prefer to increase the threshold) Solution: When the calculated threshold is below a given value fbr then… Set the threshold to infinity
SCutFBR.1
Set the threshold to the largest score during validation
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Learning the Number of Labels
[Elisseeff & Weston, NIPS02] Input: a q-dimensional feature space with the obtained scores for each label Output: The threshold t that minimizes the symmetric difference between predicted and true sets Learning: linear least squares [Tang et. al, WWW09] Input: original feature space, scores, sorted scores Output: the size of the labelset Learning: multi-class classification, with a cut-off parameter
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Outline Introduction Overview of existing techniques Advanced topics
Learning in the presence of Label Structure Multi-instance multi-label learning
The Mulan open-source software
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Outline Introduction Overview of existing techniques Advanced topics
Learning in the presence of Label Structure Multi-instance multi-label learning
The Mulan open-source software
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Hierarchy Types and Implications
Trees
Singe parent per label When an object is labeled with a node, it is also labeled with its parent (paths) Path types
Annotation paths end at a leaf (full path) Annotation paths end at internal nodes (partial paths)
Directed acyclic graphs (DAGs)
Multiple parents per label When an object is labeled with a node
It is also labeled with all its parents (multiple inheritance) It is also labeled with at least one of its parents
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
A Simple Approach
Ignore hierarchies
Simple binary relevance Should be used as a baseline Learn leaf models only, in the case of full paths!
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Hierarchical Binary Relevance
Training [Koller & Sahami, ICML97; Cesa-Bianchi et al., JMLR06]
One binary model at each node, using only those examples that are annotated with the parent node
Predictions are formed top-down
A node can predict true only if its parent predicted so What about probabilities? p(λ)=p(λ|par(λ)p(par(λ))
When thresholding, the threshold for a node should not be higher than that of its parent
Comments
Handles both partial and full paths
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Hierarchical Multi-Label Learning
Generalization of HBR [Tsoumakas et al., ECMLPKDD08w]
Training and testing follows same approach as HBR One multi-label learner is trained at each internal node If BR is used at each node, then we get HBR
TreeBoost.MH [Esuli et al., IR08]
Instantiation using AdaBoost.MH
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Other Approaches Predictive Clustering Trees [Vens et al., MLJ08] B-SVM [Cesa-Bianchi et al., ICML06]
Train similarly to HBR Bottom-up Bayesian combination of probabilities
Bayesian networks [Barutcuoglu et al., BIOINF06]
Train independent binary classifiers for each label Combine them using a Bayesian network to correct inconsistencies
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Outline Introduction Overview of existing techniques Advanced topics
Learning in the presence of Label Structure Multi-instance multi-label learning
The Mulan open-source software
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Consider … An image usually contains multiple regions each can be represented by an instance
The image can simultaneously belong to multiple classes Elephant Lion Grassland Tropic Africa ……
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Consider … A document usually contains multiple sections each can be represented by an instance
The document can simultaneously belong to multiple categories
Scientific novel Jules Verne’s writing Book on traveling ……
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
MIML
Multi-Instance Multi-Label (MIML) Learning
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Why MIML? Appropriate representation is important Having an appropriate representation is as important as having a strong learning algorithm Real-world objects are usually inherited with input ambiguity as well as output ambiguity
Traditional supervised learning, multi-instance learning and multi-label learning are degenerated versions of MIML
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Why MIML? (Cont’)
Traditional supervised learning
Multi-label learning
Multi-instance learning
Multi-instance multi-label learning
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Why MIML? (Cont’) To learn an one-to-many mapping is an ill-posed problem Why there are multiple labels? many-to-many mapping seems better; and moreover, MIML also offers a possibility for understanding the relationship between instances and labels label ……
instance different aspects
object instance
label …… label
instance
…… label
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Why MIML? (Cont’) MIML can also be helpful for learning single-label examples involving complicated high-level concepts
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Why MIML? (Cont’) MIML can also be helpful for learning single-label examples involving complicated high-level concepts
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Multi-Instance Multi-Label Learning MIML task: To learn a function from a given data set , where instances , and labels .
is a set of , is a set of ,
Ξ - the instance space Ψ - the set of class labels ni - the number of instances in Xi li - the number of labels in Yi
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Solving MIML by Degeneration MIMLBoost (an illustration of Solution 1)
May suffer from information loss
MIBoosting
MIL
during degeneration process!
MIML
SISL
MLSVM
Category-wise decomposition
MLL
Representation Transformation
MIMLSVM (an illustration of Solution 2) unambiguous
ambiguous
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Advanced Topics Solving MIML by regularization No access to original data objects Learning single-label examples involving complicated highlevel concepts Zhou et al., MIML: A Framework for Learning with Ambiguous Objects. CORR abs/0808.3231, 2008
Large margin MIML algorithm M3MIML
[Zhang & Zhou, ICDM’08]
MIML for image annotation
[Zha et al., CVPR’08]
MIML metric learning
[Jin et al., CVPR’09]
…… G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Outline Introduction Overview of existing techniques Advanced topics The Mulan open-source software
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
What It Is
Mulan
Open source software => scientific progress
"better reproducibility of experimental results, quicker detection of errors, innovative applications, and faster adoption of machine learning methods in other disciplines and in industry" JMLR OSS
Built on top of Weka
An open source software for multi-label learning Current version: 1.0.1
Well established code and user base
Programming language: Java
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Data Format
ARFF file
@relation MultiLabelExample @attribute @attribute @attribute @attribute @attribute @attribute @attribute @attribute
feature1 numeric feature2 numeric feature3 numeric label1 {0, 1} label2 {0, 1} label3 {0, 1} label4 {0, 1} label5 {0, 1}
@data 2.3,5.6,1.4,0,1,1,0,0
XML file
Datasets available at: http://mlkd.csd.auth.gr/multilabel.html
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Hierarchies of Labels
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Validity Checks All labels specified in the XML file must be also defined in the ARFF file with same name Label names must be unique Each ARFF label attribute must be nominal with binary values {0, 1} Data should be consistent with the hierarchy
If any child label appears at an example, then all parent labels must also appear
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
mulan.core
Exception handling infrastructure
Util
WekaException MulanException MulanRuntimeException Utilities
MulanJavadoc
Documentation support
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
mulan.data (1/2) MultiLabelInstances LabelsMetaData and LabelsMetaDataImpl LabelNode and LabelNodeImpl LabelsBuilder
Creates a LabelsMetaData instance from an XML file
LabelSet
Labelset representation class
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
mulan.data (2/2)
Converters from LibSVM and CLUS formats
ConverterLibSVM ConverterCLUS
Statistics
Number of numeric/nominal attributes, labels Cardinality, density, distinct labelsets Phi-correlation matrix, co-occurrence matrix
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
MultiLabelInstances
MultiLabelInstances
Instances dataSet LabelsMetaData labelsMetaData MultiLabelInstances(String arffFile, String xmlFile) MultiLabelInstances(String arffFile, int numLabels) MultiLabelInstances(Instances d, LabelsMetaData m) getDataset(): Instances getLabelsMetaData(): LabelsMetaData getLabelIndices(): int[] getFeatureIndices(): int[] getNumLabels(): int clone(): MultiLabelInstances reintegrateModifiedDataSet(Instances d): MultiLabelInstances
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
LabelsMetaData
LabelsMetaDataImpl
Map allLabelNodes Set rootLabelNodes; getRootLabels(): Set getLabelNames(): Set getLabelNode(String labelName): LabelNode getNumLabels(): int containsLabel(String labelName): boolean isHierarchy(): boolean addRootNode(LabelNode rootNode) removeLabelNode(String labelName): int
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
LabelNode
LabelNodeImpl
Set childrenNodes String name LabelNode parentNode getChildren(): Set getChildredLabels(): Set getDescendantLabels(): Set getName(): String getParent(): LabelNode hasChildren(): boolean hasParent(): boolean addChildNode(LabelNode node): boolean removeChildNode(LabelNode node): boolean
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
mulan.transformation
Package mulan.transformation
Common for learning and feature selection
Includes data transformation approaches BinaryRelevance LabelPowerset mulan.transformation.multiclass
Simple transformations that support complex ones
RemoveAllLabels
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
mulan.transformation.multiclass
MultiClassTransformation
MultiClassTransformationBase
transformInstances(MultiLabelInstances d): Instances transformInstances(MultiLabelInstances d): Instances transformInstance(Instance i): List
Copy CopyWeight Ignore SelectRandom SelectBasedOnFrequency SelectionType
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
mulan.classifier (1/3) MultiLabelLearner MultiLabelLearnerBase MultiLabelOutput
mulan.classifier.transformation mulan.classifier.meta mulan.classifier.lazy mulan.classifier.neural
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
mulan.classifier (2/3)
MultiLabelLearner
build(MultiLabelInstances instances) makePrediction(Instance instance): MultiLabelOutput makeCopy: MultiLabelLearner
MultiLabelLearnerBase
int numLabels int[] labelIndices, featureIndices boolean isDebug build(MultiLabelInstances instances) buildInternal(MultiLabelInstances instances) makeCopy debug(String msg) setDebug(boolean debug) getDebug: boolean
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
mulan.classifier (3/3)
MultiLabelOutput
boolean[] bipartition double[] confidences int[] ranking
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
mulan.classifier.transformation
TransformationBasedMultiLabelLearner
BinaryRelevance CalibratedLabelRanking
Classifier baseClassifier TransformationBasedMultiLabelLearner TransformationBasedMultiLabelLearner(Classifier base) getBaseClassifier: Classifier
boolean useStandardVoting
LabelPowerset PPT MultiLabelStacking MultiClassLearner
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
mulan.classifier.meta
MultiLabelMetaLearner
MultiLabelLearner baseLearner MultiLabelMetaLearner MultiLabelMetaLearner(MultiLabelLearner base) getBaseLearner: MultiLabelLearner
RAkEL HOMER HMC
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Lazy and Neural Methods
Package mulan.classifier.lazy
MultiLabelKNN
Abstract base class for kNN based methods
Implemented algorithms
MLkNN [Matlab version at http://lamda.nju.edu.cn/datacode/MLkNN.htm]
BRkNN IBLR_ML [Cheng & Hullermeier, MLJ09]
Package mulan.classifier.neural
BPMLL [Matlab version at http://lamda.nju.edu.cn/datacode/BPMLL.htm]
Package mulan.classifier.neural.model
Support classes
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Evaluation (1/6)
Evaluator
Evaluator() Evaluator(int seed) evaluate(MultiLabelLearner learner, MultiLabelInstances test): Evaluation crossValidate(MultiLabelLearner learner, MultiLabelInstances test, int numFolds): Evaluation
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Evaluation (2/6)
Evaluation
getExampleBasedMeasures(): ExampleBasedMeasures getLabelBasedMeasures(): LabelBasedMeasures getConfidenceLabelBasedMeasures(): ConfidenceLabelBasedMeasures getRankingBasedMeasures(): RankingBasedMeasures getHierarchicalMeasures(): HierarchicalMeasures toString() toCSV()
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Evaluation (3/6)
ExampleBasedMeasures
ExampleBasedMeasures(MultiLabelOutput[] output, boolean[][] trueLabels, double forgivenessRate) ExampleBasedMeasures(MultiLabelOutput[] output, boolean[][] trueLabels) ExampleBasedMeasures(ExampleBasedMeasures[] array) getSubsetAccuracy(): double getAccuracy(): double getHammingLoss(): double getPrecision(): double getRecall(): double getFMeasure(): double
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Evaluation (4/6)
LabelBasedMeasures
LabelBasedMeasures(MultiLabelOutput[] output, boolean[][] trueLabels) LabelBasedMeasures(LabelBasedMeasures[] array) getLabelAccuracy(int label): double getLabelPrecision(int label): double getLabelRecall(int label): double getLabelFMeasure(int label): double getAccuracy(Averaging type): double getPrecision(Averaging type): double getRecall(Averaging type): double getFMeasure(Averaging type): double
Averaging
MACRO, MICRO
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Evaluation (5/6)
RankingBasedMeasures
RankingBasedMeasures(MultiLabelOutput[] output, boolean[][] trueLabels) RankingBasedMeasures(ExampleBasedMeasures[] array) getAvgPrecision(): double getOneError(): double getRankingLoss(): double getCoverage(): double
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Evaluation (6/6)
ConfidenceLabelBasedMeasures
ConfidenceLabelBasedMeasures(MultiLabelOutput[] output, boolean[][] trueLabels) ConfidenceLabelBasedMeasures( ConfidenceLabelBasedMeasures[] array) getLabelAUC(int label): double getAUC(Averaging type): double
HiearchicalMeasures
HierarchicalMeasures(MultiLabelOutput[] output, boolean[][] trueLabels, LabelsMetaData metaData) HiearchicalMeasures(HierachicalMeasures[] array) getHierarchicalLoss: double
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Examples
Package mulan.examples
TrainTestExperiment CrossValidationExperiment EstimationOfStatistics GettingPredictionsOnTestSet AttributeSelectionTest
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
An Example MultiLabelInstances train, test; train = new MultiLabelInstances("yeast-train.arff", "yeast.xml"); test = new MultiLabelInstances("yeast-test.arff", "yeast.xml"); Classifier base = new NaiveBayes(); BinaryRelevance br = new BinaryRelevance(base); br.build(train); Evaluator eval = new Evaluator(); Evaluation results = eval.evaluate(br, test); System.out.println(results.toString()); G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Mulan Credits
Main co-developers
Robert Friberg Lefteris Spyromitros Jozef Vilcek
Code contributors
Thank you!
Stavros Bakirtzoglou Weiwei Cheng Ioannis Katakis Sang-Hyeun Park Elise Rairat George Saridis George Traianos
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09
Bibliography
An online multi-label learning bibliography is maintained at
http://www.citeulike.org/group/7105/tag/multilabel Currently includes more than 90 articles
You can…
Grab BibTeX and RIS records Subscribe to the corresponding RSS feed Follow links to the papers' full pdf (may require access to digital libraries) Export the complete bibliography for BibTeX or EndNote use (requires CiteULike account)
G. Tsoumakas, M.-L. Zhang, Z.-H. Zhou, Tutorial at ECML/PKDD’09