http://www.cse.ust.hk/~qyang/ publications.html. â« ..... e.g.,ï¼CHEFï¼ [Hammond, 1986] ï¼AI planning of recipes for cooking, HYPO (Ashley 1991), ⦠51 ...
Transfer Learning Lectures: Introduction Qiang Yang Hong Kong University of Science and Technology Hong Kong, China
http://www.cse.ust.hk/~qyang
Harbin 2011
1
A Major Assumption
Training and future (test) data
follow the same distribution, and are in same feature space
Harbin 2011
2
When distributions are different
Part-of-Speech tagging Named-Entity Recognition Classification Harbin 2011
3
When Features are different
Heterogeneous: different feature spaces Training: Text
Apples
The apple is the pomaceous fruit of the apple tree, species Malus domestica in the rose family Rosaceae ...
Bananas
Banana is the common name for a type of fruit and also the herbaceous plants of the genus Musa which produce this commonly eaten fruit ...
Harbin 2011
Future: Images
4
Transfer Learning?
People often transfer knowledge to novel situations
Chess Checkers C++ Java Physics Computer Science
Transfer Learning: The ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks (or new domains) Harbin 2011
5
Transfer Learning: Source Domains Input
Learning
Output
Source Domains Training Data
Source Domain
Target Domain
Labeled/Unlabeled
Labeled/Unlabeled
Test Data
Unlabeled Harbin 2011
6
Text Classification/Clustering
ODP/sports
21
7498
17016
7700
Harbin 2011
7
源數據
Self-taught Learning
Case 1
No labeled data in a source domain
Inductive Transfer Learning
目標數據 Labeled data are available in a target domain
Labeled data are available in a source domain
Case 2 Transfer Learning
Labeled data are available only in a source domain
Transductive Transfer Learning
No labeled data in both source and target domain
Source and target tasks are learnt simultaneously
Assumption: different domains but single task
Multi-task Learning
Domain Adaptation
Assumption: single domain and single task
Unsupervised Transfer Learning
Sample Selection Bias / Covariance Shift 8
Different Learning Problems Multiple Domain Data
Feature Spaces
Heterogeneous
Homogeneous
Instance Alignment ? Yes
Mul5-‐view Learning Source Domain
Data Distribu5on? No
Different
Heterogeneous Transfer Learning Apple is a fr-‐ uit that can be found …
Transfer Learning across Different Distribu5ons
Same
Tradi5onal Machine Learning
Banana is the common name for…
Target Domain
HarbinDomain 2011
Adaptation
9
Domain Adaptation in NLP Applications
Automatic Content Extraction Sentiment Classification Part-Of-Speech Tagging NER Question Answering Classification Clustering
Selected Methods
Domain adaptation for statistical classifiers [Hal Daume III & Daniel Marcu, JAIR 2006], [Jiang and Zhai, ACL 2007] Structural Correspondence Learning [John Blitzer et al. ACL 2007] [Ando and Zhang, JMLR 2005] Latent subspace [Sinno Jialin Pan et al. AAAI 08]
Harbin 2011
10
– – –
Uniform weights
Loss function on the target domain data
Cross-domain POS tagging, entity type classification Personalized spam filtering Correct the decision boundary by re-weighting
Loss function on the source domain data
Regularization term
Differentiate the cost for misclassification of the target and source data
Harbin 2011
11
Hedge ( )
AdaBoost
• Misclassified examples:
[Freund et al. 1997] Evaluation with 20NG: 22%8% – increase the weights
[Freund et al. 1997]
http://people.csail.mit.edu/jrennie/20Newsgroups/
To decrease the weights of the misclassified data
Source domain labeled data
target domain labeled data
The whole training data set
of the misclassified target data – decrease the weights of the misclassified source data
Target domain unlabeled data
Harbin 2011
12
20 newsgroups (20,000 documents, 20 data sets) New Old comp.graphics (comp) comp.os.mis-windows.misc (comp) sci.crypt (sci) sci.electronics (sci)
SRAA (A. McCallum, 70,000 articles) Old sim-auto (auto) sim-aviation (aviation)
comp.sys.ibm (?) comp.windows.x (?) sci.med (?) sci.space (?)
Reuters-21578
real-auto (?) real-aviation (?)
New
13
Three text collections
20 newsgroups SRAA Reuters-21578
The data are split based on sub-categories
rec sci
Old Domain: [A1=+, B1= - ], New Domain: [A2=?, B2=?] 14
Co-Clustering based Classification (KDD 2007) Co-clustering is applied between features (words) and target-domain documents Word clustering is constrained by the labels of in-domain (Old) documents
The word clustering part in both domains serve as a bridge
15 15
Co-‐Clustering based Classifica5on (cont.)
– clusters (classification) w.r.t. the target-domain documents – clusters w.r.t. the common features (words) C – class-labels of the source-domain documents
Optimization Function
Minimize the loss in mutual information before and after clustering/ classification
Between Do and W Between C and W
16 16
Co-‐Clustering based Classifica5on (cont.)
17
Structural Correspondence Learning [Blitzer et al. ACL 2007]
SCL: [Ando and Zhang, JMLR 2005] Method
Define pivot features: common in two domains Find non-pivot features in each domain Build classifiers through the non-pivot Features
(1) The book is so repetitive that I found myself yelling …. I will definitely not buy another.
(2) Do not buy the Shark portable steamer …. Trigger mechanism is defective.
Book Domain
Kitchen Domain Harbin 2011
18 18
Received-Signal-Strength (RSS) based localization in an Indoor WiFi environment. Access point 2
Mobile device Access point 1
Access point 3
-30dBm
-70dBm
-40dBm
Where is the mobile device?
(location_x, location_y)
19
Distribution Changes
The mapping function f learned in the offline phase can be out of date. Recollecting the WiFi data is very expensive. How to adapt the model ?
Night time period
Day time period
Time
20
Transfer Component Analysis Sinno Jialin Pan et al. IJCAI 2009, TNN 2011
Harbin 2011
21
If two domains are related, …
Observations Source Domain data
Target Domain data
Latent factors
Common latent factors across domains Sinno Jialin Pan
22
Semi-Supervised TCA Con: when the discriminative directions of the source and target domains are different, SSTCA may not work well.
Sinno Jialin Pan
23
Input: A lot of unlabeled data in a source domain and a few unlabeled data in a target domain. Goal: Clustering the target domain data. Assumption: The source domain and target domain data share some common features, which can help clustering in the target domain. Main Idea: To extend the information theoretic co-clustering algorithm [Dhillon et al. KDD-03] for transfer learning. Sinno Jialin Pan
Objective function that need to be minimized
where
Sinno Jialin Pan
References
http://www.cse.ust.hk/~qyang/ publications.html http://www1.i2r.a-star.edu.sg/~jspan/SurveyTL.htm Wenyuan Dai, Qiang Yang, Gui-Rong Xue and Yong Yu. Boosting for Transfer Learning. In Proceedings of The 24th Annual International Conference on Machine Learning (ICML'07), Corvallis, Oregon, USA, June 20-24, 2007. 193 200 (PDF) Wenyuan Dai, Gui-Rong Xue, Qiang Yang and Yong Yu. Co-clustering based Classification for Out-of-domain Documents. In Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM KDD'07), San Jose, California, USA, Aug 12-15, 2007. Pages 210-219 (PDF) Sinno Jialin Pan, Ivor W. Tsang, James T. Kwok and Qiang Yang. Domain Adaptation via Transfer Component Analysis. In IEEE Transactions on Neural Networks (IEEE TNN). Volume 22, Issue 2, Pages 199-210, February 2011.
Harbin 2011
26
迁移学习及其应用 (2): Transfer Learning 2: Cross Task/ Label Space 杨强,Qiang Yang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong
http://www.cse.ust.hk/~qyang 27 27
Cross Task/Label Space Transfer
Target Class Changes Target Transfer Learning
Training: 2 class problem Testing: 10 class problem. Traditional methods fail
Solution: find out what is not changed bewteen training and testing
28
Related Works
Cross-Domain Learning
Translated Learning
TrAdaBoosting (ICML 2007) Co-Clustering based Classification (SIGKDD 2007) TPLSA (SIGIR 2008) NBTC (AAAI 2007) Cross-lingual classification (in WWW 2008) Cross-media classification (In NIPS 2008)
Unsupervised Transfer Learning
Self-taught clustering (ICML 2008) 29
Our Work (cont)
Wenyuan Dai, Yuqiang Chen, Gui-Rong Xue, Qiang Yang, and Yong Yu. Translated Learning. In Proceedings of Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS 2008), December 8, 2008, Vancouver, British Columbia, Canada. (Link) Xiao Ling, Wenyuan Dai, Gui-Rong Xue, Qiang Yang, and Yong Yu. Cross-Domain Spectral Learning. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM KDD 2008), Las Vegas, Nevada, USA, August 24-27, 2008. 488-496 (PDF) Wenyuan Dai, Qiang Yang, Gui-Rong Xue and Yong Yu. Self-taught Clustering. In Proceedings of the 25th International Conference on Machine Learning (ICML 2008), Helsinki, Finland, 5-9 July, 2008. 200-207 (PDF) Wenyuan Dai, Qiang Yang, Gui-Rong Xue and Yong Yu. Boosting for Transfer Learning. In Proceedings of The 24th Annual International Conference on Machine Learning (ICML'07) Corvallis, Oregon, USA, June 20-24, 2007. 193 - 200 (PDF) Wenyuan Dai, Gui-Rong Xue, Qiang Yang and Yong Yu. Co-clustering based Classification for Out-of-domain Documents. In Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM KDD'07), San Jose, California, USA, Aug 12-15, 2007. Pages 210-219 (PDF) Dou Shen, Jian-Tao Sun, Qiang Yang and Zheng Chen. Building Bridges for Web Query Classification. In Proceedings of the 29th ACM International Conference on Research and Development in Information Retrieval (ACM SIGIR 06). Seattle, USA, August 6-11, 2006. Pages 131-138. (PDF)
30
Classification in KDDCUP 2005
ACM KDDCUP 05 Winner SIGIR 06 ACM Transactions on Information Systems Journal 2006
Joint work with Dou Shen, Jiantao Sun and Zheng Chen 31
QC as Machine Learning Inspired by the KDDCUP’05 competition
Classify a query into a ranked list of categories Queries are collected from real search engines Target categories are organized in a tree with each node being a category
32 32
Related Works
Document/Query Expansion
Borrow text from extra data source
Using hyperlink [Glover 2002]; Using implicit links from query log [Shen 2006]; Using existing taxonomies [Gabrilovich 2005];
Query expansion [Manning 2007]
Global methods: independent of the queries Local methods using relevance feedback or pseudo-relevance feedback
Query Classification/ Clustering
Classify the Web queries by geographical locality [Gravano 2003]; Classify queries according to their functional types [Kang 2003]; Beitzel et al. studied the topical classification as we do. However they have manually classified data [Beitzel 2005]; Beeferman and Wen worked on query clustering using clickthrough data respectively [Beeferman 2000; Wen 2001]; 33 33
Target-transfer Learning in QC
Classifier, once trained, stays constant
Target Classes Before
Target Classes Now
Sports (Olympics, Football, NBA), Stock Market (Asian, Dow, Nasdaq), History (Chinese, World) How to allow target to change?
Application:
Sports, Politics (European, US, China)
advertisements come and go, but our querytarget mapping needs not be retrained!
We call this the target-transfer learning problem
34
Solutions: Query Enrichment + Staged Classification
Solution: Bridging classifier
35 35
Step 1: Query enrichment
Textual information Title
Category information
Snippet Category
Full text
36 36
Step 2: Bridging Classifier
Wish to avoid:
When target is changed, training needs to repeat!
Solution:
Connect the target taxonomy and queries by taking an intermediate taxonomy as a bridge
37 37
Category Selection for Intermediate Taxonomy
Category Selection for Reducing Complexity
Total Probability (TP)
Mutual Information
38 38
Experiment ─ Data Sets & Evaluation
ACM KDDCUP
Starting 1997, ACM KDDCup is the leading Data Mining and Knowledge Discovery competition in the world, organized by ACM SIG-KDD.
ACM KDDCUP 2005
Task: Categorize 800K search queries into 67 categories Three Awards (1) Performance Award ; (2) Precision Award; (3) Creativity Award Participation
Evaluation data
142 registered groups; 37 solutions submitted from 32 teams
800 queries randomly selected from the 800K query set 3 human labelers labeled the entire evaluation query set
Evaluation measurements: Precision and Performance (F1) We won all three. a
39 / 68
39
Summary: Target-Transfer Learning
Intermediate Class
Query
classify to
Similarity
Target class
40
Activity Recognition
With sensor data collected on mobile devices
Location
Context: location, weather, etc.
GPS, Wifi, RFID From GPS, RFID, Bluetooth, etc.
Various models can be used
Non-sequential models:
Naïve Bayes, SVM …
Sequential models:
HMM, CRF …
Cross Domain Activity Recognition [Zheng, Hu, Yang, Ubicomp 2009]
Challenges:
A new domain of activities without labeled data
Cleaning Indoor
Cross-domain activity recognition
Transfer some available labeled data from source activities to help training the recognizer for the target activities.
Laundry
Dishwashing 42
Source-Selection-Free Transfer Learning (SSFTL) Evan Wei Xiang, Sinno Jialin Pan, Weike Pan, Jian Su and Qiang Yang. SourceSelection-Free Transfer Learning. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI-11), Barcelona, Spain, July 2011.
Transfer Learning Supervised Learning Lack of labeled training data always happens
Transfer Learning When we have some related source domains
Where are the “right” source data?
We may have an extremely large number of choices of potential sources to use.
SSFTL – Predictions for the incoming test data Learned Projec'on matrix
Target Domain
d
Incoming Test Data
W m
Projec'on matrix
vs. vs. vs.
vs. q
vs.
No need to use base models explicitly!
V m
With the parameter matrix W, we can make predic'on on any incoming test data based on the distance to the label prototypes, without calling the base classifica'on models
Experiments on SSFTL
From Wikipedia, we got 800K pages with 100K categories to build base classifiers Picked 50K pairs for binary base classifiers
Social media:
Training takes 2 days on single processor! 200K tags on 5M pages, 22K users
Data
SVM
TSVM SSFTL
20NG
69.8
75.7
80.6
Google
62.1
69.7
73.4
AOL
72.1
74.1
74.3
Reuters
69.7
63.3
74.3
Latent dimension |V| 20NG: The 20 Newsgroups dataset is a text =100 collection of ~ 20, 000 newsgroup documents
across 20 different topics evenly. We created 190 target binary text classification tasks, like hockey vs. religion and medicine vs. motorcycles, etc.
References
http://www.cse.ust.hk/~qyang/ publications.html Dou Shen, Jian-Tao Sun, Qiang Yang and Zheng Chen. Building Bridges for Web Query Classification. In Proceedings of the 29th ACM International Conference on Research and Development in Information Retrieval (ACM SIGIR 06), Seattle, USA, August 6-11, 2006. Pages 131-138. [PDF] Vincent W. Zheng, Derek H. Hu and Qiang Yang. Cross-Domain Activity Recognition. In Proceedings of the 11th International Conference on Ubiquitous Computing (Ubicomp-09), Orlando, Florida, USA, Sept.30-Oct.3, 2009. (PDF) Evan Wei Xiang, Sinno Jialin Pan, Weike Pan, Jian Su and Qiang Yang. SourceSelection-Free Transfer Learning. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI-11), Barcelona, Spain, July 2011. (PDF)
Harbin 2011
48
Towards Heterogeneous Transfer Learning Qiang Yang Hong Kong University of Science and Technology Hong Kong, China
http://www.cse.ust.hk/~qyang
49
TL Resources
http://www.cse.ust.hk/TL
50
Learning by Analogy
Gentner 1983: Structural Correspondence
Mapping between source and target:
mapping between objects in different domains e.g., between computers and humans
mapping can also be between relations
Anti-virus software vs. medicine
Falkenhainer,Forbus, and Gentner (1989) Structural Correspondence Engine: incremental transfer of knowledge via comparison of two domains
Case-based Reasoning (CBR)
e.g.,(CHEF) [Hammond, 1986] ,AI planning of recipes for
cooking, HYPO (Ashley 1991), …
51
Heterogeneous Transfer Learning Multiple Domain Data
Feature Spaces
Heterogeneous
Homogeneous
Instance Alignment ? Yes
Mul5-‐view Learning Source Domain
Data Distribu5on? No
Heterogeneous Transfer Learning Apple is a fr-‐ uit that can be found …
Different Transfer Learning across Different Distribu5ons
Same
Tradi5onal Machine Learning
Banana is the common name for…
Target Domain
HTL
52
Cross-language Classification WWW 2008: X.Ling et al. Can Chinese Web Pages be Classified with English Data Sources?
Classifier learn
Labeled
English
Web pages
classify
Unlabeled Chinese Web pages
Cross-‐language ClassificaHon
53
HTL Setting: Text to Images
Source data: labeled or unlabeled Target training data: labeled Training: Text
Apple
The apple is the pomaceous fruit of the apple tree, species Malus domestica in the rose family Rosaceae ...
Banana
Banana is the common name for a type of fruit and also the herbaceous plants of the genus Musa which produce this commonly eaten fruit ...
Testing: Images
54
HTL for Images: 3 Cases
Source Data Unlabeled, Target Data Unlabeled
Source Data Unlabeled, Target Data Training Data Labeled
Clustering
HTL for Image Classification
Source Data Labeled, Target Training Data Labeled
Translated Learning: classification
Annotated PLSA Model for Clustering Caltech 256 Data
Heterogeneous Transfer Learning
Average Entropy Improvement
5.7%
From Flickr.com
SIFT Features
Words from Source Data Topics Image features
Image instances in
target data
… Tags Lion Animal Simba Hakuna Matata FlickrBigCats …
56
“Heterogeneous transfer learning for image classification” Y. Zhu, G. Xue, Q. Yang et al. AAAI 2011
57
Case 2: Source is not Labeled; Goal: Classification Target data
Unlabeled Source data
A few labeled images as training samples
Testing samples: not available during training. 58
Structural Transfer Learning
59
Structural Transfer
Transfer Learning from Minimal Target Data by Mapping across Relational Domains
Lilyana Mihalkova and Raymond Mooney In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09), 1163--1168, Pasadena, CA, July 2009. ``use the short-range clauses in order to find mappings between the
relations in the two domains, which are then used to translate the long-range clauses.’’
Transfer Learning by Structural Analogy.
Huayan Wang and Qiang Yang. In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI-11). San Francisco, CA USA. August, 2011. Find the structural mappings that maximize structural similarity
References
http://www.cse.ust.hk/~qyang/ publications.html Huayan Wang and Qiang Yang. Transfer Learning by Structural Analogy. In Proceedings of the 25th AAAI Conference on Artificial Intelligence ( AAAI-11). San Francisco, CA USA. August, 2011. (PDF)Yin Zhu, Yuqiang Chen, Zhongqi Lu, Sinno J. Pan, Gui-Rong Xue, Yong Yu and Qiang Yang. Heterogeneous Transfer Learning for Image Classification. In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI-11). San Francisco, CA USA. August, 2011. (PDF) Qiang Yang, Yuqiang Chen, Gui-Rong Xue, Wenyuan Dai and Yong Yu. Heterogeneous Transfer Learning for Image Clustering via the Social Web. In Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP (ACL-IJCNLP'09), Sinagpore, Aug 2009, pages 1–9. Invited Paper (PDF) Wenyuan Dai, Yuqiang Chen, Gui-Rong Xue, Qiang Yang, and Yong Yu. Translated Learning. In Proceedings of Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS 2008), December 8, 2008, Vancouver, British Columbia, Canada. (Link 61 Harbin 2011
Transfer Learning in Link Prediction and Social Web Qiang Yang Hong Kong University of Science and Technology Hong Kong, China
http://www.cse.ust.hk/~qyang
62
Network Transfer: Topological Mapping
Expressed Through a Codebook
63
Recommendation Systems
Product Recommendation as Link Prediction
Task: predict missing links in a network Focus:
bipartite graph of users and items
65
1
?
5
?
3
? 4 4
?
Realistic Setting
?
Training Data: Sparse Rating Matrix
?
Test Data:
CF model
Rating Matrix
Density