Transfer Learning Lectures: Introduction

12 downloads 0 Views 7MB Size Report
From Flickr.com … Tags. Lion. Animal. Simba. Hakuna. Matata. FlickrBigCats … SIFT Features. Caltech 256 Data. Heterogeneous Transfer. Learning. Average ...
Transfer Learning Lectures: Introduction Qiang Yang Hong Kong University of Science and Technology Hong Kong, China

http://www.cse.ust.hk/~qyang

Harbin 2011

1

A Major Assumption  

Training and future (test) data    

follow the same distribution, and are in same feature space

Harbin 2011

2

When distributions are different

     

Part-of-Speech tagging Named-Entity Recognition Classification Harbin 2011

3

When Features are different  

Heterogeneous: different feature spaces Training: Text

Apples

The apple is the pomaceous fruit of the apple tree, species Malus domestica in the rose family Rosaceae ...

Bananas

Banana is the common name for a type of fruit and also the herbaceous plants of the genus Musa which produce this commonly eaten fruit ...

Harbin 2011

Future: Images

4

Transfer Learning?  

People often transfer knowledge to novel situations      

Chess  Checkers C++  Java Physics  Computer Science

Transfer Learning: The ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks (or new domains) Harbin 2011

5

Transfer Learning: Source Domains Input

Learning

Output

Source Domains Training Data

Source Domain

Target Domain

Labeled/Unlabeled

Labeled/Unlabeled

Test Data

Unlabeled Harbin 2011

6

Text Classification/Clustering  

ODP/sports

21

7498

17016

7700

Harbin 2011

7

源數據

Self-taught Learning

Case 1

No labeled data in a source domain

Inductive Transfer Learning

目標數據 Labeled data are available in a target domain

Labeled data are available in a source domain

Case 2 Transfer Learning

Labeled data are available only in a source domain

Transductive Transfer Learning

No labeled data in both source and target domain

Source and target tasks are learnt simultaneously

Assumption: different domains but single task

Multi-task Learning

Domain Adaptation

Assumption: single domain and single task

Unsupervised Transfer Learning

Sample Selection Bias / Covariance Shift 8

Different Learning Problems Multiple Domain Data

Feature   Spaces

Heterogeneous  

Homogeneous

Instance   Alignment  ? Yes

Mul5-­‐view   Learning Source Domain

Data   Distribu5on? No

Different

Heterogeneous   Transfer   Learning Apple  is  a  fr-­‐ uit  that  can   be  found  …

Transfer  Learning   across  Different   Distribu5ons

Same

Tradi5onal   Machine   Learning

Banana  is   the  common   name  for…

Target Domain

HarbinDomain 2011

Adaptation

9

Domain Adaptation in NLP Applications  

           

Automatic Content Extraction Sentiment Classification Part-Of-Speech Tagging NER Question Answering Classification Clustering

Selected Methods  

 

 

Domain adaptation for statistical classifiers [Hal Daume III & Daniel Marcu, JAIR 2006], [Jiang and Zhai, ACL 2007] Structural Correspondence Learning [John Blitzer et al. ACL 2007] [Ando and Zhang, JMLR 2005] Latent subspace [Sinno Jialin Pan et al. AAAI 08]

Harbin 2011

10

–  –  – 

Uniform weights

Loss function on the target domain data  

Cross-domain POS tagging, entity type classification Personalized spam filtering Correct the decision boundary by re-weighting

Loss function on the source domain data

Regularization term

Differentiate the cost for misclassification of the target and source data

Harbin 2011

11

Hedge ( )

AdaBoost

•  Misclassified examples:

[Freund et al. 1997] Evaluation with 20NG: 22%8% –  increase the weights

[Freund et al. 1997]

http://people.csail.mit.edu/jrennie/20Newsgroups/

To decrease the weights of the misclassified data

Source domain labeled data

target domain labeled data

The whole training data set

of the misclassified target data –  decrease the weights of the misclassified source data

Target domain unlabeled data

Harbin 2011

12

 

20 newsgroups (20,000 documents, 20 data sets) New Old comp.graphics (comp) comp.os.mis-windows.misc (comp) sci.crypt (sci) sci.electronics (sci)

 

SRAA (A. McCallum, 70,000 articles) Old sim-auto (auto) sim-aviation (aviation)

 

comp.sys.ibm (?) comp.windows.x (?) sci.med (?) sci.space (?)

Reuters-21578

real-auto (?) real-aviation (?)

New

13

 

Three text collections      

 

20 newsgroups SRAA Reuters-21578

The data are split based on sub-categories

rec sci

Old Domain: [A1=+, B1= - ], New Domain: [A2=?, B2=?] 14

Co-Clustering based Classification (KDD 2007) Co-clustering is applied between features (words) and target-domain documents   Word clustering is constrained by the labels of in-domain (Old) documents  

 

The word clustering part in both domains serve as a bridge

15   15

Co-­‐Clustering  based  Classifica5on  (cont.)

 

– clusters (classification) w.r.t. the target-domain documents – clusters w.r.t. the common features (words) C – class-labels of the source-domain documents

 

Optimization Function

 

 

 

Minimize the loss in mutual information before and after clustering/ classification    

Between Do and W Between C and W

16   16

Co-­‐Clustering  based  Classifica5on  (cont.)

17

Structural Correspondence Learning [Blitzer et al. ACL 2007]    

SCL: [Ando and Zhang, JMLR 2005] Method      

Define pivot features: common in two domains Find non-pivot features in each domain Build classifiers through the non-pivot Features

(1) The book is so repetitive that I found myself yelling …. I will definitely not buy another.

(2) Do not buy the Shark portable steamer …. Trigger mechanism is defective.

Book Domain

Kitchen Domain Harbin 2011

18 18

 

Received-Signal-Strength (RSS) based localization in an Indoor WiFi environment. Access point 2

Mobile device Access point 1

Access point 3

-30dBm

-70dBm

-40dBm

Where is the mobile device?

(location_x, location_y)

19

Distribution Changes  

   

The mapping function f learned in the offline phase can be out of date. Recollecting the WiFi data is very expensive. How to adapt the model ?

Night time period

Day time period

Time

20

Transfer Component Analysis Sinno Jialin Pan et al. IJCAI 2009, TNN 2011

Harbin 2011

21

If two domains are related, …

Observations Source Domain data

Target Domain data

Latent factors

Common latent factors across domains Sinno Jialin Pan

22

Semi-Supervised TCA Con: when the discriminative directions of the source and target domains are different, SSTCA may not work well.

Sinno Jialin Pan

23

Input: A lot of unlabeled data in a source domain and a few unlabeled data in a target domain. Goal: Clustering the target domain data. Assumption: The source domain and target domain data share some common features, which can help clustering in the target domain. Main Idea: To extend the information theoretic co-clustering algorithm [Dhillon et al. KDD-03] for transfer learning. Sinno Jialin Pan

Objective function that need to be minimized

where

Sinno Jialin Pan

References  

   

 

 

http://www.cse.ust.hk/~qyang/ publications.html http://www1.i2r.a-star.edu.sg/~jspan/SurveyTL.htm Wenyuan Dai, Qiang Yang, Gui-Rong Xue and Yong Yu. Boosting for Transfer Learning. In Proceedings of The 24th Annual International Conference on Machine Learning (ICML'07), Corvallis, Oregon, USA, June 20-24, 2007. 193 200 (PDF) Wenyuan Dai, Gui-Rong Xue, Qiang Yang and Yong Yu. Co-clustering based Classification for Out-of-domain Documents. In Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM KDD'07), San Jose, California, USA, Aug 12-15, 2007. Pages 210-219 (PDF) Sinno Jialin Pan, Ivor W. Tsang, James T. Kwok and Qiang Yang. Domain Adaptation via Transfer Component Analysis. In IEEE Transactions on Neural Networks (IEEE TNN). Volume 22, Issue 2, Pages 199-210, February 2011.

Harbin 2011

26

迁移学习及其应用 (2): Transfer Learning 2: Cross Task/ Label Space 杨强,Qiang Yang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong

http://www.cse.ust.hk/~qyang 27 27

Cross Task/Label Space Transfer  

Target Class Changes  Target Transfer Learning      

 

Training: 2 class problem Testing: 10 class problem. Traditional methods fail

Solution: find out what is not changed bewteen training and testing

28

Related Works  

Cross-Domain Learning    

   

 

Translated Learning    

 

TrAdaBoosting (ICML 2007) Co-Clustering based Classification (SIGKDD 2007) TPLSA (SIGIR 2008) NBTC (AAAI 2007) Cross-lingual classification (in WWW 2008) Cross-media classification (In NIPS 2008)

Unsupervised Transfer Learning  

Self-taught clustering (ICML 2008) 29

Our Work (cont)  

 

 

 

 

 

Wenyuan Dai, Yuqiang Chen, Gui-Rong Xue, Qiang Yang, and Yong Yu. Translated Learning. In Proceedings of Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS 2008), December 8, 2008, Vancouver, British Columbia, Canada. (Link) Xiao Ling, Wenyuan Dai, Gui-Rong Xue, Qiang Yang, and Yong Yu. Cross-Domain Spectral Learning. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM KDD 2008), Las Vegas, Nevada, USA, August 24-27, 2008. 488-496 (PDF) Wenyuan Dai, Qiang Yang, Gui-Rong Xue and Yong Yu. Self-taught Clustering. In Proceedings of the 25th International Conference on Machine Learning (ICML 2008), Helsinki, Finland, 5-9 July, 2008. 200-207 (PDF) Wenyuan Dai, Qiang Yang, Gui-Rong Xue and Yong Yu. Boosting for Transfer Learning. In Proceedings of The 24th Annual International Conference on Machine Learning (ICML'07) Corvallis, Oregon, USA, June 20-24, 2007. 193 - 200 (PDF) Wenyuan Dai, Gui-Rong Xue, Qiang Yang and Yong Yu. Co-clustering based Classification for Out-of-domain Documents. In Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM KDD'07), San Jose, California, USA, Aug 12-15, 2007. Pages 210-219 (PDF) Dou Shen, Jian-Tao Sun, Qiang Yang and Zheng Chen. Building Bridges for Web Query Classification. In Proceedings of the 29th ACM International Conference on Research and Development in Information Retrieval (ACM SIGIR 06). Seattle, USA, August 6-11, 2006. Pages 131-138. (PDF)

30

Classification in KDDCUP 2005  

   

ACM KDDCUP 05 Winner SIGIR 06 ACM Transactions on Information Systems Journal 2006  

Joint work with Dou Shen, Jiantao Sun and Zheng Chen 31

QC as Machine Learning Inspired by the KDDCUP’05 competition      

Classify a query into a ranked list of categories Queries are collected from real search engines Target categories are organized in a tree with each node being a category

32 32

Related Works

 

Document/Query Expansion  

Borrow text from extra data source  

 

 

 

Using hyperlink [Glover 2002]; Using implicit links from query log [Shen 2006]; Using existing taxonomies [Gabrilovich 2005];

Query expansion [Manning 2007]  

 

Global methods: independent of the queries Local methods using relevance feedback or pseudo-relevance feedback

 

Query Classification/ Clustering  

 

 

 

Classify the Web queries by geographical locality [Gravano 2003]; Classify queries according to their functional types [Kang 2003]; Beitzel et al. studied the topical classification as we do. However they have manually classified data [Beitzel 2005]; Beeferman and Wen worked on query clustering using clickthrough data respectively [Beeferman 2000; Wen 2001]; 33 33

Target-transfer Learning in QC  

Classifier, once trained, stays constant  

Target Classes Before  

 

Target Classes Now  

 

Sports (Olympics, Football, NBA), Stock Market (Asian, Dow, Nasdaq), History (Chinese, World) How to allow target to change?

Application:    

 

Sports, Politics (European, US, China)

advertisements come and go, but our querytarget mapping needs not be retrained!

We call this the target-transfer learning problem

34

Solutions: Query Enrichment + Staged Classification

Solution: Bridging classifier

35 35

Step 1: Query enrichment  

Textual information Title

 

Category information

Snippet Category

Full text

36 36

Step 2: Bridging Classifier

 

Wish to avoid:  

 

When target is changed, training needs to repeat!

Solution:  

Connect the target taxonomy and queries by taking an intermediate taxonomy as a bridge

37 37

Category Selection for Intermediate Taxonomy  

Category Selection for Reducing Complexity  

Total Probability (TP)

 

Mutual Information

38 38

Experiment ─ Data Sets & Evaluation  

ACM KDDCUP  

 

Starting 1997, ACM KDDCup is the leading Data Mining and Knowledge Discovery competition in the world, organized by ACM SIG-KDD.

ACM KDDCUP 2005        

Task: Categorize 800K search queries into 67 categories Three Awards (1) Performance Award ; (2) Precision Award; (3) Creativity Award Participation    

 

Evaluation data    

   

142 registered groups; 37 solutions submitted from 32 teams

800 queries randomly selected from the 800K query set 3 human labelers labeled the entire evaluation query set

Evaluation measurements: Precision and Performance (F1) We won all three. a

39 / 68

39

Summary: Target-Transfer Learning

Intermediate Class

Query

classify to

Similarity

Target class

40

Activity Recognition  

With sensor data collected on mobile devices  

Location  

 

Context: location, weather, etc.  

 

GPS, Wifi, RFID From GPS, RFID, Bluetooth, etc.

Various models can be used  

Non-sequential models:  

 

Naïve Bayes, SVM …

Sequential models:  

HMM, CRF …

Cross Domain Activity Recognition [Zheng, Hu, Yang, Ubicomp 2009]  

Challenges:  

 

A new domain of activities without labeled data

Cleaning   Indoor  

Cross-domain activity recognition  

Transfer some available labeled data from source activities to help training the recognizer for the target activities.

Laundry  

Dishwashing   42  

Source-Selection-Free Transfer Learning (SSFTL) Evan Wei Xiang, Sinno Jialin Pan, Weike Pan, Jian Su and Qiang Yang. SourceSelection-Free Transfer Learning. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI-11), Barcelona, Spain, July 2011.

Transfer Learning Supervised Learning Lack of labeled training data always happens

Transfer Learning When we have some related source domains

Where are the “right” source data?  

We may have an extremely large number of choices of potential sources to use.

SSFTL – Predictions for the incoming test data Learned  Projec'on   matrix  

Target Domain

d

Incoming Test Data

W   m

Projec'on  matrix  

vs. vs. vs.

vs. q

vs.

No need to use base models explicitly!

V   m

With  the  parameter  matrix  W,  we   can  make  predic'on  on  any  incoming   test  data  based  on  the  distance  to  the   label  prototypes,  without  calling  the   base  classifica'on  models  

Experiments on SSFTL  

 

From Wikipedia, we got 800K pages with 100K categories to build base classifiers Picked 50K pairs for binary base classifiers  

 

Social media:  

 

Training takes 2 days on single processor! 200K tags on 5M pages, 22K users

Data

SVM

TSVM SSFTL

20NG

69.8

75.7

80.6

Google

62.1

69.7

73.4

AOL

72.1

74.1

74.3

Reuters

69.7

63.3

74.3

Latent dimension |V| 20NG: The 20 Newsgroups dataset is a text =100 collection of ~ 20, 000 newsgroup documents

across 20 different topics evenly. We created 190 target binary text classification tasks, like hockey vs. religion and medicine vs. motorcycles, etc.

References  

 

 

 

http://www.cse.ust.hk/~qyang/ publications.html Dou Shen, Jian-Tao Sun, Qiang Yang and Zheng Chen. Building Bridges for Web Query Classification. In Proceedings of the 29th ACM International Conference on Research and Development in Information Retrieval (ACM SIGIR 06), Seattle, USA, August 6-11, 2006. Pages 131-138. [PDF] Vincent W. Zheng, Derek H. Hu and Qiang Yang. Cross-Domain Activity Recognition. In Proceedings of the 11th International Conference on Ubiquitous Computing (Ubicomp-09), Orlando, Florida, USA, Sept.30-Oct.3, 2009. (PDF) Evan Wei Xiang, Sinno Jialin Pan, Weike Pan, Jian Su and Qiang Yang. SourceSelection-Free Transfer Learning. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI-11), Barcelona, Spain, July 2011. (PDF)

Harbin 2011

48

Towards Heterogeneous Transfer Learning Qiang Yang Hong Kong University of Science and Technology Hong Kong, China

http://www.cse.ust.hk/~qyang

49

TL Resources  

http://www.cse.ust.hk/TL

50

Learning by Analogy  

Gentner 1983: Structural Correspondence  

Mapping between source and target:  

   

mapping between objects in different domains e.g., between computers and humans

mapping can also be between relations

Anti-virus software vs. medicine

 

 

Falkenhainer,Forbus, and Gentner (1989) Structural Correspondence Engine: incremental transfer of knowledge via comparison of two domains  

 

Case-based Reasoning (CBR)  

e.g.,(CHEF) [Hammond, 1986] ,AI planning of recipes for

cooking, HYPO (Ashley 1991), …

51

Heterogeneous Transfer Learning Multiple Domain Data

Feature   Spaces

Heterogeneous  

Homogeneous

Instance   Alignment  ? Yes

Mul5-­‐view   Learning Source Domain

Data   Distribu5on? No

Heterogeneous   Transfer   Learning Apple  is  a  fr-­‐ uit  that  can   be  found  …

Different Transfer  Learning   across  Different   Distribu5ons

Same

Tradi5onal   Machine   Learning

Banana  is   the  common   name  for…

Target Domain

HTL

52

Cross-language Classification WWW 2008: X.Ling et al. Can Chinese Web Pages be Classified with English Data Sources?

Classifier   learn  

Labeled  

English  

Web  pages  

classify  

Unlabeled   Chinese  Web   pages  

Cross-­‐language  ClassificaHon  

53

HTL Setting: Text to Images    

Source data: labeled or unlabeled Target training data: labeled Training: Text

Apple

The apple is the pomaceous fruit of the apple tree, species Malus domestica in the rose family Rosaceae ...

Banana

Banana is the common name for a type of fruit and also the herbaceous plants of the genus Musa which produce this commonly eaten fruit ...

Testing: Images

54

HTL for Images: 3 Cases  

Source Data Unlabeled, Target Data Unlabeled  

 

Source Data Unlabeled, Target Data Training Data Labeled  

 

Clustering

HTL for Image Classification

Source Data Labeled, Target Training Data Labeled  

Translated Learning: classification

Annotated PLSA Model for Clustering Caltech  256  Data

Heterogeneous  Transfer   Learning  

Average  Entropy   Improvement  

5.7%  

From Flickr.com

SIFT Features

Words from Source Data Topics Image features

Image instances in

target data

… Tags Lion Animal Simba Hakuna Matata FlickrBigCats …

56

 

“Heterogeneous transfer learning for image classification”   Y. Zhu, G. Xue, Q. Yang et al.   AAAI 2011

57

Case 2: Source is not Labeled; Goal: Classification Target data

Unlabeled Source data

A few labeled images as training samples

Testing samples: not available during training. 58

Structural Transfer Learning

59

Structural Transfer  

Transfer Learning from Minimal Target Data by Mapping across Relational Domains    

 

Lilyana Mihalkova and Raymond Mooney In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09), 1163--1168, Pasadena, CA, July 2009. ``use the short-range clauses in order to find mappings between the

relations in the two domains, which are then used to translate the long-range clauses.’’  

Transfer Learning by Structural Analogy.    

 

Huayan Wang and Qiang Yang. In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI-11). San Francisco, CA USA. August, 2011. Find the structural mappings that maximize structural similarity

References  

 

 

 

http://www.cse.ust.hk/~qyang/ publications.html Huayan Wang and Qiang Yang. Transfer Learning by Structural Analogy. In Proceedings of the 25th AAAI Conference on Artificial Intelligence ( AAAI-11). San Francisco, CA USA. August, 2011. (PDF)Yin Zhu, Yuqiang Chen, Zhongqi Lu, Sinno J. Pan, Gui-Rong Xue, Yong Yu and Qiang Yang. Heterogeneous Transfer Learning for Image Classification. In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI-11). San Francisco, CA USA. August, 2011. (PDF) Qiang Yang, Yuqiang Chen, Gui-Rong Xue, Wenyuan Dai and Yong Yu. Heterogeneous Transfer Learning for Image Clustering via the Social Web. In Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP (ACL-IJCNLP'09), Sinagpore, Aug 2009, pages 1–9. Invited Paper (PDF) Wenyuan Dai, Yuqiang Chen, Gui-Rong Xue, Qiang Yang, and Yong Yu. Translated Learning. In Proceedings of Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS 2008), December 8, 2008, Vancouver, British Columbia, Canada. (Link 61 Harbin 2011

Transfer Learning in Link Prediction and Social Web Qiang Yang Hong Kong University of Science and Technology Hong Kong, China

http://www.cse.ust.hk/~qyang

62

Network Transfer: Topological Mapping  

Expressed Through a Codebook

63

Recommendation Systems

Product Recommendation as Link Prediction    

Task: predict missing links in a network Focus:  

bipartite graph of users and items

65

1

?

5

?

3

? 4 4

?

Realistic Setting

?

Training Data: Sparse Rating Matrix

?

Test Data:

CF model

Rating Matrix

Density