History. ⢠Grad school Rutgers, job at AT&T. ⢠Worked in group doing KR, DB, learning, information retrieval, â
Unifying Personalized PageRank and Prolog
William W. Cohen with: William Yang Wang, Katie Mazaitis, Einat Minkov, Ni Lao, Tom Mitchell & others
Machine Learning Dept. and Language Technologies Inst. School of Computer Science Carnegie Mellon University
My History Machine Learning
Representation languages: DBs, KR
Text cat, IR, IE
History 82
94
1982/1984: Ehud Shapiro’s thesis: – MIS: Learning logic programs as debugging an empty Prolog program – Thesis contained 17 figures and a 25-page appendix that were a full implementation of MIS in Prolog – Incredibly elegant work
96
• “Computer science has a great advantage over
84 86 88 90 92
98 00 04 08 12
•
other experimental sciences: the world we investigate is, to a large extent, our own creation, and we are the ones to determine if it is simple or messy.”
History 82 84 86 88 90 92 94 96 98 00 04 08 12
• Grad school Rutgers, job at AT&T • Worked in group doing KR, DB, learning, information retrieval, … • My work: learning logical (description-logic-like, Prolog-like, rule-based) representations that model large noisy real-world datasets.
History 82 84 86 88 90 92 94 96 98 00 04 08 12
• The web takes off – as predicted by William Gibson
• IR folks start looking at retrieval and questionanswering with the Web • Alon Halevy (DB guy) starts the Information Manifold project to integrate data on the web – VLDB 2006 10-year Best Paper Award for 1996 paper on IM • I started got very interested in information integration….
History 82 84 86 88 90 92 94 96 98 00
• As the world of computer science gets richer and more complex, computer science can no longer limit itself to studying “our own creation”. • Tension exists between – Elegant theories of representation – The not-so-elegant real world that is being represented
04 08 12
• Concise logical representations often “don’t fit” complex realworld data
History 82 84 86 88 90 92 94 96 98 00 04 08 12
• The beauty of the real world is its complexity….
History 82 84 86 88 90 92 94 96 98 00 04 08 12
• The web takes off – as predicted by William Gibson
• IR folks start looking at retrieval and questionanswering with the Web • Alon Halevy (DB guy) starts the Information Manifold project to integrate data on the web – VLDB 2006 10-year Best Paper Award for 1996 paper on IM • I started got very interested in information integration….
WHIRL language:
SELECT R.a,S.a,S.b,T.b FROM R,S,T WHERE R.a~S.a and S.b~T.b Link items as needed by Q
Incrementally produce a ranked list of possible links, with “best matches” first. User (or downstream process) decides how much of the list to generate and examine.
(~ TFIDF-similar)
Query Q
R.a
S.a
S.b
T.b
Anhai
Anhai
Doan
Doan
Dan
Dan
Weld
Weld
William
Will
Cohen
Cohn
Steve
Steven
Minton
Mitton
William
David
Cohen
Cohn
History 82 84 86 88 90 92 94 96 98 00 04 08 12
• Alon Halevy (DB guy) starts the Information Manifold project to integrate data on the web – VLDB 2006 10-year Best Paper Award for 1996 paper on IM • William Cohen (ML guy) wrote WHIRL system, bridging KR/DB ideas with a key IR idea: integration by reasoning about the similarity of strings • Combining complex models of similarity and logic – SIGMOD 2008 10-Year Best Paper Award for 1998 Paper on WHIRL
Beyond TFIDF: graph similarity 82 84
“William W. Cohen, CMU”
86 88 90 92 94 96 98 00 04 08 12
cohen dr
william
w
“Dr. W. W. Cohen”
“Christos Faloutsos, CMU”
cmu
“George H. W. Bush” “George W. Bush”
Personal Info Management as Similarity Queries on a Graph Einat Minkov, Univ Haifa [SIGIR 2006, EMNLP 2008, TOIS 2010]
NSF
Sent To
Term In Subject
William graph proposal CMU 6/17/07 6/18/07
[email protected]
Beyond TFIDF: graph similarity 82 84 86 88 90 92 94 96 98 00 04 08 12
• Personalized PageRank aka Random Walk with Restart: – Similarity measure for nodes in a graph, analogous to TFIDF for text in a WHIRL database
– natural extension to PageRank – amenable to learning parameters of the walk (gradient search, w/ various optimization metrics): • Toutanova, Manning & NG, ICML2004; Nie et al, WWW2005; Xi et al, SIGIR 2005 – very fast to compute – queries: Given type t* and node x, find y:T(y)=t* and y~x Given type t* and nodes X, find y:T(y)=t* and y~X
Tasks can be reduced to similarity queries Person name disambiguation
[ term “andy” file msgId ] “person”
Threading
q What are the adjacent messages in this thread? q A proxy for finding “more messages like this one”
Alias finding
What are the email-addresses of Jason ?...
[ file msgId ] “file” [ term Jason ] “email-address”
Meeting attendees finder
Which email-addresses (persons) should I notify about this meeting?
[ meeting mtgId ] “email-address”
Results on one task + Learning
100%
80%
Recall
PERSON NAME DISAMBIGUATION
Mgmt. game
60%
40%
20%
0% 1
2
3
4
5
6
Rank
7
8
9
10
Beyond TFIDF: graph similarity 82 84 86
• Personalized PageRank aka Random Walk with Restart: – Given type t* and nodes X, find y:T(y)=t* and y~X
88 90 92 94 96 98 00 04 08 12
• New and better learning methods – richer parameterization – faster PPR inference – structure learning
• Other tasks: – relation-finding in parsed text – information management for biologists – inference in large noisy knowledge bases – work with Ni Lao (formerly CMU, now Google)
History Machine Learning
Representation languages: DBs, KR
Linguistic similarity: NLP, IE, IR
Machine Learning
Representation languages: DBs, KR
Linguisticègraph similarity: NLP, IE, IR
Machine Learning
Representation languages: DBs, KR
????
Linguisticègraph similarity: NLP, IE, IR
Unifying Personalized PageRank and Prolog: ProPPR
William Yang Wang, Katie Mazaitis
Sample ProPPR program….
Horn rules
features of rules
D’oh! This is a graph!
.. and search space…
• Score for a query soln (e.g., “Z=sport” for “about(a,Z)”) depends on probability of reaching a ☐ node* • learn transi=on probabili=es based on features of the rules • implicit “reset” transi=ons with (p≥α) back to query node • Looking for answers supported by many short proofs “Grounding” size is O(1/αε)
… ie independent of DB size è fast approx incremental inference (Reid,Lang,Chung, 08) Learning: supervised variant of personalized PageRank (Backstrom & Leskovic, 2011)
*Exactly as in Stochastic Logic Programs [Cussens, 2001]
Sample Task: Cita=on Matching • Task: • cita=on matching (Alchemy: Poon & Domingos). • Dataset: • CORA dataset, 1295 cita=ons of 132 dis=nct papers. • Training set: sec=on 1-‐4. • Test set: sec=on 5. • ProPPR program: • translated from corresponding Markov logic network (dropping non-‐Horn clauses) • # of rules: 21.
Task: Cita=on Matching
Time: Cita=on Matching vs Alchemy
“Grounding” is independent of DB size
Accuracy: Cita=on Matching
Our rules UW rules
AUC scores: 0.0=low, 1.0=hi w=1 is before learning
It gets becer….. • Learning uses many example queries • e.g: sameCitation(c120,X) with X=c123+, X=c124-, … • Each query is grounded to a separate small graph (for its proof) • Goal is to tune weights on these edge features to optimize RWR on the query-graphs. • Can do SGD and run RWR separately on each query-graph • Graphs do share edge features, so there’s some synchronization needed
Learning can be parallelized by splidng on the separate “groundings” of each query
Another Sample Task
Lao: A learned random walk strategy is a weighted set of random-walk “experts”, each of which is a walk constrained by a path (i.e., sequence of relations) Recommending papers to cite in a paper being prepared 1) papers co-cited with on-topic papers
6) approx. standard IR retrieval 7,8) papers cited during the past two years
12-13) papers published during the past two years
Another study: learning inference rules for a noisy KB (Lao, Cohen, Mitchell 2011)
AthletePlays ForTeam HinesWard
Steelers
TeamPlays InLeague
AthletePlaysInLeague ?
NFL
IsA PlaysIn
American isa-1
Synonyms of the query team
• Paths learned are like ProPPR rules • …but they are learned separately for each rela=on type, and one learned rule can’t call another athletePlaySport(Athlete,Sport) ç onTeam(Athlete,Team), teamPlaysSport(Team,Sport) teamPlaysSport(Team,Sport) ç memberOf(Team,Conference), hasMember(Conference,Team2), plays(Team2,Sport). teamPlaysSport(Team,Sport) ç onTeam(Athlete,Team), athletePlaysSport(Athlete,Sport)
• Paths learned are like ProPPR rules • …but they are learned separately for each rela=on type, and one learned rule can’t call another athletePlaySportViaRule(Athlete,Sport) ç onTeamViaKB(Athlete,Team), teamPlaysSportViaKB(Team,Sport) teamPlaysSportViaRule(Team,Sport) ç memberOfViaKB(Team,Conference), hasMemberViaKB(Conference,Team2), playsViaKB(Team2,Sport). teamPlaysSportViaRule(Team,Sport) ç onTeamViaKB(Athlete,Team), athletePlaysSportViaKB(Athlete,Sport)
Experiment: • Take top K paths for each predicate learned by Lao’s PRA • (I don’t know how to do structure learning for ProPPR yet) • Convert to a mutually recursive ProPPR program • Train weights on entire program (~=800 rules, 12k queries)
athletePlaySport(Athlete,Sport) ç onTeam(Athlete,Team), teamPlaysSport(Team,Sport) athletePlaySport(Athlete,Sport) ç athletePlaySportViaKB(Athlete,Sport) teamPlaysSport(Team,Sport) ç memberOf(Team,Conference), hasMember(Conference,Team2), plays(Team2,Sport). teamPlaysSport(Team,Sport) ç onTeam(Athlete,Team), athletePlaysSport(Athlete,Sport) teamPlaysSport(Team,Sport) ç teamPlaysSportViaKB(Team,Sport)
Joint Inference for Rela=on Predic=on • • • • •
Task: link predic=on. Dataset: a subset of 19,527 beliefs from NELL. Training set: 12,331 queries. Test set: 1,185 queries. # Rules: 797.
You can do more with ProPPR…
Machine Learning
Representation languages: DBs, KR
ProPPR
Linguisticègraph similarity: NLP, IE, IR
• Semantically simple • Extends PPR and Prolog • Scalable and flexible: • Applicable to very large databases even with arbitrary recursion in a logic program • Easily parallelizable learning-to-perform-PPR method o Not (yet) fast o Learned probabilities are about a proof process on a logic program, not about state of the world