Digital Enterprise Research Insitute (DERI),. National University of Ireland, Galway
An Open Framework for Multi-source, Cross-domain Personalisation with Semantic Interest Graphs Benjamin Heitmann, Digital Enterprise Research Insitute (DERI), National University of Ireland, Galway

Problem

Users expect personalised experiences  Preferences are: 


many domains




! Travel

New site: 

no user profile

no recommendations

Not just cold-start problem: 



rec. approach




domain specific web sites

closed approach to crossdomain personalisation

Centralised user profile ➡ Data sharing and aggregation ➡ Closed system ➡ No portability ➡ User trade-offs: privacy, trust, data ownership, control ➡ Examples: Facebook, Google+, Twitter ➡

express preference authentication for user action web site interaction cross domain data sharing if authorised by user

recommendations for external site provided by facebook

Killer application for the Semantic Web ?

Significance

Problems with current recommendation process: 1. Only single source recommendations 2. Only single domain recommendations 3. No portability of user profile data Who needs this? Sounds far fetched? Enabling Networked Knowledge Benjamin Heitmann, slide: 5 /18

Research questions

Architecture 

Enabling of a decentralised eco-system?

Aggregation of user profile fragments?

Privacy-enabling and interoperable at the same time?

User model 

Data structure for merging?

Background knowledge?

Domain definitions?

Algorithm 

Type of algorithm?

Data sets and metrics for evaluation? Enabling Networked Knowledge Benjamin Heitmann, slide: 6 /18

Alternative: an open framework for cross-domain recommendations

Architecture for privacy-enabled profile exchange private and secure aggregation of profiles

Distributed and domain-agnostic user model

Cross-domain recommendation algorithm

merging of profiles

New recommendations:

Travel destinations:


semantic interest graph

spreading activation algorithm finds new recommendations

Architecture for privacy-enabled and portable user profiles

User profile: 

Profile data expressed using RDF (FOAF+SIOC)

WebID provides identity (2 parts) – private SSL Key in user agent – public SSL Key in FOAF profile


private key

user agent

FOAF Profile

Roles: 

user agents: manage user identities

profile storage service: stores 1 or more profiles

data consumers: provide services for users

public key

stored in

data consumer

retrieves user profile if user authorises it profile storage site

RDF is a graph!

graph with typed edges and typed vertices  expressed as triples of: subject, predicate, object  entity types: 

URIs  Strings, optionally with language tag XOR data type  Blank Nodes  Rules for the triples: 

– subject can be URI or Blank, predicate just URI, object anything.

User model

Semantic interest graphs: 

interests represented as

DBpedia URIs

supports merging


transferable between systems

independent of actual inventory

Unsuitable alternatives: 

item-rating vector

lists of plain text items/tags

Semantic background knowledge + domain definitions

Background knowledge: 


semantic graph

indirect connections

categories and properties for concepts

Domain definitions: using SKOS (Simple Knowledge Organisation System)

define recommendable entities of domain

– Music: artists, cds, tracks, genre – Food: restaurant, dish, chef, ingredient



travel destination

Cross-domain recommendation algorithm

 Recommendable items


Richard Dawkins

Atheism Activists dc:subject

User profile

Algorithm: 

Spreading activation


uses semantic network

able to provide results of specified target domain (or inventory)

influencedBy dc:subject Kurt Vonnegut

Douglas Adams author

Start of spreading activation

birthplace The Hitchhikers Guide to the Galaxy (novel)

Cambridge subsequentWork

author subdivisionName

Restaurant at the end of the universe



Unconstrained SA == depth-first search (pictured)  Why does that already work ? 



United Kingdom

DBpedia is super dense!

avg degree: ~20

SA configuration

Constraints: 

fan-out / distance


Semantic configuration type of activations (domain definition)

activation type

Link type weights


Node/Link black-lists

num of target activations

fanout penalty

distance penalty

initial activation spread

activation threshold

maximum degree

Iterative algorithm num of phases: re-activation after stabilising?

num of waves per phase: how often to spread?

Implementation: 

HDT RDF store (in-house)

Neo4J, Giraph/Hadoop unsuccessful

Evaluation plan

Theoretical framework: Link prediction problem  Metrics: 

AUC & precision

diversity, novelty, personalisation, heterogeneity

Data sources:  

User profiles: StackExchange network (cross-domain user profiles) Background knowledge: DBpedia

Baseline algorithms: 

Linked Data Semantic Distance (LDSD)

Random Walker with Restart (RWR)

Collaborative Filtering (CF)

Impact

Academia:   

Personalisation: hot topic in Semantic Web community Many (!) workshops Best paper award at I-Semantics 2012: “Linked Open Data to support Content-based Recommender Systems”, Di Noia et al.

Industry: 

current Cisco Ireland collaboration bought by eBay in 2011

StumbleUpon presentation Wed,RecSys2012

Personalisation has become a commodity

Facebook approach requires multi-source, cross-domain recs.

Decentralised SocNets like Diaspora*

Achievements and future plans

Finished: 

algorithm implementation


Next step: Off-line evaluation using all StackExchange data  User study, depends on time constraints 

Publications  “Personalisation of Social Web Services in the Enterprise Using Spreading Activation for Multi-Source, Cross-Domain Recommendations”, AAAI Spring Symposium on Intelligent Web Services Meet Social Computing, 2012.  “An architecture for privacy-enabled user profile portability on the Web of Data”, HetRec Workshop at RecSys 2010.  “An empirically-grounded conceptual architecture for applications on the Web of Data”, IEEE Transactions on Systems, Man and Cybernetics, Part C - Applications and Reviews, 2011.

Summary

Goal of research:  

alternative to current closed ecosystems

mechanism for authorisation of data exchange through user enables private and secure profile exchange

2.) Distributed and domain-agnostic user model  

open framework for cross-domain & multi-source recommendations

1.) Architecture for privacy-enabled profile exchange: 

provides semantic graph as user model, background knowledge and domain definitions enables aggregating and merging of profiles

3.) Cross-domain recommendation algorithm  

provides graph-based algorithm enables personalisation in a target domain using any interests

Characteristics of the Spreading Activation algorithm

SA is very different from e.g. PageRank  “SA is depth-first search, guided/interrupted by domain logic and algorithm conditions”  Challenges when implementing SA: 

requires semantic graph

size of data (DBPedia: 11 mio. entities, 40 mio. edges)

iterative algorithm

embedding of domain logic

stateful nodes

execution speed for Cisco ADVANSSE use case

