Enabling Personalization Services on the Edge Xing Xie, Hua-Jun Zeng, Wei-Ying Ma Microsoft Research Asia 3F, Sigma Center, No 49 Zhichun Road Beijing 100080, P.R.China
{i-xingx, i-hjzeng, wyma}@microsoft.com ABSTRACT In this paper, we describe our work on enabling personalization services on the edge of the Internet. In contrast to traditional approaches, this method offers many advantages. First, content providers can make their content people-aware while the content is delivered to the user. Second, since the personalization work is distributed to an overlay network of edge servers, it greatly improves the scalability and availability of the services. Third, the sufficiency of user data collected in edge servers enables us to build more accurate user models which further enhance the performance of personalization services. We describe the implementation of a prototype system named Avatar and present some experimental result in this paper.
Keywords Content delivery network, edge computing, adaptive content delivery, personalized Web, user modeling
1. INTRODUCTION As the Internet and pervasive Web services continue to penetrate into the user’s daily life, we have seen a growing demand to make Web services, information search and delivery personalized and customized according to the need of an individual user. Personalization or customization of Web can be seen as a negotiation between content and user contexts. Therefore, both content structure and user information are needed before personalization can be performed. Currently, the task of personalization is mostly performed at the servers of popular Web sites. Since the server generates and hosts the content, it knows the content structure and semantics while user information is obtained either through static profile inputted by the user in advance or through techniques such as collaborative filtering. These server-side approaches are limited by many issues. First, for most small-to-medium Web sites, it is hard to obtain complete user profile as the user is often reluctant and impatient in providing his personal information. Second, static profile is hard to cope with dynamic nature of user’s interest and preference. To alleviate the limitation, user modeling is often applied, which
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ACM Multimedia ’02, Dec 1-6, 2002, Juan Les Pins, France. Copyright 2002 ACM 1-58113-000-0/00/0000…$5.00.
however, requires a large amount of user data in order to obtain accurate result. These problems have prevented the wide deployment of personalization services on the Web today. Personalization can also be performed at client side. The simplest example is that you can decide whether you want to see pictures in Web pages (e.g. to save bandwidth) by configuring your Web browser. Web Montage [2] is a recent work on client-side personalization for routing Web browsing, which tries to assemble links and content into a single personalized view. Since the user information is stored at client device, privacy issue could be easily managed. However, without the cooperation of Web servers, the client-side approaches have limitation on providing rich and complex personalization services. Extending functionality of proxies for personalization has been studied by numerous researchers. The WBI framework [9] is a typical work in this area. It has attempted to enable intermediaries to incorporate processes that produce personalized information. In [7] and [8], an overlay network of proxy servers was used to provide intelligent content-oriented services including personalization to the user. The benefit of proxy-based personalization is two-fold. First, it offloads the burden of content provider, and therefore, the scalability and availability of the service can be greatly improved. Second, the collection of user data shared by a network of proxy servers can greatly enhance the performance of user modeling. In this paper, we describe our work on enabling personalization services on the edge through a network of proxy servers (also called edge servers). We first introduce the edge services architecture in Section 2, and then discuss the implementation of our personalization services on the edge in Section 3. Section 4 describes our user modeling algorithm and query interface for user information. Experimental results are provided in Section 5. Section 6 summarizes our work and future directions.
2. EDGE SERVICES ARCHITECTURE As the Internet is moving towards the service-centric model, more and more computational resources are being deployed at the edge of network for value-added services [1][7][8][13]. OPES [11], a working group in IETF, is currently addressing the problem of extending the functionality of a caching proxy for providing additional services that mediate, modify, and monitor object requests and responses. Framework for Personalization of Content and Services (FPCS) [12] is a general framework used for performing personalization services based on the architecture of OPES. Edge Side Include (ESI) [4] is a mechanism which allows an embedded script in the HTML file to instruct caching proxies to assemble content accordingly. Our work is closely related to the ongoing standardizations, and can easily accommodate them once they are finalized.
Edge Server Web Server Internet Database
Application
Application
Presentation
Query Interface
User log
Web Server
Presentation
Internet User Modeling Database
Application
User Profile Store Edge Server
Figure 2. The framework of personalization services. Application Presentation
Figure 1. Edge services framework. In the following, we briefly describe this edge services framework and discuss how the personalization services can be decomposed into a sequence of steps and executed through the collaboration of origin server, edge server, and client device. As shown in Figure 1, in the classical client-server model, a Web server can be divided into three tiers: database, application logic and presentation. Traditional caching proxies can be viewed as the first example of edge server which moves the representation tier closer to the user by caching the static output of a Web server to reduce the user’s perceived latency. For personalized content which is dynamic in nature, part of the application tier would need to move to the edge so as to offload the processing and better serve the user at different geographical region. The edge services framework consists of an overlay network of edge servers which are resources to origin servers, capable of caching content as well as replicating certain application logic and executables from origin servers. To enable personalization services on the edge, the following design is used to decompose the process: 1. The origin server only needs to provide the logic on how to personalize Web content or service for a particular user based on his profile. This logic can be written in an edge-side include or executable that queries the user’s profile and makes decisions accordingly.
Therefore, provided that proper privacy is enforced, user modeling based on Web logs collected in edge servers could be used to discover new user information. Figure 2 shows the framework of our personalization services. It consists of a Web log database, a user modeling module, a query interface module and a user profile store. A server side script containing instruction on how to personalize the corresponding content or application will communicate with the query interface for user information. We assume the edge servers have a trust relationship with the user for managing his profile. The edge servers provide a query interface and pre-defined schema that allow the origin server to access information about the user. Every client visit will be recorded into a Web log database. The user modeling module periodically analyzes the log, creating or updating user profiles in a local database. When a client sends a HTTP request, it is redirected to an appropriate edge server by the original Web server. Assuming that the requested page is a serverside script which has been delivered to the edge in advance, it will then be directly executed on the edge. If the script contains function calls to personalization modules, then these calls will be transformed into the queries and sent to the query interface. The implementation of personalization module and user modeling module will be discussed in the next section.
4. USER MODELING
2. The query to the user’s profile is resolved by edge server which may contact with other servers for information.
Our user modeling consists of two parts: static profile and dynamic profile which is learned through the user’s previous browsing pattern and access history as shown in Figure 3. The query interface is provided to other Web applications for answering questions related to the user’s interest and preference.
3. The process of making content and services personalized is executed at edge server.
4.1 Profile Structure
The details of how to deploy personalization services is discussed in the next section.
3. PERSONALIZATION SERVICES User information is the key to personalization services. In practice, it can be directly inputted by end users or transmitted from client machine to server through a protocol like CC/PP [3]. The static profile cannot cope with user’s interest and preference shifting.
The static profile (often called a stereotype) is represented through a frame that includes a collection of slots. Each slot is identified by a name and encompasses a symbolic or numeric value. Storing static profiles in a server allows users input their profile only one time, while they are available for many applications. To overcome the problem of incompleteness and invariability of static profiles, a learning process is used to create dynamic profiles of users. The dynamic profile is represented by ontology,
Users
The accuracy of our user modeling relies heavily on time. In the beginning when user only visited a small number of Web pages, the user’s true interests cannot be completely revealed. As time elapses, the user model becomes more and more accurate. However, when the period gets long, some of the user’s interests may shift as well. In our implementation, a “forgetting” factor γ is used to decrease the effect of early access record:
Applications
∆Wi = γ days User Profile Store Manually input User’s behavior
Static Profile
4.3 Query Interface
Answer
Figure 3. The user profile store contains both the static and dynamic profiles. specifying the user’s interests based on a tree-hierarchy of categories. To simplify the learning problem, we eliminate the possibility of multi-inheritance in the structure. The nodes of the ontology denote a domain that the user is interested in and numeric scores associated with each node denoting the degree of interest. The names and descriptions associated with each node are obtained from the Web directory of Open Directory Project (ODP) [10]. We extract 9 categories and about 300 sub-categories from it. The same structure is applied to each user, but with different interest scores on the nodes. Although it looses some personality, it enables the transformation and matching between ontologies and also reduces the storage space required for each user. The probability that a user is interested in a particular node given the context of the parent node can be calculated as follows:
p ( N child | N parent ) =
Wchild ∑ Wi
(1)
i∈{children of N parent }
where N denotes a node and W is the score associated with the node.
4.2 The Learning Algorithm Dynamic profiles are initially converted from static profiles. As time goes on, it is updated by learning from user’s behavior. The scores of nodes on the ontology of a particular user are calculated by classifying the content that the user has visited. Here we only consider content to be Web pages. When a Web page d is visited by the user, we first analyze its structure and filter out redundant information such as decoration, navigation, advertisement, etc. The remained content of the page is represented using a bag-ofwords model. Then, we compute the membership value of the page d in each category ci of the 300 sub-categories in the ontology using the following equation:
∆Wi =
time ⋅ similarity ( d , ci ) length
(3)
where γ is a value less than 1.
Query
Dynamic Profile
time ⋅ similarity (d , ci ) length
(2)
where time is the user’s dwell time and length is the length of the Web page d. ∆Wi is then added to the score of category ci.
Based on the knowledge about the user, the system provides a query interface and pre-defined question schema that the origin server can use to get information about the user. For example when receiving a request, the Web site could send a query about the user’s interest in buying a DVD to the query interface. The system will respond Yes/No with a confidence measure based on the knowledge in the user profile store. Then Web applications can personalize the content or service such as embedding relevant advertisement. This query interface is realized as a set of APIs that could be invoked by the scripts in Web pages and applications. The main part of this set of APIs is query functions, by which the origin server can access certain properties of the user on the edge of the Internet. One of the query functions is IsInterestedIn() function, which takes the input of a domain of interest and output whether or how likely the corresponding user will be interested in it. When the IsInterestedIn() function is called, the system calculates the probability of the user’s interest on this node according to Equation (1).
5. EXPERIMENTS We have built a prototype system named Avatar to evaluate our system design. It is currently deployed in our corporate network. Figure 4 is a snapshot of the tool for managing the user profile store which was built upon our previous work [6]. The left panel of the tool lists all users in the system. The top-right corner shows the static profile of the selected user. The bottom-right window displays the user’s dynamic profile. The length of red bars indicates the interest degree of this user in corresponding node. Figure 5 shows an example of performing personalization services on the edge of the Internet. In this example, we try to generate customized advertisements in a Web page for different users according to their interests. Since each user needs a unique ID, Microsoft’s Passport is employed here to identify the user, which is more precise than using only IP address. Avatar.Profile is a class provided by the query interface discussed in Section 4.3 for accessing the user information on edge servers. GetProfile is a static member function of the class Avatar.Profile. It returns a profile object corresponding to the given user ID. After a user profile object is obtained, applications can select customized advertisements based on user’s current interest. The code for this page is written and embedded by the original Web server and executed at the edge server.
Get user ID by Microsoft’s Passport Get user’s profile from the database
Static Profile
User list
Generate customized ad
System.Web.Security.PassportIndetity id = Page.User.Identity; Avatar.Profile profile = Avatar.Profile.GetProfile (id.Name); string AdText = “
”; string AdURL = GetAdURL (profile.CurrentInterest); string AdImgSrc = GetAdImgSrc (profile.CurrentInterest); lbAdv.Text = String.Format (AdText, AdURL, AdImgSrc); … Web page C# code for generating dynamic content in the corresponding area
client
A dynamic area defined by ASP.NET
Figure 5. An example of personalized advertisement insertion.
Dynamic Profile
8. REFERENCES
Figure 4. Tool for managing the user profile store. We evaluated the performance of user modeling scheme using a set of the ground truth collected from six users. The relative entropy measure [5] is used to compute the similarity between the learned and true ontologies which describe the user’s interest. It is calculated by applying the following relation recursively:
D' (r ) = D (r ) + ∑ p (k | r ) D' (k )
(4)
∀k
[1] Akamai. http://www.akamai.com. [2] Anderson, C. R., and Horvitz, E. Web montage: a dynamic personalized start page. in Proceedings of WWW’02 (Honolulu, Hawaii, May, 2002), ACM Press, 704-712.
[3] Composite Capabilities / Preferences Profile Working Group (CC/PP). http://www.w3.org/Mobile/CCPP.
[4] Edge Side Includes (ESI). http://www.esi.org.
where r is a common node of the two ontologies, k is an immediate child node of r, D is the entropy of current node, D′ is the relative entropy of current node, and p(k|r) is calculated by Equation (1).
[5] Koh, W., and Mui, L. An information-theoretic approach for
The intuitive meaning of the relative entropy measure is: given a learned ontology, what is the error if this ontology is used to describe the user’s interest. The average relative entropy is 2.25. Comparing to 3.22 which is the relative entropy of totally randomized ontologies, our algorithm is quite effective.
managing personal multimedia files. in Proceedings of ACM Multimedia’01 (Ottawa, Canada, Sep. 2001), ACM Press, 519-521.
6. SUMMARY AND FUTURE WORK In this paper, we presented an edge services architecture for providing personalization services on the edge of the Internet. Many advantages are offered by this architecture in contrast to traditional approaches, such as the improvement of scalability and availability of the services and the ability to conduct more accurate user modeling. Many issues remain to be studied, including security issues and how to protect the user’s privacy. The management of user profile databases is much like a distributed database problem. We plan to further investigate these issues in our future work.
7. ACKNOWLEDGEMENT Thanks to Zheng Chen, Yu Chen, and Zheng Zhang for many valuable inputs on this work. We are also thankful to all the colleagues of Microsoft Research Asia who took part in the evaluation of this system.
ontology-based interest matching. IJCAI’01 Workshop on Ontology Learning (Seattle, WA, Aug. 2001).
[6] Liu, W. Y., Chen, Z., etc. Ubiquitous media agents for
[7] Ma, W. Y., Shen, B., and Brassil, J. Content services network: the architecture and protocols. in Proceedings of 6th International Workshop on Web Caching and Content Distribution, pp 83-101, Boston, June 2001.
[8] Ma, W. Y., Yuan, C., etc. Media companion: delivering content-oriented Web services to Internet media. Microsoft Technical Report, MSR-TR-2002-61, Microsoft Research. Apr. 2002.
[9] Maglio, P., and Barrett, R. Intermediaries personalize information streams. Communication of the ACM, 43(8), Aug. 2000, 96-101.
[10] Open Directory Project (ODP). http://dmoz.org. [11] Open Pluggable Edge Services (OPES). http://ietf-opes.org. [12] Personalization of Content and Services (PCS). http://pcs.eng.registro.br.
[13] WebSphere Edge Services. IBM white paper. Jan. 2000.