Recommending Access to Web Resources based ...

3 downloads 399 Views 229KB Size Report
Policy Information Point (PIP) are used to retrieve data related to users ..... hard drive with active journaling, which is made available through traceability.
Recommending Access to Web Resources based on User’s Profile and Traceability Nuno Bettencourt and Nuno Silva GECAD - Knowledge Engineering and Decision Support Research Center Polytechnic of Porto Porto, Portugal nmb,[email protected]

Abstract—Toward a constant demand for sharing and managing available resources on the Internet, users need better tools and applications for providing resource authorization as well as tools for recommendation about resource sharing. Access control permissions should be independent of resource physical locations and types available on different web servers. This paper presents a novel approach for access control and recommendation by exploiting users FOAF Profiles, resource metadata and traceability information, seemingly captured during the resources’ publication process.

I. I NTRODUCTION With the propagation of Web 2.0, publishing content (e.g. photos, documents, movies and comments) turned out to be an everyday task either by using blogs, forums, file repositories or other kinds of hosting services. Some users tend to publish publicly, while others use the Internet as a mean of sharing it to restricted groups, namely with their social network contacts. Publishing content is easy but the management of meaningful access authorization to resources is difficult and burdensome. This is especially due to (i) the distributed and decentralized nature of the web, (ii) the increasingly number of resources a user publishes and (iii) because of the ever evolving number and characteristics of people the owner/author is connected to. More and more communication operators are providing access to social networkc [2], instant based services (e.g. Twitter [1], Facebook messaging) in their electronic devices (e.g. smart phones, tv, automobile appliances). Social Web appears firstly as a way for connecting friends online and secondly as a way of sharing information and publishing content for those in each community. A social network service is an application provided to create and manage communities of people. These services are responsible for the registration, authentication and support for the management of each person’s network of contacts. One is typically registered in many social networks services with different purposes (e.g. professional, friendship, hobbies), but minimal or no interaction is allowed between social network services. This is due to privacy restrictions imposed by those services, as they do not allow exporting the social networks between different social environments and cannot either be used by other computers outside that domain. Another paradigm, suggests the web publication of each one’s social network relationships as an unrestricted file. The

extensive publication of users’ social network relationships generates a wide network of relationships that can be publicly applied. While friendship/relationship recommendation became trivial, resource recommendation between users or even communities, based on user profile information and linked resources is still an open issue on current web. It is our believe that resource recommendation can also aid users in creating access groups for their resources or even maintaining existing groups by enhancing content sharing with existing or new relationships. For that, we are seeking a system that provides resource access control management and recommendation over physically distributed resources. We suggest exploiting information from users’ social profile and relationships together with traceability information. Traceability information is metainformation generated automatically by every action executed by a user on the Internet, which acts like a breadcrumb for every uploaded resource which can be semantically used by any other person or computer. No system provides the necessary traceability features. Our proposal aims at recommending resources to relationships in a way that requires little technological changes on existing web servers, and seemingly integrates user authentication, authorization and traceability in current web technology. The rest of the paper comprehends six sections. Second section describes related work. The third section is related to access recommendation. The fourth section introduces an architectural overview of the system. Fifth section describes how recommendation and traceability is achieved. Finally, the conclusions and future work section gives an overview of the proposed solution and suggests further research. II. R ELATED W ORK Authentication is a process that consists in certifying that a user is who claims to be by means of presenting authentic credentials. Authorization, on the other hand, consists in evaluating if an authenticated user should or should not have access to a resource. Efforts for providing single signon mechanisms and distributed authentication over multiple website becomes an everyday issue, as users login to different websites that demand that authentication. There are authentication methods that work at the OSI Application Layer (HTTP) like OpenID [3], Single SignOn (SSO) [4] and proprietary web forms with login and

passwords. Others, work at the OSI Transport Layer, like FOAF+SSL [5] and RDFAuth [6]. FOAF+SSL is a new authentication method that does not rely on a user/password combination, but instead uses a self-signed certificate based on a RESTFul [7] architecture. This authentication method can be used through multiple web sites and social networks, acting in a similar way to OpenID but, while the later is built solely upon PKI (Public Key Infrastructure), the prior is based on a Web ID [8] (an URI for representing a user, which is normally called a Web ID, is in fact an “Identifier” for any entity of Type:Person, on the Internet). FOAF profiles apply the FOAF ontology [9] in describing the users’ profile and their relationships with others. Multiple FOAF files give rise to a social network. Efforts on using FOAF profile information combined with self-signed certificates are being applied in projects such as the FOAF+SSL authentication method. Although FOAF+SSL approach deals quite well with authentication, authorization instead is still a pendent issue. There are a few systems (TAAC [10] and TAMI [11]) on the area of cross domain authentication and authorization but none really addressed the question of resource recommendation. While published work [12] built with TAMI uses an approach based on OpenID authentication, our approach uses a similar approach to TAAC using FOAF+SSL as authentication. Whereas [12] is browser dependent, forcing users to install Tabulator [13] or its plug-in on the Firefox browser, our architecture is realized with just a technological enhancement on each HTTP web server in a similar approach to TAAC, by implementing special modules on web servers, which would intercept requests. Whilst the system provided in [12] has a nucleus that is responsible for acting as a proxy and maintaining all the data and URIs to resources, our approach makes use of available writable LODs to create associations between resources and publishers, not requiring a central proxy for accessing files or changing the final user browsing experience, because access control is enforced on each web server, which can connect to any available reasoner to evaluate access rules. TAAC uses an approach of logging access to resources which becomes quite useful when analyzing and providing context for resources. Our approach widens that perspective, by also providing means of creating traceability for each new uploaded resource, maintaining an association between every available resource and its publisher, which is essential for recommendation, as it will be explained later. III. ACCESS R ECOMMENDATION Access recommendation is the process by which a system notifies a resource author that another user on his/her social network would probably benefit or rejoice from having access to that resource. Traditionally, when users which to share documents or resources with relationships, they have to pick which relationship should be given access. On an access recommendation system, the process aids the user in granting or denying access to existent resources by making use of any kind of similarity factor between resource and

relationships and suggesting which users should be given access to each resource. If a web application responsible for ensuring access control is aware of all resources and user relationships, this application could recommend sharing a resource to new friends that have recently became part of any of the social networks the user participates, or even recommend removing access rights for a particular user. In our proposal, recommendation would be achieved by the creation of rules that match similarities between contexts [14] therefore, for every resource or relationship, a context would be generated. A resource context is produced by the analysis of resource’s content and metadata, while a relationships context is created based on the existing relationship depth [15] between users, each user profile, linked available resources and consequent relationships. There are four typified situations on which a recommendation scenario might me triggered: When a user uploads new resources to the Internet, content is analyzed by the application and similarities between resource context and related user’s context is used to propose which users should gain access to the uploaded resource. In our proposal, this is applied for users whose social relationships are described by a FOAF profile and have resources available and linked to them. In this situation, the system recommends the publishing user in restricting access to uploaded resource, by firstly denying all access to the resource and secondly by recommending who should be given access to. By editing a resource or changing an access control rule, recommendation also takes place as there might exist other users that should be excluded and others that should be included in the new access control, because context analysis and existing rules might have changed. Whenever a new relationship enters a user’s social network, user’s context is recalculated. User context is also based on the relationships a user might have. This being, when new relationships become available in a user social network, the system also needs to try and recommend the inclusion of these new connections to existing resources. Because context computation is enchained with other users, the inclusion of a new relationship might change a user context which would also have an impact on the resources the user might also have access to. The recommendation system must also be activated by a timer, which periodically runs as a background task. This would reanalyze available resources, user relationships and their profiles thus recommend granting or restraining access to users. In most systems, resource access control is an explicit relation established between the resource and the user or group of users. If a user is part of a group that has access to the resource, then this user is authorized to access that resource. For years, people have been using this paradigm as it actually solves some part of authorization issues. It is our belief, that at the moment, it is more important to understand what rational originated the creation of a group/role than to

explicitly provide individual user access (Figure 1). If the reason for a group creation is described, then a user could stop worrying about maintaining his/her access groups to resources as the computer would understand the rational implied in the group creation and hence could manage/recommend it (Figure 2). It is not our intention to replace the role of group access mechanisms, but to aid and recommend the manager in those decisions, thus understanding and being able to gather the knowledge in those rules for future usage. Having semiautomatic access group maintenance, based on similarities between resources, relationships between users or domain knowledge is halfway to create an automatic recommendation system based on FOAF profiles and context (Figure 3). IV. A RCHITECTURE OVERVIEW In order to improve resource sharing and recommendation among users, there is a need to control each resource access control in a single place, independently of where the resource is physically located. To achieve this goal, authentication and authorization between different social networks should be resolved beforehand. Although not neglecting any proprietary existing authentication method, we are assuming a FOAF+SSL authentication mechanism for HTTP web servers. Whilst not acting as a single sign-on philosophy, this approach could coexist with other existing authentication methods on legacy web servers, while providing authentication between multiple web sites and social networks. Existing architectures for building access control systems exist and therefore should be reused and improved to comply with current expectations. Typically, access control systems are split in several components (Figure 4): • • •



Policy Enforcement Point (PEP) enforces authentication and guaranties authorization access to resources; Policy Information Point (PIP) are used to retrieve data related to users, resources and trace information; Policy Decision Point (PDP) evaluates rules and policies in order to provide or not authorization to resources; Policy Administration Point (PAP) enables users to build and manage access rules and policies over existing resources.

isFriend Res1 hasAccess

isAuthor

hasAccess Mary

isAuthor Res2 hasAccess Me hasAccess isFriend Person(?auth) ^ isAuthor(?auth, ?res) ^ isFriend(?auth, ?friend) -> hasAccess(?res, ?friend)

Figure 2.

Jane

FOAF based access rules

A. Policy Enforcement Point (PEP) Every PEP component, which is installed and resides on legacy servers, provides means of validating FOAF+SSL authentication for every user, even if the user is not registered on the domain proprietary application. This component is responsible for tasks such as authentication, authorization and generating traceability information. For each request a server receives, either for downloading or uploading resources, the PEP is responsible for intercepting it. When the request is intercepted, it uses the FOAF+SSL authentication in order to authenticate the user. If the requester is not able to provide credentials for FOAF+SSL it bypasses the request directly to the web server application without interfering in the request process; consequently the authentication process is fully ensured by the domain server. If FOAF+SSL authentication credentials are provided, then a Web ID, representing the user, is extracted from the certificate, user identity is confirmed, logged and resource access control is evaluated. Every time a response is sent to the user, it is also intercepted. This time, the system extracts the response URI from the response HTTP headers, to check if the request was a resource upload (usually when a resource is uploaded, the server then redirects to a page where the resource can be viewed). In that moment, the PEP component initiates a request for writing traceability in a PIP component so that it can create metadata information in any available LOD about the action that was taken. isFriend Res1

isAuthor

In the proposed solution, these components play their roles in particular ways as described next.

hasAccess

Res2 isAuthor

Jane similarContext

hasAccess hasAccess

Me

Res3

Res1 isAuthor

isCoworker

Me

Mary

hasAccess

hasAccess(Res1, Mary) hasAccess(Res1, Jane)

Figure 1.

Jane

Static access rules

John Person(?auth) ^ isCoworker(?aut, ?coworker) ^ isAuthor(?aut, ?res_author) ^ isAuthor(?coworker, ?res_coworker) ^ similarContext(?res_author, res_coworker) -> hasAccess(?res_author, ?coworker)

Figure 3.

isAuthor

FOAF and Context based access rules

getResource(ResUri) getUserInfo(WebId)’ getUserInfo(WebId)

authentication

PIP

Server getUserInfo(WebId)’

resource?

PIP

PEP hadAccess(WebId, ResUri)?

User

getUserInfo(WebId)’

hasAccess(ResURI)? getRules() PDP

getExtraInfo() PIP PIP

rules

get/setRules() Owner

resource

PAP

getResources(WebId) getRelationships(WebId)

setAccessControl()

Figure 4.

Access Control Architecture

The PEP is also responsible for authorizing/denying access to resources. Whenever a FOAF+SSL authentication is possible, the PEP component is responsible for communicating with a PDP component, sending the requested URI and user Web ID. Based on the requested resource and requester identification, the PDP decides whether or not to give access to the resource therefore acting just like a broker that enforces authorization over resources on the local domain according to a PDP decision. The most relevant aspect of this component is that not only enforces FOAF+SSL authentication as well as automatically traces users’ uploading actions and enforces authorization access control over resources.

This component acts as a broker interface to existing repositories that have information about resources, authors, web servers, etc. Not only do these components act like data providers, as they are also capable of creating new data and making it available in those information repositories. They are responsible for creating and publicize new traceability information about any resource and replicating it over to other information repositories. These components also provide querying capabilities to existing information repositories, therefore being able to answer to queries (e.g. “return a list of all resources for a given user Web ID”; “return a resource author”, etc).

C. Policy Information Point (PIP)

D. Policy Administration Point Is has been mentioned before that we aim to achieve recommendation over web resources. In order to do so, it is necessary to have a web interface where users can configure rules and policies over resources and relationships. A Policy Administration Point provides the visual interface component where a user is allowed to manage access to resources. This is also the place where recommendation messages are displayed to the user. Acting like an access control management console, the visual interface provided by this component is able to control all resources spread across different web server domains. This component uses existing PIPs to retrieve data about users, resources and relationships, so that rules can be created upon them. Every access control rule is translated into a policy language. The user will have the possibility to provide a reasoner location where his/her rules reside or simply a location for the rules itself, which can be fetched and reasoned on any reasoner that can interpret them. For the latter option, security upon which reasoners can fetch the rules must be implemented and is not in the scope of this work.

A Policy Information Point component is responsible for obtaining information that is not available inside a local system. That information might be available among Internet LODs, websites, databases or any other kind of information repository that may be registered on the component.

V. R ECOMMENDATION AND T RACEABILITY Common resource access control is defined by granting or denying (user) access to resources. To widely use recommendation, systems access control must shift to a paradigm on which all access is controlled via rules defined by the

B. Policy Decision Point (PDP) A Policy Decision Point is responsible for producing the decision of granting or not access to a resource, thus being responsible for checking if a given user should or not have access to a certain resource. In our architecture, to enable authorization over resources, some requirements are mandatory: users must be provided with a Web ID and rules/policies relating users and resources must exist. The PDP obtains the first from the PEP component, and the last from the appropriate location specified on the FOAF profile for rules storage. According to those rules and user Web ID supplied by the PEP component, the PDP evaluates whether the requester should have access to the resource. If more information is needed to evaluate access, the PDP may contact any PIP point to obtain extra information (e.g. if the FOAF profile of the resource publisher is required, the PDP contacts an available PIP to retrieve it) and only when all the necessary information is gathered, the PDP evaluates any rule or policy.

resource publisher. These rules should be defined based on the resources, relationships and calculated contexts. It is much more important to understand why a user should have certain rights upon a resource and use that knowledge, rather than just giving plain and simple direct access to the user. Based on the previous architecture, the recommendation process will run in the Policy Administration Point as a proactive decision support component. The PAP collects available resources from a given author or user and recommend access right to other users.

2. trace(File URL, WebServer FOAF+SSL Certificate, User FOAF URL)

LOD Triple EndPoint RDF

Web Server DB

1. upload(file.ext, User FOAF+SSL Certificate)

3. trace(Trace URL, Triple EndPoint FOAF+SSL Certificate, WebServer FOAF URL)

file.ext

User

Figure 5.

Traceability overview diagram

A. User and Resources

C. Traceability information-based Rules

In order to get a list of all resources made available by a user, it is necessary to relate users and resources. Yet, while there is always a user that published those resources online, little information is available about that. Because a resource might have more than one author we assume that a given resource might have one or more authors, but only one of those authors (the publisher) has the ability to upload and manage access permissions to that resource. Internet resources are mandatorily addressed by an URI, may have content and sometimes metadata associated to them. These three premises are necessary for a fully functional recommendation system. The user, either acting as a reader or publisher must be identified by a Web ID which unequivocally identifies the user in question. This Web ID must point to a semantic enabled metadata structure called FOAF profile which contains public information about a user, social networks the user might be part of and other semantically structured information. While some resource formats contain metadata (e.g. PDF, Word Document, photos) they are not yet semantically ready. Until now some of those resources have properties for author’s name but few enable the association between resource URI and Web ID.

With traceability information, the task of retrieving any user’s resource is much easier and complete, in a way that no specific wrappers for every specific domain web site needs to be developed to get the resources. All the rules created between resources and relationships are kept in a machine readable format. Several policy languages exists that can be applied to this architecture (e.g. KAOS [17], RuleML [18], SWRL [19], Rei [20] and Rein [21]). Nonetheless, while some of them do not work with OWL ontologies (but provide different kinds of functionalities like logic programming), others are more suitable to application on Web Services access control. Given the characteristics of these languages, a combination of SWRL and Rein should be used to access control to resources. Every rule definition is kept private for every publisher and this privacy must be always retained. The stronger a relationship is, the highest is the probability of a user granting access to resources for that relationship. The opposite can also be inferred therefore, exposing public rule definitions can be personally harmful for the user, as it would deeply expose the user’s relationship with others.

B. Traceability Relating resources and corresponding author/publisher on the Internet is quite difficult. For certain web applications where user registration is necessary, that association is created in a domain restricted and proprietary style, which is only accessible via domain application authentication and most times not accessible from other systems. Therefore, these associations are not sufficient for the system to retrieve all published resources that are spread over the whole internet within different application domains. Keeping a semantically, unrestricted, open format correspondence between available resources and authors is essential or otherwise it is not possible to create queries that can return all available resources from a user. Web domain specific metadata is hard to query and align or map [16] with other web domain specific metadata. Even though it is not the main working point of this work, maintaining the association between resources and web publishers need to exist and research on this area is being carried and takes the name of resource traceability (Figure 5).

VI. C ONCLUSIONS In conclusion, we propose a system which would require little technological changes on existing web servers, with the aim of providing resource access control management over physically distributed resources, resource traceability and resource recommendation between users. Extending the FOAF+SSL approach was our goal, in order to provide authentication, authorization and criteria for how access to resources should actually be granted. Being FOAF+SSL an active research field, some problems still need to be solved such as being possible to guarantee that a subject is not claiming to be someone/something it/s/he is not. At the moment, the same problem arises with simple login/password methodology, because when user registration occurs, a user can pretend to be someone else. Technologically, this proposal is viable through the development of Apache [22] modules, since according to Netcraft [23], the majority of the all web sites on the Internet, and mostly all social network systems run on Apache web servers. As a result, this work will firstly be implemented for deployment on this HTTP web server platform. This work will be deployed and act in parallel with existing

communities’ authentication and authorization control systems allowing dynamic access control over a user’s Internet resources. The proposed architecture provides several improvements to the existing web. Some of the benefits that would be achieved with this kind of distributed authorization and recommendation are: (i) centralized management over resources hosted on different web servers, without the need for using any web system proprietary access control, provided that resources are fully addressable using URIs; (ii) flexibility, by making it possible to share protected content from one web system (e.g. Flickr) to users that are friends or have some kind of relationships to the resource publisher, but are not registered on that web system, provided that simple FOAF+SSL authentication exists on the web server hosting the resource; (iii) preferability, for which hosting service the user may host their resources (i.e. nowadays users tend to publish content on the social network where most of their relationships reside, even if that social network is not the most fitted for the type of resource the user is sharing, R isn’t suitable for videos as YoutubeTM isn’t suitable Flickr for photos); in the future, with no boundaries set between different social networks, users can choose the best service to host their contents without losing their access control; (iv) resource access control and recommendation over any owned resource on the Internet, regardless of which server they are hosted on; (v) rules and policies for access control can remain on a “single” place which is linked from a FOAF Profile. This proposal does not consist of creating a single repository, or restricting traceability to a locally centralized server. Even though we’re still not focusing much on distribution and technological implementation details, we assume the usage of data replication, peer to peer technologies or any other for having traceability information spread over several different web servers, so that no access bottleneck would prevail. Yet, the system has limitations and so far only works for first degree relationships and still does not deal with relationship types that have different semantics (e.g. some relationships are symmetric while others are transitive). In the future, a system like this may also suggest new relationships between two non-related users, based on similarity evaluation between their contexts and on the public information they might have available on the Internet (e.g. it could propose a connection between two researchers that have published content on the same subject, even if those users are not registered on the same domain). The whole system pulls the concept of having an Internet closer to the final user, as one can almost use it as a massive hard drive with active journaling, which is made available through traceability.

[20] L. Kagal, “Rei: A policy language for the MeCentric project,” HP Labs, Tech. Rep., Sep. 2002, http://www.hpl.hp.com/techreports/2002/HPL-2002270.html. [21] L. Kagal and T. Berners-lee, “Rein : Where policies meet rules in the semantic web,” Laboratory, Massachusetts Institute of Technology, 2005.

ACKOWLEDGMENT

[22] “The apache software foundation.” [Online]. Available: http://www.apache.org/

This work is partially supported by the Portuguese MCTFCT project COALESCE (PTDC/EIA/74417/2006).

[23] “Netcraft ltd - internet research, Anti-Phishing and PCI security services.” [Online]. Available: http://news.netcraft. com/

R EFERENCES [1] “Twitter.” [Online]. Available: http://twitter.com/ [2] “Facebook.” [Online]. Available: http://www.facebook.com/ [3] “OpenID foundation.” [Online]. Available: http://openid.net/ [4] “SSO (Single Sign-On).” [Online]. Available: http://www. google.com/support/a/bin/answer.py?hl=en&answer=60224 [5] H. Story, B. Harbulot, I. Jacobi, and M. Jones, “FOAF+SSL: RESTful authentication for the social web,” in Proceedings of the 1st Workshop on Trust and Privacy on the Social and Semantic Web, Jun. 2009. [6] “RDFAuth: sketch of a buzzword compliant authentication protocol.” [Online]. Available: http://blogs.sun.com/bblfish/ entry/rdfauth sketch of a buzzword [7] “RESTful web services.” [Online]. Available: http://java.sun. com/developer/technicalArticles/WebServices/restful/ [8] “WebID - ESW wiki.” [Online]. Available: http://esw.w3.org/ topic/WebID [9] “FOAF vocabulary specification.” [Online]. Available: http: //xmlns.com/foaf/spec/ [10] “TAAC in action - it is a mystery...” [Online]. Available: http://www.pipian.com/blog/2008/12/12/taac-in-action/ [11] “Transparent accountable datamining initiative (TAMI).” [Online]. Available: http://dig.csail.mit.edu/TAMI/ [12] C. M. A. Yeung, L. Kagal, N. Gibbins, and N. Shadbolt, “Providing access control to online photo albums based on tags and linked data,” in AAAI Spring Symposium on Social Semantic Web: Where Web 2.0 Meets Web 3.0, Mar. 2009. [13] T. Berners-lee, Y. Chen, L. Chilton, D. Connolly, R. Dhanaraj, J. Hollenbach, A. Lerer, and D. Sheets, “Tabulator: Exploring and analyzing linked data on the semantic web,” In Proceedings of The 3rd International Semantic Web User Interaction Workshop, 2006. [14] S. Ghita, W. Nejdl, and R. Paiu, “Semantically rich recommendations in social networks for sharing, exchanging and ranking semantic context,” in ESWC workshop on Ontologies in P2P communities, p. 10, 2005. [15] S. Wasserman, K. Faust, and D. Iacobucci, Social Network Analysis : Methods and Applications (Structural Analysis in the Social Sciences). Cambridge University Press, 1994. [16] N. Silva and J. Rocha, “Multidimensional service-oriented ontology mapping,” Int. J. Web Eng. Technol., vol. 2, no. 1, pp. 50–80, 2005. [17] A. Uszok, J. Bradshaw, M. Johnson, R. Jeffers, A. Tate, J. Dalton, and S. Aitken, “KAoS policy management for semantic web services,” IEEE Intelligent Systems, vol. 19, no. 4, pp. 32–41, 2004. [18] “The rule markup initiative.” [Online]. Available: http: //ruleml.org/ [19] I. Horrocks, P. Patel-Schneider, H. Boley, S. Tabet, B. Grosof, and M. Dean, “SWRL: a semantic web rule language combining OWL and RuleML,” World Wide Web Consortium, Tech. Rep., May 2004.

Suggest Documents