A Comprehensive Access Control System for Scientific Applications

0 downloads 0 Views 128KB Size Report
A Comprehensive Access Control System for Scientific Applications. Muhammad I. Sarfraz, Peter Baker, Jia Xu, and Elisa Bertino. Purdue University, West ...
A Comprehensive Access Control System for Scientific Applications Muhammad I. Sarfraz, Peter Baker, Jia Xu, and Elisa Bertino Purdue University, West Lafayette, Indiana 47906, USA {msarfraz,pnbaker,xu222,bertino}@purdue.edu

Abstract. Web based scientific applications have provided a means to share scientific data across diverse groups and disciplines extending beyond the local computing environment. But the organization and sharing of large and heterogeneous data pose challenges due to their sensitive nature. In this paper we analyze the security requirements of scientific applications and present an authorization model that facilitates the organization and sharing of data without compromising the security of data. Keywords: Access Control, Scientific Applications.

1

Introduction

Web based scientific applications provide an infrastructure that allows scientists and researchers to run scientific computations, data analysis and visualization through their web browsers. Moreover, such applications also provide a collaborative environment in which scientists and researchers can work together by sharing their tools and datasets. But the organization and sharing of large and heterogeneous data pose challenges due to their sensitive nature where data needs to be protected from unauthorized usage. An inadequate or unreliable authorization mechanism can significantly increase the risk of unauthorized use of scientific data. For this purpose, we present an access control system for scientific applications. We formulate a methodology that incorporates principles from security management and software engineering. From a security management perspective, the goal is to meet the requirements for access management in scientific applications. From a software engineering perspective, the goal is to incorporate the well-known principles of software engineering in the access control model design to yield a specification that allows authorizations to be developed and managed in a standardized manner. The remainder of this paper is organized as follows: Section 2 describes the data model while Section 3 discusses the authorization requirements for scientific applications. In Section 4, we present the authorization model based on the requirements in Section 3. Section 5 discusses the key components of the access control system of Computational Research Infrastructure for Science (CRIS), a web-based scientific application. The related work is presented in Section 6 and Section 7 concludes the paper. J. Lopez, X. Huang, and R. Sandhu (Eds.): NSS 2013, LNCS 7873, pp. 749–755, 2013. c Springer-Verlag Berlin Heidelberg 2013 

750

2

M.I. Sarfraz et al.

Data Model

A model of authorization must be designed to be consistent with the objects being supported in a scientific application. This section describes the key concepts that characterize the various objects in the scientific domain and their impact on the design of an authorization model. Object Hierarchy: Data object hierarchies are a common approach to organize large amounts of data by exploiting relationships among the various data objects. An object hierarchy is represented as a tree structure. From access control perspective, the hierarchical organization of data objects should effectively reduce the total number of permission assignments, thus reducing the cost of permission administration. Datasets and Versions: Most of the large scientific datasets are assembled from samples collected over time and are versioned for the purpose of long-term preservation and re-use of primary research data. From access control perspective, authorizations must be specified on a versioned dataset and on individual versions of the dataset. Scientific Workflows: A key impediment for scientists is how to automate their manual repetitive scientific tasks. Workflows have emerged as an alternative to ad hoc approaches for constructing computational scientific experiments . From access control perspective, authorizations on workflows can be specified at two different granularities: (1) access is granted/denied on an individual workflow (2) access is denied/granted on an individual task within a workflow. We have adopted the latter as the default and support the first as user option. Computational Tools: Most of the research activities in Web based scientific applications have focused on the development of new computational tools to support scientific discovery. Since computational tools access large amounts of data, an important implication from access control perspective is to prevent unauthorized access to a dataset when invoked as part of the execution of the tool.

3

Authorization Requirements

This section highlights the security management issues that impact the design of an authorization model for scientific applications. In what follows, we assume a general notion of authorization by which an authorization is defined in terms of a subject, a permission, an object, an object owner and an object class. Implicit Authorization. An inefficient way to implement an authorization mechanism is to explicitly store all authorizations for all system subjects desiring access and all system objects whose access has been requested. In contrast, the concept of implicit authorizations makes it unnecessary to store all authorizations explicitly.The idea behind implicit authorization is that a permission of certain type defined for a subject on a certain object implies other authorizations i.e. authorizations can be automatically propagated. Hence the authorization mechanism can compute authorizations from a minimal set of explicitly stored authorizations in order to prevent unauthorized access. Furthermore, in order to

A Comprehensive Access Control System for Scientific Applications

751

allow exceptions to an authorization, an authorization is distinguished as positive or negative authorization. A positive authorization is a granting authorization and a negative authorization is an explicit denial of an authorization. Dataset Security. Scientific applications allow a user to develop a computational tool and then grant the run authorization on this tool to other users. An important question is whether the authorization to directly access a dataset d must be checked when d is invoked as part of the execution of the tool. There can be two approaches: In the case of first approach, all accesses made during the execution of the tool are further checked as necessary against the same user who invoked the tool. Thus a user must possess all authorizations on datasets accessed by the tool and therefore authorization controls embedded in the tool would be easily by-passed. While this can be fine in some situations, the second approach is exactly the opposite of what should be done as a means to protect data. In this case, a user having the authorization to execute a tool should not have any authorization to directly read or modify the dataset accessed by the tool. When the tool is executed, all datasets which are not granted to the tool, their permission will be checked against the user executing the tool. Note that only an owner may grant execution authorizations on a dataset. Sandbox Search. Sandbox Search functionality allows a user to search whether certain data exists but this does not imply the right to see the actual data. The user must have permission to access data in order to retrieve the actual data. A question can be why forbid a user from searching when data cannot be accessed if the user does not have the permission on the data. This is due to two reasons: (1) users performing these queries will consume a lot of resources (2) in some cases one may not want to allow a browsing query to report the existence of their data as it may reveal information intended to be hidden. Temporal Constraints. In many situations, permissions have a temporal dimension in that they are usually limited in time or may hold for specific periods of time. Therefore temporal constraints surrounding an access request must be evaluated to grant/deny access to objects. Each authorization has a time interval associated with it, representing the set of time instants for which the authorization is granted. Conflict Resolution. Authorization rules must be specified correctly to ensure that authorized access is allowed while unauthorized access is denied. Identifying and resolving a conflict before it results in the denial of a legitimate access request is essential to improving the usability of any access control system.

4

Authorization Model Design

In this section, we present a general authorization model for scientific applications by formalizing the authorization requirements mentioned in Section 3. The authorization model is extension of the earlier work by Rabitti et al. [4].

752

M.I. Sarfraz et al. 

Basic Definition An authorization is defined as (s, o, p, s , c) where: s ∈ S,  the set of subjects; o ∈ O, the set of objects; p ∈ P , the set of permission; s ∈ owner(o) ⊆ S; c ∈ C, the set of class of objects. A function f is defined to  determine if an authorization (s, o, p, s , c) is True or False; f : S × O × P × S × C → (True,False) 

Definition 1. A positive authorization is a tuple (s, o, p, s , c) with s ∈ S, o ∈ O,   p ∈ P , s ∈ S and c ∈ C. A negative authorization is a tuple (s, o, ¬p, s , c) with  s ∈ S, o ∈ O, p ∈ P , s ∈ S and c ∈ C. Definition 2. An authorization base (AB) is a set of explicit authorizations   (s, o, p, s , c) with s ∈ S, o ∈ O, p ∈ P , s ∈ S and c ∈ C where p positive or negative; that is, AB ⊆ S × O × P × S × C The model in [4] is extended to include the owner of the object and class of the object as part of the authorization tuple. It is imperative for the purpose of Dataset Security mentioned in Section 3 to determine that the privilege being granted to a subject s on an object o is by the owner of o. The inclusion of object class is essentially to differentiate between the type of objects since all authorizations are being stored in one base, namely AB. Implicit Authorization. An explicitly specified authorization may imply authorizations along any combination of two dimensions in authorization defini tions, namely, the subject, and object. The function ı(s, o, p, s , c) computes True  or False of an authorization (s, o, p, s , c) from the explicit authorization in AB   if either the authorization (s, o, p, s , c) or (s, o, ¬p, s , c) can be deduced from  some (s1 , o1 , p1 , s1 , c1 ). 

Definition 3. Function ı(s, o, p, s , c) is defined as ı; S × O × P × S × C → (True,False) 





If (s, o, p, s , c) ∈ AB, then ı(s, o, p, s , c) = True; else, if (s1 , o1 , p1 , s1 , c1 ) ∈ AB    such that (s1 , o1 , p1 , s1 , c1 ) → (s, o, p, s , c), then ı(s, o, p, s , c) = True; else, if    (s1 , o1 , ¬p1 , s1 , c1 ) ∈ AB such that (s1 , o1 , ¬p1 , s1 , c1 ) → (s, o, ¬p, s , c), then  ı(s, o, p, s , c) = False. We now formally define the three domains S, O and P and the rules used for deducing implicit authorizations from explicitly defined authorizations. Subjects are organized as a means of a group and authorizations are associated to groups thus reducing the number of explicit authorizations. The idea of groups is similar to user-role assignment in Role Based Access Control (RBAC). The groups form a hierarchy called a Group Hierarchy (GH) where a node on the hierarchy represents a group and a directed arc from group A to group B indicates that an authorization for group A subsume the authorizations for group B.

A Comprehensive Access Control System for Scientific Applications

753 

A user has permission p on object o if there exists a group s such that f (s, o, p, s , c) = True and the user belongs to s. Permissions in our model take the value {read, write, create, delete, execute} and implication between two authorizations does NOT occur along the domain P . A permission in our model is stored as a cumulative permission represented by an integer bit mask where each bit represents a permission. Hence only one entry is needed to store an authorization for a particular object which reduces the need for implicit authorization along the domain P . Objects are organized in a Hierarchical Object Lattice (HOL) in the form of a rooted acyclic graph in which each node is a Project, Experiment, Job or Workflow. An arc from node A to node B in the HOL indicates that object A implies object B. Note that authorizations can only propagate in the O domain when objects are hierarchical and not in the case of tools. In the case of tools, each tool directly references the dataset(s) being utilized by the tool and each dataset directly references its versions. An authorization to a dataset must be explicitly defined and cannot imply authorization from a tool accessing the dataset. Dataset Security. Whenever an authorization request for tool t ∈ O is evaluated by function f to be true, the function check performs an additional check and returns False if the tool can be invoked by user but does not have authorization to execute this tool on the given dataset and True if the current user can execute the tool. Two other functions that we describe are: grant and revoke which respectively grant and revoke authorizations. They return True if authorization grant and revocation have been done correctly and return False otherwise. The function grant is organized as follows: First a check is done to verify that the user is the owner of the dataset since only owner can grant authorization. In this case, an error is returned since the owner already has all the authorizations and therefore the authorization is not needed. Otherwise, an authorization rule is added to the authorization base. The function revoke is organized in a similar way as the grant function by recalling that only an owner of the dataset may revoke authorizations on the dataset. The main difference is that a check is done to verify whether the authorization to be revoked exists in the authorization base. Sandbox Search. A user is allowed to only execute a browsing query on the existence of data. If the function f return False then the user does not have the authorization to search and no search results are returned. If function f returns true, then the function match is called to check whether the object being searched exists. Temporal Constraints. We consider a temporal constraint to be associated with each authorization and refer to an authorization together with a temporal con straint as a temporal authorization. Temporal authorization ([t1 ,t2 ],(s, o, p, s , c)) states that user s has permission p on object o between period t1 and t2 . Note that an authorization without any temporal constraint can be represented as a temporal authorization whose validity spans from the time at which the authorization is granted to infinity.

754

M.I. Sarfraz et al.

Conflict Resolution. To prevent conflicts, we ensure that any operation on AB leaves AB in a state satisfying the resolution, consistency, and redundancy invariant. The resolution invariant adopts the negative takes precedence approach where if we have one reason to authorize an access, and another to deny it, then we deny it. The consistency invariant ensures false authorizations are not added to AB and the redundancy invariant ensures that an authorization is not in AB if it is implied by another authorization.

5

CRIS Access Control System

In this section, we discuss the key components and implementation of the CRIS access control system. We adopt the access control framework of Spring Security as it provides comprehensive authorization services and has been used quite widely in enterprise applications. This system illustrates the use of the authorization requirements in Section 3 and the authorization model in Section 4 for the design and enforcement of access control for scientific applications. Authorization Base (AB). The AB reflects the authorization base mentioned in our model and consists of four tables provided by the default implementation of Spring Security as discussed below: – acl sid uniquely identifies any principal or authority in the system. A principal is a user and an authority is a group of users. Spring Security also provides support for group hierarchies and allows you to configure which groups should include others. – acl class uniquely identifies any domain object class in the system. – acl object identity stores information for each unique domain object along with its parent, owner and whether authorization entries inherit from any parent. – acl entry stores the individual permissions assigned to each principal or authority and whether the permission is positive or negative.

Authorization Module (AM). The AM provides a CRIS user the ability to create and store authorizations in the authorization base for the various objects in the users workspace and consequently allow access to authorized objects. If the authorization specified by the user is not already stored in AB or implied by an existing authorization in AB, the authorization is inserted into AB. Note that after any operation on AB, the state of AB is checked to ensure AB satisfies the resolution, redundancy and consistency invariants mentioned in Section 3. To check whether a user has authorization on the requested object means to evaluate the function f which is defined in terms of function ı. Then if ı returns True, user gets access to the desired object. In the case of tools, an additional check is done by invoking the check method in order to get access to the dataset(s) associated with the tool.

A Comprehensive Access Control System for Scientific Applications

6

755

Related Work

Our work is related to many areas of access control, specifically access control specification in scientific applications. Andre et al. [3] propose a number of aspects or areas of security relevant to eScience projects and Ivan et al. [2] examine the steps that can be taken to ensure that security requirements are correctly identified and security measures are usable by the intended research community. The various authorization requirements addressed in our paper have also been discussed in different domains. Rabitti et al. [4] developed a comprehensive authorization model centered around implicit authorizations designed for next-generation database systems. Bertino [1] proposes a model to provide data hiding and security where authorizations specify privileges for users to execute methods on objects. In summary, we provide a comprehensive access control system for scientific applications. While some of the authorization requirements have been studied in detail, a comprehensive access control system in the domain of scientific applications has not been addressed in literature.

7

Conclusion

In this paper we present an access control system suited for Web based scientific applications. Given the scale and depth of modern-day scientific applications, it is imperative that the methodology to formulate an authorization model be based on standardized constructs. We formulated an authorization model based on authorization requirements and well known principles of software engineering to yield a specification that can be readily integrated into existing systems.

References 1. Bertino, E.: Data Hiding and Security in Object-Oriented Databases. In: 8th International Conference on Data Engineering, pp. 338–347. IEEE Computer Society (1992) 2. Flechais, I., Sasse, M.: Stakeholder Involvement, Motivation, Responsibility, Communication: How to Design Usable Security in eScience. International Journal of Human-Computer Studies 67(4), 281–296 (2009) 3. Martin, A., Davies, J., Harris, S.: Towards a Framework for Security in eScience. In: 6th IEEE International Conference on eScience, pp. 230–237. IEEE Computer Society (2010) 4. Rabitti, F., Bertino, E., Kim, W., Woelk, D.: A Model of Authorization for NextGeneration Database Systems. ACM Transactions on Database Systems 16(1), 88– 131 (1991)

Suggest Documents