CNL4DSA∗ a Controlled Natural Language for Data Sharing Agreements Ilaria Matteucci
Marinella Petrocchi
Marco Luca Sbodio
IIT-CNR, Italy
IIT-CNR, Italy
Hewlett-Packard
[email protected]
[email protected] [email protected]
ABSTRACT A Data Sharing Agreement (DSA) is an agreement among contracting parties regulating how they share data. A DSA represents a flexible mean to assure privacy of data exchanged on the Web. As an example, a set of intelligent user agents may interact with each other, and by means of DSA, may negotiate privacy requirements on behalf of human users. However, a key factor for the adoption of privacy’s technologies is not only their reliability, but also their usability. Here, we propose CNL4DSA, a Controlled Natural Language for DSA aiming at lowering the barrier to adoption of DSA, and, at the same time, ensuring mapping to formal languages that enable the automatic verification of agreements.
Keywords Privacy Policy, Data Sharing Agreements, Controlled Natural Language, DSA authoring, DSA analysis.
1.
INTRODUCTION
Data sharing is what makes the Web unique. Since its early days, the Web has conveyed an ever growing explosion of data into our everyday life. Recent trends, such as blogging, social networking, on-line chat, and service-based computing have increased the amount of personal data that users share or put on the Web. Additionally, the Web is an open, heterogeneous and constantly changing environment. Ensuring privacy on the Web does not only require standards and technologies, but also the ability to combine them into a simple and extensible framework that supports both the users needs, and the constantly evolving Web infrastructure. In this scenario, Data Sharing Agreements (DSA) become an increasingly important research topic. A DSA is a formal legal agreement among two or more parties regulating ∗This work was partly supported by the EU project FP7214859 Consequence (Context-aware data-centric information sharing).
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’10 Submission to the Technical Track Privacy on the Web Copyright 2010 ACM 978-1-60558-638-0/10/03 ...$10.00.
how they share data, and usually consists of a set of authorizations and obligations clauses. The purpose of a DSA is exactly that of defining what parties are allowed or required to do with respect to data covered by the agreement. Note that data sharing agreements are usually subject to a lifecycle consisting (at least) of the following phases: definition, enforcement, disposal1 . During the definition phase, the parties negotiate the respective authorizations and obligations on data covered by the agreement. During the enforcement phase, the DSA clauses are enacted by an appropriate infrastructure ensuring that data exchange among parties comply with the DSA clauses. Finally, a DSA may be disposed of when its validity expires, or when one of the contracting parties decides to recede from it. We observe that the definition phase is iterative: authoring of the DSA is followed by analysis of its content in order to identify possible conflicts or incompatibilities among authorizations/obligations clauses. This process is iterated until all conflicts are solved, and parties have reached agreement on the content of the DSA. DSA promise to be a flexible mechanism to ensure privacy on the Web: an intelligent user agent may in fact autonomously negotiate privacy requirements on behalf of a human user, based, for example, on a set of user preferences, or on a dynamic user interaction. Noticeably, such an intelligent user agent may interact both with other user agents (when negotiating user-to-user privacy requirements), and with Web site agents (when negotiating privacy requirements of the user with respect to the service provided by the Web site). Another interesting aspect of DSA is their multilateralism. Whereas a privacy policy is usually understood as a legal document unilaterally stating how a service provider retains, processes, distributes, and deletes service consumer’s data, a DSA is intrinsically a multilateral agreement in which all parties actively participate. From a user perspective, a key factor for the adoption of privacy technologies, is not only their reliability, but also their simplicity and usability. Currently, there are various policy specification languages (for example the W3C Recommendation P3P [7], Rei [13], and EPAL [21]) that address different aspects, but generally fail to meet the usability requirement, because they often rely on a formal and unintuitive syntax. In order to increase the readability of a policy language, some researchers ([3, 15, 16], and more recently [6]) have proposed the adoption of controlled natural languages (CNL). Following this approach, we propose here a Controlled Natural Language for Data Sharing Agreements 1 The description of the complete lifecycle is out of scope for this paper.
(CNL4DSA), which aims at ensuring (i) simplicity for endusers, (ii) extensibility for coping with the evolving Web environment, and (iii) safe translation to formal specifications allowing for automated verification and enforcement of the authorizations and obligations clauses of a DSA. In this paper we focus mainly on the definition of CNL4DSA, and on its translation to a process algebra based formalism (POLPA) [1, 19], suitable for the analysis of the DSA clauses. The paper is structured as follows: Section 2 presents syntax and semantics of CNL4DSA. In Section 3 we introduce few illustrative examples on the use of the language, while Section 4 focuses on the feasibility of translating CNL4DSA to a high level formal language amenable for verification. Upon discussing related work in the area (Section 5), we conclude with final remarks and future work.
2.
CNL4DSA: SYNTAX AND SEMANTICS
Some examples of Data Sharing Agreements can be found in [9, 10, 11, 20]. In general, a DSA consists of various parts, among which a Title, a validity Period, the list of Data covered by the agreement, the list of involved Parties, their respective Signatures, and (possibly empty) Authorizations and Obligations sections. In this paper we focus on the Authorizations and Obligations sections, illustrating the use of CNL4DSA to define their content in a human understandable way, and, at the same time, to derive a formal specification suitable for automated verification. When the Authorizations and Obligations sections are empty, we assume that the DSA does not impose any obligation, and that it has only the following implicit authorization: “all entites (users, groups, etc.) belonging to Parties are not authorized to make any action on Data during Period ”. Authorizations and obligations refer to data and involve parties specified in the Data and Parties sections of the DSA respectively. Although the definition of the Data and Parties sections is out of scope for this paper, we remark that data items and parties are referred to by using unique identifiers (possibly a URI), and that data types, and related attributes, like, e.g., author, size, creation date, .. are defined in formal vocabularies (ontologies). In order to specify authorizations and obligations we introduce the notion of fragment denoted as f, f1 , . . ., and ranged over the set F. The fragment is a tuple consisting of four elements, f = hs, a, o, [v]i, where s is the subject, a is the action, o is the object and v is a variable. Square brackets denote that v is an optional element. The fragment f = hs, a, o, [v]i expresses that “the subject s performs the action a on the object o”. The optional value v typically represents the date and/or time, characterizing the occurrence of f . Finally, we observe that the action a is taken from a list of predefined actions (possibly deriving from an ontology), and that each action can be decorated with additional information (e.g., the action “send” is decorated with the additional information “to . . .” specifying the recipient). An example of a fragment using an action is “Bob reads Alice’s profile”, where “Bob” is the subject, “reads” is the action, and “Alice’s profile” is the object2 . Our assumption is that everything that is not explicitly expressed is forbidden. Thus, we consider that authoriza2 In the examples, we avoid as much as possible the use of anonymous unique identifiers or URIs, but, for the sake of readability, we prefer using names or English expressions.
tions and obligations are always given in a positive form, e.g., “Alice can parse the content of personal e-mails”. More complex expressions are generated by combining fragments. We refer to such expressions as composite fragments, and we denote them as F (ranged over the set CF). We distinguish two disjoint sets of composite fragments: authorization fragments, denoted by FA and ranged over the set AUT H, and obligation fragments, denoted by FO and ranged over the set OBL. Usually, fragments are evaluated within a specific context. In CNL4DSA, a basic context is a predicate c that characterizes environmental condition, such as time and location. Some examples of simple contexts are “more than 1 year ago” or “inside the facility”. In order to describe complex agreements, contexts need to be composable. Hence, starting from the basic context c, we use the boolean connectors and, or, and not for describing a composite context C (ranged over the set C) which is defined inductively as follows: C := c | C and C | C or C | not c For modelling the behaviour of our DSA, we adopt a labelled transition system [2] which add modalities to the derivations of DSA, by exploiting a modal transition system MTS [18, 17]. In its original version, MTS is a structure (S, A, →2 , →3 ), where S is a certain set of specifications (e.g., process expressions, within the framework of Process Algebras), A is the set of actions which specifications may perform, and →3 , →2 ⊆ S × A × S are the two modal transition relations expressing admissible and necessary requirements to the bea haviour of the specifications. In particular, S −→3 S 0 , with 0 S, S ∈ S and a ∈ A means that it is admissible that the implementation of S performs a and then behaves like S 0 . a Dually, S −→2 S 0 represents a transition in which the implementation of S is required to perform a and then behaves like S 0 . It is assumed that all the required transitions are admissible transitions.
Authorization Fragment. The syntax of a composite authorization fragment is inductively defined as follows: FA := nil | can f | FA ; FA | if C then FA | af ter f then FA | (FA )
The intuition for the composite authorization fragment is the following: • nil can do nothing. • can f is the atomic authorization fragment. Its informal meaning is the subject s can perform the action a on the object o, with optional value v. can f expresses that f is allowed, but not required. • FA ; FA is a list of composite authorization fragments. The list constitutes the authorization section of the considered DSA. Whenever one term of the list performs a f -transition, then that term evolves to the correspondent derivative. • if C then FA expresses the logical implication between a composite context C and a composite authorization fragment: if C holds, then FA is permitted. • af ter f then FA represents the temporal sequence of fragments. Informally, after f has happened, then the composite authorization fragment FA is permitted. Also, we assume that (af ter f then FA1 );(af ter f then FA2 ) behaves as af ter f then (FA1 ; FA2 ). Finally, round brackets indicate precedences among fragments.
Let →3 be the smallest subset of AUT H × F × AUT H, closed under the following rules: f
(can)
f
can f −→3 nil
0 3 FA (if ) C = true FA −→ f 0 if C then FA −→3 FA
f
(; )
F 1A −→3 F 10A f F 1A ; F 2A −→3 F 10A ; F 2A
(af ter)
f
after f then FA −→3 FA
Figure 1: Operational semantics of FA , where the symmetric rule for ; is omitted. Figure 1 shows the operational semantics of FA in terms of a modified modal transition system MTSAuth = (AUT H, F, →3 , C). As usual, rules are expressed in terms of a set of premises, possibly empty (above the line) and a conclusion (below the line). MTSAuth deals with authorized transitions only, and it considers also the set of contexts because the DSA transitions may depend also on the value of such contexts, see rule if in Figure 1. The introduction of C (= a set of predicates) in a labelled transition system is a standard practice [2]. We observe that the if operator implies the binding of variable appearing in the context C.
Obligation Fragment. Similarly, the syntax of a composite obligation fragment is inductively defined as follows: FO := nil | must f | FO ; FO | if C then FO | af ter f then FO | (FO )
The intuition for the composite obligation fragment is the following: • nil expresses no obligation. • must f is the atomic obligation fragment. Its meaning is the subject s must perform action a on the object o, with optional value v. Thus, the f -transition is required. • FO ; FO represents a list of composite obligation fragments. The list constitutes the obligation section of the considered DSA. Whenever one term of the list performs a f -transition, then that term evolves to the correspondent derivative. • if C then FO expresses the logical implication between a context C and a composite obligation fragment. It means that if C holds, then FO is required. • af ter f then FO represents the temporal sequence of fragments. It means that after that f is performed, then FO is required. Round brackets denote precedence. The operational semantics of FO is expressed in terms of the modal transition system MTSObl = (OBL, F, →2 , C). The axioms and rules are similar to the ones presented for FA , apart from changing the transition relation, that becomes →2 . For example, the axiom for the atomic obliga.3 tion fragment is: (must) f must f −→2 nil
3.
SOME EXAMPLES OF PATTERNS
Here, we introduce some illustrative examples of authorizations and obligations, and we show their CNL4DSA representation. In the first example we consider the case in 3 In [18, 17], the axioms defining the behaviour of 2 are also given in terms of −→3 , stating that all the required transitions are also admissible.
which Alice and Bob are members of the same community portal, and they agree on sharing some data: Alice must send the list of her new contacts to Bob, every time that Bob logs into the portal. We can express this using CNL4DSA as: AFTER f1 THEN MUST f2 • f1 = {Bob logs into the portal} is an atomic fragment with subject “Bob” and action a = “logs into the portal”; • f2 = {Alice sends the list of her new contacts to Bob} is an atomic fragment with subject “Alice”, action “send to Bob”, and object “the list of her new contacts”. In the second example we consider the case in which Alice is registering as a user of the on-line travel agent NiceTravel, and she wants to specify how NiceTravel can share her preferences: NiceTravel can share Alice’s preferences after one year of receiving them. We can express this using CNL4DSA as: AFTER f1[v] THEN ( IF c1 THEN CAN f2 ) • f1[v] = {NiceTravel receives preferences from Alice at v} is an atomic fragment with action a = “receives from Alice”, subject s = “NiceTravel”, and object o = “preferences”; the variable v is a marker characterizing the occurrence of f1; • c1 = {after 1 year of v} is a context; • f2 = {NiceTravel send preferences to AN Y } is an atomic fragment with subject s = “NiceTravel”, action a = “send to ANY”, and object o = “preferences”; we observe that AN Y is a predefined identifier representing any subject. Finally, we consider an example involving a so called “responsive forwarding” statement [22, 23]. Here Bob is the customer of the on-line bookstore ReadMore, and he wants to specify obligations concerning his purchases and the way that ReadMore communicates with Bob’s credit card company SecureCard: ReadMore will delete the credit card details received from Bob, after two years of storing them. Also, ReadMore will send to SecureCard each purchase order that it receives from Bob. We can express these statements with a list of FO .
• P1 parAlf a P2 represents concurrent activity requiring synchronization between the policies P1 and P2 . In particular, any action belonging to the set of actions Alf a can only occur when both P1 and P2 permit that action; • Z is the constant process used for identifying processes. . Indeed, we write Z = P to use the identifier Z to refer to the process P .
AFTER f1 THEN ( AFTER f2 THEN ( IF c1 THEN MUST f3 ) ) ; AFTER f4 THEN MUST f5 • f1 = {ReadMore receives the credit card details CCB from Bob} is an atomic fragment with subject s = “ReadMore”, action a = “receives from Bob”, and object o = “credit card details CCB ”; • f2 = {ReadMore stores CCB at v} is an atomic fragment having subject s = “ReadMore”, action a = “stores”, and object o = “CCB ”; the variable v is a marker characterizing the occurrence of f2; • c1 = {after two years of v} is a context; • f3 = {ReadMore deletes CCB } is atomic fragment with subject s = “ReadMore”, action a = “deletes”, and object o = “CCB ”; • f4 = {ReadMore receives a purchase order P OB from Bob} is an atomic fragment with subject s = “ReadMore”, action a = “receives from Bob”, and object o = “a purchase order P OB ”; • f5 = {ReadMore send P OB to SecureCard} is an atomic fragment with subject s = “ReadMore”, action “send to SecureCard”, and object “P OB ”.
As usual for (process) description languages, derived operators may be defined. For instance, by using the constant definition, the sequence and the parallel operators, the replicate operator r(P ) can be derived. Informally, r(P ) is the parallel composition of P an unbounded number of times. With Z = P par∅ Z, r(P ) = Z. In POLPA, a process P expresses the admissible behaviour of a system. Obligations are explicitly identified by using special start and end tags: Ob = startob .P.endob . These tags are only syntactic and they serve for distinguish obligation and authorization processes. We sketch a possible mapping from CNL4DSA to POLPA for the Authorization Fragment. Let M be the mapping function, then M is inductively defined as follows: • • • •
We introduced the identifiers CCB and P OB to avoid repeating the corresponding objects descriptions.
4.
FROM AUTHORING TO ANALYSIS
A DSA has a complex lifecycle, which comprehends at least the authoring, enforcement, and disposal phases. Passing from authoring to enforcement means the successful overcoming of the verification phase, in which the content of the DSA is analyzed to identify, and possibly fix, conflicts among authorizations/obligations clauses. CNL4DSA has been proposed with an eye for formal verification. Its semi-formal structure makes it suitable for a simple, intuitive and easy mapping to high-level formal policy languages. Here, we consider the POLPA process algebra [19, 1], built on top of CSP [12] and enriched with a predicate construct expressing that a transition happens only if a predicate is true. This language proved to be useful for expressing usage control policies, in particular for web applications [1] or for GRID systems [19]. We recall syntax and semantics of the language. The reader is referred to [19] for more details. The syntax of POLPA is as follows: P ::= stop | all | α(~ x).P | p(~ x).P | x := e.P | P1 orP2 P1 parAlf a P2 | Z with parameterized actions α(~ x) ∈ Act and possibly parameterized predicates p(~ x). The informal semantics of the language is the following: • stop is the process that allows nothing; • all is the process that allows all; • α(~x).P is the process that performs action α(~ x) and then behaves as P; • p(~x).P is the process that behaves as P whenever the predicate p(~x) is true; otherwise, it gets stuck; • x := e.P assigns to variable x the value of expression e and then behaves as P . In P , occurrences of x are replaced by e; • P1 or P2 is the non-deterministic choice between the two processes;
M (canf r) 7→ α(~ x).stop; M (F rA ; F rA ) 7→ M (F rA )par∅ M (F rA ) M ( if C then F rA ) 7→ p(~ x).M (F rA ) M ( after fr then F rA ) 7→ α(~ x).M (F rA )
The replicate operator may simulate the persistence of the authorizations during the DSA validity period. If an authorization says: “a webmail service can archive my read e-mail after 11:00 PM”, then the context “after 11:00 PM” must be evaluated every day during the DSA validity period. The mapping function for the Obligation Fragment is essentially the same as the one for the Authorization Fragment, with the exception that the mapping of the composite F rOb is enclosed into the startob , endob tags. Each atomic context c corresponds to a predicate p(~ x), expressing attributes either of location, or of time, or of roles, etc... Contexts, as well as predicates, are combined by logical connectives with exactly the same logical meaning. The atomic fragment f r corresponds to the action α(~ x), with parameters ~ x representing the subject, the object, and other optional parameters. Hence, the fragments’ set given by the ontology corresponds to the actions’ set Act in POLPA.
5.
RELATED WORK
Here we compare our proposal with existing frameworks. [4] investigates platform-independent policy frameworks to specify, analyze, and deploy security and networking policies. In particular the authors describe a scenario-based demo of a portal prototype for usable and effective policy authoring through either natural language or structured lists that manage policies from the specification to the possible enforcement. However, they do not present any formal language for authoring. Here, we describe a language by giving its syntax and semantics. The main goal is an easy mapping into a high level language for analysing and enforcing any agreements we are able to write. [22, 23] specifically focus on DSA. They model the agreement as a set of obligation constraints. Obligations are expressed as distributed temporal logic predicates (DTL), a generalization of linear temporal logic including both pasttime and future-time temporal operators. With respect to
the logical language in [22, 23], our CNL4DSA aims at a more friendly DSA authoring phase, by letting even the non expert users to easily type their DSA in a structured but yet understandable way. Nevertheless, we also take into account the problem of (automated) DSA verification: CNL4DSA translation to a formal language such as POLPA let us leverage existing analysis frameworks like, e.g., Maude [5]. [14] advocates the idea of using controlled English as a Semantic Web language allowing the creation of Semantic Web content in a user-friendly and yet logically precise way. The resulting framework is quite flexible and extensible. CNL4DSA has a more limited scope: rather than providing a comprehensive controlled natural language, we focus on a reduced set of constructs that allows users to define authorizations and obligations statements. On the other side, we observe that due to the specific domain requirements, CNL4DSA expressiveness goes beyond that of [14] in at least the definition of temporal constructs such as af ter f thenFA/Ob . Also [8] presents a logic-based policy analysis framework which (i) is expressive, (ii) considers obligations and authorizations, (iii) includes a dynamic system model, and (iv) gives useful diagnostic information. Our language considers authorizations and obligations, although it is not directly thought for their analysis. Instead, the main aim is to help the user to write DSA in a structured way. CNL4DSA plays the role of an authoring user oriented language.
6.
CONCLUSIONS
We focused on the definition of CNL4DSA, and on its translation to a process algebra based formalism (POLPA). Compared to other approaches, our work aims at defining a language that is (i) simple and human readable, (ii) sufficiently expressive to specify common authorizations and obligations in an agreement, and (iii) translatable to one (or more) formal language(s) enabling the DSA analysis. This approach is intended to facilitate the development of user-friendly editing tools that can exploit capabilities of background analysis engines. The combination of such tools would allow users to dynamically define DSA ensuring appropriate privacy conditions, and detect/solve conflicts before the actual enforcement of the resulting policies. We are currently developing an executable specification of POLPA with Maude [5], that offer built-in model checking capabilities for the analysis of DSA clauses. We intend to exploit Maude features to develop a DSA analysis engine that can detect possible conflicts within a DSA. We plan to interface this back-end analysis engine with a user-friendly DSA authoring tool, which will enable end-users to easily write CNL4DSA fragments, and to see possible conflicts detected by the analysis tool. Finally, we also plan to extend the expressiveness of CNL4DSA by adding, e.g., a while construct to deal with context conditions having a duration.
7.
ACKNOWLEDGMENTS
We would like to thank the anonymous reviewers for their helpful comments. We also thank Marco Casassa Mont (HP Labs, Bristol) and Fabio Martinelli (IIT-CNR) for their precious comments and hints.
8.
REFERENCES
[1] B. Aziz et al. Controlling usage in business process workflows through fine-grained security policies. In TrustBus, LNCS 5185. Springer, 2008.
[2] J. A. Bergstra, A. Ponse, and S. A. Smolka, editors. Handbook of Process Algebra. Elsevier, 2001. [3] C. Brodie et al. An empirical study of natural language parsing of privacy policy rules using the sparcle policy workbench. In SOUPS ’06. ACM, 2006. [4] C. Brodie et al. The coalition policy management portal for policy authoring, verification, and deployment. In POLICY, pages 247–249, 2008. [5] M. Clavel et al., editors. All About Maude - A High-Performance Logical Framework, How to Specify, Program and Verify Systems in Rewriting Logic, volume 4350 of LNCS. Springer, 2007. [6] J. D. Coi, P. K¨ arger, D. Olmedilla, and S. Zerr. Using Natural Language Policies for Privacy Control in Social Platforms. In SPOT’09, 2009. [7] L. Cranor, M. Langheinrich, M. Marchiori, M. Presler-Marshall, and J. Reagle. The Platform for Privacy Preferences 1.0 (P3P1.0) Specification. W3C Recommendation http://www.w3.org/TR/P3P/, April 2002. [8] R. Craven et al. Expressive policy analysis with enhanced system dynamicity. In ASIACCS, 2009. [9] Us datatrust terms of use. http://www.usdatatrust.com/disclaimer.htm, 2008. [10] Economic and social data service end user license. http://www.esds.ac.uk/aandp/create/eul.asp, 2006. [11] Data sharing protocol for crime reduction. www.crimereduction.homeoffice.gov.uk/infosharing.doc, 2006. [12] C. A. R. Hoare. Communicating sequential processes. Commun. ACM, 21(8):666–677, 1978. [13] L. Kagal. Rei : A policy language for the me-centric project. Technical report, HP Labs, September 2002. [14] K. Kaljurand. Attempto Controlled English as a Semantic Web Language. PhD thesis, in Mathematics and Computer Science, Tartu Univ., 2007. [15] C.-M. Karat, C. Brodie, and J. Karat. Usable privacy and security for personal information management. Commun. ACM, 49(1):56–57, 2006. [16] J. Karat et al. Designing natural language and structured entry methods for privacy policy authoring. In INTERACT, pages 671–684, 2005. [17] K. G. Larsen. Modal specifications. In Automatic Verification Methods for Finite State Systems, pages 232–246, 1989. [18] K. G. Larsen and B. Thomsen. A modal process logic. In LICS, pages 203–210, 1988. [19] F. Martinelli and P. Mori. A model for usage control in grid systems. In GRID-STP07, 2007. [20] MedicAid and MediCare Services. Inter-agency data-sharing agreement. www.cms.hhs.gov/states/letters/smd10228.asp, 2004. [21] M. Schunter and C. Powers. The enterprise privacy authorization language (epal 1.1). http://www.zurich.ibm.com/security/ enterprise-privacy/epal/, June 2003. [22] V. Swarup et al. A data sharing agreement framework. In ICISS, pages 22–36, 2006. [23] V. Swarup et al. Specifying data sharing agreements. In POLICY, pages 157–162, 2006.