Secure Publishing of XML Documents Barbara Carminati Dipartimento di Scienze dell’Informazione Universita’ degli Studi di Milano
[email protected]
Abstract. Secure publication over the Internet of XML data is becoming a crucial need as XML is rapidly becoming a standard for document representation and exchange over the Web. Publishing services must have a mechanism that ensures that a user receives all and only those portions of information he/she is entitled to access (for instance those for which the user has paid a subscription fee) and that these contents are not eavesdropped during their transmission from the publishing service to the user. In this paper we present the work carried on in the area of secure XML publishing as part of the Ph.D. activity. More precisely, we describe a preliminary architecture for an XML Publisher, emphasizing the open issues to be covered.
1
Introduction
Due to the widespread use of XML [17] as a standard for document representation and exchange over the Web, the development of a framework supporting secure publishing of XML documents is becoming a crucial need [1,2]. Publishing services for XML documents (called XML Publishers, hereafter) must deal with a scenario characterized by a very large and heterogeneous communities of subjects, which selectively access a possible large source of XML documents, containing information of different sensitivity degrees. Thus, in order to support a selective dissemination of data it is necessary, as a first step, to define an access control model specifically tailored for XML documents. Then, according to the stated access control policies, the XML Publisher must ensure a secure dissemination of XML documents to subjects. Secure dissemination in a web environment entails addressing three main issues: authenticity, integrity, and confidentiality. Ensuring document authenticity means that the subject receiving a document is assured that the document contents come from the source it claims to be from. Ensuring document integrity means ensuring that the contents of the document are not altered during its transmission from the source to the intended recipient. Ensuring document confidentiality means that the document contents can only be disclosed to subjects authorized according to the specified access control policies. In this paper we present the work carried on in the area of secure XML publishing as part of the Ph.D. activity. The remainder of this paper is organized as follows. In Section 2 we summarize the main requirements for an access control model for XML documents, A.B. Chaudhri et al. (Eds.): EDBT 2002 Workshops, LNCS 2490, pp. 587–596, 2002. c Springer-Verlag Berlin Heidelberg 2002
588
B. Carminati
whereas in Section 3 we briefly explain a preliminary architecture for an XML Publisher, emphasizing the open issues to be covered. Then, in Section 4 we give an overview of the preliminary results we have obtained by our Ph.D work. Finally, in Section 5 we point out the main issues we plan to investigate in the future.
2
Access Control Models for XML Documents
The first step in defining a secure XML Publisher is the definition of an access control model specifically tailored to the protection of XML documents. The development of an access control model for XML documents poses several new requirements, which imply revisiting the models developed in the context of traditional DBMS environments. Such requirements are briefly summarized in what follows. Selective and differentiated protection of document contents. XML documents may have a nested or hierarchical structure, being defined in terms of components that can be themselves organized into subcomponents. Moreover, XML documents could be inter-linked through Xlink language [16]. Additionally, very often different components of the same XML document have varying protection requirements. To support a differentiated and selective protection of Web documents, access control policies must be flexible enough to support a wide spectrum of protection granularity levels, identified on the basis of both the structure and the contents of document. Examples of protection granularities are a single document, a set of documents, an element of a document, an attribute of a document. For example, consider an XML document describing a purchase order which provides descriptive information about the order and about the item(s) associated with it, with links to additional documents describing related clients. Having the possibility of specifying fine-grained protection objects, different policies can be formulated for the protection of the purchase order contents, stating for instance that information about the name of items could be made available to everyone, whereas information regarding the carrier should be released only to selected subjects, or that the link(s) to client documentation could be kept hidden from most subjects and made accessible only to a restricted number of authorized subjects. Protecting documents at the intensional level. XML documents may have an associated DTD or XMLSchema [6] specifying their structures. The access control model must provide the ability to exploit this intensional description in the specification of the access control policies. For example, it must be possible to specify access control policies at the DTD level, which apply to all valid documents conforming to the DTD. In this way, the definition of access control policies for XML documents exploits a notion of schema or type, in analogy with conventional policies for relational and object-oriented databases.
Secure Publishing of XML Documents
589
Specifying subject by means of credentials. The population accessing XML document sources is generally composed of heterogeneous subjects, characterized by different skills and needs. Moreover, the population is dynamic, in that the number and type of subjects is not known a priori and can change over time very frequently. In this context, conventional identity-based access control schemes are not sufficient and access control policies based on the notion of credential [15] are required. Subject credentials assert certified properties of a subject, either personal characteristics, or characteristics and properties deriving from relationships the subject has with other subjects (e.g., qualification within an organization). In this way, the specification of access control policies becomes more direct and intuitive, since policies are defined in general terms, close to the rules and conventions holding for the documents to be protected. Exception management and propagation. Supporting fine-grained policies could lead to the specification of a, possibly high, number of access control policies, independent of the level at which policies are specified, either instance or schema level. In fact, one should specify a policy for each different protection granule of a target protection object. Additionally, the presence of several protection granularity levels entails the definition of a flexible and concise way of specifying exceptions, that is, situations where a protection object has security requirements different from those of its siblings in the hierarchical structure. To cope with these two requirements, access control policies for XML documents need to rely on the advanced features of positive and negative policies and of propagation. Positive policies specify permissions while negative ones specify denials. Propagation means that policies (either positive or negative) specified for a protection object at a given granularity level “apply by default” to all protection objects related to it according to a certain relationship in the hierarchical structure. By combining positive and negative policies with propagation, the number of policies to be specified for data protection and exception management reduces sensibly. By exploiting propagation, document protection can be enforced by specifying a variable number of policies on the DTD, depending on the protection requirements to be enforced. For example, documents with homogeneous protection requirements can be protected by specifying only one policy defined at the document level, by asking a cascading propagation to all protection granules within the document. A document having highly heterogeneous protection requirements can be secured by defining a number of policies, one for each protection object with different protection needs, without propagation or with a limited propagation within the document itself.
3
Secure Dissemination of XML Documents
Once the access control policies for a given source have been specified, XML documents belonging to the source can be released to subjects, on the basis of
590
B. Carminati
the specified policies. An XML Publisher must support at least two different distribution modes: Pull and Push. Under the pull mode, the subjects explicitly require the XML documents (or portions of documents) when needed. Upon a document request, the XML Publisher first verifies whether the requesting subject is entitled to access the requested document, according to the specified access control policies and returns the subject a view of the requested document that contains all and only those portions for which he/she has a corresponding authorization. When no authorizations are found, the access is denied. Besides the traditional pull mode, also a push dissemination mode can be successfully adopted in the Web context, suitable for XML documents that must be released to a large community of subjects and which show a regular behaviour with respect to their release (e.g., they must be periodically distributed or when some pre-defined events happen). According to a push modality, the XML Publisher periodically broadcasts (portion of) its documents to authorized subjects, without the need of an explicit access request by the subjects. There is thus the need to integrate in the XML Publisher mechanisms enabling both pull and push distribution of XML documents. The Ph.D. activity has focused mainly on the support for the push distribution mode. The main problem in supporting information push is that, since different subjects may have privilege to see different, selected portions of the same document, it may entail generating different physical views of the same document and sending them to the proper subjects. Moreover, due the possible high number of subjects accessing an XML Publisher service and the wide range of protection granularities, the number of such views may become rather large and thus such an approach cannot be practically applied. Due to these reasons in the Ph.D. activity we have investigate the use of the encryption techniques to efficiently support information push. The idea is to encrypt different portions of the same document with different encryption keys on the basis of the specified access control policies. The same encrypted copy of the document is then broadcasted to all subjects, whereas each subject only receives the key(s) for the portion(s) he/she is enabled to access. In such an approach the main problems to be faced regard the generation and management of the encryption keys. Indeed, a key requirement is the minimize of the number of keys that need to be generated in order to encrypt the various document portions. Additionally, another important aspect to be covered is the definition of an efficient key management schema able to minimize the keys to be sent to the subjects. Moreover, since the web environment is highly dynamic, the access control policies specified over an XML source could change very frequently both in terms of the subject and of the object specification and the policy updates could require the modification of the generated document encryption. There is thus the need of strategies for efficiently update document encryption upon a policy modification. According with the above-mentioned requirements, we come up with an architecture for an XML Publisher supporting both information pull and push approach, which is depicted in Figure 1. In the proposed architecture subjects are required to register with the publishing service during a mandatory sub-
Secure Publishing of XML Documents
591
scription phase for the definition of subject credentials. Based on the subject credentials, the XML Publisher then releases the subject the necessary key(s) to decrypt all and only those portions which he/she is entitled to access. Additionally, the architecture contains two modules enabling information push and information pull. By means of the information pull module, upon a subject requests an XML document through an XML query [7,11,21], the XML Publisher first verifies whether the subject is entitled to access the requested document, then sends the subject the suitable view. By contrast, by means of the information push module the XML Publisher broadcasts the same encrypted copy of the document to all the subjects. Then, each subject can decrypt only those portions which are encrypted with one of the keys received in the subscription phase. The architecture is completed by an Encrypted Document Base (EDB) which stores an encrypted copy of those XML documents that should be distributed under the push mode.
XML Publisher
EDB
Policy base
Information Pull
User k
User h
Information Push
Pi Request
Subscription credentials
i
keys of P
keys of P j
keys of Pi
Subsc ription Crede ntials
Subscription
Pi
Pj
Pi PiPj Pj
PiPj
PiPj
User h
User k
User h
Pi Pj
Pi PiPj
PiPj
Fig. 1. XML publisher architecture
4
Preliminary Results
In this section we give a brief overview of the main results obtained by the Ph.D. activities of this year. Administrative Operations. Supporting information push makes access control policy management more difficult. Indeed, each time an access control policy is inserted, revoked, or modified, the EDB should be modified accordingly. A first
592
B. Carminati
preliminary effort of our research is thus to provide the Security Administrator with efficient tools for managing such operations. Thus, a first step towards the development of an efficient XML Publisher is defining an efficient strategy for incrementally maintaining the EDB upon the execution of an administrative operation changing the set of access control policies specified for the source. In particular, in [5] we have proposed a set of algorithms for incrementally maintaining the EDB by changing all and only those portions of the document which are really affected by the insertion or revocation of a policy, without the need of re-encrypting the document from scratch. Besides presenting the algorithms and the related data structures, in [5] we have provided a complexity study of the proposed algorithms. Publishing Service for Digital Libraries of XML documents. An relevant issue investigated during the Ph.D activity is the definition of a service for secure dissemination of data belonging to Digital Libraries of XML documents. The digital library must have a mechanism that ensures that only in the subscribed period a subject receives all and only those portions of the library he/she is entitled to access (for instance those for which the user has paid a subscription fee) and that these contents are not eavesdropped during their transmission from the library to the subject. We have thus proposed a possible approach of an Publishing Service for Digital Libraries in [3]. This approach is based on the concept of package. A package is defined as a collection of components, where each component consists of a portion of information (an XML document) together with a subscription period. The subject has complete freedom in defining the information he/she is interested in receiving. Indeed, a subject can subscribe to different packages and each package may contain portions with different subscription periods, according to the subject needs. Additionally, the subject can select, for each package, the mode according to which he/she wants to receive its content. Also in this respect, the proposed publishing service supports a variety of package distribution modes and the subject can select the most suitable one according to his/her needs. More precisely, three different distribution modes are supported – Pull, Push, and Notify. If a subject selects the pull mode for a given package, he/she has to explicitly request the package portions when needed. By contrast, under the push mode the subject does not have to explicitly request the package. Rather, the publishing service periodically, or when a modification occurs, sends the modified portions to all the users subscribed to a package to which the portions belong to, without the need of an explicit request. Finally, if the notify mode is selected, the subject only receives a notification that a modification in one of his/her package occurs, and he/she has to explicitly request the portion(s) he/she is interested in receiving. The flexibility provided to the subjects requires the development of suitable encryption schemes for secure delivery of package contents to subscribers. The schemes must ensure that a subject can access the portions of information to which he/she has subscribed for the duration of his/her subscription and that such information is no longer accessible by the subject when the subscription expires. To fulfill these requirements in [3] we have proposed different encryption
Secure Publishing of XML Documents
593
schemes for the different distribution modes we support. More precisely, the scheme proposed for pull and notify packages relies on symmetric keys [13]. Indeed, upon the completion of the subscription phase, the subject receives from service a unique symmetric key [13], which is then used by the publishing service to encrypt the portions of the package returned to the subject as answer to an access request. In such a way, only the entitled user is able to decrypt the access request answer because he/she is the only one sharing the encryption key with the publishing service. In case of package to be delivered under the push mode, we have proposed a different approach aiming at minimizing the number of keys that need to be generated, by guaranteeing at the same time the security of information delivery. By such an approach, the service generates an encryption key for each component (called component key). When a subject subscribes to a package in push mode, he/she receives all the keys associated with the components belonging to the package. Then, when the publishing service needs to send a portion, it encrypts the portion only once with a session key and sends it to all the subjects. Together with the encrypted portion it sends also a different encryptions of the session key, encrypted with the different component keys corresponding to not expired component. This encryption scheme ensures that the portion can be accessed only by subjects whose subscription period is not expired. Owner-Publisher Architecture. Another issue we have investigate is the definition of a novel framework for XML Publishers, which is based on a distinction between the role of the Owner and the Publisher of the information. In such an approach the Owner is the producer of the information and is responsible for specifying access control policies regulating the access or the distribution of the information it produces. Whereas the Publisher is responsible for managing (a portion of) the Owner information and for answering subject queries. Making a distinction between the Owner and the Publisher has several benefits. First, as any decentralized architecture, such a solution has the advantage of being scalable. For instance, within an organization there may exist a Publisher service for the internal exchange of data, and another one for the external dissemination. As a consequence, this approach also reduces the risk that the Owner becomes a bottleneck of the entire system, and it does not require to the Owner expensive activities, such as key management, and query optimisation. Also in such an architecture an important issue is the secure dissemination of XML documents. Together with the traditional requirements (integrity, authenticity and confidentiality) a further important requirement is that a subject receiving an answer to an access request must be able to verify the completeness of the answer, that is, it must be able to verify that it receives all the document(s) (or portion(s) of document(s)) that it is entitled to access, according to the stated access control policies. In [4] we have developed an approach addressing the problem of document contents authenticity and completeness. The key point of the approach we propose is that it does not require the Publisher to be trusted with respect to authenticity and completeness, but it ensures at the same time that a subject is able to verify the authenticity and the
594
B. Carminati
completeness of the answer returned by a Publisher. This capability is obtained using digital signature and hashing techniques. Figure 2 illustrates the approach. It requires that a subject first subscribes to the Owner. As a result of the subscription process, the Owner returns the subject a policy configuration, which is a certificate containing information about the access control policies which apply to the subject. The subject policy configuration is signed with the private key of the Owner to prevent the subject from altering its contents. Additionally, the Owner sends the Publisher the documents it is entitled to manage, along with information on which subjects can access which portions of the set of documents. The Owner also sends the Publisher a summary signature, computed using a technique based on Merkle hash trees [10], for each document the Publisher is entitled to manage. This hash value is signed with the the private key of the Owner, and is used by a subject to verify the authenticity and the completeness of the answer returned by the Publisher. Such additional information are added to the XML document and coded using XML, resulting in what we call a security enhanced XML document. This approach has the benefit of supporting a uniform management of XML data and related security information. Details on how the subject summary signature can be used to prove the authenticity and completeness of query results can be found in [4].
d nce nha ent ty e um uri doc L XM
Sec
Subject
OWNER
co rre ct res Su ult bsc Su bje rip tio and ct po n req Ve licy ue rifi st cat confi ion gur pac atio kag n e
st ue req ss tc ne rre Co
Query and subject policy configuration
Subject
PUBLISHER
CLIENT Reply document
legend: mandatory interactions Subject
optional interactions
Fig. 2. System architecture
5
Future Work
We plan to extend the work carried on so far in the area of XML security along several direction. A first extension we are currently working on is the addition of a temporal dimension to access control policies. More precisely, we plan to extend our access control model with temporal access control policies, that is policies that hold only in a specific (periodic) temporal interval. For instance, using a temporal policy it is possible to authorize a subject to access a document only in selected hours within a day.
Secure Publishing of XML Documents
595
Introducing the support for temporal access control policies in the XML Publisher adds complexity to the information push approach, in particular with respect to key management. Indeed, as we saw in Section 3, we support information push by encrypting different portions of the same document with different encryption keys. Let consider, for instance, two access control policies acpi and acpj with two disjoint intervals of validity. Moreover, suppose that there exists a set of document nodes N to which only policies acpi and acpj apply. It is not possible to generate a unique key for encrypting these nodes because the policies have two different intervals of validity. That is, the set of subjects to which one between policy acpi and acpj apply, can access the nodes in N but in different temporal intervals. It is thus necessary to encrypt the nodes in N with different encryption keys for each different sub-interval resulting from the intersection between the intervals of validity of acpi and acpj . Thus, the introduction of a temporal dimension in the access control policies poses the necessity to investigate new key assignment schemes. In particular, we plan to investigate a new key assignment scheme adapted from [14] which allows an efficient hierarchy management of temporal keys. In the XML Publisher architecture, presented in Figure 1, XML can be used to specify security related information, such as for instance, digital signatures, access control policies, and encrypted contents. Expressing security information using XML has several advantages in a Web environment in that it facilitates their transmission and exchange among heterogeneous sources. Additionally, the protection of XML documents and their security-related information is uniform, in that security information are coded as XML documents and thus they can be protected using the same mechanisms developed for the protection of XML documents. Thus, an important issue to be covered is the definition of protocols supporting the proposals of the World Wide Web Consortium (W3C) for XML signature [19] and for XML encryption [18]. Another important aspect we are currently working on is the definition of a formal model for the XML Publisher architecture. We are currently investigating the use of a model based on the case-place automata as an abstract formal framework for the ‘idealized workers and idealized managers’ (IWIM) [12] as formal foundation of our architecture. The availability of a formal description of the architecture will be the basis to formally prove important security properties (such as for instance authenticity, integrity, confidentiality) of the service. Acknowledgments. I would like to thank my thesis advisors Professor Elisa Bertino of University of Milano, and Professor Elena Ferrari of University of Insubria, Como.
References 1. RJ Anderson, JH Lee. Jikzi: A New Framework for Secure Publishing. Security Protocols: Proceedings of the 5th international workshop. 2. E. Bertino, B. Carminati, E. Ferrari. XML Security. Information Security Technical Report, Vol 6-2,44-58, 2001, Elsevier Advanced Technology.
596
B. Carminati
3. E. Bertino, B. Carminati, E. Ferrari. A Secure Publishing Service for Digital Libraries of XML Documents. Information Security Conference (ISC01), Lecture Notes in Computer Science, 2200:347-362, Malaga, Spain, 2001. 4. E. Bertino, B. Carminati, E. Ferrari, B. Thuraisingham, A. Gupta. Selective and Authentic Third-party Distribution of XML Document. Technical Report DSI, University of Milano. Submitted for publication. 5. B. Carminati, E. Ferrari. Management of Access Control Policies for XML Document Sources. Technical Report DSI, University of Milano. Submitted for publication. 6. D. Beech, M . Maloney, N. Mendelsohn, H. Thompson. XML Schema Part 1: Structures. W3C Proposed Recommendation, October 2000. 7. A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. A Query Language for XML. In Proc. Int’l Conference on World Wide Web, Toronto, Canada, May 1999. Available at: http://www.research.att.com/suciu. 8. C. Geuer Pollmann. The XML Security Page. Available at http://www.nue.et-inf.uni-siegen.de/∼geuer-pollmann/xml security.html. 9. H. Gladney, and J. Lotspiech. Safeguarding Digital Library Contents and Users: Assuring Convenient Security and Data Quality. D-lib Magazine, May 1997. 10. R.C. Merkle A Certified Digital Signature. In Advances in Cryptology-Crypto ’89, 1989. 11. J. Robbie. XQL’99 Proposal, 1999. Available at http://metalab.unc.edu/xql/xql-proposal.html. 12. P. Katis, N. Sabadini, R.F.C. Walters. A formalization of the IWIM model. COORDINATION 2000, (Eds.) Porto, A.; Roman, G.-C., (Eds.), LNCS 1906, pages 267-283, 2000. 13. W. Stallings. Network Security Essentials: Applications and Standards. Prentice Hall, 2000. 14. Wen-Guey Tzeng. A Time-Bound Cryptographic Key Assignment Scheme for Access Control in a Hierarchy. IEEE Transactions on Knowledge and Data Engineeriing, 2001. 15. M. Winslett, N. Ching, V. Jones, I. Slepchin. Using Digital Credentials on the World Wide Web. Journal of Computer Security, 7, 1997. 16. World Wide Web Consortium. XLink XML Linking Language, 1.0, 1999. W3C Recommendation. Available at http://www.w3.org/TR/xlink/ 17. Word Wide Web Consortium. Extensible Markup Language (XML) 1.0, 1998. 18. World Wide Web Consortium. XML Encryption Working Group, 2001. http://www.w3.org/TR/2001/CR-xmldsig-core-20010419/ 19. World Wide Web Consortium. XML Signature, 2001. http://www.w3.org/TR/2001/CR-xmldsig-core-20010419/ 20. World Wide Web Consortium. XML Path Language (Xpath), 1.0, 1999. W3C Recommendation. Available at http://www.w3.org/TR/xpath. 21. World Wide Web Consortium. XML Query (XQuery), 1.0, 2001. W3C Working Draft . Available at http://www.w3.org/TR/xquery.