SUPPORTING TRUSTED DATA EXCHANGES IN COOPERATIVE INFORMATION SYSTEMS Paola Bertolazzi1, Maria Grazia Fugini2, Massimo Mecella3 Barbara Pernici2, Pierluigi Plebani2, Monica Scannapieco1,3 1
2
3
Istituto di Analisi dei Sistemi ed Informatica Consiglio Nazionale delle Ricerche (IASI-CNR)
[email protected] Dipartimento di Elettronica e Informazione Politecnico di Milano {fugini,pernici}@elet.polimi.it,
[email protected] Dipartimento di Informatica e Sistemistica Università di Roma “La Sapienza” {mecella,monscan}@dis.uniroma1.it
Contact author:
Monica Scannapieco Dipartimento di Informatica e Sistemistica Università di Roma “La Sapienza” Via Salaria 113 (2nd floor, room 231) I-00198 Roma, Italy Phone: +39 06 49918479 Fax: +39 06 85300849 E-mail:
[email protected]
Abstract. In cooperative processes and in e-services, an evaluation of the quality of exchanged data is essential for building mutual trust among cooperating organizations and correctly performing cooperative activities. Several quality dimensions, related to the intrinsic nature of data and to the context of the cooperative process where data are used, must be taken into consideration. In addition, in order to accomplish a trusted cooperative environment, data sensitivity parameters must be taken into account. A model for data quality in cooperative information systems and e-applications is proposed, together with an architecture for trusted exchanges of data and quality information associated to it. Strategic use of the model and of the architecture is discussed. Keywords: Data quality, workflow systems, e-services, security
I
SUPPORTING TRUSTED DATA EXCHANGES IN COOPERATIVE INFORMATION SYSTEMS Abstract. In cooperative processes and in e-services, an evaluation of the quality of exchanged data is essential for building mutual trust among cooperating organizations and correctly performing cooperative activities. Several quality dimensions, related to the intrinsic nature of data and to the context of the cooperative process where data are used, must be taken into consideration. In addition, in order to accomplish a trusted cooperative environment, data sensitivity parameters must be taken into account. A model for data quality in cooperative information systems and e-applications is proposed, together with an architecture for trusted exchanges of data and quality information associated to it. Strategic use of the model and of the architecture is discussed. Keywords: Data quality, workflow systems, e-services, security
1 INTRODUCTION Recently, the widespread use of information technology and the availability of networking services have enabled new types of applications, characterized by several
geographically
distributed
interacting
organizations.
The
term
Cooperative Information Systems (CIS) is used to denote distributed information systems that are employed by users of different organizations under a common goal (Mylopoulos and Papazoglou 1997, Brodie 1998). A recent extension of the CIS allows providing e-services on line in a cooperative context, by means of eapplications (VLDB-TES 2000, Mecella and Pernici 2001). In addition to geographical distribution and inter-organization cooperation, in e-applications (i) cooperating organizations may not know each other in advance and (ii) eservices can be composed both at design and run-time. Whereas in traditional “closed” CIS mutual knowledge and agreements upon design of applications are the basis for the cooperation, the availability of a complex platform for e-services (Mecella et al. 2001a) allows “open” cooperation among different organizations. An approach towards e-applications can be found in UDDI, an initiative for defining eXtensible Markup Language (XML) (W3C 1998) documents to publish and discover services on the Web. The UDDI Business Registry stores different
1
types of information about a service, that is, business contact information (“white pages''), business category information (“yellow pages'') and technical service information (“green pages'') (UDDI 2000). In such a framework, organizations willing to offer e-services give a description of the services informally, based on free text, and other organizations willing to use these e-services interact with the offering organization on the basis of agreed upon interfaces. Other proposals for architectures for e-services based on workflow systems have been presented in the literature (VLDB-TES 2000, Casati et al. 2001, Mecella et al. 2001a). The starting point of all these approaches is the concept of cooperative process, also referred to as macro process (Mecella and Batini 2001) or multi-enterprise process (MEP, Schuster et al. 2000), defined as a complex workflow involving different organizations; unlike traditional workflow processes where all the activities concern the same enterprise, in a cooperative process the activities involve different organizations, either because they form together a virtual enterprise or since they exchange services and information in a coordinated way. The approach presented in (Mecella et al. 2001a), which constitutes the underlying framework in this paper, assumes that a cooperative process can be abstracted and modeled as a set of e-services exported from cooperating organizations. The definition of a cooperative process as a set of eservices
constitutes
a
reference
schema
for
the
cooperation
among
organizations; an e-service represents a “contract” on which an organization involved in the cooperative process agrees. Organizations, which cooperate in CIS/e-applications, can be of two types: •
trusted organizations: data transmission occurs among organizations which trust each other in a network due to organizational reasons (e.g., homogeneous work groups in a departmental structure, or supply-chain relationships among organizations forming a virtual enterprise);
•
external organizations: data are transmitted among cooperating entities in general, possibly accessing external data sources. Every time mutual knowledge among organizations participating in CIS/e-
applications is not given in advance, new mechanisms are needed to ensure that
2
mutual trust is established during cooperative process executions. Trust regards mainly two aspects: (i) the quality of data being exchanged, and (ii) a secure environment for information exchange to guarantee sensitive information. The properties to indicate the quality of data being exchanged are both intrinsic to data itself and process dependent, i.e., they depend on the activity in which they are used and when they are used. We argue that organizations need to specify and to exchange information explicitly oriented to describe the quality of data circulating in CIS/e-applications. The availability of quality data allows interacting organizations to assess the quality of received and of available data before using them. Sensitivity
concerns
both
correct
authentication
of
cooperating
organizations and guaranteeing that only authorized organizations can read, use, and generate data in the cooperative process. To guarantee sensitive information, security technologies can be used, e.g., based on the use of digital certificates and signatures, to allow the cooperating organizations to establish a secure communication environment and to ensure the needed level of confidentiality. The goal of the present paper is to propose a model for data quality, including both traditional and original quality dimensions, and an architecture for trusted data exchange supporting sensitivity among cooperating organizations. Both quality and sensitivity are used to define the level of trust of the CIS/eapplication. The paper is organized as follows. In Section 2, we first introduce a running example, to be used for further illustration of our approach, and then we discuss both classical data quality dimensions and additional information that must be associated to data to build mutual trust in CIS/e-applications. The running example stems from the experience of the Italian e-Government initiative (Mecella and Batini 2001), which provides motivations for our work and the test bed in which we will try our approach. In Section 3, the model for data quality is presented in detail, whereas in Section 4 the cooperative framework is described. Finally, in Section 5 we discuss the strategic use of trusted data
3
exchanges in CIS/e-applications. Section 6 discusses related work specifically focused on data quality issues and data security aspects, and Section 7 concludes the paper by remarking future work.
2 A RUNNING EXAMPLE AND THE DATA QUALITY DIMENSIONS In this section we pose the basis for defining a conceptual framework for trusted data exchanges in CIS/e-applications. First, we shortly introduce a running example, then we define data quality dimensions for trusted cooperation in environments such as the one described in the example. 2.1
A RUNNING EXAMPLE
An example taken from the Italian e-Government scenario (Mecella and Batini 2001) will be used throughout the paper. In Italy, the Unitary Network project and the related Nationwide Cooperative Information System (Batini et al. 2001) are currently undertaken, with the aim of implementing a “secure Intranet” able to interconnect public administrations and of developing a Unitary Information System of Italian Public Administrations in which each subject can participate by providing services (e-services) to other subjects. Specifically each administration has been represented as a domain, and each domain offers data and application services, deployed and made accessible through cooperative gateways. Similar initiatives are currently undertaken also in the United Kingdom, where the e-Government Interoperability Framework (e-GIF) sets out the government’s technical policies and standards for achieving interoperability and information systems coherence across the UK public sector. For this purpose the government has launched the UK GovTalk initiative (CITU 2000), that is a joint government and industry forum for generating and agreeing standards, through the definition of XML Document Type Definitions (DTDs) (Goldfarb and Prescod 2000) to be used for information exchange. In this paper we use as running example a simplified version of the cooperative process for income management (see Figure 1).
4
Citizen I: Income-tax Return
Receive Family Status Request
Receive Income-tax Return
fsr: Family Status Request
Produce Family Status Documentation
Send Family Status Request Send Family Status Documentation fsd: Family Status Documentation
Receive Family Status Documentation
City Council ee-Service (schemas)
[There are retired relatives living with]
Send Pension Plan Status Request
ps: Pension Plan Status Request
Receive Pension Plan Status Documentation Receive Pension Plan Status Request [Else]
Other Organizations
Produce Pension Plan Status Documentation
in: Inquiry Notification
ds: Pension Plan Documentation
Calculate Expected Taxes
d: Documentation [Else]
Open Inquiry
Send Pension Plan Status Documentation
[OK] OK: OK Notification
Send OK Notification
Citizen
Italian Social Security Service ee-Service (schemas)
Department of Finance e-Service (schemas)
Figure 1. UML Activity Diagram of the cooperative process “Income Management” and the identified e-services.
Citizens send income-tax returns to the Department of Finance, which, after executing some activities of its own competence, needs to access the family composition of the citizen from other administrations with the purpose of cross-checking data. The family composition of the citizen is checked against data available from the City Council where the citizen is resident. Information about retirement plans (in case some retired persons exist in the family) is obtained from the Italian Social Security Service. More in details, the workflow consists of the Department of Finance receiving income-tax returns by citizens (sent by ordinary-mail, nowadays
5
submitted also through a Web portal); the Department, in order to verify the correct amount of taxes, needs to check incomes of all people forming the same family of the citizen; it requests the family composition to the City Council where the citizen lives. After receiving the family status of the citizen, the Department queries the Italian Social Security Service in order to know the amount of pension perceived by retired persons in the citizen’s family; this activity is carried out only if there are retired persons living with the citizen. After collecting all this information, the Department owns all the needed data to check income-tax returns and possibly start further actions against fraudulent citizens. Until recently, the information exchange described above has been carried out using paper documents; the document exchange activated specific processes in each organization aiming at producing response documents. Now, on the basis of the Unitary Network and Nationwide CIS projects, each administration can develop e-services (shown in Figure 1), allowing other cooperating organizations to ask for and obtain requested data. In the present paper we assume that data are exchanged as XML documents and described through DTDs agreed upon by all the cooperating administrations. The cooperation is effective if exchanged data are trusted, that is, their quality is assessed and their security is guaranteed: if for each exchanged data, its quality is assessed, the receiving organization can set up appropriate measures to face up poor quality situations. As an example, if the citizen address provided by a City Council is assessed not to be updated, the Department of Finance can arrange different activities trying to validate update data against other organizations (e.g., telecommunication companies maintains billing addresses for their customers). Security requirements in this scenario regard the authentication of the cooperating organizations, the decision of the sensitivity levels of data, and the certification of the data transmission. Communication can be assumed to be other trusted (e.g., between the Department of Finance and the City Council) or untrusted (e.g., between the citizen and the City Council).
6
2.2
DATA QUALITY DIMENSIONS
We distinguish two kinds of data quality dimensions: data intrinsic and process specific. Intrinsic data quality dimensions characterize properties that are inherent to data, i.e., depend on the very nature of data; an example is a dimension specifying whether the data about the family composition of a citizen is updated or not. Process specific quality dimensions describe properties that depend on the cooperative process in which data are exchanged; in our reference example, the “timeliness” of exchanged data between the Department of Finance and the City Council is a parameter that is fundamental to measure the efficiency and effectiveness of the cooperative process. Process specific parameters are an original contribution of this paper, as we show how quality is related also to the usage of data and to its evaluation in a cooperative framework. As regards intrinsic data quality dimensions, we refer to a subset of the ones proposed in the literature, by considering the most important ones (Wand and Wang 1996); we provide new definitions based on the classical ones, to adopt them in the CIS/e-application context. We will refer only to data quality dimensions concerning data values; conversely, we do not deal with aspects concerning data quality of logical schemas and data format; in the following definitions, we refer to schema element meaning, for instance, an entity in a Entity-Relationship schema or a class in an object oriented schema expressed in the Unified Modeling Language (UML 2000). 2.2.1 Intrinsic data quality dimensions Our purpose is to associate data with those dimensions that are useful for organizations receiving data to evaluate and validate them before further use. We associate to data (i) syntactic and semantic accuracy, (ii) completeness, (iii) currency, and (iv) internal consistency.
Syntactic and Semantic Accuracy. In (Redman 1996) accuracy refers to the proximity of a value v to a value v′ considered as correct. Based on such a definition, we introduce a further distinction between syntactic and semantic
7
accuracy. Syntactic Accuracy It is the distance between v and v’, being v’ the value considered syntactically correct. Semantic Accuracy It is the distance between v and v’, being v’ the value considered semantically correct. Let us consider the following examples: •
Person is a schema element with Name as the attribute of interest, and p an instance of Person. If p.Name1 has a value v = JON, while v′ = JOHN, this is a case of a low syntactic accuracy as JON is not an admissible value according to a dictionary of English names;
•
if p.Name has a value v = ROBERT, while v′ = JOHN, this is a case of a low semantic accuracy, as v is a syntactical admissible value but the person whose name is stored as ROBERT has a name which is JOHN in the real world.
Syntactic accuracy can be easily checked by comparing data values with reference dictionaries. Semantic accuracy is more difficult to quantify since, according to our definition, the terms of comparison have to be derived from real world, and so verification of semantic accuracy may be expensive. Semantic accuracy can be checked through comparison of the information related to the same instance stored in different databases. A typical process that aims at identifying similar instances consists of two phases: •
A searching phase, in which possibly matching instances are identified (Bitton and DeWitt 1983, Hernandez and Stolfo 1998, Monge and Elkan 1997);
•
1
A matching phase, in which a decision about a match, a non-match or a
The dot notation refers to instances and their attributes, i.e., a.x indicates the value of the attribute x on a specific instance a of the schema element A.
8
possible match is taken, (Hernandez and Stolfo 1998, Monge and Elkan 1997, Cochinwala et al. 1998). Usually, the decision is made in an authomatic or semi-authomatic way, on the basis of the database which is considered as storing values which are considered correct. As an example, all the attribute values related to p (with p.Name = ROBERT), such as, for example, DateOfBirth and EmployeeNumber, could be compared with another instance of Person from a different database considered as correct. In such a case, the process of checking the semantic accuracy requires the matching of < ROBERT, 11-20-1974, 1024 > and < JOHN, 11-20-74, 1024 >, that is (i) recognizing the two instances as potential match, (ii) deciding for a match of the two instances, and (iii) then correcting ROBERT into JOHN.
Completeness. We define this dimension as: Completeness The degree to which values of a schema element are present in the schema element instance. In evaluating completeness, it is important to consider the meaning of null values of an attribute, depending on the attribute being mandatory, optional, or inapplicable: a null value for a mandatory attribute is associated with a lower completeness, whereas completeness is not affected by optional or inapplicable null values. As an example, let us consider the attribute Email of the Person schema element; a null value for the Email attribute may have different meanings, that is (i) the specific person has no email address, and therefore the attribute is inapplicable (this case does not impact on completeness), or (ii) the specific person has an email address but it has not been stored (in this case completeness is low).
Currency. The currency dimension refers only to data values that may vary in
9
time; as an example, values of Address may vary in time, whereas DateOfBirth can be considered invariant. Therefore currency can be defined as the “age” of a value, namely: Currency The distance between the instant when a value is last updated and the instant when the value itself is used. It can be measured either by associating to each value an “updating timestamp” (Missier et al. 2001) or a “transaction time” in temporal databases (Tansell et al. 1993).
Internal Consistency. Consistency implies that two or more values do not conflict each other. By referring to internal consistency we mean that all the values that are compared in order to evaluate consistency are within a specific instance of a schema element. A semantic rule is a constraint that must hold among values of attributes of a schema element, depending on the application domain modeled by the schema element. On the basis of this definition, internal consistency can be defined as: Internal Consistency The degree to which the values of the attributes of an instance of a schema element satisfy the specific set of semantic rules defined on the schema element. As an example, if we consider Person with attributes Name, DateOfBirth, Sex and DateOfDeath, some possible semantic rules to be checked as satisfied are: •
the values of Name and Sex are consistent; if Name has a value v = JOHN and the value of Sex is FEMALE, this is a case of internal inconsistency;
10
•
the value of DateOfBirth needs to precede the value of DateOfDeath.
2.2.2 Process specific dimensions The need for data quality dimensions dependent on the context is recognized in (Wang and Strong 1996); we observe that in CIS/e-applications, the context is the cooperative process and data quality dimensions are related to the evolution of data during time and within the process. We have therefore chosen and adapted some of the dimensions proposed in (Wang and Strong 1996) (timeliness and source reliability), and in addition we propose new dimensions dependent on cooperative processes (importance and confidentiality). Process specific dimensions are tied to specific data exchanges within the process, rather than to the whole process. Hence, in the following definitions, we consider a data exchange as a triple < destination
organization
j,
source
exchange
id
organization
i,
>, representing the
cooperating organizations involved in the data exchange and the specific exchange2.
Timeliness. It can be defined as follows: Timeliness The availability of data on time, that is within the time constraints specified by the destination organization. For instance, we can associate a low timeliness value for the schedule of the lessons in a University, if such a schedule becomes available on line after that the lessons have already started. For computing this dimension, each organization has to indicate the due time, i.e., the latest time within which data have to be received. According to our definition, the timeliness of a value cannot be determined until it is received by the destination organization.
2
Two organizations may be involved in more than one exchange of the same data within the same cooperative process.
11
Importance. This dimension can be defined as: Importance The significance of data for the destination organization. As an example, we can consider an organization B (e.g., the Department of Finance) that cannot start an internal process until an organization A (e.g., the City Council) transfers values of the schema element X (e.g. the family composition of a citizen); in this case, the importance of X for B is high. Importance is a complex dimension that can be defined based on specific indicators measuring: for a schema element the amount of instances managed by the destination organization with respect to a temporal unit, the number of processes internal to the destination organization in which the data are used, the ratio between the number of core business processes using the data and the overall number of internal processes using the data. Therefore importance is: Importance (data, destination org.) = f(# instances of data, # internal processes of destination org. using data, # core business processes of destination org. using data / # internal processes of destination org. using data)
Source Reliability. It can be defined as: Source reliability The credibility of a source organization with respect to provided data; it refers to the pair < source, data >. The dependence on < source, data > can be clarified through an example: a University (source) has a reputation of high reliability when treating data regarding its students and offered courses, but it can have a low reliability when releasing information regarding forthcoming commercial events related to companies that offer stages, since such information is not totally of competence of the University. As another example, the source reliability of the Italian Department of Finance concerning Address of
12
citizensis lower than the one of City Councils; whereas as concern the SocialSecurityNumber its source reliability is the highest among all Italian administrations. The values of source reliability may depend on the methods each organization uses to clean its data and to measure their quality.
Confidentiality. In a cooperative process sensitivity concerns protecting data from accidental and fraudulent misuse. In general, three dimensions are associated to secure information exchange: confidentiality, integrity, and authentication. Confidentiality means that data are not read during transmission, integrity that they are not altered, and authentication that sources and destinations are correct. In the following, we will assume that integrity and authentication are in any case guaranteed by CIS/e-applications, as detailed in the following of this paper, and we associate to data additional information only about confidentiality to data. Confidentiality Indicates whether data must be protected from access by non authorized users. As an example, let us consider the instance of Person with
Name=”John”,
DateOfBirth=”11-20-1974”,
Sex
=
“M”.
Using
the
destination public key of the recipient key-pair, data can be ciphered, obtaining the sequence: D“å–àVÌÁÇ9•ûeÑÉÔ;ÿaˆäqÜdNÞוeYdXN}-çÊCª•éï$t In this way, only the recipient can decrypt the message using his own private key of the same key-pair.
13
3 DATA AND QUALITY MODELS 3.1
DATA MODEL
In our framework, all the organizations involved in CIS/e-applications need to export their data according to some specific schemas; we refer to these schemas as cooperative data schemas. They are class schemas defined in accordance with the ODMG Object Model (Cattell and Barry 1997). Specifically they describe types of exchanged data items, wherein types can be: •
classes, whose instances have their own identities;
•
literals, when instances have not identities, and they are identified by values. It is possible to define new classes as collections of objects (instances are
objects) and also structured literals, as record of literals. As an example, in Figure 2 a detail of the cooperative data schema exported by the City Council in our reference example is shown. This schema defines a Citizen as a class, and Address as structured literal (e.g., records). 3.2
DATA QUALITY MODEL
This section defines the conceptual data quality model that each cooperating organization has to define in order to export the quality of its own data. First we define the notion of cooperative data quality schema, then we distinguish between intrinsic and process specific data quality schemas and describe them in details. A cooperative data quality schema is a UML Class Diagram associated to a cooperative data schema, describing the data quality of each element of the data schema.
14
struct Address { string street; string cityName; string state; string country; short ZIPCode; } … … class Citizen { attribute string name; attribute string surname; attribute string SSN; attribute Date birthDate; attribute Address currentAddress; … … } Figure 2. The cooperative data schema exported by the City Council (detail)
3.2.1 Intrinsic Data Quality Schemas Intrinsic data quality dimensions can be modeled by considering classes, that we call dimension classes, describing the data quality of the data schema elements with reference to a specific dimension; therefore dimension classes represent specific intrinsic data quality dimensions (e.g., completeness or currency). We distinguish two types of dimension classes, according to the fact they refer either to a class or to a structured literal of a data cooperative schema, namely dimension classes and dimension structured literals. Each dimension class represents the abstraction of the values of a specific data quality dimension for each of the attributes of the class or of the structured literals to which it refers, and to which it is associated by a one-to-one association. A dimension class (or dimension structured literal) is represented by a UML class labeled with the stereotype (), and the name of the class should be < DimensionName_ClassName > (< DimensionName_SLName >). As an example, considering the class Citizen, it may be associated to a
15
dimension class, labeled with the stereotype , and the name of which is SyntacticAccuracy_Citizen; its attributes correspond to the syntactic accuracy of the attributes Name, Surname; SSN, etc. (see Figure 3 referring to Figure 2).
Citizen Name Surname SSN 1
SyntacticAccuracy_Citizen Name Surname 1 SSN
Figure 3. An example of dimension class.
3.2.2 Process specific data quality schemas Tailoring UML in a way similar to the one adopted for intrinsic data quality dimension, we introduce process dimension classes, which represent process specific data quality dimensions, in such a way as dimension classes represent intrinsic data quality dimensions. We introduce the exchange structured literal, necessary to characterize process dimension classes. According to the definitions proposed in Section 2, process specific data quality dimensions are tied to a specific exchange within a cooperative process; in our framework, a cooperative process is modeled as the interaction of different e-services provided by the different organizations, and we introduce exchange structure literals to represent the dependence of process specific dimensions from source and destination e-services (and organizations exporting such e-services). We distinguish two types of process dimension classes, process dimension classes and process dimension structured literals; they include the values of the attributes of the class or of the structured literals to which they refer, and to which they are associated by a one-to-one association. We use the stereotypes and , for dimension process classes and dimension process structured literals respectively. The name of the class should be
(
). See Figure 4 as an example. Citizen Name Surname SSN 1
Importance_Citizen Name Surname 1 SSN
Figure 4. An example of process dimension class.
An exchange structured literal is a structured literal associated to process dimension classes. It includes the following mandatory attributes: •
source e-service,
•
destination e-service,
•
process identifier,
•
exchange identifier. Because of the fact that within a cooperative process two e-Services may
have more than one exchange, it is necessary to introduce an exchange identifier, to identify the exchange itself univocally. Exchange structured literals are labeled with the stereotype . The considerations exposed in this section are summarized in Figure 5, in which the quality referring to both intrinsic and process specific dimensions for the Citizen class is represented; the intrinsic data quality dimensions (syntactic and semantic accuracy, completeness, currency, internal consistency) are labeled with the stereotype , whereas the process specific data quality dimensions (timeliness, importance, source reliability, confidentiality) are labeled with the stereotype , and are associated to the structured
literal
Exchange_Info,
labeled
with
the
stereotype
.
17
Completeness_Citizen
SyntacticAccuracy_Citizen Name Surname SSN
Name Su rname SSN
Currency_Citizen Name Surname SSN
Timeli ness_Cit izen Name Surname SSN
Citizen Name Surname SSN
SourceReliability_Citizen Name Surname SSN
SemanticAccuracy_Citizen Name Surname SSN
InternalConsistency_Citi zen Name Surname SSN
Importance_Citizen Name Su rnam e SSN
Exchange_Info Sou rceEServi ce Desti nat ionEServi ce ProcessID ExchangeID
Figure 5. Cooperative data quality schema (detail referring to the Citizen class). All the associations are 1-ary.
4 THE FRAMEWORK FOR TRUSTED COOPERATION 4.1
THE ARCHITECTURE FOR TRUSTED E-SERVICES
Many approaches can be adopted to allow different organizations to cooperate through the definition and development of CIS/e-applications, as described in the introduction. The approach adopted in this paper is workflow-based, that is the different organizations export data and services necessary to carry out specific cooperative processes in which they participate. Such an approach requires an agreement on the data and service models exported by different organizations (Mecella et al. 2001a).
18
In this section, we describe the architecture enabling trusted CIS/eapplications, by focusing on the two central elements, namely data quality and security. The starting point of our framework is the definition of a conceptual cooperative workflow specification, that is, an abstract workflow description that hides the details of process execution in each of the cooperating organizations; an example of this conceptual cooperative workflow specification for the running example has been shown in Figure 1. On the basis of such a schema, each organization defines its cooperative data schemas, which specify the structure of exchanged data. Such schemas are the static interfaces of e-services that implement the cooperative process through exchanges of trusted data and service requests among different cooperating organizations. As an example, in Figure 1 the areas limited by dotted lines identify the e-services. In addition to data schemas, each organization exports cooperative data quality schemas, described in Section 3.2, in which information about the quality of the exported data is modeled. The proposed architecture is shown in Figure 6. Each
cooperating
organization
exports
e-services
as
application
components deployed on cooperative gateways; a cooperative gateway is the computing
server
platform
which
hosts
these
components;
different
technologies, such as OMG Common Object Request Broker Architecture (OMG 1998), SUN Enterprise JavaBeans (Monson-Haefel 2000), and Microsoft Enterprise .NET (Trepper 2000) allow the effective development of such architectural elements, as detailed in (Mecella and Batini 2000). A cooperative process is therefore realized through the coordination of different e-services, to be provided by e-applications. An e-application realizes the “glue” interconnecting and orchestrating different e-services; such a “glue” needs to be based on the cooperative schemas, regarding both data and their quality.
19
e-applications
e-service software “glue”
based on
Cooperative Data and Data Quality Schemas
Cooperative Organizations
export
e-services
Cooperative gateway
Certification Authority Repository of e-services
Source Reliability Manager
Certificate CORBA
Repository
Certificate Revocation List
Figure 6. The architecture for trusted CIS/e-applications.
Some elements provide infrastructure services needed for the correct and effective deployment of trusted e-services in the context of this architecture: •
a repository, which stores e-service specifications, that is data schemas, data quality schemas and application interfaces provided by each e-service; this repository is accessed at run-time by e-applications to discover and compose e-services that each organization makes available;
•
a source reliability manager, that, for each e-service and for each data exported by such an e-service, certifies its source reliability (refer to Section 2.2.2); therefore the source reliability manager stores triples < e-service, data, source reliability value >;
•
a certification authority, providing digital certificates, a certificate repository 20
and a certificate revocation list (Housley et al. 1999); the roles of such elements will be described in the next section, when security aspects concerning exchange data exchange will be discussed. 4.2
EXCHANGE UNIT FORMAT
Different information needs to be associated to each data exchange in order to support trust; we define an exchange unit as data: •
transmitted from one e-service to another in the cooperative process,
•
associated with quality data, and
•
transmitted according to security rules. All data are exchanged according to the exchange unit format (shown in
Figure 7), in order to ensure that they all can be adequately validated by the receiving organization (this concept will be further explained in Section 5).
Data
Quality Data
History
Sensitivity Information Digital certificate
Information about data and process
Security Aspects
Digital signature
Figure 7. Exchange Unit Format.
Data are exchanged as XML files, specifically cooperative data schemas are described as DTDs; as an example, in Figure 8 an XML document, corresponding to the detail of the cooperative data schema exported by the 21
City Council (refer to Figure 2), is shown. John McLeod 000111222333 10 06 1945 … … New York NY USA … … Figure 8. A possible XML document corresponding to the cooperative data schema shown in Figure 2.
Quality data concerning intrinsic dimensions (i.e., syntactic and semantic accuracy, completeness, currency and internal consistency) may be the result of an assessment activity performed by each organization on the basis of traditional methods for measuring data quality, i.e. statistical methods proposed in (Morey 1982).
History; it can be defined as a list of n-uples < source e-service, destination e-service, operation, link to previous data, timeliness >, describing the history of manipulations applied to data. For the purpose of the present paper, we assume that: •
the history of data tracks the transfer of data among interacting organizations (i.e., e-services) only if the nature of data is not changed through processing executed by the destination organization;
•
if a value is changed, it will be transferred in a new data exchange, starting a new history list; 22
•
operations that preserve the history are those that do not alter identities of exchanged data, that is: read, clean (according to data cleaning algorithms), realign operations (such as changing the format of dates from the European to the American one).
Sensitivity information denotes the level of confidentiality of data being transferred, and, according to this level, information useful for its encryption. The confidentiality level can be assigned to data according to standard security policies, e.g., using rules for data labeling (Castano et al. 1995). Depending on the relevance level of exchanged data, confidentiality can be ensured at different granularity levels. We can encrypt: (i) only the data package, (ii) also quality data and history, (iii) no data parts, or (iv) any possible combinations thereof. To cope with these requirements, sensitivity information regards: •
Confidentiality: for each component of the exchange unit (i.e., data, quality data, history), we define a boolean value (confidentiality flag) indicating whether the component is confidential or not.
•
Encryption method: indicates the asymmetric encryption algorithm (e.g., RSA), and the hash algorithm (e.g., SHA1) (Tanenbaum 1996) to be used to generate the digital signature (see Figure 7).
•
Session key: the key to be used to encrypt the relevant information using symmetric cryptography (e.g., Triple-DES) (Tanenbaum 1996) in order to improve transmission performances.
Security aspects of the exchange unit need to be addressed, namely (i) integrity, (ii) authentication and (iii) confidentiality. As regards integrity, it is provided by creating a secure and efficient transmission channel through the following components of the exchange unit: •
the digital certificate, owned by the source organization;
•
the digital signature of both the listed components of the exchange unit and the digital certificate.
The digital certificate is issued by a Certification Authority, basically according to the X.509 format (some extensions can be possibly required but, as they 23
regard data contents and source rather than data exchange, they are not further analyzed in this paper) (Housley et al. 1999). The digital signature is created according to the PKCS#7 specification (RSA Laboratories 1993), thus allowing the destination organization to verify the integrity of the data and of the digital certificate. By signing also the certificate, we guarantee the association between the data and its creator. Authentication can be weak or strong: •
Weak authentication, required for trusted organizations, means that the destination e-service checks the signature of the source e-service using the public key of the source e-service, but trusts the certificate of the source e-service. The advantage is that data transmission is fast and reliable: trusted organizations know each other by means of a list of certificates (in the certificate repository); integrity and reliability of such lists are under the responsibility of the Certification Authority.
•
Strong authentication, required for untrusted/external organizations, uses a Public Key Infrastructure (PKI) (Housley et al. 1999), specifically a certificate revocation list, in order to validate the certificate of the source e-service.
Finally, as regards confidentiality, data, quality data and history are encrypted using the session key included in the sensitivity information part, according to the value of the confidentiality flags. To avoid disclosure of the session key, this is encrypted by the source e-service using the public key of the destination one. 4.3
QUALITY DIMENSIONS VS. FRAMEWORK ELEMENTS
The information transmitted in the exchange unit does not describe all data quality dimensions introduced in Section 2.2. Some of the dimensions are associated to data during exchanges between source and destination organizations, whereas other data quality dimensions are evaluated, or directly associated to data, by the destination organization. A summary table describing where the quality dimensions are elaborated
24
is shown in Table 1. All intrinsic quality data are transmitted together with data by the source organization. Process related quality data reflect instead the dynamic nature of data within the processes. Timeliness is evaluated by the destination organization, according to the expected arrival time of the data (i.e., due time). The importance of data is associated to data by the destination organization according to the activity to be performed on these data. The destination organization accesses the source reliability manager to know the reliability of the source organization with respect to the exchanged data; finally confidentiality is transferred by means of sensitivity information in the exchange unit.
Quality dimension
Where elaborated
Intrisic dimensions -
Syntactic and Semantic Accuracy
-
Completeness
-
Currency
-
Internal Consistency
Transferred with data within the exchange unit; evaluated by the source organization
Process specific dimensions Timeliness
Not transferred; evaluated by the destination organization
Importance
Not transferred; associated to data by the destination organization
Source Reliability
Not transferred; provided by the Source Reliability Manager which is accessed by the destination organization
Confidentiality
Transferred with data; associated to data by the source organization Table 1. Quality dimensions in cooperative processes.
5 THE FRAMEWORK UNDER A STRATEGIC PERSPECTIVE The framework proposed in the previous sections for data exchange in CIS/eapplications allows the assessment of received data by organizations upon receiving them. In the present section, we discuss methodological issues related
25
to interpretation and possible strategic uses of information about trust of the exchanged data. From a methodological point of view, we can examine different points related to data quality evaluation by an organization: •
data creation;
•
assessment of the quality of received data;
•
evaluation of acceptable quality levels;
•
actions to be taken when low quality data is received.
Data exchanged in the cooperative environment can be originated internally in the organizations. Upon data creation, it is necessary to evaluate the quality of newly created data, in particular with respect to accuracy, completeness, and internal consistency. Accuracy can be assessed according to statistical evaluations based on the type of data creation, e.g., being it manual data entry or capture using OCR systems. Corrections to such assessments can be applied if data cleaning techniques are used on created data to improve its quality.
As regards the assessment of the quality of received data by destination organizations, it is important to note that quality is not an absolute value, but it is mainly related to the intended use of the data by the destination organization in that specific exchange in the process. Several conflicting considerations can be made based on available quality parameters, yielding different evaluations, and we discuss some examples in the following. Let us suppose that a given organization B receives from an organization A an exchange unit x. First of all, B must compute the timeliness for all the data values of x, on the basis of their due time. Importance affects assessment of timeliness; as an example, if importance is “high” but data are not delivered in time, then B will consider them “poor quality” data during the evaluation phase. All the intrinsic data quality values can be weighted on the basis of the related values of importance and source reliability, by using some weighting
26
function chosen by the organization B. The values of importance of a given data are chosen by the organization B, whereas the source reliability of A, with respect to the specific data, is maintained by the Source Reliability Manager. In many cases there is a trade-off between source reliability, importance and other dimensions; as an example, B may consider that a "low" source reliability for data within x may be balanced by a “high” accuracy for them. The assessment can be done either on single data values, or on the whole exchange unit; it is a choice of the destination organization to aggregate and elaborate received data to assess a global quality value. On the other hand, it is not possible to disaggregate data which are being received as a single value with respect to quality parameters; as an example, if an Address instance is transmitted as composed of Street, ZIPCode, CityName, State and Country, it is possible to evaluate both the quality of each value and the global quality of the Address instance. Conversely if the Address value is transmitted as a simple string, it is possible to evaluate only its quality as such, and not the quality of each of its components.
Once the quality of received data has been assessed, it can be evaluated for acceptability, by using a multi-argument function. The decision whether to accept or reject incoming data depends on complex tradeoffs among quality parameters; as an example, while in some cases timeliness of data is more important than accuracy, in other cases the contrary is true, and the organization B prefers receiving late but accurate data. The result of this step is a general acceptance of the received exchange unit, e.g., if the importance of x is “very high” whereas the overall quality is “very poor”, B can decide to reject it. In this evaluation, some organizations may choose to examine the complete history of data, basing acceptance not only on information concerning the last exchange, but also evaluating all the manipulations and timeliness information about previous data exchanges, as stored in the history component of x. For instance, data cleaning operations already
27
applied to data by other organizations can be a support for an evaluation of a greater global data quality value.
After the decision to accept and to use data, it is possible to continue the execution of the cooperative process, according to the cooperative workflow specification. Conversely, if the quality of available data is insufficient, it is necessary to take corrective actions to improve the quality of data. Several actions are possible: •
Received data are rejected, and the source organization A is requested to resend the same data with better quality parameters. This situation is acceptable when low global quality is not related to lack of timeliness.
•
An e-service can be raising an exception to its normal execution. An exception causes the activation of other e-services that are not part of the normal workflow.
•
A data cleaning or improvement action is undertaken inside the e-service of the organization B in order to improve data quality. From these basic considerations, we now derive suggestions for design,
implementation and strategic use of trust parameters, and for improvement and possibly restructuring interventions.
Framework design and management issues. The design and maintenance of the cooperative environment supporting trusted data exchanges, comprises several aspects: •
Granularity criteria for e-services design. An e-service can be designed to cover a whole organization, or portions thereof, hence at different granularities. We just mention criteria that can be adopted here, such as criteria employed for workflow process design: homogeneity of activities, manageability of problems, number of interfaces to be designed, number of agents assigned to activities. Other criteria to be used here can be taken from the literature on (distributed) data design: dimension of
28
exchanged data units, number of data values to be transmitted if the designed exchange unit is too small/large, granularity of encryption and signature/certificate mechanisms to ensure security and reliability. Both classes of criteria can be applied to design the e-services at a correct level. Obviously, the granularity deeply impacts on the efficiency and the maintainability of the environment. As an example, let us consider Figure 1 and the problems related to the introduction of a new e-service: this can imply the substitution of a schema portion related to an e-service (i.e., a portion delimited by dot lines in the figure) with the one of the new provider organization (who for instance can offer the same service under competitive conditions), or the reorganization of the whole cooperative workflow specification, if a new e-service is defined and added to the existing cooperative process. •
Benchmarking. The quality parameters described in the paper can be regarded as strategic means for benchmarking the cooperative process design, since they help monitoring e-services and help: - to better define the granularity of the schemas; - to restructure and re-engineer the schemas; - destination organizations to improve their relationships towards other entities (e.g. their business customers); - source organizations to ameliorate their services (e.g., balancing accuracy vs. timeliness vs. importance).
•
Accounting and Monitoring. To help improving framework efficacy, a mechanism of e-monitoring can be set up to observe quality information, thus supporting tracing, analysis, and certification of data exchanges. Accounting information is a basic aspect of e-monitoring. It should also be accompanied by documentation about data flows, about testing and probing reports resulting from samples on the framework operation, and by trust reports that contain all security relevant parameters (history of flows, of user behavior, of security violations, and so on). Another way of verifying the quality of the design is the observation of
29
exceptions to the normal flow. Frequent exceptions can be a symptom of mis-functioning of some e-services, due to various reasons. One is straightforward and is generally concerned with wrong design choices (wrong granularity is one for all example). A second type of cause can be the low quality of data provided by a given e-service; for example, data from one provider e-service (i.e., organization) always present “very low” timeliness, or are scarcely secure. Triggers can be inserted in the cooperative workflow specification to monitor these anomalies in order to: - signal to the destination organization that a given provider e-service is unreliable; - signal to the source organization that the quality of data provided by its e-services is low and that it might become out-of-market. Anomalies can therefore be regarded as a means to strategically monitor framework design and performance and for organizations to improve their strategic orientations. •
Contractual
aspects
bound
to
e-service
executions.
Cooperating
organizations should be able to get certification of data exchanged, of their quality and sensitivity levels, of user satisfaction measured through parameters of quality. •
Compliance between the cooperative data model and the organizational model. This aspect can be studied by observing the overall behavior of the e-services, the customer satisfaction, the percentage of discarded data, the exceptions occurred during workflow executions, and so on; in particular, exceptions and their management are useful to decide whether an e-service has to be corrected or redesigned.
Implementation framework. Several elements are needed in order to realize the proposed framework for trusted cooperation; specifically (i) a descriptive structure, consisting of models and languages able to describe e-services at a high level of abstraction, (ii) tools for mapping such structure into a multitechnology platform, and to initialize trust parameters.
30
E-services are described by using an abstract description language (Mecella et al. 2001a, Mecella et al. 2001b), which is the abstraction of technological component models (e.g. CORBA, EJB, .NET); each e-service needs to be effectively provided by an organization as a component in a specific implementation technology. The implementation interfaces of such a component can be generated from the e-service specification, by basing on specific generation rules and tools. The use of different component models, one for e-service description, and many at technological level, is due to the coexistence of different cooperative technologies and to the opportunity, in a multi-organization environment, of integrating all these components by adopting technology-independent component model for them. The coordination of different e-services composing a cooperative process is carried out by the coordination “glue” inside e-applications; such glue is able to coordinate different components by generating at run-time specific service requests, according to the specific implementation technologies; the generation of the specific service requests is possible by using e-service descriptions, stored in the repository, and the availability of mapping modules which realizes the transformation rules from e-service descriptions to technological component models. The focus of this paper is on introducing trust in data exchanges among eservices; cooperative data schemas represent the interfaces of the eservices, that is the specifications of input and output data to e-services. In order to provide trust, we have also introduced cooperative data quality schemas, to be offered by e-services as part of their interfaces, and security aspects in communication among e-services. Finally, as far as the management of trust parameters, the framework is assumed to be initialized with fixed quality and sensitivity information and to be then updated using a Feedback and Monitoring Module (Bellettini et al. 1999), that observes the behavior and ameliorates performances along time using feedback about quality parameters, triggers, user actions, and the customer satisfaction.
31
6 RELATED WORK As in our framework trust is obtained by introducing quality information and security, we will briefly describe related work in these fields. The notion of data quality has been widely investigated in the literature; among the many proposals we cite the definitions of data quality as “fitness for use” (Wang and Strong 1996), and as “the distance between the data views presented by an information system and the same data in the real world” (Orr 1998, Wand and Wang 1996). The former definition emphasizes the subjective nature of data quality, whereas the latter is an “operational” definition, although defining data quality on the basis of comparisons with the real world is a very difficult task. In this paper we have considered data quality as an implicit concept strictly dependent from a set of dimensions; they are usually defined in the data quality literature as quality properties or characteristics of data (e.g., accuracy, completeness, consistency, etc.). Many definitions of data quality dimensions have been proposed; among them we cite: the classification given in (Wang and Strong 1996), in which four categories (i.e., intrinsic, contextual, representation and accessibility aspects of data) are identified for data quality dimensions, and the taxonomy proposed in (Redman 1996), in which more than twenty data quality dimensions are classified into three categories, namely conceptual view, values and format. A survey of data quality dimensions is given in (Wang et al. 1995); it is important to note that in the literature there is not an agreement not only on the set of the dimensions strictly characterizing data quality, but also on the meaning of each of them. We have defined some dimensions based on the ones proposed in the literature, and we have introduced some new quality dimensions, as they are specifically relevant in cooperative environments. Data quality issues have been addressed in several research areas, i.e., data cleaning, quality management in information systems, data warehousing, integration of heterogeneous databases and web information sources. As of our knowledge, many aspects concerning data quality in CIS/e-applications have not 32
been yet addressed; anyway when dealing with data quality issues in cooperative environments, some of the results already achieved for traditional and web information systems can be borrowed. In CIS/e-applications, the main data quality problems are: •
Assessment of the quality of the data exported by each organization;
•
Methods and techniques for exchanging quality information;
•
Improvement of quality;
•
Heterogeneity, due to the presence of different organizations, in general with different semantics about data. As regards the opportunity of assessment phases of the quality of intra-
organizational data, results achieved in the data cleaning area (Elmagarmid et al. 1996, Hernandez and Stolfo 1998, Galhardas et al. 2000), as well as in the data warehouse area (Vassiliadis et al. 1999, Jeusfeld et al. 1998) can be adopted. Heterogeneity has been widely addressed in the literature, especially focusing on schema integration issues (Batini et al. 1984, Gertz 1998, Ulmann 1997, Madnick 1999, Calvanese et al. 1998) Improvement and methods and techniques for exchanging quality information have been only partially addressed in the literature (e.g., Mihaila et al. 1998) and are the main focus of this paper; we have proposed a conceptual model for exchanging such information in a cooperative framework and some hints for improvement based on the availability of quality information. Finally as regards security, the main problems tackled in the literature regard data protection during storage and during transmission, with the associated aspects of confidentiality, integrity and authentication (Castano et al . 1995). Several solutions have been proposed, based on standards and specifications regarding the use of cryptography for signatures and certificates, such as PKCS#7 and RFC2459 (RSA Laboratories 1993, Housley et al. 1999). We have relied on these standard proposals for data exchange.
33
7 CONCLUDING REMARKS AND FUTURE WORK In this paper an approach to trusted data exchange in cooperative processes has been presented. The main emphasis of this work has been on supporting information being exchanged with additional information enabling receiving organizations to assess the suitability of data before using it. In addition, a framework has been proposed to allow trusted data exchange with quality information in a secure environment. The data quality problem in cooperative environments in general is still an open issue. Further work is still needed to precisely define the data quality dimensions proposed in the literature. In the context of cooperative processes, our approach, to our knowledge, is the first proposal for a comprehensive framework for defining trusted data exchange based on data quality information. Our approach will be validated on practical cases in the public administration domain and based on these experiences the model will be refined. In the present paper, we have concentrated our attention on data exchange within a cooperative process. Though we have identified the exact format of the exchange unit, we also need to explore possible ways to translate not only data, but also quality data, history and sensitivity information of the exchange unit into XML structures. Based on the proposed approach, future work will also concentrate on aspects related to process improvement based on the evaluation of the quality of data being exchanged. In fact, the analysis of the quality of data being exchanged, its evaluation by receiving organizations, and compensating actions started when data quality is considered insufficient can be the basis for new techniques for process improvement. In addition, more work is needed to provide mechanisms to associate information about the reliability of sources of data, to validate it, and to revise it according to a statistical analysis of instances of processes evaluated in the past. Future work about sensitivity and security regards the extension of XML DTDs to treat security properties at the needed level of data granularity (i.e., data item, quality attributes, other detail levels).
34
ACKNOWLEDGEMENTS The authors thank Carlo Batini for his discussions and suggestions about this work.
REFERENCES Batini C., Cappadozzi E., Mecella M., Talamo M. (2001): Cooperative Architectures: The Italian Way Along e-Government. To appear in Elmagarmid A.K., McIver Jr W.J. (eds): Advances in Digital Government: Technology, Human Factors, and Policy, Kluwer Academic Publishers, 2001. Batini C., Lenzerini M., Navathe S.B. (1984): A comparative analysis of methodologies for database schema integration. ACM Computing Survey, vol. 15, no. 4, 1984. Bellettini C., Damiani E., Fugini M.G. (1999): Design of an XML-based Trader for Dynamic Identification of Distributed Services, Proceedings of the 1st Symposium on Reusable Architectures and Components for Developing Distributed Information Systems, RACDIS’99 , Orlando, FL, August 1999. Bitton D., DeWitt D. (1983): Duplicate Record Elimination in Large Data Files. ACM Transactions od Database Systems, vol. 8, no. 2, 1983. Brodie M.L. (1998): The Cooperative Computing Initiative. A Contribution to the Middleware and Software Technologies. GTE Laboratories Technical Publication, 1998, available on-line (link checked July, 1st 2001): http://info.gte.com/pubs/PITAC3.pdf. Calvanese D., De Giacomo G., Lenzerini M., Nardi D., Rosati (R.) (1998): Information Integration: Conceptual Modeling and Reasoning Support. In Proceedings of the 6th International Conference on Cooperative Information Systems (CoopIS'98), New York City, NY, USA, 1998. Casati F., Sayal M., Shan M.C. (2001): Developing E-Services for Composing E-Services. Proceedings 13th International Conference on Advanced Information Systems Engineering (CAISE 2001), Interlaken, Switzerland, 2001. Castano S., Fugini M.G., Martella G., Samarati P. (1995): Database Security, Addison Wesley, 1995. Cattell, R.G.G., Barry D.K. (eds.) (1997): The Object Database Standard: ODMG 2.0. Morgan Kaufmann Publishers, 1997. Central IT Unit (CITU) of the Cabinet Office (2000): The GovTalk initiative. http://www.govtalk.gov.uk/ (link checked July, 1st 2001). Cochinwala M., Kurien V., Lalk G., Shasha D. (1998): Efficient Data Reconciliation. Bellcore Technical Report 1998. Elmagarmid A., Horowitz B., Karabatis G., Umar A. (1996): Issues in Multisystem Integration for Achieving Data Reconciliation and Aspects of Solutions. Bellcore Research Technical Report, 1996. Galhardas H., Florescu D., Shasha D., Simon E. (2000): An Extensible Framework for Data Cleaning. Proceedings of the 16th International Conference on Data Engineering (ICDE 2000), San Diego, CA, USA, 2000. Gertz M. (1998): Managing Data Quality and Integrity in Federated Databases. Second Annual IFIP TC-11 WG 11.5 Working Conference on Integrity and Internal Control in Information Systems, Airlie Center, Warrenton, Virginia, 1998. Hernadez M.A., Stolfo S.J. (1998): Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem. Journal of Data Mining and Knowledge Discovery, vol. 1, no. 2, 1998. Housley R., Ford W., Polk W., Solo D. (1999): Internet X.509 Public Key Infrastructures Certificate and CRL Profile. Network Working Group Standards Track, 1999. Jeusfeld M.A., Quix C., Jarke M. (1998): Design and Analysis of Quality Information for Data Warehouses. Proceedings of the 17th International Conference on Conceptual Modeling (ER'98), Singapore, 1998. Madnick S. (1999): Metadata Jones and the Tower of Babel: The Challenge of Large –Scale Semantic Heterogeneity. Proceeding of the 3rd IEEE Meta-Data Conference (Meta-Data ’99), Bethesda, MA, USA, 1999.
35
Mecella M., Batini C. (2000): Cooperation of Heterogeneous Legacy Information Systems: a Methodological Framework. Proceedings of the 4th International Enterprise Distributed Object Computing Conference (EDOC 2000), Makuhari, Japan, 2000. Mecella M., Batini C. (2001): Enabling Italian e-Government Through a Cooperative Architecture. In Elmagarmid, A.K., McIver Jr, W.J. (eds.): Digital Government. IEEE Computer, vol. 34, no. 2, February 2001. Mecella M., Pernici B. (2001): Designing Wrapper Components for e-Services in Integrating Heterogeneous Systems. To appear in VLDB Journal, Special Issue on e-Services, 2001. Mecella M., Pernici B., Craca P. (2001b): Compatibility of Workflow e-Services in A Cooperative Multi-Platform Environment. To appear in Proceedings of the 2nd VLDB Workshop on Technologies for E-Services (VLDB-TES 2001), Roma, Italy, September 2001. Mecella M., Pernici B., Rossi M., Testi A. (2001a): A Repository of Workflow Components for Cooperative e-Applications. Proceedings of the 1st IFIP TC8 Working Conference on ECommerce/E-Business, Salzburg, Austria, 2001. Mihaila G., Raschid L., Vidal M. (1998): Querying Quality of Data Metadata. Proceedings of the 6th International Conference on Extending Database Technology (EDBT’98), Valencia, Spain, 1998. Missier P., Scannapieco M., Batini C. (2001): Cooperative Architectures. Introducing Data Quality. Technical Report 14-2001, Dipartimento di Informatica e Sistemistica, Università di Roma “La Sapienza”, Roma, Italy, 2001. Monge A., Elkan C. (1997): An Efficient Domain Independent Algorithm for Detecting Approximate Duplicate Database Records. Proceedings of SIGMOD Workshop on Research Issues on DMKD, 1997. Monson-Haefel, R. (2000): Enterprise JavaBeans (2nd Edition). O'Reilly 2000. Morey R.C. (1982): Estimating and Improving the Quality of Information in the MIS. Communications of the ACM, vol.25, no.5, 1982. Mylopoulos J., Papazoglou M. (eds.) (1997): Cooperative Information Systems. IEEE Expert Intelligent Systems & Their Applications, vol. 12, no. 5, September/October 1997. Object Management Group (1998): The Common Object Request Broker Architecture and Specifications. Revision 2.3. Object Management Group, Document formal/98-1201,Framingham, MA, 1998. Orr K. (1998): Data Quality and Systems Theory. Communications of the ACM, vol. 41, no. 2, 1998. Redman T.C. (1996): Data Quality for the Information Age. Artech House, 1996. RSA Laboratories (1993): Cryptographic Message Syntax Standard. RSA Laboratories Technical Note Version 1.5, 1993. Schuster H., Georgakopoulos D., Cichocki A., Baker D. (2000): Modeling and Composing Service-based and Reference Process-based Multi-enterprise Processes. Proceedings of the 12th International Conference on Advanced Information Systems Engineering (CAISE 2000), Stockholm, Sweden, 2000. Tanenbaum A. S. (1996): Computer Networks, Third Edition. Prentice Hall. 1996. Tansell A., Snodgrass R., Clifford J., Gadia S., Segev A. (eds.) (1993): Temporal Databases. Benjamin-Cummings, 1993. Trepper, C. (2000): E-Commerce Strategies. Microsoft Press, 2000. UDDI.org (2000): UDDI Technical White Paper, 2000. Available on line (link checked July, 1st 2001): http://www.uddi.org/pubs/Iru_UDDI_Technical_White_Paper.pdf. Ulmann J.D. (1997): Information Integration using Logical Views. Proceedings of the International Conference on Database Theory (ICDT ‘97), Greece,1997. Vassiliadis P., Bouzeghoub M., Quix C. (1999): Towards Quality-Oriented Data Wharehouse Usage and Evolution. Proceedings of the 11th International Conference on Advanced Information Systems Engineering (CAiSE’99), Heidelberg, Germany, 1999. VLDB-TES (2000): Proceedings of the 1st VLDB Workshop on Technologies for E-Services (VLDB-TES 2000), Cairo, Egypt, 2000. Wand Y., Wang R.Y. (1996): Anchoring data quality dimensions in ontological foundations. Communication of the ACM, vol. 39, no. 11, 1996. Wang R.Y., Storey V.C., Firth C.P. (1995): A Framework for Analysis of Data Quality Research. IEEE Transaction on Knowledge and Data Engineering, vol.7, no. 4, 1995.
36
Wang R.Y., Strong D.M. (1996): Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of Management Information Systems, vol. 12, no. 4, 1996. World Wide Web Consortium (W3C) (1998): Extensible Markup Language (XML) Version 1.0. February 1998. http://www.w3.org Goldfarb G.F., Prescod P. (2000): The XML Handbook. Prentice Hall, 2000. Object Management Group (OMG) (2000): OMG Unified Modeling Language Specification. Version 1.3. Object Management Group, Document formal/2000-03-01, Framingham, MA, 2000.
37