From Object-Oriented Conceptual Multidimensional Modeling ... - gplsi

0 downloads 0 Views 514KB Size Report
To achieve this goal, we provide a DTD that allows us to generate valid XML documents that comprise all MD modeling properties considered by our approach,.
From Object-Oriented Conceptual Multidimensional Modeling into XML Sergio Luján-Mora, Enrique Medina, and Juan Trujillo Departamento de Lenguajes y Sistemas Informáticos Universidad de Alicante. SPAIN

{slujan,emedina,jtrujillo}@dlsi.ua.es

Abstract. Data warehouses and OLAP applications are based on the

multidimensional (MD) modeling. In this context, we have previously proposed a conceptual modeling approach, based on the Unied Modeling Language (UML), to consider main MD properties at the conceptual level. In this paper, we present how to store a conceptual model accomplished with our approach using the eXtensible Markup Language (XML), a widely used standard for information exchange. To achieve this goal, we provide a DTD that allows us to generate valid XML documents that comprise all MD modeling properties considered by our approach, facilitating the interchange of information between enterprises. Afterwards, we also provide XSLT stylesheets that allow us to automatically generate HTML pages from XML documents, thereby supporting dierent views of the same MD model easily. Finally, our CASE tool gives support to all theoretical issues presented in the paper.

Keywords: Multidimensional modeling, UML, XML, DTD, Web

1

Introduction

Data warehouses (DW), OLAP applications and multidimensional databases (MDDB), based on the multidimensional (MD) modeling, provide companies with huge historical information for the decision making process. In the last years, there have been some proposals to accomplish the conceptual MD modeling of these systems [1][2][3,4][5]. Due to space constraints, we refer the reader to [6] for detailed comparison and discussion about these models. In this paper, we will use the Object-Oriented (OO) conceptual MD modeling approach, based on the Unied Modeling Language (UML) [7], presented in [3,4] for a proper conceptual MD modeling. This proposal considers main relevant MD issues at the conceptual level such as the many-to-many relationships between facts and dimensions, degenerate dimensions, multiple and alternative path classication hierarchies, and non-strict and complete hierarchies. Furthermore, our approach is supported by a CASE tool that allows us to semi-automatically generate the implementation of a model into a target commercial OLAP tool such as Informix Metacube or Cognos Powerplay, thereby allowing us to check the validity and suitability of the proposed approach.

On the other hand, it is widely accepted that software systems should have a proper documentation at a high level of abstraction to facilitate their change,

1 (XML) [8] is

adaptation and extension. The eXtensible Markup Language

rapidly being adopted as a specic standard syntax for the exchange of semistructured data [9]. Furthermore, XML is an open neutral platform and vendor independent meta-language standard, which allows to reduce the cost, complexity, and eort required in integrating data within enterprises and between enterprises. One common feature of this semi-structured data is the lack of schema, so the data is describing itself. Therefore, XML documents can have dierent structures and can represent heterogeneous kinds of data (biological, statistical, medical, etc.). Nevertheless, XML documents can be associated to a Document Type Denition (DTD) or an XML Schema [10], both of which allow us to describe and constraint the structure of XML documents. In this way, an XML document can be validated against these DTDs or XML Schemas to check its correctness. Moreover, thanks to the use of eXtensible Stylesheet Language Transformations (XSLT) [11], XSLT stylesheets allow users to express their intentions about how XML documents should be presented, so they could be automatically transformed into other formats, e.g. HTML documents. An immediate consequence is that we can dene dierent XSLT stylesheets to provide dierent visualizations of the same XML document. In this paper, we present how to store a conceptual model accomplished with our approach in an XML document. To help this goal, we dene the metamodel of our OO conceptual MD modeling approach using a UML class diagram, as well as the implementation of this metamodel in a DTD. This DTD is used to automatically validate XML documents that store all the MD properties of our models in a standard interchange format, so any external application could benet from the expressiveness of our conceptual MD approach. Furthermore, we use XSLT stylesheets and XML documents in a transformation process to obtain HTML pages that can represent dierent views of the same MD model. Following these considerations, the remainder of this paper is structured in Fig. 1 as follows: Section 2 describes the basis of our OO conceptual MD modeling approach, as well as the metamodel denition for this approach using UML. Section 3 presents the DTD created from the metamodel that allows us to validate XML documents that store our models. Then, Section 4 uses XSLT stylesheets to automatically generate HTML pages from XML documents, thereby allowing us to publish dierent views of the model in the web. In Section 5 we present our conclusions and benets of our proposal. Finally, future works that are currently being considered are presented in Section 6.

1

XML is a subset of the Standard Generalized Markup Language (SGML).

Fig. 1.

2

Structure of sections of this paper.

Conceptual multidimensional modeling

Several conceptual MD models have been lately presented that provide an easy set of graphical structures to facilitate the task of conceptual MD modeling, as commented previously in the introduction. In this paper, we will use the OO approach based on the UML notation presented in [3,4], as it considers many relevant MD aspects at the conceptual level. In this section, we will briey summary how our approach represents both the structural and the dynamic part of MD modeling.

2.1

MD modeling with UML

In our approach, main MD modeling structural properties are specied by means of a UML class diagram in which the information is clearly separated into facts and dimensions. Dimensions and facts are considered by

dimension classes

and

fact classes

respectively. Then, fact classes are specied as composite classes in shared aggregation relationships of

n dimension classes. Thanks to the exibility of shared

aggregation relationships that UML provides,

many-to-many

relationships be-

tween facts and particular dimensions can be considered by indicating the 1..* cardinality on the dimension class role. For example, on the left hand side of

Sales has a many-to-many relationship with Product and a one-to-many relationship with the dimension

Fig. 2, we can see how the fact class the dimension class class

Time.

Fig. 2.

Multidimensional modeling using UML.

By default, all measures in the fact class are considered additive. For nonadditive measures, additive rules are dened as constraints and are also placed in somewhere around the fact class. Furthermore, derived measures can also be explicitly considered (constraint

/ ) and their derivation rules are placed between

braces in somewhere around the fact class, as can be seen in Fig. 2. Our approach also allows us to dene identifying attributes in the fact class, if convenient, by placing the constraint

{OID}

next to a measure name. In this

way we can represent degenerate dimensions [12][13], thereby providing other fact features in addition to the measures for analysis. For example, we could store the ticket and line numbers as other ticket features in a fact representing sales tickets, as reected in Fig. 2. With respect to dimensions, every

classication hierarchy

level is specied by

a class (called a base class). An association of classes species the relationships between two levels of a classication hierarchy. The only prerequisite is that these classes must dene a Directed Acyclic Graph (DAG) rooted in the dimension class (constraint

{dag}

placed next to every dimension class). The DAG

structure can represent both alternative path and multiple classication hier-

identifying attribute {D}). These attributes

archies. Every classication hierarchy level must have an (constraint

{OID}) and a descriptor

attribute (constraint

are necessary for an automatic generation process into commercial OLAP tools,

1 and 1..* dened in the target associated class role addresses the concepts of strictness and non-strictness. In addition, dening the {completeness} constraint in

as these tools store this information in their metadata. The multiplicity

the target associated class role addresses the completeness of a classication hierarchy (see an example on the center of Fig. 2). Our approach considers all classication hierarchies non-complete by default. The

categorization of dimensions,

used to model additional features for an

entity's subtypes, is considered by means of generalization-specialization relationships. However, only the dimension class can belong to both a classication and specialization hierarchy at the same time. An example of categorization for the Product dimension can be observed on the right hand side of Fig. 2. Regarding the dynamic part, we also provide a UML-compliant class notation (called

cube class ) to specify initial user requirements. A cube class is structured

Fig. 3.

The UML class diagram to represent the metamodel schema.

into three sections:

measures, to specify which fact attributes are analyzed; slice, dice, to dene grouping conditions

to express constraints in terms of lters; and

of the data. Later, a set of basic OLAP operations (e.g. roll-up, drill-down, slice, etc.) is provided to accomplish the further data analysis phase from these cube classes. Finally, this approach provides a set of modules with adequate transformation rules to export the accomplished conceptual MD model into a commercial OLAP tool, which allows us to check the validity of the proposed approach.

2.2

Metamodel

In this section, we present how to model the metamodel of our OO conceptual MD models using a UML class diagram. The metamodel schema of our OO aproach is illustrated in Fig. 3, where the class that represents the general information about any model is allocated in the class

goldmodel.

This schema represents the main MD properties of our conceptual modeling approach. In this way, dimensions and facts are represented using the classes

dimclass and factclass, respectively. Then, the association class sharedagg is used to express the shared aggregation relationships between facts and dimensions. Notice that many-to-many relationships between facts and dimensions are supported by assigning the same value M to both attributes in the class

sharedagg.

With respect to fact attributes, the class

roleA

and

roleB

factatt supplies the attribute OID

to indicate the role played by the fact attribute, e.g. identier (as a degenerated dimension) or measure for analysis. Later, the attribute

derivationRule

allows us to specify its counterpart for derived measures. Additivity is consid-

additivity between the classes factatt and dimclass. The possible combinations of dierent types of aggregation, i.e. SUM, ered using the association class

2

AVG, MIN, MAX, COUNT, NOT , are dened by giving boolean values to their equivalent attributes, i.e.

isSUM, isAVG, isMIN, isMAX, isCOUNT, isNOT.

On the other hand, dimensions may have one or more hierarchies. The rst level of the hierarchy for both classication and categorization relationships is considered as the default hierarchy using the classes

asoclevel and catlevel.

From these classes, the rest of the hierarchy structure is derived using the classes

relationasoc

and

relationcat

respectively. In order to include dimensional

attributes for each level in the dimension hierarchy, both types of relationships

level, which owns the attributes in an aggregation dimatt, with the possibility to indicate whether they

are super-typed into the class relationship with the class

are identifying or descriptor attributes by giving values to attributes

OID

and

D

within the given class. Notice that non-strict hierarchies between dimension associations are also supported by assigning the same value M to both attributes

roleA and roleB in the class relationasoc. Furthermore, completeness is also supported by assigning the value `true' to the attribute completeness within the class

relationasoc.

Regarding the dynamic part, the class

cubeclass along with the class slice

allows us to specify lters in initial user requirements. In addition, grouping conditions are represented by means of the

cubeclass

and

levels.

dice aggregation between the classes

Every issue considered in the UML class diagram of Fig. 3 that expresses a MD feature at the conceptual level is represented in a DTD element. In this way, any MD model accomplished by our approach can be stored in an XML document that should be validated against the DTD.

3

XML DTD and XML Document

As said in the introduction, our goal is to provide a common standard format to store and interchange models accomplished by our OO conceptual approach. To achieve this goal, this section presents the DTD

3 corresponding to the metamodel

schema previously dened. This DTD allows us to structure XML documents that store our MD models together with all the MD expressiveness commented in Section 2.

3.1

XML DTD

The representation of the DTD as a tree structure is illustrated in Fig. 4 for the sake of clearness and comprehension. As it can be observed, we denote every node of the tree with a label. Then, every label has its correspondence with one

2 3

To represent a non-additive measure. The complete denition of the DTD has more than 150 lines and it is not completely shown due to space constraints.

Fig. 4.

The DTD represented as a tree structure.

element in the DTD. Following a top-down path in the tree, we clearly identify all the MD properties supported by our OO conceptual MD model. More precisely, the root of the tree is tagged as

golmodel

and corresponds to the element

while the child nodes of a node represent the subelements of an element in the

4

DTD . To indicate the multiplicity (cardinality) of the subelements, we use the standard modiers provided by the DTD syntax: and

+

?

for 0 or 1,

*

for 0 or more,

for 1 or more. Furthermore, we dene additional tags (in plural form)

in order to group common elements together, so that they can be exploited to provide optimum and correct comprehension of the model, e.g. tags in plural like

factclasses

or

dimclasses.

Within this DTD, fact classes labeled

factclasses

in Fig. 4, may have

no fact attributes to consider fact-less fact tables, as can be observed in the multiplicity (?) for the tag

factatts:




The attributes of the elements are not represented in the tree for the sake of simplicity.

autoResize %Boolean; "true">



In case that fact attributes exist in the fact class, derivation rules can be dened for these attributes by means of

derivationRule

in order to express calculated

measures. Additivity is also supported in terms of the DTD element

additivity

along with the information about how a measure is aggregated along a particular dimension. Notice that many-to-many relationships between facts and dimensions can also be expressed by assigning the same value M to both attributes

roleA

and

roleB

in the DTD element

sharedagg.

With respect to dimensions, the following fragment of the DTD contains elements to express dimensions and classication hierarchies by means of association relationships:





Notice that attributes within these elements in the DTD allow us to express all the MD properties of our model. In this way, non-strictness may be dened by assigning the same value M to both attributes DTD element attribute

relationasoc,

completeness.

3.2

and

roleB

in the

Moreover, identifying and descriptor attributes within

dimensions can be dened using the DTD attributes

dimatt.

roleA

along with completeness by means of the DTD

OID

and

D,

respectively in

XML Document

Once the specication for the DTD is accomplished, XML documents may be generated by our CASE tool from the OO conceptual models, as reected in Fig. 5. Therefore, Fig. 6 shows the visualization of an XML document generated by our CASE tool in a popular web browser, i.e. Microsoft Internet Explorer.

Fig. 5.

Fig. 6.

An XML document generated by our CASE Tool.

An XML document displayed in Microsoft Internet Explorer without XSLT.

Furthermore, this browser brings the possibility to validate an XML docu-

5

ment against our DTD , so all the MD properties of our models are considered.

4

XSLT and the Web

Another relevant issue of our approach was to provide dierent views of the MD models in the web. However, nowadays only some web browsers support XML. In addition to this inconvenience, XML documents only contain data, so that no information about presentation aspects is included. To solve this problem, XSL Formatting Objects

6 (XSL FO) [14] is a recent technology that allows us

to dene the presentation for XML documents, although there is no current browser that supports it. As a consequence, we are currently forced to transform XML documents to HTML pages

7 in order to publish them in the web.

The best method to accomplish this task is the use of XSLT, as a language for transforming XML documents into other XML documents. XSLT stylesheets describe a set of patterns (templates) to match both elements and attributes dened in a DTD, in order to apply specic transformations for each match considered. Thanks to XSLT, the source document can be ltered and reordered in constructing the resulting output. Therefore, our XML documents can be tailored to represent dierent views of the same MD model in two ways: using one XSLT stylesheet for each view or using only one global stylesheet with an input parameter to select the desirable view. As an example, Fig. 7 illustrates the overall transformation process for a MD model composed of two fact classes sharing common dimensions. The MD model is stored in an XML document and an XSLT stylesheet is provided to generate dierent views of the MD model as dierent HTML pages. These views correspond to

fact class 1

and

2

respectively and they only contain relevant

information about each fact class in the model, i.e. dimensions in the MD model not belonging to a particular fact class are not shown in the corresponding view. In our work, we have used two dierent processors: Microsoft XML Core Services (MSXML) 4.0

8 (formerly known as the Microsoft XML Parser) and Instant

9 Saxon 6.5 . Both of them are full compliance with XSLT 1.0. However, Instant 10 features, including xsl:script and Saxon also allows the use of XSLT 1.1

xsl:document. 5 6 7

Therefore, we have considered two approaches: one for version

It is necessary to install the add-in Internet Explorer Tools for Validating XML and Viewing XSLT Output, available from

http://msdn.microsoft.com.

XSL FO describes how XML documents will look when displayed or printed. Actually, the XML documents are transformed to XHTML 1.0. In addition, we also use Cascading Style Sheets (CSS), because it gives us more control over how pages

8 9 10

are displayed. The MSXML 4.0 features are available in Internet Explorer only via scripting. It can

http://msdn.microsoft.com. http://saxon.sourceforge.net.

be downloaded from It is available from

The status of this version is working draft. Besides, XSLT 2.0 is in requirements working draft.

Fig. 7.

Generating dierent views in HTML from the same MD model.

1.0 that generates an only HTML page with internal links, and the other for version 1.1 that has the benet of generating a collection of HTML pages with links between them (the number of pages depends on the number of fact classes and dimension classes dened in the model). Due to space constraints, it is not possible to include the complete denition of the XSLT stylesheet here. Therefore, we only exhibit some fragments of the XSLT. The rst example shows the instructions that generate the HTML code

factatt):

to display information about fact attributes (



Notice that XSLT instructions and HTML tags are intermingled. The XSLT processor copies the HTML tags to the transformed document and interprets any XSLT instruction encountered. Applied to this example, the value of the attributes of the element

factatt

are inserted in the resulting document in

an HTML table; we can notice the attributes commented in Section 2: atomic, derivation rule, OID, and so on. As above commented, XSLT 1.1 allows to create dierent HTML outputs and connect them from an only XML document. The following instructions show how we create the links between the main page, the page that represents the MD model, and the pages that correspond to the fact classes:

Fact class: As we have dened the

@id

attribute of

factclass

as

ID

in the DTD, and

thus, it has a unique value, we use it as the name of the new documents ().

4.1

and the target document in the links

Exploring the model in a web browser

The resulting HTML pages allow us to navigate through the dierent views of the model in a web browser. All the information about the MD properties of the model is represented in the HTML pages. For example, in Fig. 8 we show an example of navigation for one of the views. The rst page is showed in Fig. 8.1 and contains the general description of the model: name, creation date, last modied, and the names of the fact classes and dimension classes, which are active links. Whenever it is possible, there is a link connecting dierent pieces of information. For example, if the fact class

Sales is selected, the page showed in Fig. 8.2

is loaded. This page shows the information about the selected fact class: name, measures, methods, and shared aggregations. In the example, three measures,

inventory, num_ticket

and

qty,

Sales

contains

and the rst one is an active

link because it has additivity rules. If the link is selected, a new oating page is showed (Fig. 8.3) with the additivity rule. Finally, Fig. 8.4 shows the denition of the dimension class

Time: attributes,

methods, association levels and categorization levels. In this example, this dimension class has two association levels:

Month and Week, which are again active

links and allow us to continue exploring the model. This page can be reached from pages in Fig. 8.1, 8.2, and 8.3, because there exists a relationship between them.

5

Conclusions

We have presented how to store MD models accomplished by our OO conceptual approach, based on UML, in XML documents. In this way, we are able to consider the main MD properties from a conceptual level such as many-to-many relationships between facts and dimensions, additivity, non-strictness and so on. To validate these XML documents, we have dened the metamodel that corresponds to our models by means of a DTD. The benet of this approach is

8.1

8.2

8.3

8.4

Fig. 8. Model navigation in the web using a browser.

that we make use of standard technology that is increasingly becoming accepted together with the task of MD modeling by means of our OO conceptual approach. Furthermore, we provide XSLT stylesheets that allow us to generate dierent views on a HTML format of the same MD model. Therefore, management of dierent views for dierent users could be easily achieved.

6

Future Work

Due to the rapid evolution of the used technology, we are considering to adapt our proposal to the new emerging standards. In this sense, our intention is to provide an XML Schema [10] instead of DTDs to represent our metamodel. XML schemas provide a means for dening the semantics of XML documents, not provided by DTDs. For example, we can dene types for the attributes in order to constraint their values. Furthermore, some MD properties like completeness, non-strictness or many-to-many relationships between facts and dimensions could be easily expressed in a more natural way by means of provided by the XML Schema syntax.

minOccurs and maxOccurs clauses

With respect to the presentation, XSL FO can be used to specify in deeper detail the pagination, layout, and styling information that will be applied to XML documents. However, to the best of our knowledge, there are no current tools that completely provide support for XSL FO. From the architecture perspective, we run this process using a client/server technology, i.e. the XSLT stylesheet is applied to the XML document in the server and the HTML is returned to the client browser. In the future, when the browsers completely support XML and XSLT, the transformation will be able to be performed in the browser. This approach has the advantage of removing some of the processing load from the server. Finally, we are currently studying the Common Warehouse Metamodel (CWM) [15] as a common framework to easily interchange warehouse metadata between distributed heterogenous environments. We have detected some semantics of our conceptual model that are not supported by the MD submodel of the CWM. Therefore, a future line of research is to extend the CWM for supporting all the expressiveness of our OO conceptual approach.

References 1. Golfarelli, M., Rizzi, S.:

A methodological Framework for Data Warehouse De-

sign. In: Proc. of the ACM 1st Intl. Workshop on Data warehousing and OLAP (DOLAP'98), Washington D.C., USA (1998) 39 2. Sapia, C., Blaschka, M., Höing, G., Dinter, B.: the Multidimensional Paradigm.

Extending the E/R Model for

In: Proc. of the 1st Intl. Workshop on Data

Warehouse and Data Mining (DWDM'98). Volume 1552 of LNCS., Springer-Verlag (1998) 105116 3. Trujillo, J., Gómez, J., Palomar, M.: Modeling the Behavior of OLAP Applications Using an UML Compilant Approach. In: Proc. of the 1st Intl. Conf. On Advances in Information Systems (ADVIS'00). Volume 1909 of LNCS., Springer-Verlag (2000) 1423 4. Trujillo, J., Palomar, M., Gómez, J., Song, I.Y.: Designing Data Warehouses with OO Conceptual Models. IEEE Computer, special issue on Data Warehouses 34 (2001) 5. Tryfona, N., Busborg, F., Christiansen, J.: starER: A Conceptual Model for Data Warehouse Design. In: Proc. of the ACM 2nd Intl. Workshop on Data warehousing and OLAP (DOLAP'99), Kansas City, Missouri, USA (1999) 6. Abelló, A., Samos, J., Saltor, F.: Benets of an Object-Oriented Multidimensional Data Model. In Dittrich, K., Guerrini, G., Merlo, I., Oliva, M., Rodriguez, E., eds.: Proc. of the Symposium on Objects and Databases in 14th European Conference on Object-Oriented Programming (ECOOP'00). Volume 1944 of LNCS., SpringerVerlag (2000) 141152 7. Object Management Group (OMG): Unied Modeling Language (UML). Internet: http://www.omg.org/cgi-bin/doc?formal/01-09-67 (2001) 8. World Wide Web Consortium (W3C): eXtensible Markup Language (XML) 1.0 (SE). Internet: http://www.w3.org/TR/2000/REC-xml-20001006 (2000) 9. Abiteboul, S., Buneman, P., Suciu, D.:

Data on the Web: From Relations to

Semistructured Data and XML. Kaufman Publishers (2000)

10. World

Wide

Web

Consortium

(W3C):

XML

Schema.

Internet:

http://www.w3.org/TR/2001/REC-xmlschema-0-20010502/ (2001) 11. World Wide Web Consortium (W3C): XSL Transformations (XSLT) Version 1.0. Internet: http://www.w3.org/TR/1999/REC-xslt-19991116 (1999) 12. Giovinazzo, W.: Object-Oriented Data Warehouse Design. Building a star schema. Prentice-Hall, New Jersey, USA (2000) 13. Kimball, R.: The data warehousing toolkit. 2 edn. John Wiley (1996) 14. World Wide Web Consortium (W3C): Extensible Stylesheet Language (XSL) 1.0. Internet: http://www.w3.org/TR/2001/REC-xsl-20011015/ (2001) 15. Object Management Group (OMG):

Common Warehouse Metamodel (CWM).

Internet: http://www.omg.org/cgi-bin/doc?ad/2001-02-01 (2000)