1998multidimension query in heterogeneous

0 downloads 0 Views 135KB Size Report
database languages, such as SQL, only allow users to retrieve the data in zero-dimensional ... real world, a dynamic query environment to support different point ...
MULTI-DIMENSION QUERY IN HETEROGENEOUS DATABASE Shi-Ming Huang, and Chung-Da Fu* Department of Information Management *Department of Computer Science and Engineering Tatung Institute of Technology, 40 Chungshan N. Road, 3rd sec., Taipei, 104, Taiwan, R.O.C. Tel: (886)2-25925252-3291 Fax: (886)2-25925252-2288 E-mail: [email protected]

ABSTRACT Data warehousing or decision support applications need to aggregate data across many dimensions looking for unusual useful patterns. Most of current database languages, such as SQL, only allow users to retrieve the data in zero-dimensional,

one-dimensional,

or

two-dimension

aggregates.

Applications need the N-dimensional generalization of these operators. Unfortunately current solutions need to require the application programmers to develop complex programs for these requirements. They are inflexible and can not be dynamically modified. These cause the applications unfriendly or even failure. Furthermore, most of current multi-dimension query system only can support on one DBMS, but a lot of real applications are in the heterogeneous environment. In this paper, we describe an architecture which can allow users to do directly multi-dimension query in the field-based integration heterogeneous database system.

needs

I. INTRODUCTION as SQL, OSQL, etc., only support one or two dimension query methods. This kind of query method is only suitable for traditional reporting operation of data access, but not decision making. A data warehousing or an executive information system needs

have

a

multi-dimension

query

in

heterogeneous environment.

The most recently database query languages, such

application

to

to query data in different

dimensions to enable the users to make a decision.

This paper describes a new architecture for multi-dimension query system in a Heterogeneous Database System (HDBS) environment. Section II describes related works for this research. Section III describes our integration method.

V is a case study. Section VI describes our prototype system. Finally we give a summary for this paper.

II. RELATED WORKS

needs to query the same season data by transport,

exactly make the right decision.

IV

describes the multi-dimension query system. Section

For example: One sale department manager may

area, type of product and year, then he/she can

Section

The related works for this research includes multi-dimension query and database integration.

Using the traditional method to solve the multi-dimension query problem, it needs usually to write a complex program to match each different goal. It does not enable users to change the query condition dynamically. But for a decision making process in the real world, a dynamic query environment to support

A. Multi-Dimension Query The most common technology for multi-dimension query are Data Cube[1][2] and Star Schema[3][4]. ??Data Cube: ZeroDimension

Point

Sum

different point views for an object is required. A

OneDimension

flexible and high performance multi-dimension querying

method

for

the

dynamic

changing

Color Line

Red White Blue

Color Red Sum

TwoDimension

Face

White Blue

environment is needed.

Manufacture Ford

Benz

Cube

Sum

Following the information technique advancement, the database system model has changed from

ThreeDimension

Manufacture

Color

hierarchy and network to relational data model even 1995 1996

the object-oriented database model. The different

1997

database

systems

have

different

functional Year

characteristics, and we select different database

Figure 1 Data Cube for transaction data for car sailing

systems for different necessary. In one enterprise,

A data cube generally can represent 3-dimension

usually for the building time sequence, job necessary,

data. That can add color and motion to represent the

and environment, different departments may have

5-dimension

different database systems. The environment is

representation method for the zero dimension data to

heterogeneous.

three dimension data for the transaction data of car

Considering above situations, a data warehousing or decision support application for an enterprise

sailing.

data.

Figure

1

describes

the

??Star Schema

??Frame Model

A star schema includes one fact table and some

Frame model approach is used to couple the

dimension table. The fact table stores the transaction

different existing systems in the recent year[10].

data. The dimension table stores the dimension data

Frame -based systems are to collect all information

for analysis dimension.

related to one concept in one place. It is a All Year Quarter Month Week

All Division Group Unit

All Cust Type

Prouct

Seller

Buyer

Units

Price

Discount All

Channel All

Office

Date

higher-order synthesis which includes frame concepts, semantic data modeling concepts and object-oriented concepts to ensure no real distinction between “Data”

District Region Geography

and “Knowledge”. The frame model consists of four

All

classes: Main, Attribute, Method and Constraint in a

Figure 2 Star Schema for a retailing company

Figure 2 shows the data for retailing company. It includes one fact table and six dimension tables for

data dictionary. It is implemented with a knowledge representation schema that includes object structure descriptions(i.e. classes), user-defined relationships

the transaction analysis.

between

entities,

and

structure

inheritance

descriptions defined by taxonomies of structure that

B. Database Integration There are several technologies for database

support data and behavior inheritance(i.e. abstract

integration, such as Global Schema Approach,

relationship). An expert system frame mo del is a

Multidatabase Language Approach, Frame Model, etc.

good example of a knowledge based model which

They are described as the following.

fulfills

??Global Schema Approach

integrated EDS[7].

the

requirements

for

constructing

an

This approach creates a global schema from local external schemas[5]. The global system supports a common data model and a global data language. Multidatabase users view the global schema as the definition of a single database.

III. A FIELD-BASED INTEGRATED HETEROGENEOUS DATABASE Most of the current database integration methods

??Multidatabase Language Approach

for a HDBS are using table-based integration

This approach puts most of the integration

approach, such as Global Schema Approach [5],

responsibility on users[6]. Global users are aware of

Multidatabase Language Approach [6], etc. These

multiple data sources. Instead of providing a global

approaches identify each LDBS as one unit object for

schema, a common name space is defined across all

the integration. In result, the system integration

participating

global

becomes very difficult and the global database

multidatabase language to define the sources of data,

schema becomes very complicated. As such, the

and how the data is integrated, transferred and

database schema integration technology is needed to

presented.

ensure no duplicated data.

DBMSs.

Users

use

the

From an application point of view, especially in an executive information system (EIS) or a data

Header Class

Database Class

Class ID Primary Key Class Name Database ID Parents ID

Database ID Database Name Server ID

1+

Server Class Server ID Server Name Server Location Server Type

1+

warehouse system, users only need to query the data 1+

from several fields of local databases, but not a whole

integration methodology to construct the global view an

EIS

application

in

a

Method Class

Attribute Name Class ID Method Name Attribute Type Default Value Cardinality Description

database. In this section, we provide a simple

for

Operator Class

Attribute Class

Field Type Operator Return type

Method Name Class ID Parameters Return Type Program Body Description

Local Database Meta-Data Figure 4 Local Database Meta-Data

heterogeneous

environment called Field-based Integration.

meta-data system can manage the heterogeneous A. A Field-Based Integration Approach Integrated Database Meta-Data

Heterogeneous Database Meta-Data

information. Figure 4 shows an OMT schema of our local database meta-data system. Our local database meta-data schema includes

Globe Database View

server class, database class, header class, attribute

Local Database meta-data Field Information

class, method class, and operator class. The server class describes the server information. The Database Database Gateway

Class describes the database belong to which server. Direct Schema Translation

Direct Schema Translation

Direct Schema Translation

Direct Schema Translation

Local Database system

Network

Relational

Hierarchy

Objectoriented

Figure 3 Field-Based Integration Approach

The methodology of our approach is described as the following:

The

Header

Class

describes

the

table/class

information. The Attribute Class describes the local data field information. The Method Class describes the method of object-oriented database information. The

Operator

Class

describes

the

operator

information.

Step1: Identify each local database which will be integrated to form the HDBS Step2: Direct translation all fields’ information of each local database into our global schema Step3: Create global fields to integrate different local fields.

Globe Table Class Globe Table Name Globe Field Name Join Field Name Key type

1+

Globe Field Class Globe Field Name Local Field Name Local Class ID Field Kind (Attribute/ Method/ Conflict Rule)

Conflict Rule Class

1+

Local Field Name Local Class ID Field Type Rule Body

Integrated Database Meta-Data Figure 5 Integrated Database Meta-Data Schema

Step4: Define new global views for each global database application by using global fields. B. Local Database Meta-Data

C. Integrated Database Meta-Data The global view of an application for a HDBS is stored in the integrated database meta-data system.

The field information is stored into a local database meta-data system in our approach. The meta-data system is created by using a frame object oriented data model (FOODM) [7]. FOODM can represent different data model database into a single uniform. We extend the model to enable our

Figure 5 shows an OMT schema of our integrated database meta-data system. The meta-data includes global table class, global field class, and conflict rule class. The global table class describes the global table view information. The

global field class describes the field which we want

conflict.

to integrated into our global table view. The conflict

4. Single Value to Multi Value

rule

class

describes

the

local

fields

conflict

resolutions.

This conflict situation just appear in the OODB. We can use the derived table to extract the contents from the set. Then we can select the single valued

D. Data Conflict Resolution

field to the derived table.

In our system, conflict rule class will store the

Different conflict rule algorithms have been

information to solve out the data conflict problems

applied in our approach to solve these conflict

between different local databases. There are several

problem. The detail of these algorithm can be found

conflict types of data conflict problems. They are

in [9].

described as the following. E. Direct Schema Translation

1. Value to Value ?? Attribute Name Conflict: The two fields

This

section

describes

our

direct

schema

have the same mean and data type but the

translation methodology. In our approach, we only

field’s name are different.

consider the field translation, but not the semantic

?? Data type Conflict:

The two fields have

translation. All the methodologies have the same

the same mean but the data type are different,

step1 and step2. They are described as the following.

such as date vs. numeric, date vs. string,

Step 1. Map the server information into the server

numeric vs. string, etc. ?? Semantic conflict(Range conflict)

class. Step 2. Map the database information into the

For example, one database use the string to store the color, the other database use the color number to store the color, such like

database class. The following steps will be different for each kind of DBMS.

color number 1 means white, color number 2

There are four different methodologies to translate

means yellow, color number 3 means black,

the existing data model into our frame model. There

etc.

are described as the following.

2. Value to Method The method in the OSQL just like a field in the class. If the return value is same as the other field, then the rule just like the attribute name conflict. If the method return type is not same as the other field, then we must write a rule like data meaning or data type conflict rule. 3. Value to Derived Value The derived value is a method defined on the

??Relational to Frame Model Step 3. Map each table information into the header class. Step 4. Map each field of table information into the attribute class. ??Object-Oriented Data Model to Frame Model Step 3. Map each object into the header class. Step 4. Map each attribute into the attribute class.

global schema, it is not exactly defined on the local

Step 5. Map each method into the method class.

database. The conflict is same as the value to method

Step 6. Map each operator into operator class.

??Network data Model to Frame Model

SELECT

Step 3. Map each record type into header class.

[Alias.] Select_Item [AS Column_Name]

Step 4. Map each data item information into the

[, [Alias.] Select_Item [AS Column_Name] ...]

attribute class.

FROM GlobalTableName/StarSchemaName

??Hierarchy data Model to Frame Model

[, GlobalTableName [Alias] ...]

Step 3. Map each record type into the header class.

[XDIMENSION BY Column_name

Step 4. Map each field information into the

[ROLLUP/DRILLDOWN] [LEVEL number]

attribute class.

[, Column_name [ROLLUP/DRILLDOWN] [LEVEL number]...]]

IV. MULTI-DIMENSION QUERY SYSTEM To do multi-dimension queries in our system, a star schema for each application is needed to store in

[YDIMENSION BY Column_name [ROLLUP/DRILLDOWN] [LEVEL number] [, Column_name [ROLLUP/DRILLDOWN] [LEVEL number]...]] [WHERE condition expression]

our meta-data system.

The Select_Items are the output fields which we A. Star Schema Meta-Data

are selected. The Global_Table_Names are the source

A star schema approach is applied in our system

table of global schema which we are selected. The

for multi-dimension query. Figure 6 shows the OMT

StarSchemaName is the target star schema which we

model for the start schema meta-data in our system.

are selected and the star schema only can have one be

The star schema meta-data includes two classes, i.e.

selected. The Column_Name of XDIMENSION is

Dimension Class and Fact Class. One Fact objects

the dimension on the multi-dimension query of

can contain several dimension objects.

XDIMENSION. The [ROLL UP/DRILL DOWN] option is the scroll condition. If the ‘ROLL UP’

Dimension Class

condition is selected, the scroll condition is up. If the

Star Schema Name 1+ Dimension Field Primary Field Level Number Description

Fact Class Star Schema Name Primary Field Data Field

‘DRILL DOWN’ option is selected, the scroll condition is down. The level number determines the

Star Schema

scroll

Figure 6 Star Schema Meta-Data

level. The YDIMENSION is same as

XDIMENSION. The condition expression is the B. Query Language

boolean expression, such like ‘fielda = fieldb’.

We enhance the function of SQL to enable users to do multi-dimension query. dimension

column

to

We add the X/Y

describe

the

condition for the multi-dimension query.

dimension

C. Result Representation A grid is selected to represent our multi-dimension query result. It is enhanced the original two dimension table representation method, similar to Informix Meta-Cube System. We enhance the row and column caption presentation. That can enhance the grid from two dimensions to multi-dimensions.

Figure 7 shows the structure of the grid. This example is for analyzing the transaction data by the

Our integration methodology can be described as the following.

year, quarter, color, and size dimension. Ydimension Level 1 Data Ydimension Level 2 Data

Xdimension level 2 data

85,1200

Step 1. Map the server information into the server class.

Xdimension Level 1 Data

Server Class

86,1680

1,120 2,240 3,360 4,480 1,240 2,360 3,480 4,600 10 20 30 40 20 30 40 50

White S 190 570 M 190 10 L Blue 570

Yello w 570

Red 570

190 10

S 190

20

30

40

20

30

40

50

20

30

40

20

30

40

50

Server ID Server Name Server IP

Server Type

1

Server1

140.129.20.252

ORDB

2

Server2

140.129.20.106

RDB

Step 2. Map the database information into the database class.

10

20

30

40

20

30

40

50

M 190 10

20

30

40

20

30

40

50

L

20

30

40

20

30

40

50

10

20

30

40

20

30

40

50

M 190 10

20

30

40

20

30

40

50

3

Db1

1

L

20

30

40

20

30

40

50

4

Db2

2

10

20

30

40

20

30

40

50

Step3.1 Relational to Frame Model

M 190 10

20

30

40

20

30

40

50

Step 3.1.1. Map each table information into the

L

20

30

40

20

30

40

50

190 10

S 190

190 10

S 190

190 10

Database Class Database ID Database Name Server ID

header class.

Figure 7 Output Format

Header Class Class ID Identifier Class Name Database ID Parents ID

V. A CASE STUDY

6

ProductID Table1

into the attribute class.

become a HDBS. One is a relational database (i.e.

Attribute Class

Db2), the other is an object-oriented database (i.e. The

structure

of

the

class

at

object-relational database is: Create class Class1{ Property:

the

Null

Step 3.1.2. Map each field of table information

There are two local databases to be integrated to

Db1).

4

Attribute Class Method

Attribute Default

Name

Name

Type

value

Productid 6

Null

String

Null

1

The Id of product on table1.

Color

6

Null

String

Null

1

The size of product

Style

6

Null

Int

0

1

The price of product

ID

Cardinality Description

ProductId char[10];

Step3.2 Object Oriented to Frame Model

Price int;

Step 3.2.1 Map each object into the header class.

Color integer;

Header Class

Size char[10]; Method: Class1 Create_Class1();} The structure of table at the relational database is: create table Table1{ productid char[10]; style char[10]; color char[10];}

Class ID Identifier Class Name Database ID Parents ID 5

ProductID Class1

3

Null

Step 3.2.2 Map each attribute into the attribute class.

Attribute Class

Global Table Class

Attribute Class Method

Attribute Default

Name

Name

Type

value

Productid 5

Null

String

Null

1

Color

5

Null

Integer

Null

Size

5

Null

String

Null

Price

5

Null

Int

Color

5

Null

String

ID

Cardinality Description

Global table Global field Join field name

Key type

name

name

The Id of product on class1.

Gsales

Gcolor

ProductID

Non-Key

1

The color number of product

Gsales

Gsize

ProductID

Non-Key

1

The size of product

Gsales

Gprice

ProductID

Non-Key

0

1

The price of product

Gsales

Gstyle

Productid

Non-Key

Null

1

The color of product

Gsales

GproductId Productid

Step 3.2.3 Map each method into the method class. Method Class

Primary

Using our method, a global view of Gsales, that includes fields: Gcolor, Gsize, Gprice, Gstyle and

Method Class Parameters Return Program Description

GproductId is created. A global view between the

Name

two databases is therefor achieved.

ID

Create_ 5

Void

Type

Body

Class1

Create

class1

Create class1

class1

Step 3.2.4 Map each operator into operator class.

There is an application which needs the following star schema(see Figure 8) to do multi-dimension query.

Not apply in this case

Star Schema

Step 4 Create the Global Fields

Store Dimension Store Key

Step 4.1 Identify relation between global fields

City State Country

and local fields.

Fact Table

Product Dimension

Name

Class ID

Gprice

Price

5

Attribute

Gcolor

Color

5

Attribute

Gsize

Size

5

Attribute

Gcolor

Color

6

Conflict Rule

Gstyle

Style

6

Attribute

GproductId

Productid

5

Attribute

GproductId

ProductId

6

Attribute

Step 4.2 The global database designer must know the conflict situation. He/She must create conflict rules to solve these problems. Conflict Rule Class Field

Class ID Type

Rule Body

Name Color

6

Color Size Manufacture

Field Kind

Name

Field

Day Month Quarter Year

Product ID

Global Field Local Field Local

Local

Period Key

Units Price

Global Field Class

Local

Time Dimension

Store Key Product ID Period Key

String Switch color{ Case 1: return “white” Case 2: return “yellow” Case 3: return “black”}

Step 5 Create a global view for the application.

Figure 8 Star Schema Example Our star schema meta-data can store the above star schema as the following. ?? Dimension Class Star Schema Name Sales_Star Sales_Star Sales_Star Sales_Star Sales_Star Sales_Star Sales_Star Sales_Star

Dimension Field City State Country Day Month Quarter Year Color

Primary Field Store key Store key Store key Period Key Period Key Period Key Period Key Product ID

Level Number 1 2 3 1 2 3 4 1

Sales_Star

Size

Product ID

2

Sales_Star

Manufacture Product ID

3

Description The area dimension level 1. The area dimension level 2. The area dimension level 3. The time dimension level 1. The time dimension level 2. The time dimension level 3. The time dimension level 4. The product dimension level 1. The product dimension level 2. The product dimension level 3.

?? Fact Class Star Schema Name

Primary Field

Data Field

Sales_Star

Store key

Units

Sales_Star

Store key

Price

Sales_Star

Period Key

Units

Sales_Star

Period Key

Price

Sales_Star

Product ID

Units

Sales_Star

Product ID

Price

Figure 11 System Main Window

After this step, users can use our SQL-like query language to make the multi-dimension query.

VI. SYSTEM IMPLEMENTATION To evaluate our architecture is workable. We implemented a prototype system, which is coded by Delphi [10]. The test bed of the system integrates an MS-SQL relational DBS [11] and a Uni-SQL object oriented DBS [12] to form a HDBS. Figure 10 shows the testing bed platform. Figure 12 Multi-Dimension Query Window RPC

ODBC Multi-Dimension Query System

Uni-SQL Object-Relational Database System

DELPHI

Communication System

MS SQL Relational Database System

Figure 10 Testing Bed Platform

The prototype system includes two main functions, i.e. a query system and a meta-data maintenance system. The query system enables users to do multi-dimension query and simple global SQL query. The meta-data maintenance system enables users to

Figure 13 Query Result Form

maintain and design the global database meta-data,

VII. SUMMARY

e.g. star schema meta-data, integrated database meta-data, and local database meta-data. Figure 12 shows the multi-dimension query window in our prototype system. Figure 13 shows the result window of the multi-dimension query.

Our approach supports a field based multi dimension environment.

query

in

heterogeneous

database

The field based integration method

gives us more flexible integration method. And the global schema gives us a global view for the whole database environment. The star schema can support

us to do the multi-dimension query in the global

D.)Stockholm

schema.

Computer and Systems Sciences, 1995,pp17-26.

Comparing with other Multi-Database

system(such as UniSQL/M), our approach have the

University,

Department

of

[7] Joseph Fong, Shi-Ming Huang, “A Frame Model

following advantages:

Approach for Expert and Database Integration”,

??Resolve the conflict problem.

International Journal of Software Engineering

??Allow multi-dimeniosn query.

and Knowledge Engineering, Dec 1999.

??Simple and felxible.

[8] Shi-Ming Huang, “An object oriented model for Data, Knowledge, and System Reengineering”,

ACKNOWLEDGEMENT

Database Reengineering and interoperability,

The work presented in this paper has be supported by National Science Council, Taiwan, R.O.C., under Grant No. NSC 88-2213-E-036-005. We deeply appreciate their financial support and encouragement.

Plenum Press, New York, 1996. [9] Chung-Da

Fu

“Multi-dimension

Query

in

Heterogeneous Database” Master Thesis For Information Engineering of Tatung Institute of Thecnology 1998. [10] Microsoft Corporation, ”Microsoft SQL Server

References

User’s Guide”, 1995. [1] Harinarayn, V., Rajaraman, A., and Ullman, J.D. “Implementing data cubes efficiently”, 1996, Proc. ACM SIGMOD Montreal, pp.205-216.

Don Reichart, Murali Venkatrao, Frank Pellow, Hamid Pirahesh, 1996, “Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab and Sub-Totals”, Microsoft Research, Advance Technology Division. [3] Devlin, B, “Data warehouse: from architecture to implementation”, Addison Wesley Longman, Inc. 1997,pp.236-238. [4] Won Kim, 1995, "Modern Database Systems The Object Model, Interoperability, and Beyond", ACM Press, Addison-Wesley (pub), ISBN 0-201-59098-0.

large-scale

Schema

multidatabase

management systems”,

for

Thesis

(Ph.D.) University of Arizona, 1996. [6] Song,Wei, “Schema integration : principles, methods,

and

applications”,

Thesis

Kit”, The Coriolis Group, Inc., 1995. [12] UniSQL, Inc.,”UniSQl user’s Manual”,1996.

[2] Jim Gray, Surajit Chaudhuri, Adam Bosworth,

[5] Wei,Chih-ping,”

[11] Jef Duntemann, “Delphi Programming Starter

(Ph.

Suggest Documents