MULTI-DIMENSION QUERY IN HETEROGENEOUS DATABASE Shi-Ming Huang, and Chung-Da Fu* Department of Information Management *Department of Computer Science and Engineering Tatung Institute of Technology, 40 Chungshan N. Road, 3rd sec., Taipei, 104, Taiwan, R.O.C. Tel: (886)2-25925252-3291 Fax: (886)2-25925252-2288 E-mail:
[email protected]
ABSTRACT Data warehousing or decision support applications need to aggregate data across many dimensions looking for unusual useful patterns. Most of current database languages, such as SQL, only allow users to retrieve the data in zero-dimensional,
one-dimensional,
or
two-dimension
aggregates.
Applications need the N-dimensional generalization of these operators. Unfortunately current solutions need to require the application programmers to develop complex programs for these requirements. They are inflexible and can not be dynamically modified. These cause the applications unfriendly or even failure. Furthermore, most of current multi-dimension query system only can support on one DBMS, but a lot of real applications are in the heterogeneous environment. In this paper, we describe an architecture which can allow users to do directly multi-dimension query in the field-based integration heterogeneous database system.
needs
I. INTRODUCTION as SQL, OSQL, etc., only support one or two dimension query methods. This kind of query method is only suitable for traditional reporting operation of data access, but not decision making. A data warehousing or an executive information system needs
have
a
multi-dimension
query
in
heterogeneous environment.
The most recently database query languages, such
application
to
to query data in different
dimensions to enable the users to make a decision.
This paper describes a new architecture for multi-dimension query system in a Heterogeneous Database System (HDBS) environment. Section II describes related works for this research. Section III describes our integration method.
V is a case study. Section VI describes our prototype system. Finally we give a summary for this paper.
II. RELATED WORKS
needs to query the same season data by transport,
exactly make the right decision.
IV
describes the multi-dimension query system. Section
For example: One sale department manager may
area, type of product and year, then he/she can
Section
The related works for this research includes multi-dimension query and database integration.
Using the traditional method to solve the multi-dimension query problem, it needs usually to write a complex program to match each different goal. It does not enable users to change the query condition dynamically. But for a decision making process in the real world, a dynamic query environment to support
A. Multi-Dimension Query The most common technology for multi-dimension query are Data Cube[1][2] and Star Schema[3][4]. ??Data Cube: ZeroDimension
Point
Sum
different point views for an object is required. A
OneDimension
flexible and high performance multi-dimension querying
method
for
the
dynamic
changing
Color Line
Red White Blue
Color Red Sum
TwoDimension
Face
White Blue
environment is needed.
Manufacture Ford
Benz
Cube
Sum
Following the information technique advancement, the database system model has changed from
ThreeDimension
Manufacture
Color
hierarchy and network to relational data model even 1995 1996
the object-oriented database model. The different
1997
database
systems
have
different
functional Year
characteristics, and we select different database
Figure 1 Data Cube for transaction data for car sailing
systems for different necessary. In one enterprise,
A data cube generally can represent 3-dimension
usually for the building time sequence, job necessary,
data. That can add color and motion to represent the
and environment, different departments may have
5-dimension
different database systems. The environment is
representation method for the zero dimension data to
heterogeneous.
three dimension data for the transaction data of car
Considering above situations, a data warehousing or decision support application for an enterprise
sailing.
data.
Figure
1
describes
the
??Star Schema
??Frame Model
A star schema includes one fact table and some
Frame model approach is used to couple the
dimension table. The fact table stores the transaction
different existing systems in the recent year[10].
data. The dimension table stores the dimension data
Frame -based systems are to collect all information
for analysis dimension.
related to one concept in one place. It is a All Year Quarter Month Week
All Division Group Unit
All Cust Type
Prouct
Seller
Buyer
Units
Price
Discount All
Channel All
Office
Date
higher-order synthesis which includes frame concepts, semantic data modeling concepts and object-oriented concepts to ensure no real distinction between “Data”
District Region Geography
and “Knowledge”. The frame model consists of four
All
classes: Main, Attribute, Method and Constraint in a
Figure 2 Star Schema for a retailing company
Figure 2 shows the data for retailing company. It includes one fact table and six dimension tables for
data dictionary. It is implemented with a knowledge representation schema that includes object structure descriptions(i.e. classes), user-defined relationships
the transaction analysis.
between
entities,
and
structure
inheritance
descriptions defined by taxonomies of structure that
B. Database Integration There are several technologies for database
support data and behavior inheritance(i.e. abstract
integration, such as Global Schema Approach,
relationship). An expert system frame mo del is a
Multidatabase Language Approach, Frame Model, etc.
good example of a knowledge based model which
They are described as the following.
fulfills
??Global Schema Approach
integrated EDS[7].
the
requirements
for
constructing
an
This approach creates a global schema from local external schemas[5]. The global system supports a common data model and a global data language. Multidatabase users view the global schema as the definition of a single database.
III. A FIELD-BASED INTEGRATED HETEROGENEOUS DATABASE Most of the current database integration methods
??Multidatabase Language Approach
for a HDBS are using table-based integration
This approach puts most of the integration
approach, such as Global Schema Approach [5],
responsibility on users[6]. Global users are aware of
Multidatabase Language Approach [6], etc. These
multiple data sources. Instead of providing a global
approaches identify each LDBS as one unit object for
schema, a common name space is defined across all
the integration. In result, the system integration
participating
global
becomes very difficult and the global database
multidatabase language to define the sources of data,
schema becomes very complicated. As such, the
and how the data is integrated, transferred and
database schema integration technology is needed to
presented.
ensure no duplicated data.
DBMSs.
Users
use
the
From an application point of view, especially in an executive information system (EIS) or a data
Header Class
Database Class
Class ID Primary Key Class Name Database ID Parents ID
Database ID Database Name Server ID
1+
Server Class Server ID Server Name Server Location Server Type
1+
warehouse system, users only need to query the data 1+
from several fields of local databases, but not a whole
integration methodology to construct the global view an
EIS
application
in
a
Method Class
Attribute Name Class ID Method Name Attribute Type Default Value Cardinality Description
database. In this section, we provide a simple
for
Operator Class
Attribute Class
Field Type Operator Return type
Method Name Class ID Parameters Return Type Program Body Description
Local Database Meta-Data Figure 4 Local Database Meta-Data
heterogeneous
environment called Field-based Integration.
meta-data system can manage the heterogeneous A. A Field-Based Integration Approach Integrated Database Meta-Data
Heterogeneous Database Meta-Data
information. Figure 4 shows an OMT schema of our local database meta-data system. Our local database meta-data schema includes
Globe Database View
server class, database class, header class, attribute
Local Database meta-data Field Information
class, method class, and operator class. The server class describes the server information. The Database Database Gateway
Class describes the database belong to which server. Direct Schema Translation
Direct Schema Translation
Direct Schema Translation
Direct Schema Translation
Local Database system
Network
Relational
Hierarchy
Objectoriented
Figure 3 Field-Based Integration Approach
The methodology of our approach is described as the following:
The
Header
Class
describes
the
table/class
information. The Attribute Class describes the local data field information. The Method Class describes the method of object-oriented database information. The
Operator
Class
describes
the
operator
information.
Step1: Identify each local database which will be integrated to form the HDBS Step2: Direct translation all fields’ information of each local database into our global schema Step3: Create global fields to integrate different local fields.
Globe Table Class Globe Table Name Globe Field Name Join Field Name Key type
1+
Globe Field Class Globe Field Name Local Field Name Local Class ID Field Kind (Attribute/ Method/ Conflict Rule)
Conflict Rule Class
1+
Local Field Name Local Class ID Field Type Rule Body
Integrated Database Meta-Data Figure 5 Integrated Database Meta-Data Schema
Step4: Define new global views for each global database application by using global fields. B. Local Database Meta-Data
C. Integrated Database Meta-Data The global view of an application for a HDBS is stored in the integrated database meta-data system.
The field information is stored into a local database meta-data system in our approach. The meta-data system is created by using a frame object oriented data model (FOODM) [7]. FOODM can represent different data model database into a single uniform. We extend the model to enable our
Figure 5 shows an OMT schema of our integrated database meta-data system. The meta-data includes global table class, global field class, and conflict rule class. The global table class describes the global table view information. The
global field class describes the field which we want
conflict.
to integrated into our global table view. The conflict
4. Single Value to Multi Value
rule
class
describes
the
local
fields
conflict
resolutions.
This conflict situation just appear in the OODB. We can use the derived table to extract the contents from the set. Then we can select the single valued
D. Data Conflict Resolution
field to the derived table.
In our system, conflict rule class will store the
Different conflict rule algorithms have been
information to solve out the data conflict problems
applied in our approach to solve these conflict
between different local databases. There are several
problem. The detail of these algorithm can be found
conflict types of data conflict problems. They are
in [9].
described as the following. E. Direct Schema Translation
1. Value to Value ?? Attribute Name Conflict: The two fields
This
section
describes
our
direct
schema
have the same mean and data type but the
translation methodology. In our approach, we only
field’s name are different.
consider the field translation, but not the semantic
?? Data type Conflict:
The two fields have
translation. All the methodologies have the same
the same mean but the data type are different,
step1 and step2. They are described as the following.
such as date vs. numeric, date vs. string,
Step 1. Map the server information into the server
numeric vs. string, etc. ?? Semantic conflict(Range conflict)
class. Step 2. Map the database information into the
For example, one database use the string to store the color, the other database use the color number to store the color, such like
database class. The following steps will be different for each kind of DBMS.
color number 1 means white, color number 2
There are four different methodologies to translate
means yellow, color number 3 means black,
the existing data model into our frame model. There
etc.
are described as the following.
2. Value to Method The method in the OSQL just like a field in the class. If the return value is same as the other field, then the rule just like the attribute name conflict. If the method return type is not same as the other field, then we must write a rule like data meaning or data type conflict rule. 3. Value to Derived Value The derived value is a method defined on the
??Relational to Frame Model Step 3. Map each table information into the header class. Step 4. Map each field of table information into the attribute class. ??Object-Oriented Data Model to Frame Model Step 3. Map each object into the header class. Step 4. Map each attribute into the attribute class.
global schema, it is not exactly defined on the local
Step 5. Map each method into the method class.
database. The conflict is same as the value to method
Step 6. Map each operator into operator class.
??Network data Model to Frame Model
SELECT
Step 3. Map each record type into header class.
[Alias.] Select_Item [AS Column_Name]
Step 4. Map each data item information into the
[, [Alias.] Select_Item [AS Column_Name] ...]
attribute class.
FROM GlobalTableName/StarSchemaName
??Hierarchy data Model to Frame Model
[, GlobalTableName [Alias] ...]
Step 3. Map each record type into the header class.
[XDIMENSION BY Column_name
Step 4. Map each field information into the
[ROLLUP/DRILLDOWN] [LEVEL number]
attribute class.
[, Column_name [ROLLUP/DRILLDOWN] [LEVEL number]...]]
IV. MULTI-DIMENSION QUERY SYSTEM To do multi-dimension queries in our system, a star schema for each application is needed to store in
[YDIMENSION BY Column_name [ROLLUP/DRILLDOWN] [LEVEL number] [, Column_name [ROLLUP/DRILLDOWN] [LEVEL number]...]] [WHERE condition expression]
our meta-data system.
The Select_Items are the output fields which we A. Star Schema Meta-Data
are selected. The Global_Table_Names are the source
A star schema approach is applied in our system
table of global schema which we are selected. The
for multi-dimension query. Figure 6 shows the OMT
StarSchemaName is the target star schema which we
model for the start schema meta-data in our system.
are selected and the star schema only can have one be
The star schema meta-data includes two classes, i.e.
selected. The Column_Name of XDIMENSION is
Dimension Class and Fact Class. One Fact objects
the dimension on the multi-dimension query of
can contain several dimension objects.
XDIMENSION. The [ROLL UP/DRILL DOWN] option is the scroll condition. If the ‘ROLL UP’
Dimension Class
condition is selected, the scroll condition is up. If the
Star Schema Name 1+ Dimension Field Primary Field Level Number Description
Fact Class Star Schema Name Primary Field Data Field
‘DRILL DOWN’ option is selected, the scroll condition is down. The level number determines the
Star Schema
scroll
Figure 6 Star Schema Meta-Data
level. The YDIMENSION is same as
XDIMENSION. The condition expression is the B. Query Language
boolean expression, such like ‘fielda = fieldb’.
We enhance the function of SQL to enable users to do multi-dimension query. dimension
column
to
We add the X/Y
describe
the
condition for the multi-dimension query.
dimension
C. Result Representation A grid is selected to represent our multi-dimension query result. It is enhanced the original two dimension table representation method, similar to Informix Meta-Cube System. We enhance the row and column caption presentation. That can enhance the grid from two dimensions to multi-dimensions.
Figure 7 shows the structure of the grid. This example is for analyzing the transaction data by the
Our integration methodology can be described as the following.
year, quarter, color, and size dimension. Ydimension Level 1 Data Ydimension Level 2 Data
Xdimension level 2 data
85,1200
Step 1. Map the server information into the server class.
Xdimension Level 1 Data
Server Class
86,1680
1,120 2,240 3,360 4,480 1,240 2,360 3,480 4,600 10 20 30 40 20 30 40 50
White S 190 570 M 190 10 L Blue 570
Yello w 570
Red 570
190 10
S 190
20
30
40
20
30
40
50
20
30
40
20
30
40
50
Server ID Server Name Server IP
Server Type
1
Server1
140.129.20.252
ORDB
2
Server2
140.129.20.106
RDB
Step 2. Map the database information into the database class.
10
20
30
40
20
30
40
50
M 190 10
20
30
40
20
30
40
50
L
20
30
40
20
30
40
50
10
20
30
40
20
30
40
50
M 190 10
20
30
40
20
30
40
50
3
Db1
1
L
20
30
40
20
30
40
50
4
Db2
2
10
20
30
40
20
30
40
50
Step3.1 Relational to Frame Model
M 190 10
20
30
40
20
30
40
50
Step 3.1.1. Map each table information into the
L
20
30
40
20
30
40
50
190 10
S 190
190 10
S 190
190 10
Database Class Database ID Database Name Server ID
header class.
Figure 7 Output Format
Header Class Class ID Identifier Class Name Database ID Parents ID
V. A CASE STUDY
6
ProductID Table1
into the attribute class.
become a HDBS. One is a relational database (i.e.
Attribute Class
Db2), the other is an object-oriented database (i.e. The
structure
of
the
class
at
object-relational database is: Create class Class1{ Property:
the
Null
Step 3.1.2. Map each field of table information
There are two local databases to be integrated to
Db1).
4
Attribute Class Method
Attribute Default
Name
Name
Type
value
Productid 6
Null
String
Null
1
The Id of product on table1.
Color
6
Null
String
Null
1
The size of product
Style
6
Null
Int
0
1
The price of product
ID
Cardinality Description
ProductId char[10];
Step3.2 Object Oriented to Frame Model
Price int;
Step 3.2.1 Map each object into the header class.
Color integer;
Header Class
Size char[10]; Method: Class1 Create_Class1();} The structure of table at the relational database is: create table Table1{ productid char[10]; style char[10]; color char[10];}
Class ID Identifier Class Name Database ID Parents ID 5
ProductID Class1
3
Null
Step 3.2.2 Map each attribute into the attribute class.
Attribute Class
Global Table Class
Attribute Class Method
Attribute Default
Name
Name
Type
value
Productid 5
Null
String
Null
1
Color
5
Null
Integer
Null
Size
5
Null
String
Null
Price
5
Null
Int
Color
5
Null
String
ID
Cardinality Description
Global table Global field Join field name
Key type
name
name
The Id of product on class1.
Gsales
Gcolor
ProductID
Non-Key
1
The color number of product
Gsales
Gsize
ProductID
Non-Key
1
The size of product
Gsales
Gprice
ProductID
Non-Key
0
1
The price of product
Gsales
Gstyle
Productid
Non-Key
Null
1
The color of product
Gsales
GproductId Productid
Step 3.2.3 Map each method into the method class. Method Class
Primary
Using our method, a global view of Gsales, that includes fields: Gcolor, Gsize, Gprice, Gstyle and
Method Class Parameters Return Program Description
GproductId is created. A global view between the
Name
two databases is therefor achieved.
ID
Create_ 5
Void
Type
Body
Class1
Create
class1
Create class1
class1
Step 3.2.4 Map each operator into operator class.
There is an application which needs the following star schema(see Figure 8) to do multi-dimension query.
Not apply in this case
Star Schema
Step 4 Create the Global Fields
Store Dimension Store Key
Step 4.1 Identify relation between global fields
City State Country
and local fields.
Fact Table
Product Dimension
Name
Class ID
Gprice
Price
5
Attribute
Gcolor
Color
5
Attribute
Gsize
Size
5
Attribute
Gcolor
Color
6
Conflict Rule
Gstyle
Style
6
Attribute
GproductId
Productid
5
Attribute
GproductId
ProductId
6
Attribute
Step 4.2 The global database designer must know the conflict situation. He/She must create conflict rules to solve these problems. Conflict Rule Class Field
Class ID Type
Rule Body
Name Color
6
Color Size Manufacture
Field Kind
Name
Field
Day Month Quarter Year
Product ID
Global Field Local Field Local
Local
Period Key
Units Price
Global Field Class
Local
Time Dimension
Store Key Product ID Period Key
String Switch color{ Case 1: return “white” Case 2: return “yellow” Case 3: return “black”}
Step 5 Create a global view for the application.
Figure 8 Star Schema Example Our star schema meta-data can store the above star schema as the following. ?? Dimension Class Star Schema Name Sales_Star Sales_Star Sales_Star Sales_Star Sales_Star Sales_Star Sales_Star Sales_Star
Dimension Field City State Country Day Month Quarter Year Color
Primary Field Store key Store key Store key Period Key Period Key Period Key Period Key Product ID
Level Number 1 2 3 1 2 3 4 1
Sales_Star
Size
Product ID
2
Sales_Star
Manufacture Product ID
3
Description The area dimension level 1. The area dimension level 2. The area dimension level 3. The time dimension level 1. The time dimension level 2. The time dimension level 3. The time dimension level 4. The product dimension level 1. The product dimension level 2. The product dimension level 3.
?? Fact Class Star Schema Name
Primary Field
Data Field
Sales_Star
Store key
Units
Sales_Star
Store key
Price
Sales_Star
Period Key
Units
Sales_Star
Period Key
Price
Sales_Star
Product ID
Units
Sales_Star
Product ID
Price
Figure 11 System Main Window
After this step, users can use our SQL-like query language to make the multi-dimension query.
VI. SYSTEM IMPLEMENTATION To evaluate our architecture is workable. We implemented a prototype system, which is coded by Delphi [10]. The test bed of the system integrates an MS-SQL relational DBS [11] and a Uni-SQL object oriented DBS [12] to form a HDBS. Figure 10 shows the testing bed platform. Figure 12 Multi-Dimension Query Window RPC
ODBC Multi-Dimension Query System
Uni-SQL Object-Relational Database System
DELPHI
Communication System
MS SQL Relational Database System
Figure 10 Testing Bed Platform
The prototype system includes two main functions, i.e. a query system and a meta-data maintenance system. The query system enables users to do multi-dimension query and simple global SQL query. The meta-data maintenance system enables users to
Figure 13 Query Result Form
maintain and design the global database meta-data,
VII. SUMMARY
e.g. star schema meta-data, integrated database meta-data, and local database meta-data. Figure 12 shows the multi-dimension query window in our prototype system. Figure 13 shows the result window of the multi-dimension query.
Our approach supports a field based multi dimension environment.
query
in
heterogeneous
database
The field based integration method
gives us more flexible integration method. And the global schema gives us a global view for the whole database environment. The star schema can support
us to do the multi-dimension query in the global
D.)Stockholm
schema.
Computer and Systems Sciences, 1995,pp17-26.
Comparing with other Multi-Database
system(such as UniSQL/M), our approach have the
University,
Department
of
[7] Joseph Fong, Shi-Ming Huang, “A Frame Model
following advantages:
Approach for Expert and Database Integration”,
??Resolve the conflict problem.
International Journal of Software Engineering
??Allow multi-dimeniosn query.
and Knowledge Engineering, Dec 1999.
??Simple and felxible.
[8] Shi-Ming Huang, “An object oriented model for Data, Knowledge, and System Reengineering”,
ACKNOWLEDGEMENT
Database Reengineering and interoperability,
The work presented in this paper has be supported by National Science Council, Taiwan, R.O.C., under Grant No. NSC 88-2213-E-036-005. We deeply appreciate their financial support and encouragement.
Plenum Press, New York, 1996. [9] Chung-Da
Fu
“Multi-dimension
Query
in
Heterogeneous Database” Master Thesis For Information Engineering of Tatung Institute of Thecnology 1998. [10] Microsoft Corporation, ”Microsoft SQL Server
References
User’s Guide”, 1995. [1] Harinarayn, V., Rajaraman, A., and Ullman, J.D. “Implementing data cubes efficiently”, 1996, Proc. ACM SIGMOD Montreal, pp.205-216.
Don Reichart, Murali Venkatrao, Frank Pellow, Hamid Pirahesh, 1996, “Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab and Sub-Totals”, Microsoft Research, Advance Technology Division. [3] Devlin, B, “Data warehouse: from architecture to implementation”, Addison Wesley Longman, Inc. 1997,pp.236-238. [4] Won Kim, 1995, "Modern Database Systems The Object Model, Interoperability, and Beyond", ACM Press, Addison-Wesley (pub), ISBN 0-201-59098-0.
large-scale
Schema
multidatabase
management systems”,
for
Thesis
(Ph.D.) University of Arizona, 1996. [6] Song,Wei, “Schema integration : principles, methods,
and
applications”,
Thesis
Kit”, The Coriolis Group, Inc., 1995. [12] UniSQL, Inc.,”UniSQl user’s Manual”,1996.
[2] Jim Gray, Surajit Chaudhuri, Adam Bosworth,
[5] Wei,Chih-ping,”
[11] Jef Duntemann, “Delphi Programming Starter
(Ph.