Handbook of Research on Fuzzy Information Processing in Databases

7 downloads 9740 Views 1MB Size Report
Inclusion of the names of the products or companies does .... terms like cheap, large, close to the airport, ..... extension of the domain relational calculus (DRC).
Handbook of Research on Fuzzy Information Processing in Databases José Galindo University of Málaga, Spain

Volume I

Information science reference Hershey • New York

Acquisitions Editor: Development Editor: Senior Managing Editor: Managing Editor: Assistant Managing Editor: Copy Editor: Cover Design: Printed at:

Kristin Klinger Kristin Roth Jennifer Neidig Jamie Snavely Carole Coulson April Schmidt, Shanelle Ramelb Lisa Tosheff Yurchak Printing Inc.

Published in the United States of America by Information Science Reference (an imprint of IGI Global) 701 E. Chocolate Avenue, Suite 200 Hershey PA 17033 Tel: 717-533-8845 Fax: 717-533-8661 E-mail: [email protected] Web site: http://www.igi-global.com and in the United Kingdom by Information Science Reference (an imprint of IGI Global) 3 Henrietta Street Covent Garden London WC2E 8LU Tel: 44 20 7240 0856 Fax: 44 20 7379 0609 Web site: http://www.eurospanbookstore.com Copyright © 2008 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher. Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark. Library of Congress Cataloging-in-Publication Data Handbook of research on fuzzy information processing in databases / Jose Galindo, editor. p. cm. Summary: "This book provides comprehensive coverage and definitions of the most important issues, concepts, trends, and technologies in fuzzy topics applied to databases, discussing current investigation into uncertainty and imprecision management by means of fuzzy sets and fuzzy logic in the field of databases and data mining. It offers a guide to fuzzy information processing in databases"--Provided by publisher. Includes bibliographical references and index. ISBN-13: 978-1-59904-853-6 (hardcover) ISBN-13: 978-1-59904-854-3 (ebook) 1. Databases--Handbooks, manuals, etc. 2. Data mining--Handbooks, manuals, etc. 3. Fuzzy mathematics--Handbooks, manuals, etc. I. Galindo, Jose, 1970QA76.9.D32H336 2008 005.74--dc22 2007037381

British Cataloguing in Publication Data A Cataloguing in Publication record for this book is available from the British Library. All work contributed to this book set is original material. The views expressed in this book are those of the authors, but not necessarily of the publisher. If a library purchased a print copy of this publication, please go to http://www.igi-global.com/agreement for information on activating the library's complimentary electronic access to this publication.

34

Chapter II

An Overview of Fuzzy Approaches to Flexible Database Querying Sławomir Zadrożny Polish Academy of Sciences, Poland Guy de Tré Ghent University, Belgium Rita de Caluwe Ghent University, Belgium Janusz Kacprzyk Polish Academy of Sciences, Poland

Abstract In reality, a lot of information is available only in an imperfect form. This might be due to imprecision, vagueness, uncertainty, incompleteness, or ambiguities. Traditional database systems can only adequately cope with perfect data. Among others, fuzzy set theory has been applied to deal with imperfections of data in a more natural way and to enhance the accessibility of databases. In this chapter, we give an overview of main trends in the research on flexible querying techniques that are based on fuzzy set theory. Both querying techniques for traditional databases as well as querying techniques for fuzzy databases are described. The discussion comprises both the relational and the object-oriented database modeling approaches.

Introduction Databases are a very important component in computer systems. Because of their increasing number and volume, good and accurate accessibility to a database becomes even more important. A lot of research has already been done to improve

database access. In this research, many aspects have been dealt with, among which we mention file organization, indexing, querying techniques, query languages, and other data access techniques. In this chapter, we give an overview of the main research results on the development of flexible querying techniques that are based on fuzzy

Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.

An Overview of Fuzzy Approaches

set theory (Zadeh, 1965) and its related possibility theory (Dubois & Prade, 1988; Zadeh, 1978). The scope of the chapter is further limited to an overview of those techniques that aim to enhance database querying by introducing fuzzy preferences (Bosc, Kraft, & Petry, 2005). ��������������������������� Other techniques not dealt with in this chapter include: • • •

Self-correcting querying systems that can correct syntactic and semantic errors in query formulations. Navigational querying systems that allow intelligent navigation through the database. Cooperative querying systems that support “indirect” answers like summaries, conditional answers, and contextual background information for (empty) results. (Gaasterland, Godfrey, & Minker, 1992)

We will assume a simplified view of the database query as a combination of a number of conditions that are to be met by the data sought. The introduction of fuzzy preferences in queries can be done at two levels: inside query conditions and between query conditions. Fuzzy preferences are introduced inside query conditions via flexible search criteria and allow to express that some values are more desirable than others in a gradual way. Fuzzy preferences between query conditions are expressed via grades of importance assigned to particular query conditions indicating that the satisfaction of some query conditions is more desirable than the satisfaction of others. Because of the use of fuzzy preferences and the central role of fuzzy set theory, the flexible querying approaches dealt with in this chapter will be called fuzzy querying in the remainder of the chapter. The research on fuzzy querying already has a long history. It has been inspired by the success of fuzzy logic in modeling natural language propositions. The use of such propositions in queries, in turn, seems to be very natural for human users of any information system, notably the database management system. Later on, the interest in fuzzy querying has been reinforced by the omnipresence of network based applications, related

to buzzwords of modern information technology, such as e-commerce, e-government, and so forth. These applications evidently call for a flexible querying capability when users are looking for some goods, hotel accommodations, and so forth, that may be best described using natural language terms like cheap, large, close to the airport, and so on. Another amplification of the interest in fuzzy querying comes from developments in the area of data warehousing and data mining related applications. For example, a combination of fuzzy querying and data mining interfaces (Kacprzyk & Zadrożny, 2000a, 2000b) ����������������������� or fuzzy logic and the OLAP (Online Analytical Processing) technology (Laurent, 2003) may lead to new, effective, and more efficient solutions in this area. The remainder of the chapter is organized as follows. In the next section, some preliminaries are presented. In Fuzzy ����������������������������������� Querying of Crisp Relational Databases section,���������������������������������� ������������������������������������������ the results on fuzzy querying in classical relational databases are presented, while the Fuzzy Querying of Fuzzy Relational Databases and Object-Oriented Approaches sections deal with the same issues for fuzzy and object oriented cases, respectively. Finally, ���������������������������� some concluding remarks are given. Other chapters in this volume also deal with some particular cases of fuzzy querying. Among the most relevant ones, we want to mention here the chapters written by: • • • •



Thomopoulos, Buche, and Haemmerlé who describe flexible querying with hierarchical fuzzy sets. Dubois and Prade who handle bipolar queries. Takači and Škrbić who deal with introducing priorities in fuzzy queries. Barranco, Campaña, and Medina who write about a fuzzy object-relational database model and some strategies for fuzzy queries in this model. De Tré, Demoor, Callens, and Gosseye who present some flexible querying techniques that are based on case based reasoning.

35

An Overview of Fuzzy Approaches

Preliminaries In order to review and discuss main contributions to the research area of fuzzy querying, we have to introduce the terminology and notation related to the basics of database management and fuzzy logic. A relational database may be meant in an abstract sense as a collection of relations or, informally, of tables (Codd, 1970) which represent them, comprising rows and columns. Each relation R—or relational variable R (Date, 2004)—is defined via the relation schema: R(A1 : Dom(A1 ), A2 : Dom(A2 ),  , An : Dom(An ))



(1)

where the Ai’s are the names of attributes (columns) and Dom(Ai)’s are their associated domain. Each relation (table) represents a class of objects (meant as in common parlance rather than in the object-oriented paradigm) essential for a part of the real world modeled by a given database. A tuple (row) of such a relation represents a particular object of such a class. The most interesting operation on a database, from this chapter’s perspective, is the retrieval of data satisfying certain conditions. Usually, to retrieve data, a user forms a query specifying these conditions (criteria). The retrieval process may be meant as the calculation of a matching degree for each tuple of relevant relation(s). Classically, a row either matches the query or not; that is, the concept of matching is binary. In the context of flexible criteria, a degree of matching is considered. Usually two general formal approaches to the querying are assumed: the relational algebra and the relational calculus. The former has a procedural character: a query consists here of a sequence of operations on relations that finally yield requested data. These operations comprise five basic ones: union, difference, projection, selection, and cross product that may be combined to obtain some derived operations such as intersection, division, and join. The latter approach, known in two fla-

36

vors as the tuple relational calculus (TRC) or the domain relational calculus (DRC), is of a more declarative nature. Here a query just describes what information is requested, but how it is to be retrieved from a database is left to the database management system. The exact form of queries is not of utmost importance for our considerations, as we focus on the condition part of queries. However, some reported research in this area directly employs the de-facto standard querying language for relational databases, that is, SQL (Structured Query Language) (cf., Melton & Simon, 2002; Ramakrishnan & Gehrke, 2000). Thus, we will also sometimes refer to the SELECT instruction of this language and its WHERE clause, where query conditions are specified. We will use the following concepts and notation concerning fuzzy logic. A fuzzy set FS in the universe U is characterized by a membership function: mFS F : U → [0,1]

(2)

For each element x ∈ U , mFS (x) denotes the membership grade or extent to which x belongs to FS. On the one hand, fuzzy sets make it possible to represent vague concepts like “tall man,” in an appropriate way, taking into account the graduality of such a concept. On the other hand, a fuzzy set that is interpreted as a possibility distribution can be used to represent the uncertainty about the value of a variable, for example, representing the height of a man (Dubois & Prade, 1988; Zadeh, 1978). Possibility distributions are denoted by π. The notation πX is often used to indicate that the distribution concerns the variable X: X

: U → [0,1]

where X takes values from a universe U.

(3)

Possibility and necessity measures can provide for the quantification of such an uncertainty. These measures are denoted by Π and N, respectively, that is:

An Overview of Fuzzy Approaches

~ (U ) → [0,1] Π :℘ and ~ N :℘(U ) → [0,1]

(4)

~ (U ) stands for the family of fuzzy sets where ℘ defined over U. Assuming that all we know about the value of a variable X is a possibility distribution pX, these measures, for a given fuzzy set FS, assess how it is possible (Π) or sure (N) that the value of X belongs to FS. More precisely, if pX is the underlying possibility distribution, then: Π X (F ) = sup min ( u∈U

X

N X (F ) = inf max(1 − u∈U

(u ), FS((u)) u )) X

(u ), F S((u)) u ))

(5) (6)

Sometimes, the interval [NX (FS), ΠX (FS)] is used as an estimate of the possibility that the actual value of X comes from FS. The possibility (necessity) that two variables X and Y, the values of which are given by possibility distributions, pX and pX, are in relation θ, for example, equality, is computed as follows. The joint possibility distribution, pXY, of X and Y on U × U (assuming non-interactivity of the variables) is given by: XY

(u, w) = min( X (u ), Y (w))

(7)

Knowing the possibility distributions of two variables X and Y, one may be interested on how these distributions are similar to each other. Obviously, Equations (8) through (9) may provide some assessment of this similarity, but other indices of similarity are also applicable. This leads to a distinction, proposed by Bosc, Duval, and Pivert (2000), between representation-based and value-based comparisons of possibility distributions. We will discuss this later in the Fuzzy Querying of Fuzzy Relational Databases section. As an alternative for possibility and necessity measures, extended possibilistic truth values (EPTVs) can be used to quantify uncertainty (de Tré, 2002). An EPTV is defined as a possibility distribution in the universe I * = {T , F , ⊥} that consists of the three truth values T (true), F (false), and ⊥ (undefined), that is: ~ ~ ( I *) t * : P →℘

(10)

where P denotes the universe of all propositions. In general, the EPTV ~ t * ( p ) of a proposition p ∈ P has the following format: ~ t * ( p ) = {(T ,

~ t *( p )

(T )), ( F ,

~ t *( p )

( F )), (⊥,



~ t *( p )

(⊥))}

(11)

Relation θ may be fuzzy and represented by a fuzzy ~ (U × U ) F ∈℘ set FS such that mFSF (u , w) = (u , w). The possibility (resp. necessity) measure associated with pX will be denoted by Π XY (resp. NXY). Then, we calculate the measures of the variables in relation θ as follows:

Hereby ~t *( p ) (T ) denotes the possibility that p is true, ~t *( p ) ( F ) is the possibility that p is false, and ~t *( p ) (⊥) is the possibility that some elements of p are not applicable, undefined, or not supplied. EPTVs extend the approach of possibility and necessity measures with an explicit facility to deal with the inapplicability of information as can, for example, occur with the evaluation of query (F )== sup min( X (u ), Y (conditions. Possibility (X Y ) = Π XY (F) w), (u , w)) u , w ∈ U ility (X Y ) = Π XY (F ) = sup min( X (u ), Y (w), (u , w)) (8) In Table 1, some special cases of EPTVs are u , w∈U presented. These cases are verified as follows:

Necessity(X Y ) = N XY ((F) F ) == inf max(1 − X (u ),1 − Y (w), (u , w)) • If it is completely possible that the proposition u , w∈U ecessity(X Y ) = N XY (F ) = inf max(1 − X (u ),1 − Y (w), (u , w)) is true and no other truth values are possible, u , w∈U (9) then it means that the proposition is true.

37

An Overview of Fuzzy Approaches

Table 1. Special EPTVs

• •





~ t * ( p)

Interpretation

{(T,1)}

p is true

{(F,1)}

p is false

{(T,1), (F,1)}

p is unknown

{(⊥,1)}

p is inapplicable

{(T,1), (F,1), (⊥,1)}

Information about p is not available

If it is completely possible that the proposition is false and no other truth values are possible, then it means that the proposition is false. If it is completely possible that the proposition is true, it is completely possible that the proposition is false, and it is not possible that the proposition is inapplicable, then it means that the proposition is applicable, but unknown. This truth value will be called, in short, unknown. If it is completely possible that the proposition is inapplicable and no other truth values are possible, then it means that the proposition is inapplicable. If all truth values are completely possible, then this means that no information about the truth of the proposition is available. The proposition might be inapplicable, but might also be true or false. This truth value will be called, in short, unavailable.

Assume again that all we know about the value of a variable X is a possibility distribution pX, defined over a universe U. Then the EPTV of the proposition “X is FS” that expresses to which extent the value of X is compatible with the value represented by a given fuzzy set F in U can be calculated by: t*(' XisFS ')

(T ) = sup min(

t*(' XisFS ')

( F ) = sup min(

u∈U

u∈U {⊥U }

X

(u ), X

FS

(u ),1 −



38

(u )) FS

(12)

(u )) (13)

t*(' XisFS ')

(⊥) = min(

X

(⊥U ),1 −

FS

(⊥U ))

(14)

where ⊥U represents a special “undefined” element of U that is used to model cases where a regular element of U is not applicable (cf. Prade & Testemale, 1984).

Fuzzy Querying of Crisp Relational Databases In this case, a classical crisp relational database is assumed, while queries are allowed to contain natural language terms in their conditions. The main lines of research include the study of the idea of modeling linguistic terms in queries using elements of fuzzy logic (Tahani, 1977); enhancements of the fuzzy query formalism with flexible aggregation operators (Bosc & Pivert, 1993; Dubois & Prade, 1997; Kacprzyk, Zadrożny, ������������������������ & Ziółkowski��, 1989; Kacprzyk & Ziółkowski, 1986), and practical problems with embedding fuzzy constructs in the syntax of the standard SQL (Bosc, 1999; Bosc & Pivert, 1992a, 1992b, 1995; de Tré, Verstraete, Hallez, Matthé, & de Caluwe, 2006; Galindo, Medina, Pons, & Cubero�������������������������� , 1998; Galindo, ��������� Urrutia, & Piattini����������������������������������� , 2006; Kacprzyk & Zadrożny, 1995; Umano & Fukami, 1994).

Fuzzy Preferences Inside Query Conditions Tahani (1977) was the first to propose the use of fuzzy logic to improve the flexibility of crisp data-

An Overview of Fuzzy Approaches

base queries. He proposed a formal approach and architecture to deal with simple fuzzy queries. His query language is based on SQL. Tahani proposed to use vague terms typical for natural language, for example, “high” and “young” in “WHERE salary = HIGH AND age = YOUNG.” The semantics of these vague terms is provided by appropriate fuzzy sets. The matching degree, g, for such extended queries is calculated as follows. For a tuple t and a simple (elementary) condition Q of type A = l, where A is an attribute (e.g., “age”) and l is a linguistic (fuzzy) term (e.g., “YOUNG”), the value of the function g is: g(Q, t) = ml(x)

(15)

where x is t[A]; that is, the value of tuple t for attribute A and ml is the membership function of the fuzzy set representing the linguistic term l. The matching function g for complex conditions, exemplified by “age = YOUNG AND (salary = HIGH OR empyear = RECENT ),” is obtained by applying the semantics of the fuzzy logical connectives; that is:

(P ∧ Q, t ) = min( (P, t ), (Q, t ))

(16)

(P ∨ Q, t ) = max( (P, t ), (Q, t ))

(17)

(¬Q, t ) = 1 − (Q, t)

(18)

where P, Q are conditions. The min and max operators may be replaced by, for example, t-norm and t-conorm operators (Klement, Mesiar, & Pap, 2000) to model the conjunction and disjunction connectives, respectively. The classical querying formalisms of the relational data model were also studied from the perspective of the fuzzy querying purposes. The relational algebra may be fairly easily adapted. However, for some operations, multiple fuzzy versions have been proposed. One such operation lacking a clear, widely accepted fuzzy counterpart

is the division of relations which has been studied by many researchers, including Yager (1991), Dubois and Prade (1996), and Galindo, �������� Medina, Cubero, and Garcia������������������������������� (2001); see also a chapter by Bosc et al. in this volume. The relational calculus attracted much less attention. One of the earliest contributions in this area is the work of Takahashi (1995) where he proposes the FQL (Fuzzy Query Language), meant as a fuzzy extension of the domain relational calculus (DRC). A more complete approach has been proposed by Buckles, Petry, and Sachar (1989). Even if it was developed in the framework of a fuzzy database model it covers all aspects relevant for the fuzzy relational calculus. Also Zadrożny and Kacprzyk (2002) proposed to interpret elements of DRC in terms of a variant of fuzzy logic. This approach also makes it possible to account for preferences between query conditions in an uniform way.

Fuzzy Preferences Between Query Conditions The next step is to distinguish simple (fuzzy) conditions composing a query with respect to their importance. To model the relative importance of conditions, weights are associated with them. Usually, a weight wi is represented by a real number of the unit interval, that is, wi ∈ [0,1]. Hereby, as extreme cases, wi = 0 models “not important at all” and wi = 1 represents “fully important.” A weight wi is associated with each (fuzzy) condition Pi. Assume that the matching degree of a condition Pi with an importance weigh wi is denoted by (Pi * , t) . In order to be meaningful, weights should satisfy the following requirements (Dubois, Fargier, & Prade, 1997): • •

In order to have an appropriate scaling, it must hold that at least one of the associated weights is 1, that is, maxi � wi = 1�. If wi = 1 and the matching function equals 0 for Pi, that is, g(Pi, t) = 0������������������ , then the impact of the weight should be 0, or (Pi* , t)= 0 . In other words, if Pi is not satisfied at all and

39

An Overview of Fuzzy Approaches

Pi is fully important, then the weight should not modify the matching degree�. If wi = 1 and the matching function equals 1 for Pi , or g(Pi, t) = 1������������������������� , then the impact of the weight should be 1, or (Pi* , t) = 1 . In other words, if Pi is completely satisfied and Pi is fully important, then the weight should not modify the matching degree�. Lastly, if wi = 0��������������������������������� , then the result should be such as if Pi would not exist. �

a similar scheme may be offered). Let us denote by g(Pi, t) the matching degree for a tuple t of such an elementary condition Pi without any importance weight assigned. Then, Dubois and Prade (1997) propose to use the following formula to compute the matching degree, (Pi* , t) = , of 1 an elementary condition Pi with an importance weight wi ∈ [0,1] assigned:

The impact of a weight can be modeled by first matching the condition as if there is no weight and then modifying the resulting matching degree in accordance with the weight. A modification function that strengthens the match of more important conditions and weakens the match of less important conditions is used for this purpose. Different interpretations are possible. From a conceptual point of view, a distinction can be made between static weights and dynamic weights. Static weights are fixed, known in advance, and can be directly derived from the formulation of the query. These weights are independent of the values of the record(s) on which the query criteria act and are not allowed to change during query processing. A further, orthogonal distinction can be made between static weight assignments, where it is also known in advance with which condition a weight is associated (e.g., in a situation where the user explicitly states preferences) and dynamic weight assignments, where the associations between weights and conditions depend on the actual attribute values of the record(s) on which the query conditions act (e.g., in a situation where most criteria have to be satisfied, but it is not important which ones).

where ⇒ is an operator modeling a fuzzy implication connective. The overall matching degree of the whole query composed of the conjunction of conditions Pi is calculated using the standard min-operator. Depending on the type of the fuzzy implication operator used, we get various interpretations of importance weights. For example, using the Dienes implication, we obtain from Equation (19):

Static weights. In most approaches, static weights are used. As Dubois and Prade (1997) discovered, some of the most practical interpretations of static weights may be formalized within a universal scheme. Namely, let us assume that query condition P is a conjunction of weighted elementary query conditions Pi (for a disjunction

This is the interpretation presumably first discussed by Yager (cf. for a reference Dubois & Prade, 1997). The importance weight wi is here treated as a threshold: if condition Pi is satisfied to a degree greater than this threshold, then the weighted condition Pi* is considered to be fully satisfied. Otherwise the matching degree for Pi* equals that for Pi.





40

(P , t)= (w ⇒ (P , t )) i

*

i

i

(P , t)= max( (P , t),1 − w ) i

*

i

i

(19)

(20)

For a small importance (wi close to 0), the satisfaction of elementary condition Pi does not bear on the satisfaction of the overall query. On the other hand, with wi close to 1, the satisfaction of the elementary condition is essential for the matching of the overall query P. Consequently, the requirements for weights, proposed by Dubois et al. (1997) and mentioned in the item list above, are satisfied. For the Gödel implication, Equation (19) turns into:

(P , t) =  (P1 , t )  i

*

i

if

(Pi , t )≥ wi

otherwise

(21)

An Overview of Fuzzy Approaches

Finally, another interpretation of importance is obtained when the Goguen implication is used in Equation (19):

(P , t )=  (P ,1t ) w  i

*

i

i

if

(Pi , t )≥ wi otherwise

(22)

In fact, here we still have a threshold-type interpretation, as in the previous case, but the undersatisfaction of the condition is treated in a more continuous way. For still another interpretation of importance, see Zadrożny (2005). The use of importance weights indirectly leads to an unconventional aggregation of partial matching degrees. Dynamic weights. The approach described for static weights, based on Equation (19), has been refined (Dubois & Prade, 1997) to deal with a variable importance wi ∈ [0,1] depending on the matching degree of the associated elementary condition. For example, in a specific context, it may be useful to assume wi to be constant for a relatively high satisfaction of the elementary condition, but an extremely low satisfaction should be more strongly reflected in the overall matching by automatically increasing the weight wi. For instance, when we want a car of a moderate price, if a particular car has a very high price, the price criterion becomes more important (wi = 1) in order to reject that car. More generally, when using dynamic weights and dynamic weight assignments, neither the weights nor the associations between weights and criteria are known in advance. Both the weights and their assignments then depend on the attribute values of the record(s) on which the query criteria act. This kind of flexibility is required to avoid some unnatural behavior of the query evaluation in cases where, for example, a condition is of limited importance only within a given range of values such as if the condition “high salary” is not important, unless the salary value is extremely high.

Other approaches. Other flexible schemes of aggregation are also a direct subject of research in the framework of flexible fuzzy logic based querying. In Kacprzyk and Ziółkowski (1986) and Kacprzyk et al. (1989), the aggregation of partial queries (conditions) to be guided by a linguistic quantifier has been first described. In such approaches, conditions of the following form are considered: P = Ψ out of {P1 , … , Pk }

(23)

where Ψ is a linguistic (fuzzy) quantifier and Pi is an elementary condition to be aggregated. For example, in the context of a U.S.-based company, one may classify an order as troublesome if it meets most of the following conditions: “comes from outside of USA,” “its total value is low,” “its shipping costs are high,” “employee responsible for it is John Doe (known to be not completely reliable),” “the amount of order goods on stock is not much greater than ordered amount,” and so forth. The overall matching degree may be computed using any of the approaches used to model linguistic quantifiers. In Kacprzyk and Ziółkowski (1986) and Kacprzyk et al. (1989), first the linguistic quantifiers in the sense of Zadeh (1983) and later the OWA operators (Yager, 1994) are used (cf. Kacprzyk & Zadrożny, 1997). Such approaches make it also possible to take into account the importance of conditions to be aggregated. There are many works on this topic studying various possible interpretations of linguistic quantifiers for the flexible querying purposes such as Bosc, Pivert, and Lietard (2001), Bosc, Lietard, and Pivert (2003), Galindo et al. (2006), Vila, Cubero, Medina, and Pons (1997).

Practical Approaches More practical approaches to flexible fuzzy querying in crisp databases are well represented by SQLf (SQLfuzzy) (Bosc & Pivert, 1995) and FQUERY (FuzzyQUERY) for Access (Kacprzyk & Zadrożny, 1995). The former is an extension of

41

An Overview of Fuzzy Approaches

SQL introducing linguistic (fuzzy) terms wherever it makes sense, and the latter is an example of the implementation of a specific “fuzzy extension” of SQL for Microsoft Access®, a popular desktop DBMS (database management system). Also, Galindo et al.’s (1998) FSQL (FuzzySQL) features the capability of fuzzy querying of a, in principle, crisp database. However, as it is a more comprehensive approach, it will be considered in the section on fuzzy databases. Moreover, in another chapter by Urrutia, Tineo, and Gonzalez in this volume, the reader can find a comparison of SQLf and FSQL. FQUERY. In Kacprzyk and Zadrożny (1995), an extension of the Access SQL language, with the linguistic terms in the spirit of the approaches discussed earlier, has been presented. The following types of linguistic terms have been considered: fuzzy values (e.g., “YOUNG”); fuzzy relations (fuzzy comparison operators) (e.g., “MUCH GREATER THAN”); and fuzzy quantifiers (e.g., “MOST”). The matching degree is calculated according to the previously discussed semantics of fuzzy predicates and linguistically quantified propositions. This extension to SQL has been implemented as an add-in, FQUERY for Access, to Microsoft Access, thus extending the native Access’s querying interface with the capability of manipulating linguistic terms. In FQUERY for Access, the user composes a query using a QBE (query-by-example) type user interface provided by the host environment, that is, Microsoft Access. The resulting rows are ordered decreasingly with respect to the matching degree. FQUERY has been one of the first implementations demonstrating the usefulness of fuzzy querying features for a crisp database. In addition to the syntax and semantics of the extended SQL, the authors have also proposed a scheme for the elicitation and manipulation of linguistic terms to be used in queries. The concept of FQUERY has been further developed in two directions. In �������������������� Zadro��������������� ż�������������� ny������������ & Kacprzyk (1998) and Kacprzyk and Zadro���������������� ��������������������� ż��������������� ny (2001),����� the

42

very same concept has been applied in the Internet environment (WWW). Another line of development (Kacprzyk & Zadrożny, 2000a; Kacprzyk & Zadrożny 2000b) consists in adding some data mining capabilities to the existing fuzzy querying interface. Such a combined interface partially employs the same modules and data structures as the ones used in FQUERY and seems to be a promising direction for the development of advanced OLAP and data analysis tools. SQLf. So far we have only discussed the “fuzzification” of conditions appearing in the WHERE clause of the SQL’s SELECT instruction. In Bosc and Pivert (1992b), Bosc and Pivert (1995), and Bosc and Pivert (1997a), a new language, called SQLf, has been proposed. This language is a much more comprehensive and complete “fuzzy” extension of the crisp SQL language. In SQLf, linguistic terms may appear as fuzzy values, relations, and quantifiers (associated with aggregation operators) in the WHERE clause and other clauses. The linguistic quantifiers may be used together with subqueries. This is called by Bosc et al. the vertical quantification in contrast to the horizontal quantification when a quantifier plays the role of an aggregation operator and replaces the AND or OR connectives in a condition as in (23). All the operations of the relational algebra (implicitly or explicitly used in SQL’s SELECT instruction) are redefined to properly process fuzzy relations that appear when parts of a fuzzy query are processed. Other operations typical for SQL are also redefined, including the partition of relations (GROUP BY clause) and the operators “IN” and “NOT IN” used along with subqueries. All the features of SQL have been redefined in such a way so as to preserve the equivalences that occur in the “crisp” SQL. A number of pilot implementations of SQLf have been developed (e.g., Gonçalves & Tineo, 2001a, 2001b). Other approaches. Other approaches and implementations for the flexible querying of crisp relational databases, based on similar principles as

An Overview of Fuzzy Approaches

explained above, exist. Among these, we should mention the PRETI-platform that is intended as an experimental environment for the exchange of expertise (de Calmès, Dubois, Hüllermeier, Prade, & Sedes, 2002) and the approach based on EPTVs (de Tré, de Caluwe, Tourné, & Matthé, 2003; de Tré et al., 2006).

Fuzzy Querying of Fuzzy Relational Databases Fuzzy databases intend to grasp imperfect information about a modeled part of the world and represent it directly in a database. The most straightforward application of fuzzy logic to the classical relational data model is by assuming that the relations in a database themselves are also fuzzy. Each tuple of a relation (table) is associated with a membership degree. This approach is often neglected because the interpretation of the membership degree is unclear. On the other hand, it is worth noticing that fuzzy queries, as discussed in the previous section, in fact produce fuzzy relations. Two leading approaches to the representation of imperfect information in relational databases are the possibilistic model (Prade & Testemale, 1984, 1987) and the similarity relation based model (Buckles & Petry, 1982; Petry, 1996). More recently, an extended possibilistic approach, based on EPTVs has been proposed (de Tré & de Caluwe, 2003). The main idea behind the possibilistic data model is to represent the imprecisely known value of an attribute via a possibilistic distribution on the domain of this attribute. For example, if all that is known about the age of a suspect in a criminal investigation is that he is “young,” then in a corresponding database, this information may be represented by a suitable possibility distribution on, for example, the interval [1,100]. This calls for some special measures both in data representation and querying, which will be described in the next section. The similarity based approach is rooted in the observation that by specifying the search condi-

tions of a query, the user actually looks not only for tuples exactly satisfying them but also for similar tuples. Thus, a similarity relation on the attribute domain is assumed. The values taken by a similarity relation are in the unit interval [0,1], where 0 corresponds to “totally different” and 1 to “totally similar.” It is a fuzzy binary relation such that its membership function expresses the similarity degree between the pairs of the domain elements. Similarity relations are usually provided by the user. The extended possibilistic approach is an extension of the possibilistic approach. It explicitly deals with the inapplicability of information during the evaluation of the query conditions: if some part of the query conditions are inapplicable, this will be reflected by the model. We briefly discuss the main concepts of fuzzy querying as proposed for both leading models of fuzzy databases. Next, fuzzy querying in the extended possibilistic approach, as well as in some hybrid approaches, is briefly described.

The Possibilistic Approach Prade and Testemale (1984) proposed an algebra for retrieving information from a fuzzy possibilistic relational database. The principles of this algebra can be illustrated by an example of the selection operator. The syntax of the condition is more or less the same as previously, but the attributes may take possibilistic distributions as values. Two types of elementary conditions are considered:

(i) A θ a, where A is the name of an attribute, θ is a comparison operator (fuzzy or not), and a is a constant (fuzzy or not); (ii) A θ B, where A and B are names of attributes.

The computed matching degree of an elementary condition against a tuple t is expressed by a pair: the possibility and necessity measure of some sets (with respect to the possibility distributions A(t) and B(t)). In case of (i), it is the set, crisp or fuzzy, of the elements from the domain of A in

43

An Overview of Fuzzy Approaches

relation θ with a constant a. In the second case (ii), it is the subset of the Cartesian product of domains of A and B containing only the pairs of elements being in relation θ. In this case, a joint possibility distribution over the Cartesian product of the domains of A and B is used. Formally, the matching degree for case (i) is computed as follows. Let us denote by FS the set (in general fuzzy) whose possibility and necessity measures have to be computed. Its membership function for the elements of the domain of A is as follows: mFS (d ) = sup min( (d , d ′), (d ′)), d ∈ Dom( A) F a Dom ((A ) (d , d ′ ), ′ ( d ) = sup d ′∈min F a (d )), d ∈ Dom( A) d ′∈Dom (A )



(24)

where ma is the membership function of the constant a. The possibility and necessity measures of the set FS with respect to the possibility distribution pA(t) (the value of the attribute A for the tuple t) are computed as in Equations (2)-(3). For the second form of atomic condition (ii), the set F comprises the pairs of elements (d, d' ), d ∈ Dom(A), d ′ ∈ Dom(B ) such that d θ d’ is satisfied. Thus, its membership function is identical to that of θ and the possibility and necessity measures are computed as in Equations (8) through (9). Baldwin, Coyne, and Martin (1993) have implemented a system for querying a possibilistic relational database using semantic unification and the evidential logic rule. The queries are composed of one or more conditions, the importance of each condition, a “filtering” function (similar to the notion of quantifier), and a threshold. The particularity of their work is the process, semantic unification, used for matching the fuzzy values of the criteria with the possibility distributions of the attributes of a tuple. As a result, one obtains an interval [n, p] where, similar to the previous case, n (necessity) is the certain degree of matching and p (possibility) is the maximum possible degree of matching. However, this time the calculations are based on the mass assignments theory developed by Baldwin et al. 44

Bosc and Pivert (1997b) have proposed a new type of queries for possibilistic databases. These are directly querying the representation of the attribute’s value (i.e., features of the corresponding possibility distribution) rather than the value itself. Examples of basic queries of this new type are: “Find tuples such that all the values d1, d2, …, dn are possible for attribute A”; “Find tuples such that more than n values are possible to a degree higher than λ for attribute A.” The matching degree for such queries is computed using the formula: min(

A

(d1), A(d 2 ),  , A(d n ))

(25)

where A is an attribute; d1, d2, …,dn are values from its domain Dom(A); and πA is the possibility distribution representing the value of A. The tuples �������������������� such that more than n values are possible to a degree higher than λ for attribute A are retrieved using the condition: Card _ cut (A,

) > n

(26)

where Card _ cut (A,

) = {d d ∈ Dom(A)∧



A (d ) ≥

(27)

}

and λ is a value in the interval [0,1]. These basic queries and the scheme for computing their matching degree may then be used to process more complex queries like: “Find the tuples where for attribute A the value d1 is more possible than the value d2”; “Find the tuples where for attribute A only one value is completely possible.”

An Overview of Fuzzy Approaches

There are other works on fuzzy querying in the possibilistic approach (de Caluwe, 2002; Umano, 1982; Umano & Fukami, 1994; Zemankova-Leech & Kandel, 1984).

The Similarity Relation Based Approach The research on querying in similarity relation based fuzzy databases has been summarized in Buckles and Petry (1985), Buckles et al. (1989), and Petry (1996). A complete set of operations of the relational algebra has been defined for the similarity relation based model. These operations result from their classical counterparts by the replacement of the concept of equality of two domain values with the concept of similarity of two domain values. The conditions of queries are composed of crisp predicates as in a regular query language. Additionally, a set of level thresholds may be submitted as a part of the query. A threshold may be specified for each attribute appearing in the query’s condition. Such a threshold indicates what degree of similarity of two values from the domain of given attribute justifies in considering them equal. The concept of threshold level plays also a central role in the definition of the redundancy concept for this database model. Two tuples are redundant if the values of all corresponding attributes are similar (to a level higher than a selected degree). There are also a number of hybrid models proposed in the literature. Takahashi (1993) has proposed a model for a fuzzy relational database assuming possibility distributions as attribute values. Additionally, fuzzy sets are used as tuple truth-values. For example, a tuple t may express that “It is quite true that John’s age is nearly 40.” Medina, Pons, and Vila (1994) propose a fuzzy database model, GEFRED (Generalized Fuzzy Relational Database), trying to integrate the advantages of both the possibilistic and similarity based models. The data are stored as generalized fuzzy relations that extend the relations of the relational model by allowing imprecise information and a compatibility degree associated with

each attribute value. They also define an algebra, called a generalized fuzzy relational algebra, to manipulate information stored in such a fuzzy database. Galindo, Medina, and Aranda (1999) have extended the GEFRED model with a fuzzy domain relational calculus (FDRC). The GEFRED model has been implemented using the crisp commercial DBMS Oracle (Galindo et al., 1998). The implementation supports FSQL, a “fuzzy” SQL. This fuzzy extension to SQL includes the linguistic labels (terms; fuzzy values) and fuzzy comparison operators (relations) that have been discussed in the previous sections. Each condition could be assigned a fulfillment threshold ( ∈ [0,1]) requiring that this condition has to be satisfied at least to a degree α (thus, in some sense, changing a fuzzy condition to a crisp one). In Galindo, �������� Medina, Cubero, and Garcia������������������������������� (2000), the fuzzy quantifiers have been included into their FDRC language. Some applications of the FSQL are reported (Barranco, Campaña, Medina, & Pons, 2004; Galindo et al., 2006).

An Extended Possibilistic Approach In the extended possibilistic approach, the computed matching degree of an elementary condition against a tuple t is expressed by an EPTV. This EPTV represents the extent to which it is (un)certain that t belongs to the result of a flexible query. Let us again denote by FS the set (in general, fuzzy) whose EPTV with respect to the possibility distribution pA(t) has to be computed. Then the computation of the EPTV can be done as in Equations (12) through (14). In case of composed query conditions, the resulting EPTV can be obtained by aggregating the EPTVs of the elementary conditions. Hereby, generalizations of the logical connectives of the conjunction (∧), disjunction (∨), negation (¬), implication (⇒), and equivalence (⇔) can be applied (de Tré, 2002; de Tré & de Baets, 2003). The extended possibilistic approach is an extension of the possibilistic approach based on possibility and necessity measures presented in Prade and Testemale (1984). It offers additional facilities

45

An Overview of Fuzzy Approaches

to cope with the inapplicability of information at the logical level: if some of the query conditions are inapplicable for a given tuple t, this will be explicitly reflected in the EPTV representing the matching degree for the tuple (de Tré, ����������� de Caluwe, & Prade������������ , in press).

Object-Oriented Approaches With object-oriented databases becoming mature, research on “fuzzy” object-oriented databases has drawn a lot of attention. Nowadays, several fuzzy object-oriented database models exist. Based on some of them, prototypes have already been implemented. The majority of the presented models do not conform to a single underlying object data model which is a logical consequence of the present lack of (formal) object standards. The most recent version of the ODMG (Object Data Management Group) proposal (Cattell & Barry, 2000) offers the best perspectives, although it still suffers from some shortcomings such as the absence of formal semantics and does not have the status of an official standard (Alagić, 1997; Kim, 1994). Informally, an object database is a collection of objects that are instances of classes and typically have their own identity. Each class is characterized by its structure, usually specified by a finite number of attributes Ai: Dom(Ai) as in the relational model and by its behavior specified by a finite number of operations. Classes are interrelated via association relationships which allow to associate objects with other objects, and via inheritance relationships which allow sharing characteristics among classes. Research on fuzzy object-oriented databases can also be subdivided into two main approaches: those based on a possibilistic model and those based on a similarity relation based model. Furthermore, an extended possibilistic approach and some other alternative approaches have been proposed. Most research interest has been in the development of semantically richer data modeling facilities. Fuzzy querying of fuzzy object-oriented databases

46

has in most cases been performed using similar techniques as described in the previous sections of this chapter.

Possibilistic Models Among the possibilistic approaches are the objectcentered model of Rossazza (1990) and ���������� Rossazza, Dubois, and Prade����������������������������� (1997); the object-oriented model of Tanaka, Kobayashi, and Sakanoue (1991); the FOOD model (Fuzzy Object-Oriented Data model) of Bordogna, ������������������������������� Lucarella, and Pasi������������ (1994) and Bordogna, Pasi, and Lucarella������������������� (1999); the fuzzy algebra of Rocacher ��������������������������� and Connan����������������� (1996); the UFO (Uncertainty and Fuzziness in an Object-oriented model) model of Van Gyseghem (1998); the fuzzy association algebra model of Na and Park (1997); the FIRMS (Fuzzy Information Retrieval and Management System) model of Mouaddib and Subtil (1997); and the FOODM (Fuzzy ObjectOriented Database Model) model of Marín, Pons, and Vila (2000). In the object-centered model of Rossazza (1990) and Rossazza ����������������������������������������������� et al.�������������������������������� (1997), all information is contained in objects that are completely described by a set of attributes. For these objects, no behavior is defined. Objects with the same attributes are collected in classes that are organized in class hierarchies. A range of allowed values and a range of typical values have been specified for the attributes. These ranges may be fuzzy. Various kinds of (graded) inclusion relations between classes have been defined: in order to find out to which extent a class is a subclass of another class, the ranges of their corresponding attributes are compared with each other, using a “default reasoning” technique as proposed in Reiter (1980). In the object-oriented model of Tanaka et al. (1991), fuzziness is considered with respect to both the structural and the behavioral aspects of objects. Attribute values can be fuzzy. Furthermore, fuzziness is considered at the levels of instantiation, inheritance, and relationships between objects by introducing some special classes. Special comparison operators, which are obtained by applying

An Overview of Fuzzy Approaches

Zadeh’s (1975) extension principle, are provided to compare instances of fuzzy classes and to support flexible querying. The FOOD model of Bordogna et al. (1994, 1999) is based on a visualization paradigm that supports the representation of data semantics and the direct browsing of information. It has been defined as an extension of a graph-based object model in which both the database scheme and instances are represented as directed labeled graphs. A database manipulation language has been described in terms of graph transformations. A prototype of the model has been implemented (Bordogna, Leporati, Lucarella, & Pasi�������� , 2000). The fuzzy algebra of Rocacher and Connan (1996) is an extension of the so-called EQUALalgebra which is part of the object-oriented database model ENCORE (Shaw & Zdonik, 1990). The extension is based on an early version of the ODMG data model (Cattell & Barry, 2000) and is aimed at the modeling and manipulation of fuzzy data. The extended operators are “union,” “intersection,” “difference,” “select,” “image” (to invoke functions on objects), “project,” “join,” “flatten,” “nest,” and “unnest.” Additionally, specific operators have been provided to generate and to compare fuzzy sets. The UFO model of Van Gyseghem (1998) has been an attempt to extend an object-oriented database model as generally as possible in order to be able to deal with fuzziness as well as with uncertainty. Different concepts of the object orientation have been extended (attributes, methods, objects, classes, inheritance, instantiation, etc.). A specific feature of this approach is the use of “role” objects to properly deal with the manipulation of uncertain data. In the approach of Na and Park (1997), a fuzzy object-oriented data model has been built by means of fuzzy classes and fuzzy associations. A fuzzy database is represented by a fuzzy schema graph at schema level and a fuzzy object graph at object instance level. Data manipulation is handled by means of a fuzzy association algebra which consists of operators that can operate on the fuzzy associa-

tion patterns of homogeneous and heterogeneous structures. As the result of these operators, truth values are returned with the patterns. The FIRMS model of Mouaddib and Subtil (1997) can deal with fuzzy, uncertain, and incomplete information. At the base of the model are the concepts of a “nuanced value” and “nuanced domain.” Furthermore, a fuzzy thesaurus is used to restrict the allowed domain values of discrete attributes. A formal grammar is used to generate the characteristic membership functions of the thesaurus terms. In the FIRMS model, no class hierarchies are supported. The FOODM model of Marín et al. (2000) illustrates how different sources of vagueness can be managed over a regular object-oriented database model. It is founded on the concept of a “fuzzy type” where properties are ranked in different levels of precision according to their relationship with the type. Objects are created using α-cuts of their fuzzy type. Architecture of a prototype implementation of the model has been presented in Berzal, Marìn, ����������������������������� Pons, and Vila�������� (2003).

Similarity Relation Based Models George (1992) and George, Yazici, Petry, and Buckles (1997) have proposed an object-oriented database model, which facilitates an enhanced representation of different types of imprecision and utilizes a similarity relation to generalize equality to similarity. Similarity permits to represent imprecision in data and imprecision in inheritance. An object algebra based on extensions of the five traditional operators (union, difference, product, projection, and selection) and three operators to handle nested class data have been provided to support querying.

Other Approaches In the “rough” object-oriented database of Beaubouef and Petry (2002), an indiscernibility relation and approximation regions of rough set theory are used to incorporate uncertainty and

47

An Overview of Fuzzy Approaches

vagueness into the database model. As is the case for fuzzy relational databases, the EPTVs have also been applied in fuzzy object-oriented databases. The database model of the constraint based approach of de Tré, ����������������������� de Caluwe, and Van der Cruyssen���������������������������������������� (2000) and de Tré and de Caluwe (2005) is consistent with the ODMG data model (Cattell & Barry, 2000). Both the data(base) semantics and the flexible querying criteria are expressed by generalized constraints. A many-valued possibilistic logic based on EPTVs is used in order to be able to explicitly cope with missing information and to express query satisfaction.

Concluding Remarks In this chapter, we have presented an overview of some of the most important contributions in two main sub-areas of fuzzy querying: in crisp and fuzzy databases. We have discussed the first sub-area in more detail because it still seems to be more promising. Both the relational and the object-oriented database modeling and querying approaches have been described.

Acknowledgment The authors would like to thank the reviewers and the editor Dr. Jose Galindo (University of Malaga, projects TIN2006-14285 and TIN2006-07262 by Ministry of Education and Science of Spain) for their valuable comments and suggestions regarding the original manuscript of this chapter which greatly helped to shape its final version.

References Alagić, S. (1997). The ODMG object model: Does it make sense? ACM SIGPLAN Notices, 32(10), 253-270. Baldwin, J. F., Coyne, M. R., & Martin, T. P. (1993). Querying a database with fuzzy attribute values

48

by iterative updating of the selection criteria. In A. L. Ralescu (Ed.), Proceedings of the Workshop on Fuzzy Logic in Artificial Intelligence (LNCS 847, pp. 62-76). London: Springer-Verlag. Barranco, C. D., Campaña, J., Medina, J. M., & Pons, O. (2004). ImmoSoftWeb: A Web based fuzzy application for real estate management. In J. Favela, E. Menasalvas, & E. Chávez (Eds.), Advances in Web intelligence (pp. 196-206). Berlin: Springer. Beaubouef, T., & Petry, F. E. (2002). Uncertainty in OODB modeled by rough sets. In Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems (IPMU 2002) (pp. 1697-1703), Annecy, France. Berzal, F., Marìn, N., Pons, O., & Vila, M. A. (2003). ��������������������������������������� FoodBi: Managing fuzzy object-oriented data on top of the Java platform. In Proceedings of the 10th International Fuzzy Systems Association (IFSA) World Congress (pp. 384-387), Istanbul, Turkey. Bordogna, G., Leporati, A., Lucarella, D., & Pasi, G. (2000). The ��������������������������������������� fuzzy object-oriented database management system. In G. Bordogna & G. Pasi (Eds.), Recent issues on fuzzy databases (pp. 209-236). Heidelberg, Germany: Physica-Verlag. Bordogna, G., Lucarella, D., & Pasi, G. (1994). A fuzzy object oriented data model. In Proceedings of the 3rd IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’94) (pp. 313-318), Orlando, FL. Bordogna, G., Pasi, G., & Lucarella, D. (1999). �� A fuzzy object-oriented data model for managing vague and uncertain information. International Journal of Intelligent Systems, 14(7), 623-651. Bosc, P. (1999). Fuzzy databases. In J. Bezdek (Ed.), Fuzzy sets in approximate reasoning and information systems (pp. 403-468). Boston: Kluwer Academic Publishers. Bosc, P., Duval, L., & Pivert, O. (2000). ������������ Value-based and representation-based querying of possibilistic

An Overview of Fuzzy Approaches

databases. In G. Bordogna & G. Pasi (Eds.), Recent research issues on fuzzy databases (pp. 3-27). Heidelberg: Physica-Verlag.

Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2001) (pp. 12311234), Melbourne, Australia.

Bosc, P., Kraft, D., & Petry, F. E. (2005). Fuzzy ������ sets in database and information systems: Status and opportunities. Fuzzy Sets and Systems, 153(3), 418-426.

Buckles, B. P., & Petry, F. E. (1982). A fuzzy representation of data for relational databases. Fuzzy Sets and Systems, 7, 213-226.

Bosc, P., Lietard, L., & Pivert, O. (2003). Sugeno ������� fuzzy integral as a basis for the interpretation of flexible queries involving monotonic aggregates. Information Processing and Management, 39(2), 287����� -306. Bosc, P., & Pivert, O. (1992a). Some approaches for relational databases flexible querying. International Journal on Intelligent Information Systems, 1, 323-354. Bosc, P., & Pivert, O. (1992b). Fuzzy querying in conventional databases. In L. A. Zadeh & J. Kacprzyk (Eds.), Fuzzy logic for the management of uncertainty (pp. 645-671). New York: Wiley. Bosc, P., & Pivert, O. (1993). An approach for a hierarchical aggregation of fuzzy predicates. ��� In Proceedings of the 2nd IEEE International ���� Conference on Fuzzy Systems (FUZZ-IEEE´93) (pp. 1231-1236), San Francisco, CA. Bosc, P., & Pivert, O. (1995). SQLf: A relational database language for fuzzy querying. IEEE Transactions on Fuzzy Systems, 3, 1-17. Bosc, P., & Pivert, O. (1997a). Fuzzy queries against regular and fuzzy databases. In T. Andreasen, H. Christiansen, & H. L. Larsen (Eds.), Flexible query answering systems. Dordrecht: Kluwer Academic Publishers. Bosc, P., & Pivert, O. (1997b). On representationbased querying of databases containing ill-known values. In Z. W. Ras & A. Skowron (Eds.), Proceedings of the 10th International Symposium on Foundations of Intelligent Systems (LNCS 1325, pp. 477-486). London: Springer-Verlag. Bosc, P., Pivert, O., & Lietard, L. (2001). Aggre������ gate operators in database flexible querying. In

Buckles, B. P., & Petry, F. E. (1985). Query languages for fuzzy databases. In J. Kacprzyk & R. Yager (Eds.), Management decision support systems using fuzzy sets and possibility theory (pp. 241-251). Cologne, Germany: Verlag TÜV Rheiland. Buckles, B. P., Petry, F. E., & Sachar, H. S. (1989). A domain calculus for fuzzy relational databases. Fuzzy Sets and Systems, 29, 327-340. Cattell, R. G. G., & Barry, D. (Eds.). ���������������� (2000). The object data standard: ODMG 3.0. San Francisco: Morgan Kaufmann. Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377-387. Date, C. J. (2004). An introduction to database systems (8th ed.). ������������������������������ Boston: Pearson Education Inc. de Calmès, M., Dubois, D., Hüllermeier, E., Prade H., & Sedes, F. (2002). ������������������������ A fuzzy set approach to flexible case-based querying: methodology and experimentation. In Proceedings of the 8th International Conference, Principles of Knowledge Representation and Reasoning (KR2002) (pp. 449-458), ����������������� Toulouse, France. de Caluwe, R. (2002). Principles of fuzzy databases. In J. Kacprzyk, M. Krawczak, & S. Zadrozny (Eds.), Issues in information technology (pp. 151172). Warszawa, Poland: Exit. de Tré, G. (2002). Extended possibilistic truth values. International Journal of Intelligent Systems, 17, 427-446. de Tré, G., & de Baets, B. (2003). Aggregating ������������ constraint satisfaction degrees expressed by possibilistic truth values. IEEE Transactions on Fuzzy Systems, 11(3), 361-368. 49

An Overview of Fuzzy Approaches

de Tré, G., & de Caluwe, R. (2003). Modelling ���������� uncertainty in multimedia database systems: An extended possibilistic approach. International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems, 11(1), 5-22. de Tré, G., & de Caluwe, R. (2005). A constraint based fuzzy object oriented database model. In Z. Ma (Ed.), Advances in fuzzy object-oriented databases: Modelling and applications (pp. 1������ -45). Hershey, PA: Idea Group Publishing. de Tré, G., de Caluwe, R., & Prade, H. (in press). Null values in fuzzy databases. Journal of Intelligent Information Systems. de Tré, G., de Caluwe, R., Tourné, K., & Matthé, T. (2003). ���������������������������������������� Theoretical considerations ensuing from experiments with flexible querying. In T. Bilgiç, B. De Baets, & O. Kaynak (Eds.), Proceedings of the IFSA 2003 World Congress (LNCS 2715, pp. 388-391). ��������� Springer. de Tré, G., de Caluwe, R., & Van der Cruyssen, B. (2000). A ��������������������������������������� generalised object-oriented database model. In G. Bordogna & G. Pasi (Eds.), Recent issues on fuzzy databases (pp. 155-182). Heidelberg, Germany: Physica-Verlag. de Tré, G., Verstraete, J., Hallez, A., Matthé, T., & de Caluwe, R. (2006). The ����������������������� handling of selectproject-join operations in a relational framework supported by possibilistic logic. In Proceedings of the 11th International Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems (IPMU) (pp. 2181-2188), Paris, France. Dubois, D., Fargier, H., & Prade, H. (1997). ������� Beyond min aggregation in multicriteria decision: (ordered) weighted min, discri-min and leximin. In R. R. Yager & J. Kacprzyk (Eds.), The ordered weighted averaging operators: Theory and applications (pp. 181������������������������������������������ -192)������������������������������������� . Boston: Kluwer Academic Publishers. Dubois, D., & Prade, H. (1988). Possibility theory. New York: Plenum Press.

50

Dubois, D., & Prade, H. (1996). Semantics of quotient operators in fuzzy relational databases. Fuzzy Sets and Systems, 78, 89���� -��� 93. Dubois, D., & Prade, H. (1997). �������������������� Using fuzzy sets in flexible querying: Why and how? In T. Andreasen, H. Christiansen, & H. L. Larsen (Eds.), Flexible query answering systems. Dordrecht: Kluwer Academic Publishers. Gaasterland, T., Godfrey, P., & Minker, J. (1992). An overview of ����������������������� cooperative answering. Journal of Intelligent Information Systems, 1, 123-157. Galindo, J., Medina, J. M., & Aranda, M. C. (1999). Querying fuzzy relational databases through fuzzy domain calculus. International Journal of Intelligent Systems, 14, 375-411. Galindo, J., Medina, J. M., Cubero, J. C., & Garcia, M. T. (2000). ���������������������������������� Fuzzy quantifiers in fuzzy domain calculus. In ��� Proceedings of the International Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems (IPMU´2000) (pp. 1697-1704), Madrid, Spain. Galindo, J., Medina, J. M., Cubero, J. C., & Garcia, M. T. (2001). Relaxing ����������������������������������������� the universal quantifier of the division in fuzzy relational databases. International Journal of Intelligent Systems, 16(6), 713-742. Galindo, J., Medina, J. M., Pons, O., & Cubero, J. C. (1998). A server for fuzzy SQL queries. In T. Andreasen, H. Christiansen, & H. L. Larsen (Eds.), Proceedings of the Third International Conference on Flexible Query Answering Systems (LNAI 1495, pp. 164-174)�������������������������� . London: Springer-Verlag. Galindo, J., Urrutia, A., & Piattini, M. (2006). Fuzzy databases: Modeling, design and implementation. Hershey, PA: Idea Group Publishing. George, R. (1992). Uncertainty management issues in the object-oriented database model. PhD Thesis, Tulane University, New Orleans, LA, USA. George, R., Yazici, A., Petry, F. E., & Buckles, B. P. (1997). Modeling impreciseness and uncertainty in the object-oriented data model: A similarity-

An Overview of Fuzzy Approaches

based approach. In R. de Caluwe, (Ed.), Fuzzy and uncertain object-oriented databases: Concepts and models (pp. 63-95). Singapore: World Scientific. Gonçalves, M., & Tineo, L. (2001a). SQLf flexible querying language extension by means of the norm SQL2. In Proceedings of the ����������������������� IEEE International Conference on Fuzzy Systems (������������������ FUZZ- IEEE’ 2001) (pp. 473-476). Gonçalves, M., & Tineo, L. (2001b). SQLf3: An extension of SQLf with SQL3 features. In Proceedings of ������������������������������������������� the IEEE International Conference on Fuzzy Systems ������������������ (FUZZ-IEEE’ 2001) (pp. 477-480). Kacprzyk, J., & Zadrożny, S. (1995). FQUERY for Access: Fuzzy querying for windows-based DBMS. In P. Bosc & J. Kacprzyk (Eds.), Fuzziness in database management systems (pp. 415-433). Heidelberg, Germany: Physica-Verlag. Kacprzyk, J., & Zadrożny, S. (1997). ��������� Implementation of OWA operators in fuzzy querying for Microsoft Access. In R. R. Yager & J. Kacprzyk (Eds.), The ordered weighted averaging operators: Theory and applications (pp. ����������������������� 293��������������� -306)���������� . Boston: Kluwer Academic Publishers. Kacprzyk, J., & Zadrożny, S. (2000a). On a fuzzy querying and data mining interface. Kybernetika, 36, 657-670. Kacprzyk, J., & Zadrożny, S. (2000b). On combining intelligent querying and data mining using fuzzy logic concepts. In G. Bordogna & G. Pasi (Eds.), Recent research issues on fuzzy databases (pp. 67-81). Heidelberg: Physica-Verlag. Kacprzyk, J., & Zadrożny, S. (2001). ������������ Using fuzzy querying over the Internet to browse through information resources. In B. Reusch & K. H. Temme (Eds.), Computational intelligence in theory and practice (pp. 235-262). Heidelberg: Physica-Verlag. Kacprzyk, J., Zadrożny, S., & Ziółkowski, A. (1989). ���������������������������������� FQUERY III+: A “human-consistent” database querying system based on fuzzy logic with linguistic quantifiers. Information Systems, 14, 443-453.

Kacprzyk, J., & Ziółkowski, A. (1986). Database ��������� queries with fuzzy linguistic quantifiers. IEEE Transactions on Systems, Man and Cybernetics, 16, 474-479. Kim, W. (1994). Observations on the ODMG-93 proposal for an object-oriented database language. ACM SIGMOD Record, 23(1), 4-9. Klement, E. P., Mesiar, R., & Pap, E. (Eds.). (2000). Triangular norms. Kluwer Academic Publishers. Laurent, A. (2003). Querying fuzzy multidimensional databases: Unary operators and their properties. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 11, 31-46. Marín, N., Pons, O., & Vila, M. A. (2000). Fuzzy ������ types: A new concept of type for managing vague structures. International Journal of Intelligent Systems, 15(11), 1061-1085. Medina, J. M., Pons, O., & Vila, M. A. (1994). GEFRED: A generalized model of fuzzy relational databases. Information Sciences, 76(1-2), 87-109. Melton, J., & Simon, A. R. (2002). SQL:1999: Understanding relational language components. Morgan Kaufmann. Mouaddib, N., &, Subtil, P. (1997). Management of uncertainty and vagueness in databases: The FIRMS point of view. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems, 5(4), 437-457. Na, S., & Park, S. (1997). Fuzzy object-oriented data model and fuzzy association algebra. In R. de Caluwe (Ed.), Fuzzy and uncertain object-oriented databases: Concepts and models. Singapore: World Scientific. Petry, F. E. (1996). Fuzzy databases: Principles and applications. Boston: Kluwer Academic Publishers. Prade, H., & Testemale, C. (1984). Generalizing database relational algebra for the treatment of incomplete or uncertain information and vague queries. Information Sciences, 34, 115-143. 51

An Overview of Fuzzy Approaches

Prade, H., & Testemale, C. (1987). Representation of soft constraints and fuzzy attribute values by means of possibility distributions in databases. In J. C. Bezdek (Ed.), Analysis of fuzzy information. Boca Raton, FL: CRC Press. Ramakrishnan, R., & Gehrke, J. (2000). Database management systems. McGraw-Hill. Reiter, R. (1980). A logic for default reasoning. Artificial Intelligence, 13(1), 81-132. Rocacher, D., & Connan, F. (1996). A ���������������� fuzzy algebra for object oriented databases. In Proceedings of the 4th European Congress on Intelligent Techniques and Soft Computing (EUFIT’96) 2 (pp. 871-876), Aachen, Germany. Rossazza, J.-P. (1990). Utilisation de hiérarchies de classes floues pour la représentation de connaissances imprécises et sujettes à exception: Le système “SORCIER.” PhD Thesis, Université Paul Sebatier, Toulouse, France. Rossazza, J.-P., Dubois, D., & Prade, H. (1997). A hierarchical model of fuzzy classes. In R. de Caluwe (Ed.), Fuzzy and uncertain object-oriented databases: Concepts and models (pp. 21-61). Singapore: World Scientific. Shaw, G. M., & Zdonik, S. B. (1990). A query algebra for object-oriented databases. In Proceedings of the 6th International Conference on Data Engineering (ICDE’90) (pp. 154-162), Los Angeles, CA. Tahani, V. (1977). A conceptual framework for fuzzy query processing: A step toward very intelligent database systems. Information Processing and Management, 13, 289-303.

Tanaka, K., Kobayashi, S., & Sakanoue, T. (1991). Uncertainty management in object-oriented database systems. In Proceedings of the International Conference on Database and Expert System Applications (DEXA’91) (pp. 251-256). Berlin: Springer-Verlag. Umano, M. (1982). FREEDOM-0: A fuzzy database system. In M. Gupta & E. Sanchez (Eds.), Fuzzy information and decision processes (pp. 339-347). Amsterdam: North-Holland. Umano, M., & Fukami, S. (1994). Fuzzy relational algebra for possibility-distribution-fuzzy relational model of fuzzy data. Journal of Intelligent Information Systems, 3, 7-27. Van Gyseghem, N. (1998). Imprecision and uncertainty in the UFO database model. Journal of the American Society for Information Science, 49(3), 236-252. Vila, M. A., Cubero, J.-C., Medina, J.-M., & Pons, O. (1997). Using OWA operator in flexible query processing. In R. R. Yager & J. Kacprzyk (Eds.), The ordered weighted averaging operators: Theory and applications (pp. 258-274)����������������� . Boston: Kluwer Academic Publishers. Yager, R. R. (1991). Fuzzy quotient operators for fuzzy relational databases. In Proceedings of the International Fuzzy Engineering Symposium (IFES’91) (pp. 289-296), Yokohama, Japan. Yager, R. R. (1994). Interpreting linguistically quantified propositions. International Journal of Intelligent Systems, 9, 541-569. Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338-353.

Takahashi, Y. (1993). Fuzzy database query languages and their relational completeness theorem. IEEE Transactions on Knowledge and Data Engineering, 5, 122-125.

Zadeh, L. A. (1975). The ���������������������������� concept of a linguistic variable and its application to approximate reasoning (parts I, II, and III). Information Sciences, 8, 199��������������� -�������������� 251, 301������ -����� 357; 9, 43���� -��� 80.

Takahashi, Y. (1995). A fuzzy query language for relational databases. In P. Bosc & J. Kacprzyk (Eds.), Fuzziness in database management systems (pp. 365-384). Heidelberg, Germany: Physica-Verlag.

Zadeh, L. A. (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems, 1(1), 3-28.

52

An Overview of Fuzzy Approaches

Zadeh, L. A. (1983). ���������������������������� A computational approach to fuzzy quantifiers in natural languages. Computational Mathematics Applications, 9, 149-184. Zadrożny, S. (2005). Bipolar queries revisited. In V. Torra, Y. Narukawa, & S. Miyamoto (Eds.), Modelling decisions for artificial intelligence (MDAI 2005) (LNAI 3558, pp. 387-398). Berlin: Springer-Verlag. Zadro������������������������������������������ ż����������������������������������������� ny, S., & Kacprzyk, J. (1998). Implement���������� ing fuzzy querying via the Internet/WWW: Java applets, ActiveX controls and cookies. ��� In Flexible query answering systems (pp. 382-392). Heidelberg: Springer-Verlag. Zadrożny, S., & Kacprzyk, J. (2002). Fuzzy ������������ querying of relational databases: A fuzzy logic view. In Proceedings of the EUROFUSE Workshop on Information Systems (pp. 153���������������������� -��������������������� 158), Varenna, Italy. Zemankova-Leech, M., & Kandel, A. (1984). Fuzzy relational databases: A key to expert systems. Cologne, Germany: Verlag TÜV Rheinland.

Key Terms Database: A collection of persistent data. In a database, data are modeled in accordance with a database model. This model defines the structure of the data, the constraints for integrity and security, and the behavior of the data. Fuzzy Database: In a regular database, only crisp (perfectly described) data are stored. However, due to imprecision, vagueness, uncertainty, incompleteness, or ambiguities, a lot of data are in the real world available in an imperfect form only. Fuzzy databases intend to grasp imperfect information about a modeled part of the world and represent it directly, as accurate as possible, in a database. The two leading approaches to the representation of imperfect information in databases are the possibilistic approach and the similarity relation based approach. Fuzzy Preferences Between Query Conditions: The introduction of fuzzy preferences in

fuzzy querying can also be done between query conditions. These kinds of preferences are expressed via grades of importance, usually called weights. Different weights are then assigned to particular conditions indicating that the satisfaction of some query conditions is more desirable than the satisfaction of others. Fuzzy Preferences Inside Query Conditions: In fuzzy querying, the introduction of fuzzy preferences in queries can be done inside the query conditions via flexible search criteria and allow to express that some values are more desirable than others in a gradual way. Fuzzy Querying: Searching for data in a database is called querying. Modern database systems offer/provide a query language to support querying. Relational databases are usually queried using SQL (Structured Query Language), and object-oriented ODMG databases are queried using OQL (Object Query Language). Traditional database querying can be enhanced by introducing fuzzy preferences and/or fuzzy conditions in the queries. This is called fuzzy querying. Object-Oriented Database: An object-oriented database is a database that is modeled in accordance with an object-oriented database model. In an object-oriented database model, the data are structured in classes, which also embody the behavior of the data. Classes are constructed in the spirit of the object-oriented programming paradigm and are as such closely connected to an object-oriented programming language. The best known object-oriented database model is the ODMG model. Relational Database: A relational database is a database that is modeled in accordance with the relational database model. In the relational database model, the data are structured in relations that are represented by tables. The behavior of the data is defined in terms of the relational algebra, which originally consists of eight operators (union, intersection, division, cross product, join, selection, projection, and division), or in terms of the relational calculus, which is of a declarative nature. 53

An Overview of Fuzzy Approaches

Possibilistic Fuzzy Database Approach: In the possibilistic fuzzy database approach, imprecision in the value of an attribute is modeled via a possibilistic distribution on the domain of this attribute. This calls for the use of necessity and possibility measures in database querying.

54

Similarity Relation Based Fuzzy Database Approach: In the similarity relation based fuzzy database approach, query results are allowed to contain not only data that exactly satisfy the search conditions but also data that are similar to these data. For this reason, the attribute domains have to be equipped with a similarity relation. (A similarity relation is a fuzzy binary relation whose membership function expresses the similarity degree between the pairs of the domain elements.)

Suggest Documents