Fuzzy Dimension To Databases

3 downloads 0 Views 111KB Size Report
Department of Computer Science, Hans Raj College,. Delhi University, Delhi – 110 007. Email. : [email protected]. Address. : 19, M. S. Flats, Type ...
Fuzzy Dimension To Databases Punam Bedi, Harmeet Kaur, Ankit Malhotra

Abstract Traditional databases handle data, which is crisp, deterministic and precise in nature. However our reasoning and decision-making process is uncertain and vague in nature. This paper gives an insight into the world of uncertainty. The concept of fuzziness in databases and the ways of handling the fuzzy queries to databases / fuzzy databases are explained in this paper. We have proposed two models by which uncertainty can be handled in databases. The first model deals with fuzzy query to a crisp database while the second model deals with storage and retrieval of fuzzy information in database. A prototype of both the models has also been implemented in JAVA. Keywords: Fuzzy Logic, Fuzzy Sets, Fuzzy Relational Databases, Fuzzy SQL

1.

Uncertainty: A Modern Outlook

Most of our traditional tools for formal modeling, reasoning and computing are crisp, deterministic and precise in nature. Precision assumes that the parameters of a model represent exactly either our perception of the phenomenon modeled or the features of the real system that has been modeled. Certainty eventually indicates that we assume the structures and parameters of the model to be definitely known. However, if the model or theory asserts factuality, then the modeling language has to be suited to model the characteristics of the situation under study appropriately. However we have a problem. For factual models or modeling languages, two major complications arise: 1. Real situations are very often not crisp and deterministic and cannot be described precisely i.e. real situations are very often uncertain or vague in a number of ways. 2. Complete description of a real system would require far more detailed data than a human being could ever recognize and process simultaneously. Hence, among the various paradigmatic changes in science and mathematics in last century, one such has been the concern of the concept of uncertainty. In science this change is manifested by a gradual transition, from a view, which

stated that uncertainty is undesirable to an alternative view that accepts uncertainty as an integral part of the whole system that is essential to model the real world. There are three basic types of uncertainties discussed in literature as 1. Fuzziness

2. Discord

: Lack of definite or sharp distinctions. The alternate terms used for it are q Vagueness q Cloudiness q Haziness : Disagreement in choosing among several alternatives. The synonyms for it are q Dissonance q Incongruity q Discrepancy

3. Nonspecificity : Two or more alternatives are left unspecified. The synonyms for it are q Variety q Generality q Diversity The last two types of uncertainties can be classified as a higher uncertainty type, ambiguity, which means any situation in which it remains unclear which of several alternatives should be accepted as the genuine one. In general, ambiguity results from lack of certain distinctions characterizing an object, from conflicting distinctions or from both of these. Our paper deals with implementation of fuzziness in databases. 2. 2.1

Fuzzy Sets : Basic Concepts Introduction

An important point in the evolution of modern concept of uncertainty was the publication of a seminal paper by Lofti A Zadeh [17], in which Zadeh introduced a theory whose objects fuzzy sets are sets with boundaries that are not precise and the membership in this fuzzy set is not a matter of true or false, but rather a matter of degree. This concept was called Fuzziness and the theory was called Fuzzy Set Theory.

Fuzziness can be defined as the vagueness concerning the semantic meaning of events, phenomenon or statements themselves. It is particularly frequent in all areas in which human judgment, evaluation and decisions are important. As an example consider a student record database system. Supposing we want to find bright and young students in the whole batch. For a crisp system we would specify the query as PROJECT (Student_Name) WHERE 19 ≤ AGE ≤ 23 and 3 ≤ GPA ≤ 4 But this system has a major flaw. Consider a student, Krishna whose age is 24 and has a good GPA of 4 out of 4. He should have been selected but is not. It is because of the rigid boundary conditions set by the normal crisp logic. In fuzzy logic we would do the same by specifying two fuzzy sets YOUNG and GPA

1

0 17

1

19

23

25

0 3

(a)

3.5

4 (b)

Fig. 1 : (a) Age ; (b) GPA

and each student will have some membership grade associated with the two sets. So according to our definition Krishna will have a non–zero membership grade although it will be less than other students in the age group 19-23. Hence even Krishna will be included in the result set to be considered as Krishna also satisfies the query to some extent, which is represented by its membership grade. Definition: When A is a fuzzy set and x is a relevant object, the proposition “x is a member of A” is not necessarily either true or false, as required by the two-valued logic, but it may be true only to some degree, the degree to which x is actually a member of A, is a real number in the interval [0, 1

Theoretically, if X is a collection of objects denoted generically by x, then a fuzzy set F in X is a set of ordered pairs, F = {(x, µ F(x))|x ε X}, µ F(x) is called the membership function (or grade of membership) of x in F that maps X to the membership space M. The range of the membership function is a subset of the nonnegative real numbers whose supremum is finite[10]. Fuzzy Set Operators and Fuzzy Logic For crisp sets, the basic operations are, namely, q q q

Union, OR Intersection, AND Complement, NOT

As an analogy, for fuzzy sets we define fuzzy operators that allow us to manipulate the fuzzy sets. We similarly have fuzzy complements, intersection and union operators but they are not uniquely defined i.e. as membership functions, they are also context – dependent [5]. However an important dissimilarity exists there between traditional set / logic and fuzzy set theory. Traditionally there is a distinction between a union operation of sets and OR of logic as is the case with intersection and AND also. But in fuzzy theory there is no such distinction between the logical and set operators [10] i.e. Fuzzy union ≡ Fuzzy OR Fuzzy intersection ≡ Fuzzy AND Fuzzy complement ≡ Fuzzy NOT We define some standard fuzzy operations as : q

q

q

Fuzzy Complement, ~A(x) = 1 - A(x) Fuzzy Union, (A∪B)(x) = max[A(x), B(x)]. Fuzzy Intersection, (A∩B)(x) = min[A(x), B(x)].

More information regarding fuzzy operators and their properties can be found in [5], [10]. 3. 3.1

Fuzzy Databases Need For Fuzzy Databases

As the application of database technology moves outside the realm of a crisp mathematical world to the realm of the real world, the need to handle imprecise information becomes important, because a database that can handle imprecise information shall store not only raw data but also related information that shall allow us to interpret the data in a much deeper context, e.g. a query “Which student is young and has sufficiently good grades?” captures the real intention of the user’s query than a crisp query as SELECT * FROM STUDENT WHERE AGE < 19 AND GPA > 3.5 Such a technology has wide applications in areas such as medical diagnosis, employment, investment etc. because in such areas subjective and uncertain information is not only common but also very important. 3.2

Techniques for implementation of Fuzziness in Databases

One of the major concerns in the design and implementation of fuzzy databases is efficiency i.e. these systems must be fast enough to make interaction with the human users feasible. In general, we have two feasible ways to incorporate fuzziness in databases: 1. Making fuzzy queries to the classical databases (discussed in section 4). 2. Adding fuzzy information to the system (discussed in section 5). 3.3

Classification of Data

The information data can be classified as following : 1. Crisp : There is no vagueness in the information. e.g., X = 13 Temperature = 90° 2. Fuzzy : There is vagueness in the information and this can be further divided into two types as

a. Approximate Value : The information data is not totally vague and there is some approximate value, which is known and the data, lies near that value. e.g., 10 X 15 Temperature 85° These are considered have a triangular shaped possibility distribution as shown below

1

0 -d X +d ( APPROX X ) Fig. 2: Possibility Distribution for an approximate value

The parameter, d gives the range around which the information value lies. b. Linguistic Variable: A linguistic variable is a variable that apart from representing a fuzzy number also represents linguistic concepts interpreted in a particular context. Each linguistic variable is defined in terms of a variable which either has a physical interpretation (speed, weight etc.) or any other numerical variable (salary, absences, gpa etc.) A linguistic variable is fully characterized by a quintuple where, v - is the name of the linguistic variable. T - is the set of linguistic terms that apply to this variable. X - is the universal set of the values of X. g - is a grammar for generating the linguistic terms. m - is a semantic rule that assigns to each term t ε T, a fuzzy set on X. The information in this case is totally vague and we associate a fuzzy set with the information. A linguistic term is the name given to the fuzzy set. e.g.,

X is SMALL Temperature is HOT

These are considered have a trapezoidal shaped possibility distribution as shown below SMALL 1

0 α

β

ã

ä

Fig. 3: Possibility Distribution for a Linguistic Term SMALL for the Linguistic Variable HEIGHT

There are four parameters associated with a linguistic term as α, β, ã and ä as shown in the Fig. 3. For the range [β , γ] the membership value is 1.0, while for the range [α , β] and [γ , δ] the membership value remains between [0.0, 1.0]. 4. 4.1

Fuzzy Querying to Relational Databases The proposed model

The easiest way of introducing fuzziness in the database model is to use classical relational databases and formulate a front end to it that shall allow fuzzy querying to the database. A limitation imposed on the system is that because we are not extending the database model nor are we defining a new model in any way, the underlying database model is crisp and hence the fuzziness can only be incorporated in the query. To incorporate fuzziness we introduce fuzzy sets / linguistic terms on the attribute domains / linguistic variables e.g. on the attribute domain AGE we may define fuzzy sets as YOUNG, MIDDLE and OLD. These are defined as the following:

1

0

Young

αY

βY

Middle

γY,αM δY,βM

Fig. 4 : Age

γΜ,αΟ δΜ,βO

Old

γO

δO

For this we take the example of a student database which has a table STUDENTS with the following attributes: a. Name

b. Age

Name Ankit Anuj Sumit Rahul Bishop Neha Malini Rocky Sandeep Nagesh

c. Course

Age 19 17 18 19 19 18 17 16 19 19

b. Percentage

Course 12 10 11 12 12 11 10 9 12 12

c. Absences

Percentage 83 80 83 56 65 77 69 79 75 83

Absences 13 9 6 12 32 23 10 13 6 6

Fig. 5 : A snapshot of the data existing in the database.

4.2

Meta Knowledge

At the level of meta knowledge we need to add only a single table, LABELS with the following structure: LABELS Label

Column_Name

Alpha

Beta

Gamma

Delta

Fig. 6 : Meta Knowledge

This table is used to store the information of all the fuzzy sets defined on all the attribute domains. A description of each column in this table is as follows: • • • 4.3

Label: This is the primary key of this table and stores the linguistic term associated with the fuzzy set. Column_Name: Stores the linguistic variable associated with the given linguistic term. Alpha, Beta, Gamma, Delta: Stores the range of the fuzzy set as shown in Fig. 3 above. Implementation:

The main issue in the implementation of this system is the parsing of the input fuzzy query.

As the underlying database is crisp, i.e. no fuzzy data is stored in the database, the INSERT query will not change and need not be parsed therefore it can be presented to the database as it is. During parsing the query is parsed and divided into the following 1. Query Type: Whether the query is a SELECT, DELETE or UPDATE. 2. Result Attributes: The attributes that are to be displayed used only in the case of the SELECT query. 3. Source Tables: The tables on which the query is to be applied. 4. Conditions: The conditions that have to be specified before the operation is performed. It is further sub-divided into Query Attributes (i.e. the attributes on which the query is to be applied) and the linguistic term. If the condition is not fuzzy i.e. it does not contain a linguistic term then it need not be subdivided. The implementation of the proposed system has been done in JAVA using a MySQL database as the backend and the mm.mysql.jdbc-1.2c type 3 JDBC driver. 5. 5.1

Fuzzy Extension to Relational Databases The proposed model

In the previous section, we have discussed how vague queries can be used on relational databases. We now present the design of a Fuzzy Relational Database in which not only the fuzzy queries can be applied rather fuzzy information can also be stored in it. Considering the same database as given in section 4.1, with the difference that, now the attributes AGE, PERCENTAGE and ABSENCES can have fuzzy information and the remaining are considered to be crisp. Based on the information data classification, the attributes in the database are defined to be two types: • Type 1 : The attribute can store only crisp values. • Type 2 : The attribute is fuzzy and can take either a crisp value, an approximate value or a linguistic term. Name Ankit Anuj Sumit Rahul

Age OLD MIDDLE 18 OLD

Course 12 10 11 12

Percentage GOOD 80 83 BAD

Absences 13 9 LOW 12

Bishop Neha Malini Rocky Sandeep Nagesh

19 18 17 MIDDLE APPROX 19 APPROX 19

12 11 10 9 12 12

65 AVERAGE 69 79 APPROX 75 83

HIGH 23 10 13 APPROX 6 APPROX 6

Fig. 7: Snapshot of the data existing in the database.

5.2

Metadata

In this case, at the level of Meta knowledge we require three tables as discussed below. 1. COLUMNS_IN_DB Column_Name

Type

Fig. 8. (a): Part of Meta Knowledge

This table stores the types (section 5.1) of all the attributes in the table. A description of each column in this table is as follows: • Column_Name: This is the primary key of this table and its tuples correspond to the attributes in the table, STUDENTS. • Type: This stores the type of the corresponding attribute and this can have two values, namely, 1 and 2 for the two types as mentioned in section 5.1 (crisp and fuzzy). 2. APPROXIMATE_VALUES_TABLE Column_Name

Margin

Fig. 8 (b): Part of Meta Knowledge

This table stores the parameter, d as shown in Fig. 2. A description of each column in this table is as follows: • Column_Name: This is a foreign key here and corresponds to COLUMNS_IN_DB. • Margin: This corresponds to the parameter, d. 3. LABELS Label

Column Name

Alpha

Beta

Gamma

Fig. 8 (c): Part of Meta Knowledge

Delta

This table is used to store the information of all the fuzzy sets defined on all the attribute domains, along with there parameters, α, β, ã and ä, as shown in Fig. 3. A description of each column in this table is as follows: • Label: This is the primary key of this table and stores the linguistic term associated with the fuzzy set. • Column_Name: This is a foreign key here and corresponds to COLUMNS_IN_DB. • Alpha / Beta / Gamma / Delta: These correspond to the parameters α, β, ã and ä. 5.3

Implementation

Here again the main issue in the implementation of this system is the parsing of the input fuzzy query. During parsing the query is parsed and divided into the following 1. Query Type: Whether the query is an INSERT, SELECT, DELETE or UPDATE. 2. Result Attributes: The attributes that are to be displayed used only in the case of the SELECT query. 3. Source Tables: The tables on which the query is to be applied. 4. Conditions: The conditions that have to be specified before the operation is performed. It is further sub-divided into Query Attributes (i.e. the attributes on which the query is to be applied) or the linguistic term or the approximate value. If the condition is not fuzzy i.e. it does not contain a linguistic term then it need not be subdivided. The implementation of the proposed system has been done in JAVA using a MySQL database as the backend and the mm.mysql.jdbc-1.2c type 3 JDBC driver. 6.

Query Language:

The syntax of the query language remains the same for both the models and is defined as follows: SELECT The syntax of the SELECT statement is as follows SELECT [, …] FROM [, …] [WHERE [ …]]

[THOLD relational operator x | #x] where the CONDITION would be defined as follows 1. ATTRIBUTE relational operator CONSTANT 2. ATTRIBUTE1 relational operator ATTRIBUTE2 (both attributes should be compatible). 3. ATTRIBUTE is|are TERM (where both attribute and the linguistic term (defined in section 3.3) should be compatible). And CON is a connective that is used to combine two conditions e.g. OR, AND etc. e.g. SELECT NAME FROM STUDENTS WHERE PERCENTAGE > 85 AND ABSENCES ARE LOW THOLD specifies the alpha cut [5][10] that is to be applied to the result set. #x specifies the number of records that need to be returned. If x is equal to 0 all entries are considered to be a part of the result and their corresponding membership values are returned along with the entries. 7. 7.1

Data Manipulation Language Data Manipulation Language for the first model

The type of operations and their syntax that we shall allow to the database are: a. INSERT This is the same as specified in SQL and has the following structure, INSERT INTO VALUES (, ….) e.g. INSERT INTO STUDENTS VALUES (“Ankit”, 19,12,85,10) b. DELETE The structure of the DELETE statement is DELETE FROM
[WHERE [ …]]

where CONDITION and CON is defined the same as in section 6 above. e.g. DELETE FROM STUDENTS WHERE PERCENTAGE > 85 AND ABSENCES ARE LOW c. UPDATE The structure of the DELETE statement is UPDATE
SET VALUES = [, = …] [WHERE [ …]] where CONDITION and CON is defined the same as defined in section 6. e.g. UPDATE STUDENTS SET PERCENTAGE = 85 WHERE PERCENTAGE < 85 AND ABSENCES ARE LOW This is a minimal set of operators and more can be added if the need arises. 7.2

Data Manipulation Language for the second model

The type of operations and their syntax that we shall allow to the database are: a. INSERT In the INSERT statement an approximate value can also be inserted into the database. The syntax of the INSERT statement is given below INSERT INTO TABLE VALUES ( , ...) Where expression can be an approximate value as well as a linguistic term. e.g. INSERT INTO STUDENTS VALUES (“Ankit”, APPROX 19,12,85,about 10) b. DELETE The structure of the DELETE statement is

DELETE FROM
[WHERE [ …]] where CONDITION and CON is defined the same as in Section 6. e.g. DELETE FROM STUDENTS WHERE PERCENTAGE > APPROX 85 AND ABSENCES ARE LOW c. UPDATE The structure of the DELETE statement is UPDATE
SET VALUES = [, = …] [WHERE [ …]] Where expression can be an approximate value as well as a linguistic term. Also here CONDITION and CON is defined the same as in Section 6. e.g. UPDATE STUDENTS SET (PERCENTAGE = GOOD) WHERE PERCENTAGE < 85 AND ABSENCES ARE LOW This is a minimal set of operators and more can be added if the need arises. 8. 8.1

Conclusion and Future Research Conclusion

We have designed and implemented two models for incorporating fuzziness in databases. Fuzzy query of database is discussed in first model (section 4) and storage of fuzziness is discussed in second model (section 5). The fuzziness is in the form of approximate values or linguistic variables, which can be used only in queries in the first model but can be used as attribute values in the second model. We have successfully implemented both the models in JAVA. Few screen shots are available at www.tsucorp.net/ankit. Even though second model is more flexible, but fuzzy databases are still not very much in use because people are reluctant to replace their crisp data by fuzzy data before they are convinced that it is worthwhile or necessary to do so. From this

point of view, the first model scores over the second model as it can be used with crisp data and also it is making use of the power of fuzzy theory. 8.2

Future Research

A limitation in the proposed model of section 5 is that the handling of the linguistic variables has not been complete. We have yet to implement linguistic hedges i.e. modifiers such as ‘very’, which on being applied to a linguistic term changes it’s semantic meaning. In future we will work to overcome this limitation.

References [1]

Bosc P., Liétard I and Pivert O. “Evaluation of flexible queries : The quantified statement case”, Technologies for Constructing Intelligent Systems I, Physica-Verlag Heidelberg New York, pp 337-350 (2002)

[2]

Chiang D., Chow L. R. and Hsien N. “Fuzzy information in extended fuzzy relational databases”, Fuzzy Sets and Systems 92, pp.1-10. (1997)

[3]

Cao T. H., Rossiter J. M., Martin T. P. and Baldwin J. F. “ On the implementation of Fril++ for object-oriented logic programminhg with uncertainty and fuzziness”, Technologies for Constructing Intelligent Systems II, Physica-Verlag Heidelberg New York, pp 393-406 (2002)

[4]

Kaushik S, Nanda H, “Web Based Access of Relational Databases Using Fuzzy Natural Language Queries”, International Conference on Cognitive Systems, Delhi, India (1999).

[5]

Klir G. J. and Yuan B. [2001], “Fuzzy Sets and Fuzzy Logic : Theory and Applications”, Prentice Hall, Inc. Englewood Cliffs, N. J., U.S.A.

[6]

Medina J. M., Pons O., Vila M.A. “GEFRED, A Generalized Model of Fuzzy Relational Databases”. Information Sciences, 76, 1-2, pp 87-109. (1994)

[7]

Yang Q., Zhang W., Liu C., Wu J., Nakajima H. and Rishe N.D. “Efficient Processing of Nested Fuzzy SQL Queries in a Fuzzy Database”, IEEE Trans. On Knowledge and Data Eng., vol. 13, no. 6, pp. 884-901, Nov/Dec 2001

[8]

Zadeh, L.A. “Fuzzy Sets.” Information and Control, 8(3), pp. 338-353. (1965)

[9]

Zadeh L. A. “A new direction in AI : Toward a computational theory of perceptions”, Technologies for Constructing Intelligent Systems I, Physica-Verlag Heidelberg New York, pp 3-20 (2002)

[10]

Zimmerman J. [2001], “Fuzzy Set Theory – And It’s Applications”, Kluwer Academic Publishers, Norwell, Massachusetts, U.S.A.

Profile of the authors Name Qualification Age Gender Institution

: : : : :

Email : Address : Contact Number : Name Qualification Age Gender Institution

: : : : :

Email : Address : Contact Number : Name Qualification Age Gender Institution

: : : : :

Email Address

: :

Contact Number :

Dr. (Ms.) Punam Bedi M.Tech.(Computer Science), Ph.D (Computer Science) 40 years Female Department of Computer Science, Delhi University – 110 007. [email protected] J - 35, Kirti Nagar, New Delhi -110 015 011-7667591, 011-5446576® Ms. Harmeet Kaur M.C.A. 29 years Female Department of Computer Science, Hans Raj College, Delhi University, Delhi – 110 007 [email protected] 19, M. S. Flats, Type – III, Timarpur, Delhi - 110 054 011-7667545/46, 011-3817894® Mr. Ankit Malhotra Student of Bachelor of Information Science 20 years Male Department of Computer Science, Hans Raj College, Delhi University, Delhi - 110007 [email protected] 102, Priya Enclave, Near Karkardooma Courts, Delhi – 110 092 011-2370840®, 011-2379871®