1
A method to build similarity relations into extended Rough Set Theory Yaima Filiberto1 , Rafael Bello2 , Yaile Caballero1 , Rafael Larrua1
1
Artifial Intelligence Investigations Group, Universidad de Camagüey, Camagüey, Cuba,
[email protected], yailec@yahoo,
[email protected] Abstract— In this paper we propose a method to build similarity relations into extended Rough Set Theory. Similarity is estimated using ideas from Granular computing and Case-base reasoning. A new measure is introduced in order to compute the quality of the similarity relation. This work presents a study of a case of a similarity relation based on a global similarity function between two objects, this function includes the weights for each feature and local functions to calculate how the values of a given feature are similar. This approach was proved in the function approximation problem. Promissory results are obtained in several experiments. Index Terms— Rough set theory, similarity relations, function approximation
I. I NTRODUCTION HE philosophy of rough sets is based on the supposition that every object of a universe U has associated a certain quantity of information (data and knowledge), expressed by means of some attributes (features). These attributes are used to describe objects. Rough sets were presented by Professor Zdzislaw Pawlak in a seminal paper published in 1982 [1]. The basic concepts of Rough set theory (RST) are de information system and approximations of sets. The whole knowledge about the domain is contained in the set of objects, called Information System. Formally, let A = {A1 , A2 , ..., An } be a set of attributes and a non-empty set U , called universe, of objects (examples, cases, entities, situations, and alike). Each object is described by means of attributes Ai in A. The pair (U, A)Sis called an information system (IS). A decision system (U, A d, where d ∈ / A) is obtained when a new attribute d, called decision attribute, is added for each object in U . A simple idea of rough sets is the following: objects having exactly the same values of condition attributes are indiscernible using these attributes. This indiscernibility relation is the mathematical basis of Rough Sets Theory (RST). Such relation induces a partition of the universe U in equivalence classes. Any subset X ⊆ U can be expressed exactly or approximately in terms of these sets by using two crisp sets called lower approximation and upper approximation. Indiscernibility is an essential concept and also the starting point of RST. Two objects in a decision table are indiscernible if we can not discriminate among them by using a given set of attributes. For any subset B of A(B ⊆ A) there is associated a binary indiscernibility relation IB which holds all pairs of indiscernible objects according to B.
T
2
Center of Studies on Informatics, Universidad Central de Las Villas, Villa Clara, Cuba,
[email protected] In the classical RST this B-indiscernibility relation is an equivalence relation, where B ⊆ A; that is a binary relation R ⊆ U xU which is reflexive (an object x is related to itself, xRx), symmetric (if x Ry then yRx) and transitive (if xRy and yRz then xRz). The RST can be generalized in several directions. An alternative is using non-equivalence binary relations. This yields some extensions to the classical RST (extended RST) such as [2], [3], [4], [5] and [6]. In this paper, we proposed a method to build similarity relations, it is also useful in the case of decision systems whose decision feature is continuous. A new measure is introduced in order to compute the quality of the relation. The case of a similarity relation based on a global similarity function is studied; this function includes the weights for each feature and local functions to calculate how are similar the values of a given feature. A method to calculate the weights is proposed, based on this approach. We discuss the approximation of a continuous function when some data about the function is given, and showing this set of weights allows improving the performance of the k-NN and the Multi-Layer Perceptron when they are used to implement the function approximation. The set of weights is used to build the similarity function in the k-NN method; and to make the weight initialization in Multi-Layer Perceptrons (MLP) trained with back propagation (BP), taking into account BP training suffers from been very sensitive to initial conditions. II. E XTENDED ROUGH SETS THEORY The early definition of rough sets focuses on a relation R that defines the inseparability among objects having the same values for a subset B of features. Defined in this way, R is an equivalence relation. A generalization of the classical rough set approach is specified by the replacement of the binary inseparability relation with a binary weaker similarity relation. The equivalence relation R is very restrictive in many cases. Slight differences among objects are not meaningful when building the equivalence classes; this is especially important in numerical attributes, where small differences in the attribute values could not be important in solving problems. Using a similarity relation instead of an equivalence relation is pertinent in this situation. The purpose is to extend the inseparability relation R so those objects which are not identical but closer (similar) enough are located in the same group. Similarity relations do not generate partitions on universe U , but these yield similarity classes for any object x ∈ U . The
2
similarity class of x, according to the similarity relation R is denoted by R(x) and defined as R(x) = {y ∈ U : yRx}. It is read as “the set of elements in U that are similar to x according to R”. This is a reflexive relation. Other alternative is using a tolerance relation [7] and [8]. When relation R ⊆ XxU is reflexive (xRx) for any x ∈ U and symmetric (xRy ⇒ yRx) for any pair , it is called “tolerance relation”. An example of extension of RST based in similarity relation was presented by R. Slowinski and D. Vanderpooten [2], also see [3]. The equivalence relations induce partitions of the universe U , while the similarity relations generate a covering of the universe. A covering of universe U is a family of nonempty subsets of U where their union is equal to U , and it is possible a nonempty overlap of two subsets. A partition of U is a covering of U , so the concept of a covering is an extension (generalization) of the concept of partition. Many studies have been developed to study covering approximation space in the Rough set theory, such as [4], [9] and [5]; some oriented to build a covering-based generalized RST. Therefore, whilst in the classic RST to build the indiscernibility relation R is not a problem because any subset B of A (B ⊆ A) induces a relation (xRy if f (x, Ai ) = f (x, Ai ) f or all Ai ∈ B, where f (x, Ai ) denotes the value of f eature Ai in object x); it is more complex to find an adequate similarity relation to each Decision system. However, there are a limited number of studies in this subject. In [10], the author proposes to induce a similarity relation by a covering; in this case, given a covering C = {C1 , C2 , . . . , Cn }, the similarity relation R is defined by 1. This relation is reflexive and symmetric, but in general is not transitive.
expressed as “the more similar is the problem description according to features in A, the more similar is the outcome according to the decision feature d”. The main concept of Granular computing is the granule [13]; a granule is a chunk of knowledge made of different objects “drawn together by indistinguishability, similarity or proximity” [14]. The universe can be divided into a series of granules, with each granule being an object set assembled via an indiscernibility relation (a similarity relation) R [15]. The quotient set U/R contains groups of similar objects which are granules of knowledge representation, they form basic granule of knowledge about the universe. A family of granules that contains every object in the universe is called a granulation of the universe. Granulation of a universe of discourse is one of the important aspects whose solution has significant bearing on granular computing. The granulation depends on the indiscernibility relation. Two commonly used granulations of a universe are partitions and coverings, yield usually by equivalence relations and similarity relations. The concepts of Conditional Granules (CG) and Decision Granules (DG) are introduced to name the granularity of the universe according to the features in A and the decision feature respectively, and the relations between these sets of granules are studied in [16]. Our proposal joints the granulations of the universe (CG and DG) with the CBR principle in order to formulate a heuristic to build the similarity relations which induce these granulations. This heuristic is called quality of similarity. Let the similarity relations R1 and R2 defined on U . R1 generates granules of objects according to the features in A, and R2 yields granules according to the decision feature. For all objects x and y in U:
xRy if ∃Ci ∈ C : x, y ∈ Ci
(1)
xR1y if and only if F 1(x, y) ≥ e1
(2)
Next, a method to build similarity relations is presented; the objective is the similarity among two objects according to the features in A approaches as much as possible the similarity among the objects according to the decision feature d.
xR2y if and only if F 2(x, y) ≥ e2
(3)
III. T HE QUALITY OF SIMILARITY MEASURE IN THE ROUGH S ET T HEORY Let the decision system DS = (U, A + {d}), where the domains of features in A and the domain of the decision feature d are discrete or continuous. The method proposed in this paper is based on a new measure called quality of similarity; this measure computes the relation between the similarity according to features in A and the similarity according to the decision feature d. This measure is based on two tenets about the universe of objects. The first one comes from the Granular computing and the second one from the Case-based reasoning (CBR). Case-based problem solving can be seen as exploiting the relationship between two different types of similarity; these types of similarity apply to two different spaces, the space of problem descriptions and the space of problem solutions [11] and [12]. From a right way of describing problem, similar problems will have similar solutions. CBR principle can be
Where F 1 and F 2 are similarity functions to compare objects in U ; F 1 includes features in A and F 2 computes the similarity degree between two objects according to the value of the decision feature d; e1 and e2 are thresholds. Let the sets N 1(x) and N 2(x) for all x in U be defined by expression 4 and 5; N 1(x) and N 2(x) of x are the neighbourhoods of object x according to the relations R1 and R2 respectively: N 1 = {y ∈ U : xR1y}
(4)
N 2 = {y ∈ U : xR2y}
(5)
Taking into account the idea from CBR “similar problems have similar solutions” the granules N 1(x) and N 2(x) must be similar, denoted by N 1(x) ≈ N 2(x). The grade of similarity between these granules is defined by expression 6, this establishes a relation between the quantity of common objects and the quantity of objects in N 1(x) plus the quantity of objects in N 2(x), if N 1(x) = N 2(x) then γ(x) is equal
3
to 1, if N 1(x) and N 2(x) are completely different then γ(x) is equal to 0. γ(x) =
|N 1(x) ∩ N 2(x)| 0.5 ∗ |N 1(x)| + 0.5 ∗ |N 2(x)|
0 ≤ γ(x) ≤ 1
(6) The grade of similarity between the conditional granules and the decision granules generated by R1 and R2 respectively is defined by expression 7: X γ(x) θ(DS) = ∀x∈U (7) |U | This measure represents the degree in which the similarity among objects using de features in A approaches the similarity according to the decision feature d. Then, the problem is to find the relations R1 and R2 that maximize the expression 7, that is: X γ(x) (8) M ax → ∀x∈U |U |
If the decision feature d has a discrete domain, the decision granules are the set of decision classes using the expression 11 to implement the function F 2. If F 1 is defined by expression 9, the similarity relation R1 depends on the set of weights W. The problem is reduced to find the set of weights W = {w1 , w2 , . . . , wn }, where n is the number of features in the set A, that maximizes the expression 7. Several search methods could be used to calculate W . The method proposed here is based on a heuristic search in which the quality of the similarity measure of a decision system is used as heuristic value. Among different alternatives to perform the heuristic search are Particle Swarm Optimization (PSO) [17] or Genetics Algorithms (GA) [18] and [19]. We use PSO to find the best set W , this method has showed good performance to solve optimization problems [20] and [21]; also, the use of PSO with RST in the feature selection problem was showed in [22]. In this problem each particle represents a set of weights W , so each particle is a vector with n components (one for each feature in A). The quality of particles is calculated by using the expression 7. At the end of the PSO search, the best particle is the best weight set W to build the function F 1 defined by expression 9; then, the similarity relation R1 established by expression 2 can be implemented using this function F 1.
IV. A METHOD TO CALCULATE THE SIMILARITY DEGREE AMONG OBJECTS
V. E XPERIMENTAL STUDY
According to expressions 2 and 3 the similarity relations depend on similarity functions to compare objects in U . The similarity degree among objects is computed using the following components: • Local similarity measures used to compare the values of single features (called comparison function of the feature). • A set of feature weights representing the relative importance of each attribute. • A global similarity measure to calculate the similarity degree based on the local similarities and feature weights. Let the similarity functions F 1 and F 2 be defined by 9 and 10:
This approach was used in the case of the function approximation problem. We calculated the weight of features in order to improve the quality of the approximation. The k-NN method and the Multilayer Perceptron (MLP) were considered. For this study five databases were used, three in the branch of the Civil Engineering and two in the branch of the Meteorology, these databases will be described next. In the case of the Civil Engineering, the problem is to calculate the resistant capacity of three types of connectors (stud, crestbond and canals); it is an important parameter because it is responsible of ensuring the connection between structures. However, it has been proven that the calculus expressions of the resistant capacity established in the main international norms, such as AISC-LRFD (2005) [23] and Eurocode 4 (IN1994-1-1:2004) [24], underestimate it excessively and in other cases they overestimate it. For this reason, it is adequate to solve it as an approximation function problem. In the case of Meteorology the problem is to calculate the minimum temperatures of two Cuban regions (Camagüey and The Tunas). The databases are described in Tables I and II. The stud database has eight input variables and one output variable with a total of 66 instances, the crestbond database has eleven input variables with a total 36 instances, the canals database has five input variables with a total 43 instances, the database of minimum temperatures of Camagüey with 26 input variables and 1783 instances and the database of the minimum temperatures of The Tunas with 23 input variables and 2557 instances. The output for each instance in the connector’s databases is the value of resistant capacity (Q) and in the temperature’s databases is the value of the minimum temperature.
F 1(x, y) =
n X
wi ∗ ∂i (xi , yi )
(9)
i=1
F 2(x, y) = ∂(d(x), d(y))
(10)
where: n is the number of features wi is the weight of feature i xi and yi are the values of feature i in objects x and y, respectively ∂i is a function to calculate how the values of a given feature are similar (the comparison function of feature i). A simple comparison function is defined by expression 11: 1 if x and y are real and |x − y| ≤ ε or x and y are discrete and x = y δ(x, y) = 0 in other case (11)
4
TABLE I I NPUT FEATURES IN THE TRAINING SETS . Input variables of Crestbond connector Diameter of the trapezoidal opening (Lsc) Total height of the connector (hsc) Total thickness of the flagstone including the flagstone before manufactured if there is her (tc) Thickness of the flagstone before manufactured (tpl) Thickness of the foil of the connector (tsc) Resistance of the concrete to the compression (fc) Number of holes (n) Diameter of the trapezoidal teeth (D) Diameter of the bars of steel of the armor (ø) Number of bars (# barras) Longitude of the connector (Lc) Input variables of Canals connector Thickness of the soul (w) Thickness of the wing (t) Longitude of the connector (L) Height of the connector (H) Resistance of the concrete to the compression (fc) Input variables of Stud connector Area of the connector (area X 10´r² (m²)) Number connectors (nr). Position of the connector (Pos). Average width of the channel of the Steel Deck (bo). Depth of the channel or the Steel Deck (hp). Height of connector (hsc). Resistance of the concrete to the compression (fc). Tension of fluency of the connector (Fu).
TABLE II D ESCRIPTION
OF THE ATTRIBUTES OF THE
T EMPERATURES DATABASES .
Input variables of temperatures databases It dates for which the values of the temperatures and the other associate variables were obtained. (Year, Month) They are 7 different attributes, corresponding to the real values of the max temperatures of the 7 previous days, they are measured from 14:00 o’clock to the 16:00 hours. (Tx-1,...,Tx-7) They are 7 different attributes, corresponding to the real values of the min temperatures of the 7 previous days, they are measured from 6:00 o’clock to the 8:00 hours. (Tn-1,...,Tn-7) Heatstroke. (Hluz) Number of Wolf that is determined keeping in mind the total of groups of existent stains in the solar disk and the total of stains and pores that integrate this groups. (W) Informs quantitatively on the solar activity in terms of the irradiated energy. (Solar flow) Covered area for sunspots. (Area) Number of stains that have arisen. (New stains) Address of the wind at the 19:00 hours. (dd7) Speed of the wind at the 19:00 hours. (ff7) Direction of the wind at the 7:00 hours. (dd00) Speed of the wind at the 7:00 hours. (ff00) Relative humidity. (HR) Cloudines. (Nub) Precipitations. (Prec)
The k − N N method was used to calculate the resistant capacity and the minimun temperature. The key idea in the k − N N method is that similar input data vectors have similar output values, that is, if Xi is near to Xj then di is near to dj , where Xi and Yj are the condition part, and di and dj are the decision values, of objects i and j. It is necessary to find a certain quantity of nearest neighbours and their output values, to compute the output approximation. This output can be calculated as an average of outputs of the neighbours in the
neighbourhood N (Xh) of the a new vector Xh. In order to built the neighbourhood N (Xh), the similarity degree between two vectors X and Y was calculated using the function F 1 defined by expression 9, and the feature comparison function defined by 12, where Di denotes de domain of feature i. |(Xi −Yi )| 1 − M ax(Di )−M in(Di ) if i is contin. ∂(Xi , Yi ) = 1 if i is discrete and Xi = Yi 0 if i is discrete and Xi 6= Yi
(12)
Three alternatives of methods to calculate the weights were included in expression 9 and three values for k were employed for the experimentation. The variants for calculating the weights are: (i) the proposed method in this paper (called PSO+RST), (ii) the weight obtained by Conjugated Gradient method (k − N NV SM ) [25], (iii) assigning the same weight to each feature (called Standard). The back propagation (BP) algorithm [26] is the most widely used to train the feedforward multilayer perceptron (MLP) which is one of the most popular models in artificial neural networks. The basic idea of the BP algorithm is that weights are updated in the direction to reduce the error between the desired output and the actual output. In the case of the MLP trained with BP, conventionally employed rule for weight initialization is to use small random values. One of the main reasons for the slow convergence and the suboptimal generalization results of MLP based on gradient descent training is the lack of a proper initialization of the weights to be adjusted [27]. Even sophisticated learning procedures are not able to compensate for bad initial values of weights, while good initial guess leads to fast convergence and or better generalization capability even with simple gradient-based error minimization techniques. Although initial weight space in MLPs seems to be a critical aspect, there is no study so far to solve this in real world problems. It was shown how the choice of an initialization method influences the convergence of the optimization [28]. In order to decrease the influence of the parameters, such as the initial weights, in the performance of the networks, different evolving architecture and initial connection weights of the MLP have been studied [29] and [30]. When the MLP is used to calculate the resistant capacity and the minimun temperature, the topology of the network includes a hidden layer. Two alternatives were studied to initial values of weights, (i) the weights are initialized in a random way, and (ii) initializing the weights with the proposed method (PSO+RST). In the last case, the weights associated to the links from neurons in the input layer to the neurons in the hidden layer are calculated by using the method PSO+RST, the links from the hidden layer to the output neuron are initialized in a random way; this alternative is called MLP-PSO-RST. K-Fold cross-validation process was employed. K -Fold Cross - Validation divides the original dataset into k subsets of equal size where one is used as test set while the others as used as the training set. Then the overall accuracy of the learner is calculated as the average precision obtained with all test subsets. This technique eliminates the problem of overlapping test sets and makes an effective use of all available
5
data. The recommended value k = 10 was used [31]. The experiments were implemented by using the Weka tool [32]. The MLP includes three layers, one neuron is located in the first layer for each input feature, (attribs + classes) / 2 neurons are located in the hidden layer and one neuron in the output layer; the learning process is executed during 500 cycles, other parameter used in the MLP training is the learning rate equal to 0.3. The results obtained by these variants in the both cases were compared with the real value of the resistant capacity and the minimun temperature according to the experiments that are described below. Experiment 1: Comparing the results of accuracy of each alternative using k-NN and MLP method according to the measures: (i) Mean Absolute Percentage Error (MAPE), (ii) Root Mean Square Error (RMSE), and (iii) the average magnitude of the difference between the desired value and that obtained by the prediction (PMD). These measures are defined by expressions 13, 14 and 15.
M AP E =
N X ai − yi ai i=1
N
∗ 100%
PMD =
AND
K-NN Data Base
Weight PSO+RST
Stud
k − N NV SM
Standard
PSO+RST
Crestbond
k − N NV SM
Standard
PSO+RST
Canals
k − N NV SM
(13) Standard
v uX N u ai − yi 2 u t ai i=1 RM SE = ∗ 100% N N X
TABLE III R ESULTS OF MAPE, RMSE
PSO+RST
(14) M T emCmg
|ai − yi|
i=1
N
k − N NV SM
Standard
(15)
Where: ai is the desired output value. yi is the calculated output value. N is the number of instances. The results of MAPE, RMSE and PMD for each variant are summarized in Table III for the k-NN method and in the Table IV for the case of the MLP method. The results of the errors are expressed in percent (MAPE and RMSE) and the average (PMD) in absolute values. In the Table III we can observe that the values of MAPE, RMSE and PMD, for the first variant (PSO+RST) are smaller than the other ones, it can be also appreciated the difference among the values of this variant with the other ones with k = 2 in the case of the connector’s databases and in the temperature’s databases with k = 3, are the most effective variant. Table IV shows that the values of MAPE, RMSE and PMD, for the first variant (MLP+PSO+RST) are smaller than the second variant. So, the conclusion would be that MLP using PSO+RST to initialize the weight in the MLP is the most effective variant. Experiment 2: Comparing the results of accuracy of each alternative using the k-NN and MLP methods, in order to determine whether there are significant differences in accuracy regarding the real value by means of the coefficient R², the correlation coefficient and standard error; in the case of the
PSO+RST
M T emLT
k − N NV SM
Standard
PMD
FOR EACH VARIANT FOR THE
METHOD .
K
MAPE
RMSE
PMD
K1 K2 K3 K1 K2 K3 K1 K2 K3 K1 K2 K3 K1 K2 K3 K1 K2 K3 K1 K2 K3 K1 K2 K3 K1 K2 K3 K1 K2 K3 K1 K2 K3 K1 K2 K3 K1 K2 K3 K1 K2 K3 K1 K2 K3
13.1326 13.0241 14.1742 15.7224 14.5631 17.0934 14,7994 14.0107 16,6067 10.9659 10.5954 11.9569 13.6104 13.8163 13.8222 13.6104 13.8163 13.8222 10.1026 8.7106 8.9400 16.4154 10.5895 15.5640 16.4154 10.5895 15.5640 4.78738 4.15614 4.08164 5.07814 4.49281 4.31848 5.00665 4.53346 4.27272 4.31151 3.80401 3.60515 4.45170 3.86624 3.74032 4.45170 3.86624 3.74032
16.4214 17.8245 20.7115 20.5588 21.3190 24.5153 19,3214 19.8518 22,7934 16.6827 15.8714 17.6045 22.6871 24.5203 22.9153 22.6871 24.5203 22.9153 13.7852 11.4749 11.3708 22.7779 14.1518 18.9854 22.7779 14.1518 18.9854 0.47270 0.35143 0.34810 0.52487 0.41145 0.38160 0.50858 0.43203 0.37194 0.35373 0.27554 0.25250 0.36506 0.28639 0.27386 0.36506 0.28639 0.27386
7.1419 6.7495 6.9543 8.959 8.0257 8.6130 8,3485 7.7888 8,1555 3.3766 3.2577 3.6633 3.8840 3.8265 4.0121 3.8840 3.8265 4.0121 3.5873 3.2276 3.3716 5.9885 4.1333 5.5974 5.9885 4.1333 5.5974 0.9508 0.8252 0.8055 1.0090 0.8911 0.8536 0.9947 0.8972 0.8455 0.8887 0.7837 0.7414 0.9166 0.7963 0.7671 0.9166 0.7963 0.7671
k-NN method, the best combination (PSO+RST with k = 2 and k = 3) was used for the comparison. In Table V, the test values for the case of the best values of k are summarized because they were the best results. The statistical analysis yielded the following results: With PSO + RST in the five database, for standard error, the results are significantly lower than the other variants, the R² coefficient value is relatively high, above 72 %, and the correlation coefficient is above 0.85 %. In Table VI the statistical analysis yielded the following results: With MLP using PSO + RST in the three databases, for standard error, the results are significantly lower than the other variant, the R² coefficient value is relatively high, above 56 %, and the correlation coefficient is above 0.75 %.
6
TABLE IV R ESULTS OF MAPE, RMSE
AND
MLP DB Stud Crestbond Canals M T emCmg M T emLT
PMD
FOR EACH VARIANT FOR THE
METHOD .
MLP MAPE
+PSO+ RMSE
RST PMD
MAPE
MLP RMSE
PMD
16.191 7.5700 6.8552 6.3998 4.6592
22.699 10.447 10.400 0.7739 0.4111
8.447 2.385 2.370 1.312 0.992
22.144 7.8076 11.354 6.7632 4.7979
35.559 11.346 14.738 0.9709 0.4741
10.27 2.437 4.223 1.38 1.017
TABLE V S TATISTICAL ANALYSIS OF THE RESULTS FOR THE k-NN Methods
PSO+RST
k − N NV SM
Standard
Stand. Error R2 Coeff. Correl. Coeff. Stand. Error R2 Coeff. Correl. Coeff. Stand. Error R2 Coeff. Correl. Coeff. Stand. Error R2 Coeff. Correl. Coeff. Stand. Error R2 Coeff. Correl. Coeff.
8.5259 0.8564 0.9253 1.4023 0.7281 0.8510 4.2181 0.9018 0.9494 1.0653 0.8004 0.8946 0.9703 0.7737 0.8796
9.7889 0.8107 0.9004 5.5938 0.5599 0.7483 5.5225 0.8377 0.9153 1.1227 0.7783 0.8822 1.0059 0.7568 0.8600
9.530 0.8205 0.9058 5.5938 0.5599 0.7483 5.5225 0.8377 0.9153 1.1105 0.7830 0.8849 1.0059 0.7568 0.8600
Data Base Stud k=2 Crestbond k=2 Canals k=2 M T emCmg k=3 M T emLT k=3
METHOD .
TABLE VI S TATISTICAL ANALYSIS OF THE RESULTS FOR THE MLP Data Base Stud
Crestbond
Canals
M T empCmg
M T empLT
METHOD .
Methods
MLP+PSO+RST
MLP
Stand. Error R2 Coeff. Correl. Coeff. Stand. Error R2 Coeff. Correl. Coeff. Stand. Error R2 Coeff. Correl. Coeff. Stand. Error R2 Coeff. Correl. Coeff. Stand. Error R2 Coeff. Correl. Coeff.
10.036 0.8005 0.8947 2.9190 0.8801 0.9382 3.1323 0.9457 0.9724 1.4954 0.5662 0.7525 1.2383 0.6183 0.7863
12.406 0.6957 0.8341 3.0091 0.8727 0.9342 5.1911 0.8566 0.9255 1.5799 0.5158 0.7182 1.2463 0.6134 0.7832
VI. C ONCLUSIONS A method to support building similarity relations in the extended Rough set theory was presented. This includes a new measure called the quality of similarity of a decision system. This method could be applied in the case of discrete and continue decision feature. Its application to solve a function approximation problem showed satisfactory results. R EFERENCES [1] Z. Pawlak. Rough sets. International Journal of Information & Computer Sciences 11, 11:145–172, 1982. [2] R. Slowinski D. Vanderpooten. A generalized definition of rough approximations based on similarity. IEEE Transactions on Data and Knowledge Engineering 12 (2), pages 331–336, 2000. [3] S. Greco. Rough sets theory for multicriteria decision analysis. European Journal of Operational Research 129, 129:1–47, 2001.
[4] Y.Y. Yao. On generalizing rough set theory. Lectures Notes on Artificial Intelligence 2639, pages 44–51, 2003. [5] K. Qin et al. On covering rough sets. Lectures Notes on Artificial Intelligence 4481, pages 34–41, 2007. [6] Ming-Wen Shao. A complete method to incomplete information systems. Lectures Notes on Artificial Intelligence 4481, pages 50–59, 2007. [7] A. Skowron J. Stepaniuk. Tolerance approximation spaces. Fundamenta Informaticae 27, 27:245–253, 1996. [8] S.K. Pal A. Skowron (Eds). Rough fuzzy hybridization: a new trend in decision-making. Springer-Verlasg, 1999. [9] W. Zhu F. Wang. Relations among three types of covering rough sets. In IEEE GrC 2006, pages 43–48, Atlanta, USA. 2006. [10] D. Bianucci. Entropies and co-entropies of coverings with applications to incomplete information systems. Fundamenta Informaticae 77, 77:77– 105, 2007. [11] D.B. Leake. Cbr in context: The present and future. in case-based reasoning: Experiences, lessons, and future directions, d. leake (ed.). Menlo Park: AAAI Press/MIT Press, 1996. [12] R. M. Lopez E. Armengol. Machine learning from examples: Inductive and lazy methods. Data & Knowledge Engineering, 25:99–123, 1998. [13] T. Y. Lin. Granular computing: from rough sets and neighbourhood systems to information granulation and computing in words. Proc. European Congress on Intelligent Techniques and SoftComputing, pages 1602–1606, 1997. [14] L. Zadeh. Is there a need for fuzzy logic?. Information Sciences, 178:2751–2779, 2008. [15] J. M. Ma et al. Granular computing and dual galois connection. ˝ Information Sciences, 177:5365U5377, 2007. [16] B. Chen et al. Granular rough theory: A representation semantics oriented theory of roughness. Applied Soft Computing, 9:786–805, 2009. [17] J. Kennedy R. Eberhart. Swarm intelligence. Morgan Kaufmann Publishers, 2001. [18] F. Herrera M. Lozano A. Sanchez. A taxonomy for the crossover operator for real coded genetic algorithms: An experimental study. International Journal of Intelligent Systems, 18:309–338, 2003. [19] García-Martínez et al. Global and local real-coded genetic algorithms based on parent-centric crossover operators. European Journal of Operational Research, 185:1088–1113, 2008. [20] K.E. Parsopoulos M.N. Vrahatis. Recent approaches to global optimization problems through particle swarm optimization. Natural Computing 1, 1:235–306, 2002. [21] M. Reyes-Sierra C.C. Coello. Multi-objective particle swarm optimizers: A survey of the state-of-the-art. International Journal of Computational Intelligence Research vol. 2, no. 3, vol. 2, no. 3:287–308, 2006. [22] X. Wang et al. Feature selection based on rough sets and particle swarm optimization. Pattern Recognition Letters 28, pages 459–471, 2007. [23] Load and Resistance Factor Design (LRFD) Specification for Structural Steel Building (2005). American Institute of Steel Construction (AISC), Inc., Chicago, IL. [24] Eurocode 4 (EN 1994-1-1). Desing of Composite Steel and Concrete Structures Part 1.1 (2004) : European Committee for Standardization, Brussels. [25] D. Wettschereck. A description of the mutual information approach and the variable similarity metric. Technical report, Artificial Intelligence Research Division, Sankt Augustin, Germany, German National Research Center for Computer Science, 1995. [26] D.E. Rumelhart J.L. McClelland. Parallel distributed processing. MIT Press, 1, 1986. [27] S. Adam et al. Revisiting the problem of weight initialization for multilayer perceptrons trained with back propagation. Lectures Notes on Computer Science, 5507:308–315, 2009. [28] G. Thimm E. Fiesler. High-order and multilayer perceptron initialization. ˝ IEEE Trans. on Neural Networks, 2:349U359, 1997. [29] L. Almeida T. Ludermir. An evolutionary approach for tuning artificial neural network parameters. Lectures Notes on Artificial Intelligence, ˝ 5271:156U163, 2008. [30] X. Fu et al. A resource limited immune approach for evolving architecture and weights of multilayer neural network. Lectures Notes ˝ on Computer Science, 6145:328U337, 2010. [31] J. Demsar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research(7), 7:1–30, 2006. [32] Tool of open code written in Java. Available under GNU public licenses in http://www.cs.waikato.ac.nz/Ÿml/weka/.