Test-Cost-Sensitive Attribute Reduction Based on ... - Semantic Scholar

2 downloads 44 Views 112KB Size Report
Test-Cost-Sensitive Attribute Reduction Based on Neighborhood Rough Set. Hong Zhao, Fan Min. ∗. , and William Zhu. Lab of Granular Computing, Zhangzhou ...
Test-Cost-Sensitive Attribute Reduction Based on Neighborhood Rough Set Hong Zhao, Fan Min∗ , and William Zhu Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou 363000, China. Email: [email protected], [email protected], [email protected]

Abstract—Recent research in machine learning and data mining has produced a wide variety of algorithms for costsensitive learning. Most existing rough set methods on this issue deal with nominal attributes. This is because that nominal attributes produce equivalent relations and therefore are easy to process. However, in real applications, datasets often contain numerical attributes. As we know, numerical attributes are more complex than nominal ones and require more computational resources. Consequently, respective learning tasks are more challenging. This paper deals with test-cost-sensitive attribute reduction for numerical valued decision systems. Neighborhood rough set achieved success in numerical data processing, hence we adopt the model to define the minimal test cost reduct problem. Due to the complexity of the new problem, heuristic algorithms are needed to find a sub-optimal solution. We propose one kind of heuristic information, which is the sum of the positive region and weighted test cost. When the test cost is not considered, the information degrades to the positive region, which is the most commonly used one in classical rough set. Three metrics are adopted to evaluate the performance of reduction algorithms from a statistical viewpoint. Experimental results show that the proposed method takes advantages of test costs and therefore produces satisfactory results. Keywords-Cost-sensitive learning, neighborhood, rough set, reduction, heuristic algorithm.

I. I NTRODUCTION In practical data mining applications, it is well known that redundant data make the mining task rather difficult. Attribute reduction is a successful technique to remove them and facilitate the mining task. This issue has attracted much attention in recent years [1], [2], [3], [4]. Different definitions of reducts and respective optimal metrics are applicable to different fields. When the test cost is not considered, people proposed attribute reduction algorithms to deal with nominal data [5], [6], [7]. On the other hand, the attribute reduction algorithms based on neighborhood rough set model are proposed to deal with numerical valued decision systems [8], [9], [10]. Recently, the test-cost-sensitive attribute reduction problem was proposed in [11], [12], [13]. This problem has wide application since the collection of data is not free, and there is a test cost for each data item [14], [15]. The test-cost-sensitive attribute reduction algorithm framework in [16] is devoted to this problem. The algorithm of [11] employs a user-specified factor λ to adjust the heuristic information function based on the test-cost of each attribute. ∗ Corresponding

author. Tel.: +86 133 7690 8359

The performance of the algorithm is satisfactory. However the data is limited to nominal ones. Since numerical data exists in real-world widely, there is much need to consider them. In this paper we define the test-cost-sensitive attribute reduction problem on numerical data. Because neighborhood rough set is successful in deal with numerical data, we adopt the theory for our problem definition. In order to facilitate neighborhood threshold settings, data items are normalized firstly. Meanwhile, test costs are also normalized to facilitate the definition of heuristic function. In most existing works, positive region is used solely in neighborhood as the heuristic information. We have a more complex data model due to test costs. Specifically, we use sum of the positive region and weighted test cost as the heuristic information, with which a new heuristic algorithm is designed. As we know, the value of positive region is a number in the range [0, 1]. Hence, the weighted test cost after normalization can adjust the heuristic information based on positive region in a small range. In order to ensure the leading position of positive region, the adjustment adopts addition instead of commonly used multiplication. The Iris dataset with various test-cost settings is employed to study the performance of our algorithm. Since there is no test cost setting in the dataset, we use three distribution functions to generate test costs. The three functions correspond with different applications. Moreover, we adopt three metrics to evaluate the performance of the reduction algorithms from a statistical viewpoint. Experimental results show that our algorithm can generate a minimal test cost reduct in most cases. This is because the proposed method takes advantages of test costs. Experiments are undertaken using an open source software called COSER (cost-sensitive rough sets) [17]. The rest of the paper is organized as follows: Section II presents the subjects needed in the other parts. Section III shows the attribute reduction algorithm based on neighborhood rough set. Experimental analysis is given in Section IV. Conclusions come in Section V. II. P RELIMINARIES This section introduces preliminary knowledge of the paper. First, the neighborhood rough set decision system is reviewed. And then, test-cost-sensitive decision system is discussed.

A. Neighborhood rough set decision systems

C. Test-cost-sensitive decision systems

Formally, the decision system can be written as a 5-tuple S =< U, C, D, {Va }, {Ia } >, where U is the nonempty set called a universe, C and D are the nonempty sets of variables called as conditional attributes and decision attributes respectively. Va is the set of values for each a ∈ C ∪ D, and Ia : U → Va is an information function for each a ∈ C ∪ D. We often denote {Va |a ∈ C ∪ D} and {Ia |a ∈ C ∪ D} by V and I, respectively. Definition 1: [8] Given arbitrary xi ∈ U and B ⊆ C, the neighborhood δB (xi ) of xi is defined as:

Since we have assumed that cost tests are undertaken in parallel, so we consider the most widely used model as follows: Definition 4: [16] A test-cost-independent decision system (TCI-DS) S is the 6-tuple:

δB (xi ) = {xj |xj ∈ U, ∆B (xi , xj ) ≤ δ},

(1)

where ∆ is a distance function. ∀x1 , x2 , x3 ∈ U , it satisfies (1) ∆(x1 , x2 ) ≥ 0; (2) ∆(x1 , x2 ) = 0, if and only if x1 = x2 ; (3) ∆(x1 , x2 ) = ∆(x2 , x1 ); (4) ∆(x1 , x3 ) ≤ ∆(x1 , x2 ) + ∆(x2 , x3 ); A detailed survey on distance functions can be seen in [18]. If the attributes generate neighborhood relation over the universe, the decision system is called a neighborhood decision system. It is denoted by N DS =< U, C, D, V, I >. Definition 2: [10] Given a neighborhood decision system N DS, X1 , X2 , ..., XN are the object subsets with decisions 1 to N , δB (xi ) is the neighborhood information granules including xi and generated by attributes B ⊆ C, Then the positive region (POS) of the decision is defined as P OSB (D) = {xi |δB (xi ) ⊆ X, xi ∈ U }.

(2)

The size of the neighborhood depends on threshold δ. When δ = 0, the samples in the same neighborhood granule are equivalent to each other. In this case, the neighborhood rough sets are a natural generalization of Pawlak rough sets. B. Attribute significance and reduction with neighborhood model The dependency degree of D to B is defined as the ratio of consistent objects: γB (D) =

|P OSB (D)| . |U |

S = (U, C, D, {Va }, {Ia }, c),

(4)

where a ∈ C ∪D, U is the nonempty set called a universe, C and D are the nonempty set of variables called as conditional attributes and decision attributes respectively. Va is the set of values for each a ∈ C ∪ D, and Ia : U → Va is a heuristic information function for each a ∈ C ∪ D, and c : C → R+ ∪ {0} is the test cost function.∑ Test costs are independent of one another, that is, c(B) = a∈B c(a) for any B ⊆ C. III. T EST- COST- SENSITIVE ATTRIBUTE REDUCTION BASED ON NEIGHBORHOOD DECISION SYSTEM

In this section, we discuss both the attribute value and test cost normalization first. And then, a problem of test-costsensitive attribute reduction based on neighborhood decision system is proposed. A. Attribute value normalization To design test-cost-sensitive attribute reduction based on neighborhood decision system, we need to set the threshold δ, which determines the size of the neighborhood. Setting the threshold is the most important problem in neighborhoodbased classification. In order to facilitate neighborhood threshold settings, the values of attributes are normalized first. Example 1: Table I presents a decision system of Iris, which conditional attributes are numerical data. Where U = {x1 , x2 , x3 , x4 , ..., x149 , x150 }, C = {Sepal-length, Sepalwidth, Petal-length, Petal-width}, and D = {Class}. In order to computing the distance of conditional attributes, we normalized every attributes from their value into a range from 0 to 1 firstly. Table I A N EXAMPLE NUMERICAL VALUE ATTRIBUTE DECISION TABLE (I RIS )

(3)

A number of definitions of relative reducts exist [19], [20], [21] for different rough set models. This paper employs the definition based on the positive region. Definition 3: [8] Given a neighborhood decision system N DS =< U, C, D, V, I >, B ⊆ C, attribute subset B is a relative reduct if (1)γB (D) = γC (D), (2)∀a ∈ B, γB (D) > γB−a (D).

Patient x1 x2 x3 x4 x5 ......

Sepal-length 0.23529 0.29412 0.35294 0.64706 0.41176 ......

Sepal-width 0.77273 0.72727 0.09091 0.31818 0.31818 ......

Petal-length 0.14286 0.11905 0.38095 0.52381 0.50000 ......

Petal-width 0.04762 0.04762 0.42857 0.52381 0.42857 ......

Class 0 0 0.5 0.5 0.5 ......

x149 x150

0.58824 0.44118

0.54545 0.27273

0.85714 0.64286

1.00000 0.71429

1 1

B. Test cost normalization For statistical purposes, three different schemes to produce random test costs are adopted. These schemes comprise: uniform distribution, normal distribution, and Pareto distribution. For simplicity, test costs are integers ranging from M to N , and are evaluated independently [16]. In order to facilitate the definition of heuristic function, we normalize the value of test cost. Let B ⊆ C, ai ∈ B. c∗i = (ci − mincost)/(maxcost − mincost)

(5)

is the normalized cost of attribute ai , where ci is the test cost of attribute ai , and mincost and maxcost are the minimum cost and the max one of all conditional attributes respectively. C. Test-cost-sensitive attribute reduction based on neighborhood decision system Most heuristic algorithms to the attribute reduction have the same structure and their differences lie in the heuristic function [2]. Now we define the heuristic function based on the positive region and weighted test cost. Definition 5: Let S ∗ = (U, C, D, {Va }, {Ia }, c∗ ) be a test-cost-sensitive neighborhood decision system , where c∗ is normalized one of c. Let B ⊆ C, ai ∈ (C − B), c∗i is defined in Equation(5). Now we propose our positive region and weighted test cost function as follows. SIGtc (ai , B, D, c∗i ) = SIG(ai , B, D) + (1 − c∗i ) ∗ ρ, (6) where ρ is regulatory factor of the test cost, if ρ = 0, SIGtc and SIG are equivalent, where SIG(ai , B, D) = γB∪ai (D).

(7)

SIGtc is the heuristic function with test costs, and SIG is the one without taking into account test costs. Now we propose a heuristic algorithm based on the positive region and weighted test cost to find out the reduct with minimal test cost. A framework of our heuristic method is shown in Algorithm 1. In the proposed Algorithm 1, if sigm = 1, the algorithm finds the best solution. The proposed algorithm has stably performance from a statistical perspective. According to Equation (1), if let δ = 0, the neighborhood rough set degenerates to classical one. In this case, the proposed test-cost reduct method can be used to deal with both nominal attributes and numerical ones without discretization. IV. E XPERIMENTS The complexity of classification not only depends on the given feature space, but also the granularity level [8]. In this paper, user can set granularity level and parameter δ in one set of experiments. We let δ = 0.005, 0.008, 0.011, ..., 0.029.

Algorithm 1 Test-cost-sensitive attribute reduction based on neighborhood decision system Input: (U, C, D, {Va }, {Ia }, c∗ ) and δ, δ is the threshold to control the size of the neighborhood Output: A reduct with minimal test cost red Method: 1: red = ∅; 2: sigm = −1, sigt = 0 ; 3: while (sigm ̸= 1||sigm ̸= sigt ) do 4: sigt = sigm ; 5: for each ai ∈ (C − red) do 6: Compute SIGtc (ai , red, D, ci ); 7: end for 8: Select am and cm with the maximal SIGtc (am , red, D, cm ); 9: Compute sigm = SIG(am , red, D); 10: red = red ∪ am ; 11: end while 12: if sigm = 1 then 13: return red; 14: end if

In Equation (6), ρ is the regulatory factor of the test cost. We let ρ = 0.01, which is just to reduce the influence of the test cost but necessary. In other words, it can make the SIG play a major role in the function. For each distribution, we generate 100 sets of test costs, and for each test cost setting, there are 9 δ settings. A. Evaluation metrics In order to dispel the influence of subjective and objective factors, three evaluation metrics are adopted to compare the performances. These are finding optimal factor (FOF), maximal exceeding factor (MEF), and average exceeding factor (AEF). The detail of definition can be seen in [16]. When the algorithm runs with different test cost settings, we obtain some reducts with these evaluation metrics. B. Statistical results For different test cost distributions, the performance of the algorithm is different. Figure 1 shows the results of finding optimal factors. This metric is both qualitative and quantitative. First, it only counts optimal solutions. Second, it is computed statistically. Figure 2 shows the results of maximal exceeding factors. The maximal exceeding factors provide the worst case of the algorithm, and they should be viewed as a statistical metric. Figure 3 shows the average exceeding factors. This displays the overall performance of the algorithm from a statistical perspective. On the whole, with the Normal test cost distribution, the algorithm has the best performance.

0.9

0.14 Uniform Normal Pareto

0.85

Uniform Normal Pareto

0.12

0.8 Average exceeding factor

Finding optimal factor

0.1 0.75 0.7 0.65

0.08

0.06

0.04

0.6 0.02

0.55 0.5 0.005

0.008

0.011

0.017 delta

0.02

0.023

0.026

0 0.005

0.029

1

0.8

0.9

0.7

0.8

0.6 0.5 0.4 0.3 Uniform Normal Pareto

0.2

0.011

Figure 3.

0.9

0.1 0 0.005

0.008

Finding optimal factor(FOF)

FOF with the Uniform distribution

Maximal exceeding factor

Figure 1.

0.014

0.014

0.017 delta

0.02

0.023

0.026

0.029

Average exceeding factor(AEF)

Without Cost With Cost

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0.008

0.011

Figure 2.

0.014

0.017 delta

0.02

0.023

0.026

0.029

Maximal exceeding factor(MEF)

C. Performance comparison If let ρ = 0, the Equation (6) will degrade to the Equation (7). Now we just consider ρ ̸= 0. Figure 4, Figure 5 and Figure 6 compare the heuristic information function with test cost and without one. Experimental results on the Iris dataset with various test-cost settings show performance improvement of the information function SIGtc over the SIG one. V. C ONCLUSION This study has proposed a new problem of a test-costsensitive attribute reduction. We formally defined the minimal test cost reduct problem for numerical valued decision systems. The new problem has practical areas of application because datasets often contain numerical attributes in the real world. The proposed solution on this problem is

0.005

Figure 4.

0.008

0.011

0.014

0.017 delta

0.02

0.023

0.026

0.029

Finding optimal factor with the Uniform distribution

based on neighborhood rough set model. We also design a heuristic information function based on positive region and the weighted test costs to obtain effective results. With this function, a new heuristic algorithm is designed. Experimental results show that the proposed method is able to find a low cost test set. ACKNOWLEDGMENTS This work is in part supported by Fujian Province Foundation of Higher Education under Grant No. JK2010036, the Fujian Province Foundation of Serving the Construction of the Economic Zone on the West Side of the Straits, National Science Foundation of China under Grant No. 60873077, 61170128, the Natural Science Foundation of Fujian Province, China under Grant No. 2011J01374, and the Education Department of Fujian Province under Grant

1 Without Cost With Cost

0.9

FOF with the Normal distribution

0.8

[5] H. Li, W. Zhang, and H. Wang, “Classification and reduction of attributes in concept lattices,” in Granular Computing, 2006, pp. 142–147. [6] Q. Liu, F. Li, F. Min, M. Ye, and G. Yang, “An efficient reduction algorithm based on new conditional information entropy,” Control and Decision (in Chinese), vol. 20, no. 8, pp. 878–882, 2005.

0.7 0.6 0.5

[7] G. Wang, H. Yu, and D. Yang, “Decision table reduction based on conditional information entropy,” Chinese Journal of Computers, vol. 2, no. 7, pp. 759–766, 2002.

0.4 0.3

[8] Q. Hu, D. Yu, J. Liu, and C. Wu, “Neighborhood rough set based heterogeneous feature subset selection,” Information Sciences, vol. 178, no. 18, pp. 3577–3594, 2008.

0.2 0.1 0 0.005

Figure 5.

0.008

0.011

0.014

0.017 delta

0.02

0.023

0.026

0.029

Finding optimal factor with the Normal distribution

[9] Q. Hu, D. Yu, and Z. Xie, “Numerical attribute reduction based on neighborhood granulation and rough approximation (in chinese),” Journal of Software, vol. 19, no. 3, pp. 640–649, March 2008. [10] Z. X. Qinghua Hu, Daren Yu, “Neighborhood classifiers,” Expert Systems with Applications, vol. 34, pp. 866–876, 2008.

1 Without Cost With Cost

0.9

FOF with the Pareto distribution

0.8 0.7

[11] F. Min, H. He, Y. Qian, and W. Zhu, “Test-cost-sensitive attribute reduction,” Information Sciences, vol. 181, pp. 4928– 4942, November 2011. [12] H. He, F. Min, and W. Zhu, “Attribute reduction in testcost-sensitive decision systems with common-test-costs,” in ICMLC, v1, 2011, pp. 432–436.

0.6 0.5

[13] H. He and F. Min, “Accumulated cost based test-cost-sensitive attribute reduction,” in RSFDGrC, ser. LNAI, vol. 6743, 2011, pp. 244–247.

0.4 0.3 0.2

[14] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data mining to knowledge discovery in databases,” AI Magazine, vol. 17, pp. 37–54, 1996.

0.1 0 0.005

Figure 6.

0.008

0.011

0.014

0.017 delta

0.02

0.023

0.026

0.029

Finding optimal factor with the Pareto distribution

[15] P. D. Turney, “Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm,” Journal of Artificial Intelligence Research, vol. 2, pp. 369– 409, 1995. [16] F. Min and Q. Liu, “A hierarchical model for test-costsensitive decision systems,” Information Sciences, vol. 179, no. 14, pp. 2442–2452, 2009.

No. JA11176. R EFERENCES [1] M. Dash and H. Liu, “Consistency-based search in feature selection,” Artificial Intelligence, vol. 151, pp. 155–176, 2003. [2] Y. Yao, Y. Zhao, and J. Wang, “On reduct construction algorithms,” in Rough Set and Knowledge Technology, 2006, pp. 297–304. [3] W. Zhu and F. Wang, “Reduction and axiomization of covering generalized rough sets,” Information Sciences, vol. 152, no. 1, pp. 217–230, 2003. [4] Y. Yao and Y. Zhao, “Attribute reduction in decision-theoretic rough set models,” Information Sciences, vol. 178, no. 17, pp. 3356–3373, 2008.

[17] F. Min, W. Zhu, and H. Zhao, “Coser: Cost-senstive rough sets, http://grc.fjzs.edu.cn/˜fmin/coser/index.html,” 2011. [18] T. R. M. D. Randall Wilson, “Improved heterogeneous distance functions,” Journal of Artificial Intelligence Research, vol. 6, pp. 1–34, 1997. [19] Z. Pawlak, “Rough sets,” International Journal of Computer and Information Sciences, vol. 11, pp. 341–356, 1982. [20] D. Slezak, “Approximate entropy reducts,” Fundamenta Informaticae, vol. 53, no. 3-4, pp. 365–390, 2002. [21] Y. Qian, J. Liang, W. Pedrycz, and C. Dang, “Positive approximation: An accelerator for attribute reduction in rough set theory,” Artificial Intelligence, vol. 174, no. 9-10, pp. 597– 618, 2010.

Suggest Documents