Multi-objective optimization method for learning

0 downloads 0 Views 4MB Size Report
threshold learning. In our model, we integrate an objective function that minimizes the decision cost with ...... Tic-tac-toe Endgame (Tic-tac-toe). 2. 9. 958. 958. 8.
JID:IJA AID:7867 /FLA

[m3G; v1.172; Prn:22/01/2016; 15:51] P.1 (1-16)

International Journal of Approximate Reasoning ••• (••••) •••–•••

1

Contents lists available at ScienceDirect

2

1 2

3

International Journal of Approximate Reasoning

4 5

3 4 5

6

6

www.elsevier.com/locate/ijar

7

7

8

8

9

9

10

10

11 12 13

Multi-objective optimization method for learning thresholds in a decision-theoretic rough set model

14 15 16 17 18

Tingsheng Yang School of Management Science and Engineering, Anhui University of Technology, Maanshan 243032, China

25 26

29 30 31 32

16 18 20

a r t i c l e

i n f o

a b s t r a c t

21 22

Article history: Received 24 July 2015 Received in revised form 17 January 2016 Accepted 18 January 2016 Available online xxxx

27 28

15

19

22 24

13

17

20

23

12 14

Ruilin Pan ∗ , Zhanchao Zhang, Yanglong Fan, Jianhua Cao, Ke Lu,

19 21

11

Keywords: Decision-theoretic rough set model F_measure Multi-objective optimization Pareto optimal set Threshold learning

33 34 35 36

For decision-theoretic rough sets, a key issue is determining the thresholds for the probabilistic rough set model by setting appropriate cost functions. However, it is not easy to obtain correct cost functions because of a lack of prior knowledge and few previous studies have addressed the determination of learning thresholds and cost functions from datasets. In the present study, a multi-objective optimization model is proposed for threshold learning. In our model, we integrate an objective function that minimizes the decision cost with another that decreases the size of the boundary region. The ranges of the thresholds and two types of F_measure are used as constraints. In addition, a multi-objective genetic algorithm is employed to obtain the Pareto optimal set. We used 12 UCI datasets to validate the performance of our method, where the experimental results demonstrated the trade-off between the two objectives as well as showing that the thresholds obtained by our method were more intuitive than those obtained using other methods. The classification abilities of the solutions were improved by the F_measure constraints. © 2016 Elsevier Inc. All rights reserved.

23 24 25 26 27 28 29 30 31 32 33 34 35 36

37

37

38

38

39

39

40 41

1. Introduction

42 43 44 45 46 47 48 49 50 51 52 53

58 59 60 61

43 44 45 46 47 48 49 50 51 52 53 54

55 57

41 42

The Pawlak rough set proposed in the early 1980s [1] has been applied in many research fields such as data mining [2,3] and machine learning [4,5]. However, the traditional model is too strict to include objects in the approximation regions. Thus, probabilistic rough set models were introduced to loosen the extreme membership requirements of the equivalence classes in the object set [6]. As a special type of probabilistic rough set model, decision-theoretic rough set (DTRS) models [7] can be used to derive several probabilistic rough set models, e.g., the 0.5 probabilistic rough set model [8] and variable precision rough set model [9]. Previous research into DTRS models can be summarized briefly as follows. First, a series of studies addressed the extension of the DTRS model. Yao [10] studied the derivation of other probabilistic rough set models from the DTRS. Liu et al. [11] introduced three-way decision discriminant analysis into the DTRS model, while Yao and Zhou [12] introduced naive Bayesian classification into DTRSs. Based on the current status of the DTRS model, Yang and Yao [13], Zhou [14], and Li and Zhou [15] proposed multi-agent, multi-class, and multi-view DTRS models, respectively, as different DTRS model extensions.

54 56

40

55

*

Corresponding author. E-mail addresses: alltimefi[email protected] (R. Pan), [email protected] (Z. Zhang), [email protected] (Y. Fan), [email protected] (J. Cao), [email protected] (K. Lu), [email protected] (T. Yang). http://dx.doi.org/10.1016/j.ijar.2016.01.002 0888-613X/© 2016 Elsevier Inc. All rights reserved.

56 57 58 59 60 61

JID:IJA AID:7867 /FLA

2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

[m3G; v1.172; Prn:22/01/2016; 15:51] P.2 (1-16)

R. Pan et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

Using a Bayesian decision procedure and graded rough set, Li and Xu [16] proposed a new framework for DTRS called the double-quantitative-DTRS. By considering the new expression of evaluation information using hesitant fuzzy sets (HFSs), Liang and Liu [17] introduced HFSs into DTRSs and explored their decision mechanisms. Other studies have concentrated on the methodology of DTRS theory, such as attribute reduction and rule acquisition. In particular, Yao and Zhao [18], Min et al. [19], and Jia et al. [20] studied attribute reduction with respect to DTRS theory from different viewpoints. In [18], different classification properties, such as coverage, cost, confidence, decision-monotonicity, and generality, were regarded as important factors for attribute reduction. In [19], the new problem of minimal test cost reduction was proposed, where three metrics for evaluating the performance of the reduction algorithm were defined from a statistical viewpoint. Furthermore, in [20], a new definition was proposed for attribute reduction for DTRS models by formulating an optimization problem that aims to minimize the cost of decisions. In addition, Li et al. [21] defined a new attribute reduction method based on further investigations of its monotonicity property. Grzymala-Busse et al. [22] analyzed positive and boundary regions as well as comparing possible rules using the Modified Learning from Examples Module Version 2 algorithm. Finally, several studies have considered the application of DTRS models. Thus, Zhou et al. [23] proposed a three-way decision method for filtering spam based on a Bayesian decision procedure. Li et al. [24] proposed an instance-centric hierarchical classification framework using the three-way decision method. Yu et al. [25] used the DTRS model to formulate an efficient automatic clustering method. Liu et al. [26] applied the three-way decision method to a policy-making procedure to reduce the decision risk. The key feature of the DTRS is a sound mathematical interpretation of thresholds based on the Bayesian decision procedure. Using learned thresholds, three pairwise disjoint regions can be defined in the probabilistic rough set: positive, boundary, and negative regions. As a new semantic interpretation of the three regions, Yao [27,28] introduced the concept of a three-way decision comprising positive, negative, and boundary rules. However, it is not easy to obtain effective decision cost functions for the DTRS model because of a lack of prior knowledge. To overcome this problem, only a few studies have addressed the problem of learning the decision cost functions and thresholds from datasets automatically. In particular, Deng and Yao [29] and Jia et al. [30] proposed different single objective optimization models for automatically learning thresholds from datasets, where the former determined the optimal thresholds by aiming to minimize the uncertainty induced by the three regions, whereas the latter focused on minimizing the decision cost for learning optimal thresholds. However, a major challenge regarding probabilistic rough set models was ignored in these models because they did not formulate a method for decreasing the size of the boundary region by further exploration of the data [31]. In particular, in the model proposed by Jia et al. [30], they used penalties to control the size of the boundary region, but the penalties were provided by users and they could not be selected easily. Herbert and Yao [31] proposed a game-theoretic rough set (GTRS) model to decrease the size of the boundary region and to calculate the required thresholds within a game-theoretic environment. In a related study [32], the configuration of probabilistic thresholds was interpreted as a decision-making problem in a competitive game involving multiple criteria, such as accuracy, generality, confidence, and coverage. In a recent study of GTRS theory, Azam and Yao [33] constructed a mechanism for analyzing the uncertainties of rough set regions with the aim of determining effective threshold values. A competitive game was formulated between the regions to modify the thresholds in order to improve their respective uncertainty levels. By playing the game repeatedly and utilizing its results to update the thresholds, a learning mechanism was proposed to automatically tune the thresholds based on the data. The games based on accuracy and generality consider the three regions, and the uncertainty-based games consider the individual regions. However, in the GTRS model, users have to provide initial possible increases/decreases in the threshold values to set up the game. In addition, the games between decision cost and boundary regions were not investigated further. In the present study, we propose a multi-objective optimization model for automatic threshold learning. We consider two significant problems regarding DTRS theory: decreasing the size of the boundary region and decreasing the overall decision cost for three types of rules. Using the model proposed by Jia et al. [30], we modify the formulae irrespective of the penalties in our first objective and constraint, before adding a simple but very meaningful objective, α –β , which intuitively represents the goal of decreasing the size of the boundary region. In contrast to our method, Li and Zhou [15] proposed a three-way view decision model where optimistic, pessimistic, and equable decisions are made according to the cost of misclassification. The thresholds for probabilistic inclusion are calculated based on the minimal risk cost under the respective decision bias. Similarly, Min et al. [19] posited a minimal test cost reduction problem, which constitutes a new but more general problem than the classical reduction one. It is also quite different from our method. The multi-objective problem is regarded as a game in our method to investigate the trade-off that exists between these two objectives. This game gives rise to a set of Pareto optimal solutions, among which one cannot be said to be better than the other. Excluding the Pareto optimal solutions, no other outcome makes each player (objective) at least as well off and at least one objective better off. In addition, two types of F_measure constraints are used to improve the classification ability of the selected solutions in our model. The first type of F_measure, which is called the F1_measure, is used at the end of the algorithm to preserve the solutions with better classification performance. The second, which is called the F2_measure, is applied during the iteration procedure. Using the self-correcting mechanism in the F2_measure, we modify α /β in non-feasible individuals to satisfy the F2_measure constraint. We used 12 representative UCI datasets [34] to validate the performance of our model, where the experimental results demonstrated several advantages of our method. First, compared with the other methods mentioned above, a set of Pareto optimal thresholds is learned automatically and the penalties in Jia et al.’s model [30] can be neglected. Second, the newly added objective function to decrease the size of the boundary region is represented simply and it is easy to understand.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

JID:IJA AID:7867 /FLA

[m3G; v1.172; Prn:22/01/2016; 15:51] P.3 (1-16)

R. Pan et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

1 2 3 4 5 6 7 8

3

Furthermore, solutions with better classification abilities can be kept by using the F1_measure constraint and we fit an equation using multiple linear regression (MLR) to correct the non-feasible individuals by considering the F2_measure constraint. Finally, the Pareto optimal solutions can be presented to allow users to select the final solution based on their own preferences. The remainder of this paper is organized as follows. In Section 2, the basic concepts of the DTRS model are introduced. In Section 3, a multi-objective optimization model for learning the thresholds of the DTRS model is proposed, and the model is solved by a multi-objective genetic algorithm (MOGA). In Section 4, the experimental results are presented and discussed. In Section 5, we give our conclusions.

9 10

13 14 15

18 19 20 21 22 23 24 25 26 27

30 31 32 33 34 35

38 39 40 41 42 43 44 45 46 47 48 49 50 51

(1)

where U , A t , and C are finite non-empty sets of objects, attributes, and condition attributes that describe the objects, respectively, and D is a decision attribute indicating the classes of the objects. Furthermore, V a is a non-empty set of values for attribute a ∈ A t , and I a : U → V a is an information function that maps an object in U to exactly one value in V a . In the decision table, given a subset of attributes A ⊆ A t , an indiscernibility relation ind ( A ) is defined as follows [26]:

x ind( A ) y ↔ ∀a ∈ A [ I a (x) = I a ( y )],

(2)

where two objects x and y are indiscernible with respect to A if and only if they have exactly the same value for each attribute in A. The indiscernibility relation ind ( A ) is an equivalence relation. The equivalence class containing object x is denoted by [x] A , or for simplicity by [x]:

R (ai |x) =

s 

(4)

j =1

15 17 18 19 20 21 22 23 24 25 26 27

31 32 33 34 35 37 38

where the quantity R (ai |x) is also called the conditional cost. Inthe DTRS  model, the object classification problem is considered with approximation operators. The states given by set  = C , C C indicate that an object is in a decision class C or not in C , respectively. The set of actions defined by A = {a P , a B , a N }, represents the three actions for classifying an object, deciding POS (C ), BND (C ), or NEG (C ), respectively. The positive region POS (C ), boundary region BND (C ), and negative region NEG (C ) correspond to the decisions of acceptance, deferment, and rejection, respectively. Let λ P P , λ B P , and λ N P denote the costs incurred for taking actions a P , a B , and a N , respectively, when an object belongs to C , and λ P N , λ B N , and λ N N , respectively, denote the costs incurred for taking the same actions when the object belongs to C C . probabilities that an object x is in C or C C can be defined, respectively, as P (C | [x]) = |C ∩ [x]| / |[x]| and  The conditional  P C C | [x] = 1 − P (C | [x]). Given the cost functions, the expected cost R (a P | [x]), R (a B | [x]), and R (a N | [x]) can be computed as follows: C

R (a N |[x]) = λ N P · P (C |[x]) + λ N N · P (C C |[x]).

(7)

39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56

The Bayesian decision procedure for Eq. (4) leads to the following three minimum cost decision rules [28]:

57

61

14

36

λ(ai | w j ) · P ( w j |x),

(6)

60

13

30

R (a B |[x]) = λ B P · P (C |[x]) + λ B N · P (C C |[x]),

59

12

Next, we review the DTRS theory in detail. Suppose that  = {ω1 , · · · , ωs } is a finite set of s states, A = {a1 , · · · , am } is a finite set of m possible actions,  and P (ω j |x) is the conditional probability of object x being in state ω j given that the object is described by x. Let λ ai |ω j denote the cost or loss for taking action ai when the state is ω j . For an object with the description x, we suppose that action ai is taken. The expected cost associated with taking action ai can then be calculated by [7]:

53

58

8

29

(5)

56

7

(3)

R (a P |[x]) = λ P P · P (C |[x]) + λ P N · P (C |[x]),

55

6

28

[x] A = [x] = { y ∈ U |∀a ∈ A ( I a (x) = I a ( y ))}.

52 54

5

16

S = (U , A t = C ∪ { D }, { V a |a ∈ A t }, { I a |a ∈ A t }),

36 37

4

11

In this section, we present the basic notions of the DTRS model and three-way decision theory. In addition, Jia et al.’s single-objective optimization model is introduced. First, let us consider a simple knowledge representation scheme within a finite set of objects, which are described by a finite set of attributes. This scheme can be defined by a decision table S, which is expressed as the tuple [26]:

28 29

3

10

16 17

2

9

2. Basic concepts of DTRS models

11 12

1

57

(P) if R (a P |[x]) ≤ R (a B |[x]) and R (a P |[x]) ≤ R (a N |[x]), decide x ∈ POS(C ),

(8)

(B) if R (a B |[x]) ≤ R (a P |[x]) and R (a B |[x]) ≤ R (a N |[x]), decide x ∈ BND(C ),

(9)

(N) if R (a N |[x]) ≤ R (a P |[x]) and R (a N |[x]) ≤ R (a B |[x]), decide x ∈ NEG(C ).

(10)

58 59 60 61

JID:IJA AID:7867 /FLA

[m3G; v1.172; Prn:22/01/2016; 15:51] P.4 (1-16)

R. Pan et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

4

1 2 3 4

When any two or all three actions have the same cost, our proposed approach breaks the tie by taking an action according to the order of a P , a B , and a N . In this study, we assume the cost functions: 0 ≤ λ P P ≤ λ B P < λ N P and 0 ≤ λ N N ≤ λ B N < λ P N . The decision rules (P), (B), and (N) can then be simply rewritten as:

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

30 31 32 33 34 35

(11)

(B) if P (C |[x]) ≤ α and P (C |[x]) ≥ β, decide x ∈ BND(C ),

(12)

38 39 40 41 42 43 44 45 46 47 48

(N) if P (C |[x]) ≤ β and P (C |[x]) ≤ γ , decide x ∈ NEG(C ),

(13)

where the parameters

α , β , and γ are defined as: (14)

55

13 14 16 17

(16)

α > β . Thus, α > γ > β and the following decision

18 19 20 21 22 23

(B1) if β < P (C |[x]) < α , decide x ∈ BND(C ),

(18)

25

(N1) if P (C |[x]) ≤ β, decide x ∈ NEG(C ).

(19)

27

24 26 28

According to the three probabilistic regions, we make a three-way decision based on the following positive, boundary, and negative rules [28]:

29 30 31

Des([x]) → P Des(C ), for [x] ⊆ POS(α ,β) (C ),

(20)

Des([x]) → B Des(C ), for [x] ⊆ BND(α ,β) (C ),

(21)

Des([x])→ N Des(C ), for [x] ⊆ NEG(α ,β) (C ).

(22)

32 33 34 35 36

Unlike the rules in classical rough set theory, all three types of rules in the DTRS model may be uncertain, where they represent the levels of tolerance to making incorrect decisions. Each rule incurs a corresponding cost based on its own error rate. In Jia et al.’s method [30], after assuming that the cost of correct classification is zero, i.e., λ P P = λ N N = 0, the three types of rules with different costs are defined as follows:

(23)

Boundary rule: p · λ B P + (1 − p ) · λ B N ,

(24)

Negative rule: p · λ N P ,

(25)

where p = P (C | [x]) for rules Des ([x]) → K Des (C ) , K ∈ { P , B , N }. Based on Eqs. (23)–(25), the overall decision cost of a three-way decision is defined as follows [30]:

COST =



(1 − p i ) · λ P N +

xi ∈POS( A )

37 38 39 40 41

Positive rule: (1 − p ) · λ P N ,



p j · λN P +

x j ∈NEG( A )



42 43 44 45 46 47 48 49

((1 − pk ) · λ B N + pk · λ B P ).

(26)

xk ∈BND( A )

50 51 52

According to the Bayesian decision procedure, the optimization problem of minimizing the decision cost is expressed as the minimization of



(1 − p i ) · λ P N +



p j · λN P + σ

p i ≥α

58

s.t. 0 ≤ β < γ < α ≤ 1, σ ≥ 1,

61

10

(17)

57

60

9

(P1) if P (C |[x]) ≥ α , decide x ∈ POS(C ),

56

59

8

15

(15)

52 54

6

12

When (λ P N − λ B N ) (λ N P − λ B P ) > (λ B P − λ P P ) (λ P N − λ N N ), we have rules are induced:

51 53

4

11

(λ P N − λ B N ) α= , (λ P N − λ B N ) + (λ B P − λ P P ) (λ B N − λ N N ) β= , (λ B N − λ N N ) + (λ N P − λ B P ) (λ P N − λ N N ) γ= . (λ P N − λ N N ) + (λ N P − λ P P )

49 50

3

7

36 37

2

5

(P) if P (C |[x]) ≥ α and P (C |[x]) ≥ γ , decide x ∈ POS(C ),

28 29

1

p j ≤β



((1 − pk ) · λ B N + pk · λ B P ),

53 54 55

(27)

β< pk

Suggest Documents