INFORMATION CLASSIFICATION USING FUZZY ... - CiteSeerX

1 downloads 0 Views 208KB Size Report
DAVID CAMACHO, CÉSAR HERNÁNDEZ, JOSÉ M. MOLINA. Computer Science Department ..... Brenner W., Zarnekow R., Wittig H.. Intelligent Software Agents.
INFORMATION CLASSIFICATION USING FUZZY KNOWLEDGE BASED AGENTS DAVID CAMACHO, CÉSAR HERNÁNDEZ, JOSÉ M. MOLINA Computer Science Department, Universidad Carlos III Madrid Avda. Universidad nº 30, 28911 Leganés (Spain) {dcamacho, molina}@ia.uc3m.es [email protected]

,

Abstract It is possible to find any kind of useful information in the Web. However, there are serious problems to retrieve, to manage and to use this information, due precisely to its vastness. Different approaches have been developed to avoid those problems (search engines, MetaSearch engines, Spiders, Sofbots, Intelligent Agents or Web Agents). This paper is based on one kind of these systems, which uses a set of heterogeneous intelligent software agents to achieve these previous tasks. Two different agents compose the system: WebAgents developed to retrieve information from a specific Web source and MetaWebAgents developed to select the appropriated WebAgent to search the necessary information. Each WebAgent retrieve, filter and store the retrieve information from the Web, to improve the system performance. The MetaWebAgents need to represent and to classify the behavior of the different WebAgents. In this work a fuzzy system that helps to classify the behavior of the WebAgents is presented. The MetaWebAgent calculates the appropriateness of each existing agent behavior, using different distances that will be analyzed in the paper. The behavior classification will be used to select which WebAgent is requested for information by the MetaWebAgent.

Keywords Fuzzy Systems, Fuzzy Distance, Intelligent Software Agents, Multiagent Systems.

1 Introduction Nowadays it is possible to find an enormous number of companies that are using the Web to exhibit and to sell products. It is possible to find travel agencies, transportation, hotel, or car companies that offer different products in the Web. In fact, it is possible to find bargain prices only for the users that buying through the Web. It is possible to

consult prices, timetables, buy tickets (flight tickets, train tickets, car and/or hotel reservations, lodging a room, etc…) by any connected user. Therefore, since the foundation and development of the World Wide Web (Web), any user has been able to find any kind of useful information. However, there are serious problems to retrieve and to use this stored information, due mainly to the vast amount of information stored in it. Some of these problems could be summarized in; the number of “sites” (Web addresses) which offer potential useful information grown exponentially; the number of founded documents (data) should be enormous; and finally, different representations are used for the same kind of information. Several approaches have been taken to solve these problems, some of the most popular are the Search Engines; these systems using software systems like "softbots", "spiders" or "worms" to recollect all the possible information stored in the Web. These systems build a complete and dynamic database that will be use to answer to the possible users [5]. Other systems are the MetaSearch Engines; these systems using a set of search engines and different techniques to improve the retrieve task [8, 9]. The Intelligent Software Agents are approaches that use the agent concept. Agents are a new paradigm for developing software applications [2, 8]. The agent has an incomplete amount of information or does not have the accurate abilities to solve the whole problem, and there no exist a system global control. Finally, the MultiAgent Systems (MAS) are applications that deal with the interaction of groups of intelligent agents attempting to cooperate to solve problems [1, 2, 9]. MAS are successful due they are able to solve big problems, because the agents can cooperate and share skills and knowledge. All of these previous systems try to manage the overload of information through the analysis and classification of it. These systems need to retrieve, filter, represent and store the information obtained from the Web,

Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference Copyright 2001

and give this stored information to the user or to other agents in the system [7]. Therefore, it is possible to develop systems that implement specialized agents that could retrieve the information from different sites in the Web (MAPWeb [4]). However, if there exist a set of different agents, it is necessary to classify those specialized agents to decide which of them could be requested for a particular problem. In general, a MultiAgent system that deals with information stored in the Web should be composed by two different type of agents: WebAgent: this kind of agent is specialized in retrieve, filter and store information from a particular site in the Web; MetaWebAgent: this kind of agent can reuse the WebAgent retrieve knowledge and reason with it (using classical techniques in Artificial Intelligence, like planning or learning) to solve a problem. The basic objective in this paper is to show that fuzzy systems could be suitable for compare, evaluate and classify the WebAgents behavior. A Fuzzy Knowledge Based prototype has been developed evaluate the distance between the different WebAgents behavior. This fuzzy system could be useful to any MetaWebAgent to decide which is the most convenient set of WebAgents to retrieve the information (documents, data, etc…) from the Web. The system used in this paper is MAPWeb, a MultiAgent framework developed to solve complex problems in a travel domain in the Web [4]. This paper is divided into five sections: in section 2 the problem definition is showed; section 3 gives a short description about the evaluation of the distance for a WebAgent; section 4 shows several experimental results and, finally, the conclusions of the paper are presented in section 5.

User 1

Queries/Answers

WebAgent 1 WebAgent 2

MetaWebAgent

User 2

Queries/Answers WebAgent 3

Internet

User k

Queries/Answers WebAgent 4

WebAgent J

Fig. 1: Relationship between a MetaWebAgent and specialized WebAgents in the MAPWeb architecture.

2 Problem Definition and Agents Behavior Representation

Really, it will be possible to request to all WebAgents in the system, but this will generate a poor performance in the system due to the high number of possible queries that could be requested by the different MetaWebAgents. The MetaWebAgent could analyze the information answered by the WebAgents in previous problems to decide which of them are suitable to ask new queries. As the nature of the problem is often inexact, the use of a usual distance, like Euclidean distance, does not give successful results because such distances do not take into account all the semantic information that is known about the data. Therefore, the use of non-traditional distances, like fuzzy distances, should improve the results obtained from traditional techniques, and so on, improve the efficiency of the MetaWebAgents. The information (about the agents behavior) is represented in a n-dimensional vector, {vj} that is composed by n characteristics (x1j, x2j, .. xnj). The Euclidean and absolute distance use this definition to evaluate the distance considering each components with the same importance. Fuzzy systems could be applied in order to evaluate the distance with some domain information, giving different levels of importance to each component. Actually, we are using only two characteristics: [reply-time, nº solutions]. The reply-time characteristic measures the answer time from any WebAgent when any MetaWebAgent request for information. The nº solutions (number of solutions) characteristic is used to measure the performance of the WebAgent. These characteristics are used to evaluate the performance (or simply, behavior) of a particular WebAgent when is requested by a MetaWebAgent. In previous work [3], a fuzzy distance was used to classify information retrieved by the WebAgents, the fuzzy distance should be very appropriate due to the nature of the problem. Fuzzy distance allows to the system modify the importance of the different values that we are

Information classifying techniques usually use methods that must measure distances between two vectors. These vectors represent the information. These distances are essentially useful to decide if a vector is close to another vector or not. This paper shows how it is possible to use the distance concept to evaluate a set of different agents instead the traditional retrieved information. Fig. 1 shows how a MetaWebAgent, that needs information from other agents, should evaluate which of the possible agents will be requested.

Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference Copyright 2001 

using to classify the retrieve data by the agents. Our previous work concluded fixing the fuzzy distance like the more appropriate ones for this kind of problems.

3 Evaluation Behavior

of

WebAgent

appropriateness of the vector, whose set of values is {ls1,ls2,...,lsp}. The well-known Mamdani implication [10] has been chosen to assign their meaning to these fuzzy conditional statements. The compositional rule of inference (CRI), (approximate extension of the rule of modus ponens) has been adopted as the inference mechanism to obtain the fuzzy subset induced in APPR (through each conditional statement of the FRA), by the fuzzy statement of the form ANDni=1(Ldi is ldiri), linguistic description of the distance among any travel features. The meaning of APPR will be the intersection of the intermediate meanings resulting from the application of the CRI to each conditional statement of the FRA (min of all the induced consequent membership functions). Finally, the adopted defuzzification process, applied to the final meaning of APPR, will be the traditional Center of Gravity procedure [11].

We are developing heterogeneous software agents (WebAgents) that are specialized in retrieve information about tourism and travel information from the Web [6, 7], once this information is retrieved and filtered is sent to the MetaWebAgent that requested the information. When WebAgents send the founded information a vector is built to characterize the agent request and the actual necessities of the MetaWebAgent. This vector {V}, represent a set of characteristics, grouped in general features about the WebAgent performance, like the time to answer to the MetaWebAgent request, or the number of records retrieve from the Web, etc... This information could be represented by (F1,F2,...,Fn), a set of values that characterizing the WebAgent performance in the system; each fi Fi, being Fi the range of possible numerical values of fi. WebAgent performance can then be related with a metric distance in the space of possible values. Given a numerical distance, di, between fi (values of two different components of the vectors k and l, respectively), Di is defined as the range of all possible values of di. To better cope with the intrinsic uncertainty of the perceived features, numerical values of the distance di will be mapped into qualitative symbolic descriptions. Thus, the computed distances will be transformed, through a fuzzification process, into linguistic variables. For each di, a linguistic variable, Ldi, is introduced together with its set of values {ldi1,ldi2,...,ldici}, each one labeling a fuzzy subset in Di (its meaning). The fuzzification operation applied to the numerical distance di, will result in their transformation into a fuzzy singleton [10], fuzzy subset whose membership function is the Kronecker delta, (d-di), in Di. A Fuzzy Relational Algorithm (FRA) [10] will store the knowledge required to obtain the appropriateness of the WebAgent request with the desired travel that the MetaWebAgent is looking for, based on the linguistic similarities between the features of each one. The FRA will be composed by a finite set of fuzzy conditional statements of the form IF ANDni=1(Ldi is ldiji) THEN (APPR is lsk), whose antecedents are conjunctions of fuzzy statements about the linguistic variables Ldi (i=1,2,...,n), and their consequent fuzzy statements about APPR, linguistic

4 Experiments Each possible WebAgent answer is characterized by a vector that is compared with a target pattern given by the MetaWebAgent. This pattern ([time, nº sols]) represents the desired behavior, or the target behavior, for the MetaWebAgent. Next sections shown the input-output variables and the different set rules used by the MetaWebAgent are:







Input Variables: As we said before, the two variables considered corresponding with the distances between two characteristics of WebAgents behavior. Each variable is defined by five membership functions (VeryNear, Near, Medium, Far, and VeryFar). Output Variables: One variable is considered and corresponding with the distances between two possible behaviors. We used five membership functions; (VeryNear, Near, Medium, Far, VeryFar).

Different rules and fuzzy sets could be applied in order to obtain different behaviors of the fuzzy distance. In the evaluation of the fuzzy system we have used to different kind of problems, with different target patterns and we will show how the fuzzy system could classify the different agents behavior. Next section describes in detail those experiments and the different set of rules used. Our experiments with Euclidean and Absolute distances verify that exist important changes between similar vectors due to the characteristics used. When a fuzzy distance is used the changes are became softer and the



Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference Copyright 2001 

importance of a characteristic could be defined through different rules. Therefore, it is necessary check it with some experiments that allow us verifies that a fuzzy distance should be useful to help to the MetaWebAgent to select its known WebAgents. In the experiments, MAPWeb obtains flights from different air flight companies (Iberia Airlines1, Avianca Airlines2) and meta-search flight companies (Amadeus3 and Four Airlines4). Besides, we considered only two possible flights: solutions with 0 and 1 transfers. The MetaWebAgent target pattern has been modified, and it has been proved: [0,20], [0,40], [0,100] patterns. Where, Time = 0 means that the MetaWebAgent wish the information as soon as possible. Only a subset of the experiments is showed. Those experiments show the different behavior among the WebAgents. To classify each agent behavior we use different distances: Master: this is the evaluation of an expert about the target request for a particular problem. Normal Euclidean distance (Local): This the evaluation using a euclidean distance for a particular kind of problem. Normal Euclidean distance (Global): This distance uses all the experiments, of any kind of problem. Fuzzy Distance: Finally this distance showed the fuzzy system classification for the showed experiment.

OUTPUT

1 VN N M F VF

0 -24

-14

-4

6

16

26

36

46

56

66

76

86

96

106

116

126

136

Fig.2c Fuzzy sets for the System Output. VN VN VN N M M

VN N M F VF

T I M E

NUM_SOLS M N N M F VF

N VN VN N M F

F M F F VF VF

VF F F VF VF VF

Table 1: Fuzzy Rules. Experiment 1



In Fig. 3, 4, 5 and 6 the results for the different WebAgents specialized in: (Iberia Airlines, Avianca Airlines, Amadeus, 4Airlines) companies are showed. It is possible see how the first WebAgent has a better behavior in our experiments, this is due the nature of the experiments. Most of the queries for searching flights had its departure and arrival cities in Spain and Europe. 0 Transfer [0-20] IBERIA 120



Master Normalized Euclidean (global) Normalized Euclidean (local) Fuzzy Distance 100



80

60



4.1 Experiment 1

40

20

This experiment use a set of queries that search for flights with 0-transfers, and the desired pattern is [0,20]. In Figs. 2a, 2b, 2c and Table 1, the fuzzy set and the rules used by the MetaWebAgent are showed.

0 1

4

7

10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85

Fig. 3: Fuzzy, Normalized Euclidean (local and global) distance to classify the behavior of WebAgent:Iberia. 0 Transfer [0-20] AVIANCA

VN N M F VF

120

TIME (0-Transfers)

Master Normalized Euclidean (global) Normalized Euclidean (local) Fuzzy Distance

1 100

80

0 -60

-30

0

30

60

90

120

150

180

210

240

270

300

330

360

390

Fig.2a Fuzzy sets for the request time

60

NUM_SOLS 40

1 VN N M F VF

20

0 -10

0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

160

170 0 1

Fig.2b Fuzzy sets for the Number of Solutions founded.

2

3

4

5

6

7

8

9

10

11

12

13

14

15

1

Fig. 4: Fuzzy, Normalized Euclidean (local and global) distance to classify the behavior of WebAgent:Avianca.

Iberia Airlines: http://www.iberia.com Avianca Arlines: http://www.avianca.com 3 Amadeus: http://www.amadeus.net 4 4Airlines: http://www.4airlines.com 2

Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference Copyright 2001 

0 Transfer [0-20] AMADEUS

OUTPUT

120 Master Normalized Euclidean (global) Normalized Euclidean (local) Fuzzy Distance

VN N M F VF

1

100

0 -24 -14 -4

80

6

16 26 36 46 56 66 76 86 96 10 11 12 13 14 15 6 6 6 6 6 6

Fig.7c: Fuzzy sets for the System Output.

60

40

T I M E

20

VN VN VN N M M

VN N M F VF

N VN VN N M F

NUM_SOLS M N N M F VF

F M F F VF VF

VF F F VF VF VF

Table 2: Fuzzy Rules. Experiment 2

0 1

3

5

7

9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73

Fig. 5: Fuzzy, Normalized Euclidean (local and global) distance to classify the behavior of WebAgent:Amadeus 0 Transfer [0-20] 4 Air Lines

In Fig. 8, 9, 10 and 11, the results for the different WebAgents specialized in: (Iberia Airlines, Avianca Airlines, Amadeus, 4Airlines) companies are showed.

120

1 Transfer [0-20] IBERIA

Master Normalized Euclidean (global) Normalized Euclidean (local) Fuzzy Distance

120 Master Normalized Euclidean (global) Normalized Euclidean (local) Fuzzy Distance

100

100 80

80 60

60

40

40

20

20

0 1

2

3

4

5

6

7

8

9

10

11

12

13

14 0 1

Fig. 6: Fuzzy, Normalized Euclidean (local and global) distance to classify the behavior of WebAgent:4Airlines.

4.2 Experiment 2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

1 Transfer [0-20] AVIANCA

This experiment use a set of queries that search for flights with 1 transfers, and the desired pattern is [0,20]. In Figs. 7a, 7b, 7c and Table 2, the fuzzy set and the rules used by the MetaWebAgent are showed. VN N M F VF

2

Fig. 8: Fuzzy, Normalized Euclidean (local and global) distance to classify the behavior of WebAgent:Iberia 120 Master Normalized Euclidean (global) Normalized Euclidean (local) Fuzzy Distance 100

80

60

TIME (1-Transfer)

1

40

0 -240 -120

0

120 240 360 480 600

720 840 960 1080 1200 1320 1440 1560 1680 1800 1920 2040 2160 2280 2400 2520

20

Fig.7a: Fuzzy sets for the request time VN N M F VF

NUM_SOLS

0 1

0 -10

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Fig. 9: Fuzzy, Normalized and Expert distance to classify information retrieved by WebAgent::Avianca.

1

0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

160

170

Fig.7b: Fuzzy sets for the Number of Solutions founded.

Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference Copyright 2001

in a Multiagent system because it is possible to optimize the number of queries requested among different agents and so on, minimize the number of access to the Web that has a very high computational cost. Finally, this fuzzy distance has being incorporated like a new skill into different software agents that need to classify different retrieve information or the behavior of other agents in the system.

1 Transfer [0-20] AMADEUS 120 Master Normalized Euclidean (global) Normalized Euclidean (local) Fuzzy Distance 100

80

60

40

Acknowledgments 20

The research reported here was carried out as part of the research project funded by CICYT TAP-99-0535-C02. (http://decsai.ugr.es/$\sim$lcv/SEPIA/tap990535-c02-01.html)

0 1

3

5

7

9

11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61

Fig. 10: Fuzzy, Normalized Euclidean (local and global) distance to classify the behavior of WebAgent::Amadeus. 1 Transfer [0-20] 4 Air Lines

References

120 Master Normalized Euclidean (global) Normalized Euclidean (local) Fuzzy Distance

1. Bond A.H., L.Gasser. Readings in Distributed Artificial Intelligence. San Franciso California, Morgan Kaufmann, 1988. 2. Brenner W., Zarnekow R., Wittig H.. Intelligent Software Agents. Foundations and Applications. Springer-Verlag, 1998. ISBN: 3540-63411-8. 3. Camacho D., Hernández C., Molina J.M.. Information Classification in Web Agents using Fuzzy Knowledge for Distance Evaluation. WSES International Conference on: Fuzzy Sets & Fuzzy Systems (FSFS-01). Puerto De La Cruz, Tenerife, Canary Islands (Spain). February 2001. 4. Camacho D., Molina J.M., Borrajo D., Aler R.. MAPWEB:Cooperation between Planning Agents and Web Agents. Information & Security: An International Journal. Volume 7. 2001. 5. Etzioni O. Moving Up the Information Food Chain. In AI Magazine, volume 18, nº 2, pages 11-18, summer 1997. 6. Gasser L. An Overview of DAI. Distributed Artificial Intelligence: Theory and Praxis. Kluwer Academic Publishers. 1992. 7. Jennings N.R., M. Wooldridge. Agent Technology: Foundations, Applications, and Markets. ISBN: 3-540-63591-2. SpringerVerlag. 1998, pp 3-28. 8. Pinkerton B., Finding What People Want: Experiences with the WebCrawler. The Second International WWW Conference Chicago, USA, October17-20,1994. 9. Selberg E. and Etzioni O. The MetaCrawler Architecture for Resource Aggregation. IEEE Expert, January/February 1997, Volume 12 No. 1, pp. 8-14. 10. Zimmermann H.J., Fuzzy Sets, Uncertainty and Information, Kluwer 1985.

100

80

60

40

20

0 1

3

5

7

9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77

Fig. 11: Fuzzy, Normalized Euclidean (local and global) distance to classify the behavior of WebAgent::4Airlines.

4.3 Results and Discussion From the Results showed in Figures 3, 4, 5, 6 and 8, 9, 10, 11, it is possible show how the Fuzzy distance is more closed to the Master or the expert opinion and so on, allows to the MetaWebAgent classify better the WebAgents behavior. These figures show the interpolation capabilities of fuzzy system obtaining a gradual distance. The gradual distance values allow discriminating among the WebAgents more accurately.

5 Conclusions Euclidean distances do not seem very appropriate to our domain, because these distances have the same importance to all the components, and in our domain it is necessary assign different importance to different components. The fuzzy systems are especially useful because relies on the different evaluation that could make for each concept. The experiments shown how a MetaWebAgent classify the WebAgent behavior using a set of rules that could be fit by an expert. This characteristic is very useful to gain efficiency

Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference Copyright 2001