A8 World Wide Web (2012) 15:285–323 DOI 10.1007/s11280-011-0134-4
Automating user reviews using ontologies: an agent-based approach Murat Sensoy ¸ · Pınar Yolum
Received: 18 June 2009 / Revised: 17 May 2011 / Accepted: 19 May 2011 / Published online: 9 June 2011 © Springer Science+Business Media, LLC 2011
Abstract The Web is becoming a global market place, where the same services and products are offered by different providers. When obtaining a service, consumers have to select one provider among many alternatives to receive a service or buy a product. In real life, when obtaining a service, many consumers depend on the user reviews. User reviews—presumably written by other consumers—provide details on the consumers’ experiences and thus are more informative than ratings. The down side is that such user reviews are written in natural language, making it extremely difficult to be interpreted by computers. Therefore, current technologies do not allow automation of user reviews and require too much human effort for tasks such as writing and reading reviews for the providers, aggregating existing information, and finally choosing among the possible candidates. In this paper, we represent consumers’ reviews as machine processable structures using ontologies and develop a layered multiagent framework to enable consumers to find satisfactory service providers for their needs automatically. The framework can still function successfully when consumers evolve their language and when deceptive reviewers enter the system. We show the flexibility of the framework by employing different algorithms for various tasks and evaluate them for different circumstances. Keywords e-commerce · service selection · ontologies · trust
M. Sensoy ¸ (B) Department of Computing Science, University of Aberdeen, AB24 3FX, Aberdeen, Scotland, UK e-mail:
[email protected] P. Yolum ˘ Department of Computer Engineering, Bogaziçi University, Bebek, 34342, Istanbul, Turkey e-mail:
[email protected] 131
286
World Wide Web (2012) 15:285–323
1 Introduction What would you do if you need to buy a camcorder? You would probably visit one of the electronic marketplaces such as E-bay and browse through available camcorder sellers. Since the selection size is large (and your knowledge of camcorder sellers is mediocre), you would start looking through the ratings of the sellers. However, ratings actually do not say much about what kind of service a seller provides. Therefore, you would try to read the reviews that have been written for the sellers. Some reviews are far too general and would not give enough detail for you to judge the seller. Better reviews are those that explain explicitly what the reviewer was expecting to receive and what she actually got. You would try to base your decision on the reviews that are written by people who have similar service demands as you have. What if the review contains some jargon that you are not familiar with? You would search the Web and find out what these mean and reevaluate the review in the light of this information. Overall, you would use the information provided by other consumers in your shoes to find a suitable seller for your own situation. The above scenario describes how user reviews are used currently by consumers to select appropriate service providers. Consumers depend on each other to find useful service providers to handle their service needs. However, current technologies limit the usability of the reviews and require humans to participate actively in the interactions. Alternatively, we envision a multiagent system that consists of consumer agents that represent human users. Agents act as automated assistants for their users and help them find correct service providers to satisfy their service needs. The interesting question is how should these agents be built and what kind of a framework should they be situated in to be able to automate the activities of humans. The main activity of a consumer agent is to request different consumers for their reviews about the sellers and to read thousands of reviews for each seller within minutes. The main outcome of this activity is to generate models of the sellers for different service requests. For example, based on others’ reviews, the user agent should be able to draw conclusions such as a service provider Bob delivers a product three days later than he promises. Or, a service provider Rick delivers the products on time but the products are sometimes damaged. If the consumer agent knows its user’s expectations, it can successfully estimate which seller is the most satisfactory for the user. However, there are several challenges with this scenario: 1. Reviews are written in natural language and are meant to be understood by humans; not by machines. 2. Some reviewers are inherently more trustworthy than others. 3. Some reviewers may use a jargon that is not familiar to all users. To tackle the first challenge, experiences are previously proposed as a machine understandable form of reviews [30]. An experience states the story between a consumer and a provider regarding a specific service demand using an ontology. It explicitly defines the service demand of the consumer, the service supplied in response to this demand by the provider as well as contracts, commitments and fulfillments of the parties regarding this specific transaction. Since experiences make the context of service dealings explicit (as opposed to ratings), experience-based service selection approaches make context-aware service selections possible. Previously 132
World Wide Web (2012) 15:285–323
287
studies on experience-based approaches have showed that they outperform ratingbased approaches in terms of accuracy of finding suitable service providers [30]. Trust has been widely studied in multiagent literature [14, 24]. However, most of these approaches operate on ratings, rather than experiences. We have previously shown how experience-based approaches can employ a trust-based information filtering approach to differentiate between trustworthy and untrustworthy experiences [34]. Traditionally, approaches to service selection [18, 21, 45] assume that all agents share the same service semantics. That is, agents share a common representation, such as an ontology, through which they can represent their service needs in the best way possible. However, in real life consumers’ needs evolve over time and new concepts may be necessary for them to describe their evolving service needs. The main contribution of this paper is the design and evaluation of a layered agent environment for service selection. The core of the framework is the representation of the experiences. Various modules for trust, decision making, and so on exist to improve the success of the environment. For these modules, the environment flexibly supports many existing approaches from the literature. Using this agent environment enables consumers to cooperatively evolve the semantics of their service needs. Further, using this framework, agents can learn to differentiate between real and fraudulent experiences. Careful treatment of these problems enables us to automate user reviews realistically and correctly. We evaluate the framework over simulations. The rest of the paper is organized as follows. In Section 2, we summarize the proposed architecture and the integrated approach for the agent-based automation of user reviews. In Section 3, we describe how user reviews are represented semantically and shared by agents. In Section 4, we explain how the shared information is used to select a service provider for a specific service demand. In Section 5, we describe mechanisms for agents to evolve their ontologies in order to express their service needs more concisely during their interactions with others. In Section 6, we integrate trust mechanisms into the proposed approach so that the deceptive information could be filtered during service selection. In Section 7, we evaluate our approach in detail under various settings. In Section 8, we discuss the proposed approach and compare it with related work. Finally, we present our conclusions in Section 9.
2 Agent-based automation of user reviews Our proposed framework is based on automated consumer agents that help their users find useful service providers. In doing so, the agents’ main task is to maintain, store and exchange experiences of providers with other service consumers. Unlike traditional approaches in the literature [32, 34, 38, 40, 41, 46], this framework allows agents to evolve service semantics and trust relations among themselves. Figure 1 outlines the agent architecture that we propose to enable fully automated service selection. This architecture is composed of different modules. Each module is a software component with a well-defined interface, so that modules can interact only through their interfaces. This lets each module in the architecture be implemented independent of other modules [36]. In a typical e-commerce setting, a human user has some service demand such as buying a book or finding exchange rates for some currencies. The user interacts with 133
288
World Wide Web (2012) 15:285–323
Figure 1 Architecture of consumer agents.
its agent through a user interface. User Modeler module of the agent is responsible for grasping the current service demand of the user. Ideally, the user provides as little information to the agent as possible but the agent understands the service demand of the user succinctly. Using the user modeler, the agent also models its user over interactions and learns the user’s expectations and her satisfaction criteria (taste) for the specific service demand. For this purpose conjoint analysis techniques can be used [27]. During its interactions with the user, user modeler tries to represent the demand of the user using the concepts from the local ontology of the agent. If the concepts in the local ontology are not enough, a new concept is created by the Semantics module to represent the demand properly. After understanding the user’s demand, the agent uses its past experiences or it contacts other consumer agents to gather information about possible service providers. To communicate with necessary agents, a common communication Protocol is required. The protocol will take care of tasks related to contacting others, collecting experiences, and so forth. The experiences are then stored in an Information Repository. If the collected experiences are defined using a common vocabulary, the agent can understand and use them. However, if the experiences refer to unknown concepts or services, then the agent needs to find out the meanings of these concepts and update its ontology appropriately. These tasks are handled again by the Semantics module. Semantics module looks for unknown concepts within these experiences. If an unknown concept is encountered, the concept is learned from the 134
World Wide Web (2012) 15:285–323
289
owner of the experience. The learned concept is then added to the local ontology by the semantics module. Next, the agent uses the Decision Maker and the experiences within the Information Repository to select one service provider that is expected to satisfy the user and requests the service. Finally, the agent requests feedback about the supplied service from the user and stores this information internally (i.e., as a personal experience). This feedback is used to identify trustworthy agents that can be contacted in following interactions (in the Trust module and to filter out untrustworthy experiences and agents.
3 Representing and sharing user experiences The key component of our architecture is the experience metaphor, which captures an instance of a consumer’s dealings with a provider over a single service. We have previously studied how experiences can be captured computationally [30]. In order to express their experiences with the service providers, service consumers use a common OWL ontology for a specified service domain. This ontology covers the fundamental concepts (such as demand, service, commitment and experience), which exist in the base level ontology and domain specific concepts and properties, which exist in the domain level ontology. Using these concepts and properties, a service consumer can express its service demands and experiences. The base level ontology (Figure 2) consists of the domain-independent infrastructure of the experience ontology. The main class in the base level ontology is Experience. Instances of this class represent the experiences of service consumers in the system. As in real life, an experience in the ontology contains information about what a service consumer has requested from a service provider and what the service consumer has received at the end. To conceptualize the service demand
Class ServiceProvider
isA
Class Owner
isA
Class
ObjectProperty domain hasCommitment ObjectProperty hasFulfillment
range
ServiceConsumer
ObjectProperty hasOwner
DataTypeProperty hasDate
domain
range
Class Fulfillment isA isA
Class
range
Description
xsd: date isA
Class Service
isA
Class Demand
domain
range
domain ObjectProperty hasResponsibility range
Class Responsibility
isA
isA
Class PreCondition
range
ObjectProperty
Class
hasService
Experience
ObjectProperty hasDemand
Class Commitment
Class SimilarDemand
isA
Class ConditionalCommitment domain ObjectProperty hasPreCondition range
domain
Figure 2 Base level ontology [30]. 135
290
World Wide Web (2012) 15:285–323
and the received service of the consumer, Demand and Service classes are included in the base level ontology. Both the demand and the supplied service concepts are descriptions of a service for a specific domain and hence share a number of properties. These shared properties are captured in the Description class in the base level ontology. The domain level ontology contains extensions to this class. Domain-dependent properties of the Description class can be used to describe service demands, supplied services, responsibilities and fulfillments of sides during transactions. These properties are shown in domain level ontology. Description class contains an owner and a date field. For a demand, the owner is a service consumer and for a service, the owner is a service provider. The date value keeps the date of demanded service or the provided service. While the base level ontology deals only with domain independent concepts, the domain level ontology deals with domain dependent concepts. The core class of domain level ontology is the Description class, which refers to the same Description class in the base level ontology. Domain specific properties of Description class are used to describe service demands, supplied services, responsibilities and fulfillments of parties during transactions. A domain level ontology for online shopping is shown in Figure 3. This ontology contains domain specific concepts such as ShoppingItem, Location, DeliveryType and Quality, as well as domain specific properties such as hasShoppingItem, toLocation, hasDeliveryType, hasDeliveryDuration, hasShipmentCost, and hasPrice. Those concepts and properties are used to describe consumers’ experiences in online shopping. Service consumers maintain, exchange, and interpret experiences related to the providers. Since these experiences are expressed in OWL, they can be interpreted easily by the agents using an OWL reasoner such as Pellet [35]. In Figure 4, an example experience is shown. This experience is explained in Example 1.
Class DeliveryType
DataTypeProperty hasShipmentCost
domain
domain
range
DataTypeProperty
ObjectProperty hasShoppingItem
hasPrice
isA domain
domain ObjectProperty
Class Location
isRefundable
domain
isDamaged range
range DataTypeProperty hasCustomerSupport
Class Service
DataTypeProperty DataTypeProperty
toLocation
DataTypeProperty isAsDescribed
domain
domain range
xsd: double
domain
domain
Description
domain
range
range xsd: boolean DataTypeProperty hasStockInconsistency range
Class
range
Class Quality
DataTypeProperty hasUnitPrice
domain
Class shoppingItem
hasDeliveryDuration
ObjectProperty hasDeliveryType
xsd: double
domain
DataTypeProperty
range
range
range
range
ObjectProperty hasQuality
DataTypeProperty range xsd: integer hasQuantity
range
xsd: boolean
DataTypeProperty range
didRecieveMerchandise
Figure 3 Domain level ontology for online shopping [30]. 136
World Wide Web (2012) 15:285–323
291
Figure 4 An experience that is about buying a notebook from a seller named TechnoShop.
Example 1 In her experience (represented in Figure 4), the buyer states that she ordered a Xel T60 notebook from a seller named TechnoShop on 15 October 2007. She requested the merchandise to be delivered to New York within 14 days. The seller received $700 for the product and delivered the merchandise within 7 days without requesting any extra money for shipping. However, the delivered product was not refundable and TechnoShop did not provide any customer support. When a consumer has a new service demand and only a few or no direct previous interactions with the service providers, it needs to collect information about the service providers from other consumer agents. This information is used to compute the expected behavior of the providers for the current service demand of the consumer. Behavior of the providers may change considerably in different contexts. For example, while a provider delivers bicycles on time, it may deliver cars with a delay. This implies that a consumer may model the behavior of service providers with respect to its specific service demand. Therefore, the consumers that have had similar service demands in the past may provide more useful information about the providers. Those consumers are contacted to provide the information related to the service providers. In rating-based service selection approaches, the collected information is the ratings of providers. Ratings reflect the subjective opinion of the raters. Therefore, 137
292
World Wide Web (2012) 15:285–323
ratings may mislead the consumers in the cases where the satisfaction criteria of the consumers using these ratings are different from the satisfaction criteria of those that provide the ratings (as shown in Example 2). Unlike ratings, experiences do not reflect the subjective opinion of their creators. Therefore, any consumer receiving an experience can evaluate the service provider according to its own criteria using the objective data in the experience. Example 2 Consider the experience in Figure 4 and assume that there are two different consumers (Bob and Lucy) who receive this experience. For Bob, delivery duration and price are crucial whereas customer support or being refundable are not important. On the other hand, for Lucy, being refundable and having customer support are indispensable. Therefore, for Bob, TechnoShop is a very good provider and deserves a good rating, because it delivers products within one week without requesting any extra money. However, for Lucy, TechnoShop is not preferable. However, by plain ratings, Bob’s positive ratings of TechnoShop would have misled Lucy. When an agent formulates an experience, it can share it with other agents. This is done in a “pull” manner, such that an agent requests the experiences of other agents based on some criteria. The communication for this is handled in the Protocol module. This module maintains a model of other agents in an acquaintance list. An agent’s model contains the identifier of the related agent and its service interests (known service demands). This model is an estimate of the agent and may (and will probably do) change over time. Initially, each agent is aware of some other agents through directory services but does not know the interests of these agents. However, as a result of the interactions between agents, service interests of these agents are learned over time. When a service consumer has a service demand, it uses experiences related to similar demands to estimate which of the providers are more likely to produce a satisfactory service for the service demand. If the consumer does not have enough experiences related to similar service demands, it sends an Experience Request message to other agents through its protocol module. This message contains a description of similar demands. To allow a consumer to express its description of similar demands, SimilarDemand concept is included in the experience ontology. This concept is a subclass of the Demand concept. A service consumer can express what a similar demand is with respect to its similarity criteria using Semantic Web Rule Language1 (SWRL). A simple rule for similarity is shown in Figure 5. In this rule, the consumer states that a demand is a similar demand only if it concerns a notebook and requires the delivery within two weeks. SWRL is introduced as a way to integrate rules with OWL-DL ontologies. Unlike other rule languages such as RuleML,2 SWRL is purposely constrained to make automated reasoning more tractable. Hence, using SWRL rules, consumers can represent logical axioms and reasoning on those axioms can be done in a tractable manner. That is, if a consumer has a particular service demand and a list of others’
1 http://www.w3.org/Submission/SWRL 2 http://www.ruleml.org/0.91/
138
World Wide Web (2012) 15:285–323
293
Figure 5 Example SWRL rule for similar demands.
service demands, then it can apply the SWRL rule representing its similar demand definition to select those demands which are similar to that of its own. If the consumer makes its SWRL rule for similar demands public, other consumers can also use this expression of similarity to reason about whether their past service demands are similar to the demand of the consumer or not. When the protocol module gets an experience request message, it reasons about the known demands of its acquaintances to figure out which of these agents have similar service demands. This is achieved using the SWRL rule in the message and a reasoner. Then, the protocol module forwards this message to the agents with similar service demands. Messages are received by an agent through its protocol module. When the protocol module of another agent receives an experience request message, it passes the message to information repository through semantics module. Information repository examines the personal experiences using the SWRL rule within the message and returns its related personal experiences to the message originator through the protocol module. In our previous examples, information repository responds with the personal experiences that are related to buying a notebook and delivering it within two weeks. Protocol module adds new agents into its acquaintance list or updates the service interest of the agents within its acquaintance list based on two message types: Peer Discovery Message (PDM) and Request for Acquaintances Message (RAM). Both PDM and RAM messages contain a SWRL rule that describes the similar demand criteria of the originator of the message. When a consumer Y receives a PDM message, it checks if its service demands are similar to that of the originator X. If so, it notifies X and X adds Y as a new acquaintance entry in its acquaintance list. This entry contains identity of Y and its demands classified as similar by the similarity criteria of X. The consumer Y also forwards the request to a set of service consumers in its acquaintance list. Y selects consumers having demands similar to X’s demand to forward the request. If there is no such consumer, Y randomly selects consumers from its acquaintance list. How long the request is going to be forwarded is controlled using a time-to-live field. All other agents that receive the request act the same way Y does. When Y receives a RAM message from the originator X, it checks its acquaintance list for entries containing consumers having demands similar to the demand of X. Then Y sends these entries to X. So, X can add these entries to its acquaintance list. 139
294
World Wide Web (2012) 15:285–323
4 Decision maker Decision maker selects service providers that are expected to satisfy the user. For this purpose, it learns the service characteristics of each provider using the consumers’ past experiences with the providers. Deciding if a provider will produce a satisfactory service for a specific service demand can be considered as a two-class classification problem. Once a training set is composed, classifiers such as Decision Trees or Naive Bayes can be trained for this purpose. 4.1 Decision making based on reviews As stated above, decision maker module depends on the consumers’ past experiences to decide on a provider among many others. Therefore, it requests experiences related to the current service demand from information repository. For example, if the consumer needs to buy a notebook, decision maker asks for the experiences that are related to “buying notebooks”. After getting related experiences, decision maker module evaluates each experience using satisfaction criteria of the consumer. Each consumer has an internal taste function Ftaste (namely satisfaction criteria) to evaluate its transactions with the service providers in the context of its service demands. In real life, the taste of a customer may change over time. Hence, this function should be time dependent and updated regularly by the user modeler. Once the user modeler elicits the taste function from the customer through an interface, decision maker can easily compute its expected level of satisfaction for a specific experience. In other words, the consumer can produce its expected level of satisfaction for the experience by asking itself how satisfied it would be, had it lived the experience under consideration. Using the experiences about service providers, decision maker estimates which of the providers produce a satisfactory service for a specific service demand. For this purpose, it can use classifiers as follows. Demand and service specifications within experiences are received in the form of ontologies, but then they are converted into the internal representation of the service consumer. Demand information in each experience is represented as a vector. Each field in this vector is extracted from the experience ontology. These fields correspond to property values in the experience ontology such as service price. Then, supplied service for this demand is classified as satisfied or dissatisfied with respect to satisfaction criteria of the consumer using the taste function and ontological reasoning [22]. Lastly the (vector, class) pairs are used as training set to train a classifier, where possible classes are satisf ied and dissatisf ied. The trained classifier is used to expect whether the current demand is to be satisfied or not. Various classifiers can be used by decision making module. In this work, a parametric classifier based on Gaussian model [2] is used for decision making as follows. First, for each class, covariance and mean are extracted from the training set. Then, a discriminant function is defined to compute the probability of satisfaction [8]. The service consumer performs this computation for every service provider and chooses the provider with the highest satisfaction probability. Equations (1), (2) and (3) formulate this computation. In these equations, Ci refers to the ith class and d represents the number of dimensions of the demand vector X. Note that there are
140
World Wide Web (2012) 15:285–323
295
two classes: the first class is satisf ied and the second class is dissatisf ied. For the ith class, mean and covariance are represented by µi and !i , respectively. The mean µi is a vector with d dimensions and refers to the mean of the demand vectors in Ci . That is, each element of µi refers to the mean of the corresponding dimension of the demand vectors in Ci . The covariance !i is a matrix of d × d dimensions and each element of it refers to the correlation between the corresponding dimensions of the demand vectors in Ci . Equation (1) formulates the class likelihood p(X|Ci ): the probability that the demand X is observed in class Ci . In the equation, T is the transpose operator. Using Bayes’ Rule, (2) formulates the posterior probability p(Ci |X): the probability that the demand X is in class Ci . In (2), p(X) refers to the probability that demand X is observed and it is computed as p(X) = p(X|C1 ) + p(X|C2 ) in this case. Similarly, p(Ci ) refers to the prior probability that the class Ci is observed. Lastly, the discriminant function for the ith class, gi (X), is formulated as in (3) [8]. In this way, we are performing a maximum posteriori estimation. The higher the computed g1 (X) value is, the more likely the provider under consideration satisfies demand X.
p(X|Ci ) =
" ! exp − 12 (X − µi )T !i−1 (X − µi )
(1)
p(Ci |X) =
p(X|Ci ) p(Ci ) p(X)
(2)
(2π )2/d |!i |1/2
" ! ! " ! " ! " gi (X) = log p (Ci |X) + log p(X) = log p (X|Ci ) + log p(Ci )
(3)
Parametric classifiers, like the one described above, assume a probability density function for each class and try to estimate parameters of this function. However, in some settings, it may be either not realistic to assume a probability density function for each class nor possible to determine its parameters correctly. On the other hand, non-parametric classifiers (e.g., k-Nearest Neighbor, Decision Trees and so on) do not have any assumption about the probability density function for each class. In this work, we mostly use parametric classifiers for decision making. However, in Section 7.3.5, C4.5 decision tree classifiers are used for analysis. Hence, below we briefly describe how a C4.5 decision tree classif ier [8] can be used for decision making. Using C4.5 algorithm [23], a decision tree classifier is constructed as follows. As described above, each demand has a set of attributes (e.g., price, delivery duration, shopping item and so on). While building a decision tree, we first determine the best attribute to make as the root node of the decision tree. This attribute is the one that best classifies training examples and determined by the C4.5 algorithm using information theory. That is, the attribute having the highest information gain is selected. Information gain of an attribute is computed based on information content calculations [8], as described briefly below. Assume that using an attribute A as the root of the tree will partition the set of training examples S into disjoint subsets {S1 , S2 , . . . , St }. Let RF(Ci , S) denote the relative frequency of cases in S that belong to class Ci . The information content of
141
296
World Wide Web (2012) 15:285–323
S is then computed using (4). Based on (4), the information gain for A is computed using (5). I(S) = −
2 # i=1
(4)
RF(Ci , S) × log(RF(Ci , S))
G(S, A) = I(S) −
t # |Si | i=1
|S|
(5)
× I(Si )
Once the attribute representing the root node is selected based on its information gain, each value of the attribute leads to a branch of the node. These branches divide the training set used to create the node into disjoint sets {S1 , S2 , . . . , St }. Then, we recursively create new nodes of the tree using these subsets. If Si contains training examples only from the class Ci , we create a leaf node labeled with the class Ci ; otherwise, we recursively build a child node by selecting another attribute based on Si . This recursive process stops either when the tree perfectly classifies all training examples, or until no unused attribute remains. The C4.5 algorithm [23] operates on both continuous and discrete attributes. It can build a decision tree when training data contains missing values for attributes and provides pruning techniques to reduce the size of the resulting tree. The C4.5 algorithm is further extended in [44, 48] and [15] by exploiting domain knowledge during tree induction. That is, nominal values of attributes are generalized using a taxonomy of attribute values, which can easily be derived from domain ontologies [48]. Figure 6 shows an example decision tree built for a service provider based on users’ past experiences with this provider, using C4.5 algorithm with domain knowledge. Given a service demand, this decision tree can be used to determine if the service provider is expected to supply a satisfactory service. 4.2 Missing information in reviews While sharing his experiences with others, a service consumer may not always expose all details of these experiences. That is, he may omit or hide some information about these experiences before sharing them with others. Missing information in the shared experiences may be based on various reasons. First, the consumer may believe that certain information is insignificant and simply omit it while sharing his experiences. Second, certain information in his experiences may be considered confidential by the consumer and it is hidden while sharing these experiences (e.g., the price he is Figure 6 A decision tree built for a service provider based on users’ past experiences with the provider.
142
World Wide Web (2012) 15:285–323
297
willing to pay for a specific service). Lastly, some information about an experience may be unavailable to or forgotten by the consumer, so this experience is shared by the consumer with some missing information. Missing information in the shared experiences may prevent consumers from using these experiences during their decision making. Especially, an experience cannot be used by a consumer if the consumer cannot evaluate it using his taste function. For example, assume that satisfaction of a consumer depends mainly on delivery duration. In this case, the consumer cannot compute how satisfactory an experience is if this experience has a missing delivery duration information. Such experiences cannot be evaluated, so they are disregarded during decision making. On the other hand, for another consumer, these experiences may still be of high value if this consumer’s taste function does not depend on the missing information. If the missing information in an experience does not prevent its evaluation by the consumer’s taste function, the consumer can classify the experience and use it while creating a training set as described in Section 4.1. Then, this training set is used to train a classifier as explained before to make decisions about the service providers. Let us note that the resulting training set may contain some missing attribute values. Classifiers such as C4.5 decision tree classifier has integrated mechanisms to handle missing values while the others (e.g., the parametric classifier described above) simply eliminate the training examples with missing values [8]. Training classifiers with incomplete data is an active research area and detailed information can be found elsewhere [9, 43].
5 Semantics Agents use ontologies in their communications to understand each other. Although agents share a common ontology to represent their experiences, this common ontology will become limited over time for several reasons. In many realistic settings, new concepts will emerge over time. Either, the common ontology needs to be updated centrally or the agents should update their ontologies locally when they see fit. The first option is costly since it requires a central body to monitor the world to update the common ontology. Therefore, an agent should be able to enrich its ontology by creating or learning new concepts to represent its service needs better. However, this may prevent others to understand the agent if the new concepts are not known by them. Semantics module is responsible for enriching an agent’s local ontology as well as coordinating with others about the semantics of new concepts that are added to the ontology. If the agent (through its user modeler) realizes that the known concepts are not enough to represent the user’s current service need, it generates a description of the required new service concept and sends it to the semantics module. There are two approaches for describing concepts. First, an agent can use a shared meta-ontology that is composed of primitive concepts and properties. In this case, each concept is described in detail using the meta-ontology [33] and Description Logics (DL) [3]. Using DL to describe new concepts enables us to compare concepts using off-theshelf DL-reasoners such as KAON2 [16]. That is, using DL reasoning, we can test if a concept subsumes another one or two concepts are semantically equivalent based on their descriptions [3]. Example 3 demonstrates a simple case. Second, an agent 143
298
World Wide Web (2012) 15:285–323
can use instances (positive examples) and non-instances (negative examples) of a concept to describe the concept [7, 28, 31]. Example 3 A user wants to buy a small car with five doors. However, his agent does not have a concept corresponding to this type of a car in its ontology. Hence, it cannot gather and interpret experiences highly specialized for the user’s demand. The agent creates a semantic description of the desired concept by defining its necessary and sufficient properties; a car with a cardinality restriction on its size and number of doors. This description is made using only the primitive concepts and properties that exist in the common meta-ontology. Therefore, the agents sharing this ontology can understand the description clearly. Upon receiving the description of a new concept, the semantics module composes a message containing the description. Then, using the protocol module, it sends this message to other agents having service interests related to a similar concept (called neighbors hereafter). The agents receiving this message examine their ontology for a semantically equivalent concept [7, 33] using their own semantics modules. They send the names of the matching concepts to the sender of the message. This means that the requested concept has already been known in the society. If the responding agents return one common name for the concept (e.g., Hatchback), then the agent adds the new concept to its ontology using the received name. If more than one name is received, it notes that all of these names are synonyms (they refer to the same concept) and informs the neighbors about these synonyms. If the sender does not get any name for its desired new concept, this means that this new concept is not known by its neighbors yet. In this case, the sender adds the new concept into its ontology using a name from its unique name-space [33] (e.g, URI) and teaches this new concept to its neighbors by providing the description and the name of the new concept. This approach has two important advantages. First, by informing neighbors about the learned synonyms, overhead of mapping concepts from different ontologies is significantly alleviated. Second, by teaching newly created concepts to its neighbors, the agent ensures common semantics for these new concepts and disallows its possible future communication problems related to these concepts. Exchanged concept descriptions are used to teach concepts to other agents. If a shared meta-ontology is used to describe a concept, learning the concept is reduced to placing it into a right place in the local ontology. If two concepts have the same descriptions, then they are semantically equivalent. The underlying intuition of this approach is that a concept in an ontology can be described using its parent concept and some extra features that its parent does not have [3]. If instances are used to describe a concept, then learning the concept is reduced to a classification problem [7, 28]. The learner agent uses negative and positive examples within the description and trains a classifier for the concept. This classifier can decide whether a given object is an instance of the concept or not. In this approach, two concept are similar as much as their classifiers classify the same instances similarly. Therefore, if two classifiers classify each other’s positive example as positive and negative examples as negative, then their related concepts are semantically equivalent. In the proposed architecture, semantics module examines all of the incoming messages. If an unknown concept within a message is encountered, semantics module 144
World Wide Web (2012) 15:285–323
299
requests the description of this concept from the sender of the message. Then, it learns the concept using this description. Although we introduce two methods in this section, semantics module in our framework is flexible enough to transparently use other concept learning, ontology mapping and alignment methods from the literature if desired. 6 Trust Decision maker module prioritizes the agent’s personal experiences when deciding on a service provider. However, in many practical settings, transactions with providers will be expensive, prohibiting the agent to try various providers liberally. When this is the case, the agent should be able to use the experiences of others that it trusts. Using the experiences of only the trustworthy agents is crucial, because the open multiagent systems cannot pose restrictions on the correctness of the experiences provided by the agents. That is, agents may lie about the details of their experiences or fabricate fake experiences if they will. The use of such incorrect information may result in wrong service decisions. Hence, it is mandatory to filter out untrusted experiences during decision making. For a consumer, an experience is considered trustworthy if it leads the consumer to satisfactory service decisions. As usually assumed in the literature [14, 24], untrustworthy users are responsible for the dissemination of untrustworthy experiences, while trustworthy experiences are shared by the trustworthy users. Therefore, there is a one-to-one relationships between trustworthiness of experiences and that of their owners. For the sake of clarity, hereafter we refer owner of a shared experience as a reviewer. In this section, we propose a way of computing trustworthiness of reviewers based on the experiences shared by them. Then, an experience is considered trustworthy only if it belongs to a trustworthy reviewer. Trust is a personal and context-dependent concept [24]. A reviewer can be considered untrustworthy by a consumer, while another consumer regards the same reviewer trustworthy. More surprisingly, a consumer can consider a reviewer trustworthy in one context, while it regards the same reviewer untrustworthy in another context [45]. Examples 4 and 5 shortly describe these two cases, respectively. These examples imply that trust of a consumer to a reviewer depends on the contextdependent personal satisfaction criteria of the consumer. Example 4 Consider two consumers. For the first consumer, price is the most important factor while shopping online, but delivery duration is not significant. However, for the second consumer, delivery duration is far more important than price. Assume that a reviewer shares his experiences, but he lies about delivery duration, while giving correct information about the price. The information provided by this reviewer is useful for the first consumer, because delivery duration has no effect on the consumer’s satisfaction. However, this reviewer is highly misleading for the second consumer whose decisions highly depend on delivery duration. As a result, unlike the second consumer, the first consumer considers the reviewer trustworthy. Example 5 For a consumer, price is far more important than delivery duration while shopping for himself. However, while buying a birthday present for someone else, 145
300
World Wide Web (2012) 15:285–323
delivery duration may be crucial. The present should not be delivered after the birthday even if its price is very low. Hence, experiences of the reviewer in Example 4 is useful for the consumer only if he is not buying a birthday present; otherwise, the consumer should consider these experiences deceptive. Majority of the service selection approaches in the literature depend on plain ratings, instead of semantically described experiences. It has been shown that unfair ratings may negatively affect the service decisions of the consumers [14]. In order to deal with unfair and deceptive ratings, different trust approaches are proposed [40, 41, 46, 47]. These approaches have strong theoretical background and analyzed well by the researchers. However, they are limited by the deficiencies of the ratings. First, the ratings do not carry any contextual information. Hence, rating-based approaches cannot compute trust in a context-aware way. Second, ratings reflect the subjective opinion of the raters. Two honest consumers may give conflicting ratings to the same service, just because they have different satisfaction criteria (as shown in Example 2). Therefore, rating-based trust approaches consider only one of these two consumers trustworthy while regarding the other one untrustworthy. Unlike ratings, experiences are context-aware and can be interpreted personally by consumers depending on their own satisfaction criteria. In this section, we propose a method to adapt existing rating-based trust approaches so that a consumer can employ these approaches to determine trustworthiness of reviewers in a contextaware way using its own satisfaction criteria. In rating-based approaches, it is usually assumed that for each transaction with a provider P, a consumer K produces a rating. This rating is a simple numerical value that reflects the satisfaction criteria (taste) of K for the produced service S in response to its service demand D. Assume that each consumer has an internal taste function (introduced as Ftaste in Section 4) to evaluate its transactions with the service providers in the context of its service demands. Hence, using this function, consumers produce their ratings to service providers for a specific transaction. Equation (6) summarizes how K produces a rating for its transaction with P at time t, where D and S are the service demand of K and the service produced by P for this demand, respectively. When another consumer X receives this rating, it cannot decode the rationale behind the rating, because it does not know D K , S P , and the satisfaction criteria of K. K rtK,P = Ftaste (D K , S P , t)
(6)
In the case of experiences, consumers record details of their transactions with the providers using an ontology, instead of recording numerical ratings for those transactions. An experience of K about P after a transaction at time t can be P,t = (D K , S P , t), where D K is the service demand of K related to described as E K this transaction, and S P is the produced service for D K . When another consumer X receive this experience, it can interpret the experience and produce its own rating using its own satisfaction criteria as in (7). In other words, X can produce a rating for P by asking itself how satisfied it would be, had it lived the experience under consideration (as shortly demonstrated in Example 6). X (D K , S P , t) rtX,P ∼ = Ftaste
(7) 146
World Wide Web (2012) 15:285–323
301
Example 6 Consider the consumers Bob and Lucy in Example 2. After receiving and interpreting the experience in Example 1, Bob gives a positive rating to TechnoShop, whereas Lucy gives a negative rating. In this way, X produces a rating for each experience received from K and other reviewers. In other words, X does not receive any ratings, but receives experiences and evaluates them on its own terms and produces ratings for itself considering its own context. Hence, there is no subjectivity involved in these self-produced ratings, because all of the ratings are produced by X using its satisfaction criteria. In the literature, there are many methods that are proposed to eliminate untrustworthy or unfair ratings [40, 41, 46, 47]. Once ontology-based experiences are converted into ratings as explained above, the existing methods for filtering unfair ratings can be used to filter untrustworthy experiences. That is, after an experience is converted into a ranting, reliability of this rating is evaluated by a chosen trust method. If the rating is classified as unfair, the corresponding experience is also regarded as untrustworthy. Therefore, using the experience-based ratings and the existent rating-based trust methods, the consumer X determines and filters out untrustworthy experiences during service selection.
7 Evaluation In order to demonstrate the performance of the proposed methods, we implement a simulator and conduct simulations on it. The simulator is implemented in Java. KAON23 is used as the OWL-DL reasoner. In the following sub-sections, we first describe other approaches used as benchmarks and the simulation environment, then we demonstrate our results. 7.1 Service selection approaches as benchmarks There are many rating-based service selection approaches in the literature. We use two of those approaches to make benchmark comparisons with our approach. Those approaches are explained briefly below. In order to make more reliable comparisons, the rating-based approaches and the proposed approach use the same information sources in our experiments. While the proposed approach uses experiences, the rating-based approaches use ratings from the same sources (reviewers). Beta reputation system The beta reputation system (BRS) is proposed by Jøsang and Ismail [13]. It estimates reputations of service providers using a probabilistic model. This model is based on the beta probability density function, which can be used to represent probability distributions of binary events. In this approach, consumers propagate their ratings about providers. A rating of the consumer c to the provider p is in the form of r = [g, b ], where g is the number of c’s good interactions with p and b is the number of c’s bad interactions with p. Ratings from
3 http://kaon2.semanticweb.org
147
302
World Wide Web (2012) 15:285–323
different consumers about the same provider are combined by simply computing the total number of good interactions and the total number of bad interactions with the provider. These two numbers are used to compute the parameters of a beta distribution function that represents the reputation of the provider. To handle unfair ratings provided by other consumers (advisors), Whitby et al. extend the BRS to filter out those ratings that do not comply with the significant majority of the ratings by using an iterated f iltering approach [41]. Hence, this approach assumes that significant majority of the advisors honestly share their ratings. TRAVOS This approach is proposed by Teacy et al. [40]. Similar to BRS, it uses beta probability density functions to compute consumers’ trust on service providers. The main difference between BRS and TRAVOS is the way they filter out unfair ratings. While BRS uses the majority of ratings to filter out unfair ratings about the providers, TRAVOS uses the personal observations about those providers to detect and filter out unfair ratings. Hence, unlike BRS, TRAVOS does not assume that the majority of ratings are fair. 7.2 Simulation environment In real life, different restaurants have different service offerings, which are usually called Food Menus. Each menu concept in this domain is described by a list of foods and beverages that are served or delivered to a consumer together for a meal. For example, KFC Hot Wings Menu is an instance of Chicken Wing Menu concept that contains a number of fried chicken wings, a bunch of fried potatoes, and a cup of drink. In our simulations, we use the online shopping domain, where shopping items are food menus. Therefore, we have extended the experience ontology in figure with another ontology that contains food menus. This ontology is called “Food-Menu” ontology and composed by extending W3C’s food ontology [19]. It has more than 200 primitive concepts, 1020 individuals and 60 properties, but contains only 10 menu concepts. Food-menu ontology is shared among the consumers and used as a metaontology to create and describe new menu concepts as in Section 5. 7.2.1 Service needs To facilitate meaningful generation of service needs, we have designed roles for agents. These roles are similar to the real-life roles such as student, parent, and so on. These roles define the behaviors of consumers by specifying their service needs and characteristics. For example, for dinner, the consumers playing vegetarian role demand a cup of vegetable soup, some pasta or rice, and a salad. In our simulations, there are ten distinct roles and each agent is assigned exactly one role. Each agent has a personality ontology that contains information about the role that the agent plays. The consumer agent can reason about its service needs using this ontology and shape its behaviors and preferences appropriately. In other words, the roles enable generation of service needs during the simulation. In our experiments, roles of the consumers are set at the beginning and properties of the roles do not change during a simulation. This means that consumers playing the same role continuously have the same service needs. In the beginning of simulations, each consumer agent has the same food-menu ontology. This ontology contains only ten menu concepts such as pizza menu, salad 148
World Wide Web (2012) 15:285–323
303
menu and so on. Each of these service concepts contains only one type of food. For example, an instance of pizza menu contains an instance of pizza and nothing else. However, each service need that is imposed by the roles may be composed of several food types. If this is the case, none of these shared menu concepts is enough to represent a service need on its own. As a result, a consumer having a service need must request (demand) several food menus together to satisfy its service need. Let complexity of a service need be defined as the number of dif ferent food types required to describe the service need. For example, vegetarians’ service need contains soup, pasta or rice and salad; so its complexity is three. If they only use the menu concepts from the shared ontology, the consumer playing vegetarian role must order several food menus such as soup menu, pasta or rice menu and salad menu. Since, in the shared food-menu ontology, there does not exist a food-menu concept that contains a soup, rice or pasta, and a salad together. This may result in problems in real life. For example, a consumer may want to have some soup and pasta. However, there does not exist a food-menu concept that contains soup and pasta together. Instead, there is a soup menu offered by a restaurant and a pasta menu that is offered by another restaurant. Hence, the consumer has to make two different orders from two different restaurants. If the pasta arrives much earlier than the soup, the consumer should either eat pasta before the soup or wait for the soup and let the pasta get cold. 7.2.2 Demands, services and satisfaction criteria In our simulations, service characteristics of a service provider are generated as follows. First, a service space is defined so that all possible services are represented within this service space. Dimensions of the service space and their ranges are tabulated in Table 1. Each service provider has a multidimensional region called service region in this service space. This region is randomly generated. The service space and the service regions have 15 dimensions. A service region covers all of the services produced by the service provider. If a consumer that is located in Union Street/Aberdeen/UK orders two specific pizza menus from the service provider, the
Table 1 Dimensions of service space and their ranges.
Dimension name
Type
Range
hasShoppingItem toLocation hasDeliveryType hasDeliveryDuration hasShipmentCost hasPrice hasUnitPrice hasQuantity hasQuality isRefundable hasConsumerSupport didRecieveMerchandise hasStockInconsistency isAsDescribed isDamaged
Integer Integer Integer Integer Double Double Double Integer Integer Boolean Boolean Boolean Boolean Boolean Boolean
1–1,000 1–100 1–6 1–60 0–250 10–11,000 1–100 1–100 1–10 0–1 0–1 0–1 0–1 0–1 0–1 149
304
World Wide Web (2012) 15:285–323
service that the provider delivers will be constructed as follows. The properties that are specified (shopping item id, quantity and location) will be fixed. For the remaining attributes, the service provider will choose random values making sure that the values stay in the range of its service region. So, for this example, the degree of freedom for generating services will be reduced to 12. Given the service constraints, the simulation environment generates a demand of a service consumer as follows. A demand space is constructed for the consumer by removing the dimensions of the service space that do not belong to Demand class. Then a random region in this demand space is chosen. The center of this region represents the demanded service. In response to a service demand, the chosen provider supplies a service. In real-life settings, as demonstrated in Example 2, only some attributes of a supplied service may play a significant role in the satisfaction of a consumer, while other attributes do not have any effect at all on the level of the consumer’s satisfaction. To imitate this in our experiments, we associate each demanded service with a satisfaction region, which is computed by removing some dimensions from the related demand region. We configured the system so that, on the average, 2/3 of the dimensions are removed randomly. If a service supplied for a demand stays within the margins of the related satisfaction region, the service consumer having this demand is satisfied; otherwise she is dissatisfied. This is the implementation of Ftaste function in our evaluations. The simulation environment guarantees that each demand can be satisfied by exactly one service provider. Next, the simulator creates the similar demand criteria for the demand of the service consumer. This is again done by creating a new region (similar demand region). Essentially, this is the demand region after some dimensions have been removed. The number of dimensions to be removed and these dimensions are chosen randomly. Service demands staying within the margins of the similar demand region are classified as similar demand by the consumer. The simulation environment is set up with ten service providers and 200 service consumers. Only one of the service providers can satisfy a given service demand. Simulations are run for 50 epochs, where an epoch refers to a discrete time slot during which each consumer may request at most one service. When the simulations start, agents do not have any prior experiences with service providers. At each epoch, with a probability of 0.5, a consumer requests a service for its current service demand. Then, it collects experiences related to the similar service demands from other consumers in order to use for service selection. The proposed approach enables consumers to represent their past experiences with service providers and share these experiences with others. Using the proposed approach, an agent makes service selection based on both its own experiences and the experiences of others. The experiences received from other agents are the reviews of these agents about the service providers. If the agent has enough number of personal experiences about service provides, it does not need to use these reviews from other agents to decide on a service provider. On the other hand, if the agent has no or insufficient number of personal experiences, it uses the reviews while selecting a service provider. In a distributed environment like the one described in this paper, discovering and gathering reviews about the service providers is not trivial. Furthermore, unlike personal experiences, reviews may contain missing or deceptive information. Therefore, the main focus of this work is the automation of user reviews and service selection based on them. That is why, to measure the success of our 150
World Wide Web (2012) 15:285–323
305
approach, we force agents in our experiments to make service decisions based on reviews (shared experiences of others) rather than their own previous experiences. This restriction has been made only for the sake of performance evaluation and does not imply any limitation on the proposed approach, i.e., the proposed approach supports service selection based on both personal experiences and reviews. 7.2.3 Simulation parameters In our simulations, we try to mimic real-life scenarios. Therefore, we have parameterized our simulation environment considering some of the important factors in real life. These factors are deception, subjectivity, variations on context, complexity of service needs, and missing information. We briefly explain our parameters related to the factors below. –
–
–
–
–
Deception: An important parameter in the simulations is Rliar , which defines the ratio of liars in the consumer society. Liars modify their experiences before sharing, so that they mislead the other consumers the most. This is achieved by disseminating bad experiences (or ratings) about the good providers and good experiences (or ratings) about the bad providers. Details about the behaviors of these liars can be found in [32, 40, 41, 46]. Subjectivity: Consumers having similar demands may have different satisfaction criteria. This means that for the same demand and the same supplied service, two consumers may have different degrees of satisfaction (e.g., ratings) depending on their satisfaction criteria. This is the subjectivity of the consumers. In the experiments, we define subjectivity as a parameter (Rsubj ), which determines the ratio of consumers having similar demands but conflicting satisfaction criteria. For example, if Rsubj = 0.5, half of the consumers having the same or similar demands have conflicting satisfaction criteria (tastes). Variation on context: As frequently seen in real world, each service consumer changes its service demand after receiving a service. This is done with a predefined probability (PCD ). After changing its demand, the service consumer collects information for its new service demand. This parameter is introduced to mimic variations on the context of service demands in real life. Complexity of service needs: Let Cavg denote the average complexity of consumers’ service needs. If Cavg > 1, shopping items (food menus) in the common food-menu ontology are not enough to express consumers’ needs concisely. During the simulation, each consumer tries to express its service need as concisely as possible (i.e., using a single menu concept). To do so, at each epoch, the consumer tries to create a new menu concept with a small probability Pnew = 0.001. A new menu concept is created by either combining two existing menu concepts or adding a new property to an existing menu concept. If the menu concept to be created is learned from others, the consumer uses the learned menu concept rather than recreating it. Missing information: With a probability Ph , a service consumer hides each attribute (e.g., price, delivery duration and so on) necessary to describe an experience while sharing it with others. There will be no information hidden about the shared experiences if Ph = 0. However, attribute values of the shared experiences would be completely hidden if Ph = 1.0. 151
306
World Wide Web (2012) 15:285–323
7.3 Experimental results In this section, we evaluated our approach in five steps. First, we examine how deception affects the overall service selection performance under different settings. Second, we extend our analysis for the settings where there exists subjectivity together with deception. Third, we analyze service selection performance when the context of service demands change over time. Fourth, we evaluate the proposed approach when the consumers evolve their vocabulary over time. Lastly, we examine the effect of missing information on the system. In our experiments, there are various settings and for each setting, simulations are repeated ten times in order to increase the reliability. We average the performance of various approaches throughout the simulations and their mean values are reported in the figures, unless otherwise stated. The main purpose of our simulations is to measure the performance of our approach in selecting an appropriate service provider in different settings. Although we estimate and report the mean values, these mean values may not reflect the true mean values. The reason is that the estimated mean values may vary from sample to sample. Hence, we compute a confidence interval that generates a lower and upper limit for the mean values. This interval estimate gives an indication of how much uncertainty is in our estimate of the true mean values. The narrower the interval is, the more precise our estimate is. In order to compute confidence intervals of the mean values, a t-test can be used when the number of samples is small (e.g., ten samples). Therefore, our simulation results are analyzed with a t-test for a 95% confidence interval, as suggested in [20]. Our tests show that with a 95% probability, the mean values deviate at most 3%, and therefore our calculations are statistically significant. 7.3.1 Deception Different trust methods can be used as the trust module in our architecture to filter deceptive experiences. In this section, we integrate two trust models to our approach in order to filter deceptive experiences. These methods are the deceptive information filtering mechanism proposed in BRS and TRAVOS. The integrated approaches are referred as ExpBRS and ExpTRAVOS , respectively. In this setting, as in the other settings, each consumer has one service demand (Cavg = 1.0). Figure 7 shows the performances of BRS, TRAVOS, ExpBRS , and ExpTRAVOS in terms of service selection success for varying values of Rliar . During the experiments, there is no subjectivity or variation on context (Rsub = 0.0 and PCD = 0.0) and the shared experiences do not contain any missing information (Ph = 0.0). In this setting, all approaches have similarly good service selection performance when Rliar ≤ 0.4. However, when Rliar > 0.4, performances of BRS and ExpBRS sharply decrease. On the other hand, performances of TRAVOS and ExpTRAVOS decrease only slightly while Rliar increases. We have further demonstrated the performance of these approaches in terms of trust evaluations in Figures 8 and 9, which show percentage of false positives and false negatives during the trust evaluation, respectively. Figure 8 indicates that false negatives increase sharply for BRS and ExpBRS when Rliar ≥ 0.6. When Rliar = 0.8, the agents using these approaches classify more then 70% of the raters as honest while they are actually liars. Similarly, when Rliar = 0.8, Figure 9 indicates that they classify around 20% of the raters as liar while they are actually honest. This high ratio 152
World Wide Web (2012) 15:285–323
307
Average Percentage of Successful Service Selections
100 90 80 70 60
BRS TRAVOS
50
ExpBRS TRAVOS
Exp
40 30 20 10 0
0
0.1
0.2
0.3
0.4 Rliar
0.5
0.6
0.7
0.8
Figure 7 Average percentage of successful service selections (Cavg = 1.0, Rsub = 0.0, PCD = 0.0, and Ph = 0.0, and Rliar ≥ 0.0).
of misclassification is simply because of the fact that BRS classifies agents who do not comply with the majority as liars. That is, when the ratio of liars is high, BRS classifies liar as honest and honest as liar. This leads to the elimination of genuine experiences or ratings, so only the deceptive ones are used during service selection. As a results, agents using BRS or ExpBRS almost always make wrong service decisions at high values of Rliar . Note that, every honest agent is labeled as liar by BRS and EXPBRS when Rliar ≥ 0.6. Hence, the percentage of false positives for BRS and EXPBRS
Average % of False Negatives During Trust Evaluation
80 BRS TRAVOS
70
ExpBRS
60
TRAVOS
Exp
50 40 30 20 10 0
0
0.1
0.2
0.3
0.4 R
0.5
0.6
0.7
0.8
liar
Figure 8 Average percentage of false negatives during trust evaluation (Cavg = 1.0, Rsub = 0.0, PCD = 0.0, and Ph = 0.0, and Rliar ≥ 0.0). 153
Average % of False Positives During Trust Evaluation
308
World Wide Web (2012) 15:285–323
45 40
BRS TRAVOS
35
Exp
BRS TRAVOS
Exp 30 25 20 15 10 5 0
0
0.1
0.2
0.3
0.4 R
0.5
0.6
0.7
0.8
liar
Figure 9 Average percentage of false positives during trust evaluation (Cavg = 1.0, Rsub = 0.0, PCD = 0.0, and Ph = 0.0, and Rliar ≥ 0.0).
decreases when Rliar is increased from 0.6 to 0.8, simply because of the decrease in the number of honest agents. On the other hand, TRAVOS makes trust evaluations based on personal interactions. Hence, an agent can evaluate trustworthiness of its peers after having some direct interactions with the providers. Hence, the false positives and negatives during evaluations of trust are significantly low for TRAVOS or ExpTRAVOS . That is, these approaches usually could eliminate deceptive ratings or experiences during service selection. 7.3.2 Deception and subjectivity In many real-life settings, deception and subjectivity exist together. In this setting, we also set Cavg = 1.0, PCD = 0.0, and Ph = 0.0 as in the previous setting. However, this time, half of the consumers having similar service demands have conflicting satisfaction criteria (Rsub = 0.5). Vulnerability of rating-based approaches to subjectivity is expected, because rating-based approaches assume that there is no subjectivity among the consumers [14]. That is, they assume that every honest consumer gives good ratings to “good” providers and bad ratings to “bad” providers. However, in the case of subjectivity (Rsub = 0.5), the definition of “good” and “bad” depends on each consumer and may change significantly from consumer to consumer as in real life. Figure 10 demonstrates the service selection performances of the competing approaches, when Rliar increases. The performance of BRS decreases much further in this setting, because of the combined effect of the subjectivity and deception. However, the performance of EXPBRS is similar to its performance when there is no subjectivity. Similarly, the performance of EXPTRAVOS does not change when there is subjectivity together with deception, while T ROV OS has a significantly lower performance in this setting. That is, using experiences instead of ratings, the proposed approach makes deceptive information filtering mechanism of BRS and TRAVOS robust to subjectivity. 154
Average Percentage of Successful Service Selections
World Wide Web (2012) 15:285–323
309
100 90 80 70 60 50 40
BRS TRAVOS
30
Exp
20
ExpTRAVOS
BRS
10 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Rliar
Figure 10 Average percentage of successful service selections (Cavg = 1.0, Rsub = 0.5, PCD = 0.0, and Ph = 0.0, and Rliar ≥ 0.0).
Average % of False Negatives During Trust Evaluation
To see this better, we demonstrate percentage of false negatives and positive during trust evaluations in Figures 11 and 12, respectively. These figures show that having subjectivity in addition to deception does not significantly affect the false negatives. However, it significantly increases the percentage of false positives for BRS and TRAVOS. For the rating-based approaches like TRAVOS, trust is used as a way of filtering ratings from different-minded consumers too. Hence, the subjectivity is handled by using ratings only from the similar-minded consumers and considering all other consumers as liars. That is why TRAVOS has a good performance in terms of service selection when Rsubj > 0.0.
90 BRS TRAVOS
80
ExpBRS
70
ExpTRAVOS
60 50 40 30 20 10 0
0
0.1
0.2
0.3
0.4 Rliar
0.5
0.6
0.7
0.8
Figure 11 Average percentage of false negatives during trust evaluation (Cavg = 1.0, Rsub = 0.5, PCD = 0.0, and Ph = 0.0, and Rliar ≥ 0.0). 155
310
World Wide Web (2012) 15:285–323
Average % of False Positives During Trust Evaluation
40 BRS TRAVOS
35
BRS
Exp
ExpTRAVOS
30 25 20 15 10 5 0
0
0.1
0.2
0.3
0.4 R
0.5
0.6
0.7
0.8
liar
Figure 12 Average percentage of false positives during trust evaluation (Cavg = 1.0, Rsub = 0.5, PCD = 0.0, Ph = 0.0, and Rliar ≥ 0.0).
7.3.3 Variation on context In this setting, as in the other settings, each consumer has one service demand (Cavg = 1.0) and there is no missing information in the shared experiences (Ph = 0.0). However, unlike the other settings, consumers change their service demands with probability PCD after receiving a service. Moreover, all of the consumers are honest (Rliar = 0.0), their satisfaction criteria are similar if their service demands are also similar (Rsub = 0.0). Figure 13 shows the average percentage of successful service selections when the context is allowed to vary during service selection. When PCD = 0.0, all approaches make equally good service selections. However, the performances of the rating-based approaches, BRS and TROVOS, decreases sharply when PCD > 0.0. The reasons behind the performance decrease for the rating-based approaches are explained in Example 7. On the other hand, performances of EXP BR and EXPTRAVOS are equally good (i.e., around 97% of service selections are satisfactory) and do not change while PCD increases. Example 7 Ratings of a consumer reflect the aggregation of its past transactions with the providers. Assume that a provider SoupHeaven is an expert on soups, but not competent in pizza menus. Assume that Bob recently made five transactions for five items from SoupHeaven: two soup menus and three pizza menus. Because SoupHeaven is an expert on soup menus, the transactions related to the soup menus were successful, but the transactions related to the pizza menus were not. In this case, the overall rating of Bob for SoupHeaven is bad, because number of unsuccessful transactions is higher than that of the successful transactions. If another consumer wants to buy a soup menu, the rating of Bob for SoupHeaven will be misleading. In other words, as consumers change their demands, their ratings about the providers become more misleading, depending on the variation in the expertise of the 156
World Wide Web (2012) 15:285–323
311
Average Percentage of Successful Service Selections
100 90
BRS TRAVOS
80
Exp
BRS TRAVOS
Exp 70 60 50 40 30
0
0.1
0.2
0.3
0.4
0.5 PCD
0.6
0.7
0.8
0.9
1
Figure 13 Average percentage of successful service selections (Cavg = 1.0, Rsub = 0.5, Ph = 0.0, and PCD ≥ 0.0).
providers. However, the proposed approach differentiates between the experiences belonging to different contexts. It can easily recognize that SoupHeaven can provide a satisfactory service if a soup menu is demanded, but it cannot produce a satisfactory service if a pizza menu is asked for. In this specific setting, there are no liars, hence the number of false negatives during trust evaluation is zero. However, as shown in Figure 14, BRS and TRAVOS have almost the same percentage of false positives around 45%, which means that
Average % of False Positives During Trust Evaluation
50 45 40 35
BRS TRAVOS
30
ExpTRAVOS
ExpBRS
25 20 15 10 5 0
0
0.1
0.2
0.3
0.4
0.5 PCD
0.6
0.7
0.8
0.9
1
Figure 14 Average percentage of false positives during trust evaluation (Cavg = 1.0, Rsub = 0.5, Ph = 0.0, and PCD ≥ 0.0). 157
312
World Wide Web (2012) 15:285–323
they classify around 45% of the raters as liars. As a result they filter ratings received from these raters. On the other hand, EXPBRS and EXPTRAVOS convert experiences into personalized ratings based on the context and taste function. That is, they evaluate an experience within its context and produce ratings that are later used to filter deceptive information. In this setting, EXPBRS looks more successful, i.e., it does not have any false positives during trust evaluation. However, EXPTRAVOS classifies around 10% of the reviewers as liars for PCD > 0. This is simply because of the fact that, unlike that of BRS, information filtering mechanism of TRAVOS is based on personal interactions. Assume a consumer x had experiences with the provider p in the context c1 . The first time x changes the context of its service demands into c2 , it does not have any personal experience with p within the context c2 . Hence, it uses its past experiences created within the context c1 to evaluate the information about p in context c2 . This is absolutely misleading. Fortunately, shortly during the experiments, x gains some personal experiences within the context c2 and starts making better trust evaluations. Figure 15 shows the change in the percentage of false positives during trust evaluation for EXPTRAVOS in a single experiment over time when PCD = 0.2. 7.3.4 Complexity of service needs In this part of our evaluations, we let consumers create new service concepts (menu concepts). Once a new menu concept is created, other consumers learn the new menu concept through their interactions as explained in Section 5. If the consumers find the new menu concept useful they start demanding services related this new menu concept. Creation of new menu concepts are directly related to complexity of service needs. That is, the more complex service needs are, the more new menu concepts are created to represent these service needs concisely.
% of False Positives During Trust Evaluation
25
20
15
10
5
0
0
5
10
15
20
25 Epoch
30
35
40
45
50
Figure 15 Change in the percentage of false positives during trust evaluation for EXPTRAVOS in a single experiment over time when PCD = 0.2 (Cavg = 1.0, Rsub = 0.5, and Ph = 0.0). 158
World Wide Web (2012) 15:285–323
313
Figure 16 demonstrates the average percentage of successful service selections at the end of 50 epochs while the complexity of service needs Cavg varies from 1 to 5, given Rliar = 0.0, Rsub = 0.5, and Ph = 0.0. The figure shows that the service selection performance decreases slightly as the complexity of service needs increases. In this section, we examine the reasons of this decrease through analyzing three individual experiments. In the first one, initial complexity of service needs is two (Cavg = 2). This means that, in the beginning, consumers should demand two menus on the average from the initial common ontology at the same time to satisfy their service needs, because there is no single food menu that satisfies their service needs. Our simulation results in the first 30 epochs for this setting is demonstrated in Figure 17, where there are two synchronized sub-figures. The sub-figure on the top shows the average decrease in the complexity of service needs over time as new menu concepts are created to represent service needs more concisely. On the other hand, the sub-figure on bottom shows how the percentage of satisfactory service selections changes over time. Figure 17 implies that while new menu concepts are introduced, consumers can represent their service needs more concisely with fewer concepts. New menu concepts result in a decrease in the percentage of satisfactory service selections. This is intuitive because when a consumer starts to demand a new menu, it should first collect experiences about this new menu to select a satisfactory service provider. However, there may not be enough number of experiences about the brand new menu, so service selection performance decreases while new menu concepts are introduced. At epoch 12, complexity of service needs decreases to one, which means that each consumer can represent its service needs only using one new menu concept. After a while, experiences about new menu concepts are accumulated and percentage of satisfactory service selections becomes 100 at epoch 18. In our second setting, initial complexity of service needs is three (Cavg = 3). Our simulation results for this setting is demonstrated in Figure 18. At epoch 19, complexity of service needs decrease from 3 to 1. Then, percentage of satisfactory
Average Percentage of Successful Service Selections
100 99 98 97 96 95 94 93 92 91 90
1
2
3
4
5
C
avg
Figure 16 Average percentage of successful service selections for various values of Cavg where Rliar = 0.0, Rsub = 0.5, and Ph = 0.0. 159
Complexity of Service Needs
314
World Wide Web (2012) 15:285–323 2
1.5
1
0.5
0
0
5
10
15 Epoch
20
25
0
5
10
15 Epoch
20
25
% of Satisfactory Selections
100 80 60 40 20 0
Figure 17 Simulation results for Cavg = 2.
service selections becomes 100 at epoch 24. In our third setting, initial complexity of service needs is five (Cavg = 5). Figure 19 shows our simulation results for this setting; the consumers start making satisfactory service decision only 4 epochs after the complexity of service needs decreases to 1 at epoch 20.
Complexity of Service Needs
3 2.5 2 1.5 1 0.5 0
0
5
10
15 Epoch
20
25
0
5
10
15 Epoch
20
25
% of Satisfactory Selections
100 80 60 40 20 0
Figure 18 Simulation results for Cavg = 3. 160
World Wide Web (2012) 15:285–323
315
Complexity of Service Needs
5 4 3 2 1 0
0
5
10
15 Epoch
20
25
0
5
10
15 Epoch
20
25
% of Satisfactory Selections
100 80 60 40 20 0
Figure 19 Simulation results for Cavg = 5.
These results imply that the proposed approach enables consumers to cooperatively create and share new menu concepts over time to represent their service needs more concisely. Moreover, the system is not effected considerably by the newly introduced menu concepts. Although, the service selection performance slightly decreases while the new menu concepts are introduced, eventually consumers collect reviews (experiences) about the new menus and make satisfactory service decision. 7.3.5 Missing information High expressive power of OWL enables service consumers to express and share their personal experiences with others. Using a language more expressive than ratings to represent past experiences enable us to differentiate between the contexts of service demands and significantly outperform the rating-based approaches [34]. However, it may be tedious for some consumers to describe their past experiences in detail using such a language. As in the user reviews on the Web, some consumer may desire to highlight only some attributes of their past experiences while completely omitting (or hiding) the others. In this section, by varying the Ph parameter, we demonstrate how much the proposed approach’s performance is affected if the consumers hide some attributes of their experiences before sharing. Figure 20 shows the change in average percentage of successful service selections while the probability of hiding an attribute (Ph ) increases. The parametric classifier used in this work cannot handle the missing information in training examples, i.e, it cannot use an example for training if it contains a missing attribute value. For small values of Ph , our experiments show that the performance of the proposed approach does not decrease significantly. That is, for Ph ≤ 0.3, average percentage of successful service selections does not fall below 89%. Even though the experiences with 161
316
World Wide Web (2012) 15:285–323
Average Percentage of Successful Service Selections
100 90 80 70 60 50 40 30 20 10 0
0
0.1
0.2
0.3
0.4
0.5 Ph
0.6
0.7
0.8
0.9
1
Figure 20 Average percentage of successful service selections for various values of Ph when parametric classification is used. There is no subjectivity or variation on context during the experiments (Cavg = 1.0, Rsub = 0.0, and PCD = 0.0) and all of the consumers are honest (Rliar = 0.0).
missing attribute values are eliminated during decision making by the classifier, the remaining experiences are enough to make satisfactory service selections when Ph ≤ 0.3. On the other hand, for Ph > 0.3, the performance of the proposed approach decreases dramatically and becomes around 10% when Ph = 0.7. This means that,
Average Percentage of Successful Service Selections
100 90 80 70 60 50 40 30 20 10 0
0
0.1
0.2
0.3
0.4
0.5 P
0.6
0.7
0.8
0.9
1
h
Figure 21 Average percentage of successful service selections for various values of Ph when C4.5 decision tree classifier is used instead of parametric classification. There is no subjectivity or variation on context during the experiments (Cavg = 1.0, Rsub = 0.0, and PCD = 0.0) and all of the consumers are honest (Rliar = 0.0). 162
World Wide Web (2012) 15:285–323
317
for Ph ≥ 0.7, almost all of the shared experiences are eliminated, and then the service selection is made randomly, because there is not enough number of experiences to train a classifier. Here, we have shown that the proposed approach is sensitive to missing information in the shared experiences when parametric classifier is used during decision making. In a real-life application, missing information in user reviews (i.e., shared experiences) is expected to be common. Hence, it is important to extend our approach to make it robust to missing information. As stated before in Section 4.2, machine learning approaches like C4.5 Decision Tree classifiers have some inherent mechanisms to handle missing information. In order to see the effect of using such a machine learning approach within the proposed approach, we have replaced the parametric classification with a C4.5 decision tree classifier. The results of our experiments for this setting are shown in Figure 21. It is clear that the C4.5 decision tree classifier improves the performance of the proposed approach significantly in the existence of missing information. That is, the performance of service selection decreases gradually as Ph increases when a C4.5 decision tree classifier is used. Handling missing information in data mining and pattern classification is a challenging active research area. That is why we set further analysis of missing information in service selection as future work.
8 Discussion In this section, we first summarize our experimental results and then discuss our work with references to the literature. 8.1 Summary of results This research aims to provide an integrated approach for the automation of user reviews using multiagent systems and Semantic Web technologies. To evaluate the proposed approach, we conducted experiments in different settings. Our experimental results are presented in Section 7 and can be summarized as follows: 1. Selection of trust module significantly affects the performance of the proposed approach. For example, with a trust module based on TRAVOS, the proposed approach (EXPTRAVOS ) has a capability of making satisfactory service selections under deception. On the other hand, with a trust module based on BRS, the proposed approach (EXPBRS ) fails as the number of liars increases in the society. 2. Using experiences instead of ratings improves the performances of information filtering mechanisms in BRS and TRAVOS. These filtering mechanisms become robust to subjectivity and variation on context, when experiences are used instead of ratings. 3. Unlike the rating-based approaches, the proposed approach is not sensitive to subjectivity and variation on context. It achieves satisfactory service selections when consumers are allowed to have different tastes and change the context of their service demands. However, the rating-based service selection approaches like TRAVOS and BRS classify honest consumers as liars to handle subjectivity and context change. As a result, they decrease the number of information resources they can use for service selection. 163
318
World Wide Web (2012) 15:285–323
4. When consumers are allowed to evolve their ontologies by creating new service concepts, overall ratio of satisfactory service selections decreases for a while because of the lack of experiences about these new service concepts in the society. However, service selection performance increases promptly as a result of the rapid constitution of experiences related the new service concepts. 5. Depending on the classification method used during decision making, the proposed approach may be highly sensitive to missing information in the shared experiences, i.e., it may fail if most of the shared experiences contain missing information. This weakness of the proposed approach can be resolved by using machine learning mechanisms capable of handling missing information. 8.2 Related work In this paper, we have proposed an architecture and developed an integrated approach based on this architecture for the agent-based automation of user reviews. This approach uses the representation and reasoning mechanisms proposed in [30] for context-aware service selections, i.e, it enables consumers to record their past experiences semantically using an ontology, instead of plain subjective ratings. This representation of past experiences handles subjectivity and enables consumeroriented service selections. Here, we extend [30] by describing how various classifiers can be used for decision making, by allowing agents to cooperatively evolve their vocabulary, and by analyzing the effect of missing information during service selection. First time in the literature, in this work, service selection is combined with an ontology evolution approach in a novel way to enable consumers to express their service needs more concisely during their interactions. We have used an approach proposed in our previous work [33] for ontology evolution and carefully analyzed the impact of new service concepts in the overall service selection performance. Lastly, based on the methods proposed in POYRAZ [34], this paper integrates various trust mechanisms to the proposed approach in order to enable service consumers to explicitly reason about the reliability of information resources during service selection. Unlike POYRAZ, this paper analyzes the effects of subjectivity and context variations on the trust evaluations and trust-based information filtering. Classical service provider selection strategies are mainly based on ratings and do not allow more expressive representations. Whereas rating-based approaches [14] assume that the ratings are given and taken in similar contexts (e.g., in response to similar service demand), this assumption cannot account for many real-life settings. In this paper, we argue that making contextual information explicit allows agents to evaluate others’ experiences based on their needs and improves the satisfaction rate of the consumers. FIRE is a trust and reputation model consisting of four components [12]: interaction trust, witness reputation, role-based trust and certified reputation. Rolebased trust and certified reputation components are not related to our work. The interaction trust component models a consumer’s trust of a provider using only the direct interactions between the consumer and the provider. Here, FIRE uses the direct trust component of another well-known trust and reputation system, REGRET [26]. On the other hand, the witness reputation component uses only the ratings from other consumers to compute the reputation of the provider. In FIRE, each rating is a tuple in the following form: r = (c; p; i; t; v); where c and p are the 164
World Wide Web (2012) 15:285–323
319
consumer and the provider that participated in the interaction i respectively, and v is the rating c gave p for the term t (e.g. price, quality, and delivery). The range of v is [−1, +1], where −1 means absolutely negative, +1 means absolutely positive, and 0 means neutral or uncertain. In this way, FIRE enables consumers to rate each attribute of a service independently. Unfortunately, FIRE does not have any mechanism for filtering out unfair ratings. After computing the direct trust and the witness reputation, FIRE calculates the overall trust of the provider as a weighted sum of those values. The infinite relational trust model of Rettinger et al. [25] takes into account contextual information as well, when modeling trust between interaction agents, but only focuses on learning initial trust for unknown agents. Their model makes use of only direct interactions between two agents, whereas we allow experiences to be shared among agents. Moreover, unlike their model, we describe contextual information in our approach using an ontology in a flexible manner. Sen and Sajja [29] develop a reputation-based trust model that is used for selecting processor agents for processor tasks. Each processor agent can vary its performance over time. Agents are looking for processor agents to send their tasks by using only evidence from others. Sen and Sajja propose a probabilistic algorithm to guarantee finding a trustworthy processor. In our framework, service demands among agents are not equivalent; and a provider that is trustworthy for one consumer needs not be so for a different consumer. Hence, each consumer may have to select a different provider for its needs. Yolum and Singh study properties of referral networks for service selection, where referrals are used among service consumers to locate service providers [45]. Current applications of referral networks rely on exchanging ratings. They suffer from circulation of subjective information. However, it would be interesting to combine referral networks with the ontology representation that we propose so that agents can exploit the power of ontologies for knowledge representation as well as referrals for accurate routing. Cornelli et al. [6] propose an approach for selecting reputable servents in a P2P network. Each servent maintains reputation and credibility of other servents in the network. Reputation represents the “trustworthiness” of a servent in providing files. Credibility represents the “trustworthiness” of a servent in providing votes. Each servent keeps track and share with others information about the reputation of their peers. Before initiating the download, requesters can assess the reliability of sources (servents) by polling peers for their votes. Votes are analogous to ratings, i.e., they are values expressing opinions on other peers. A vote for a servent is either positive (1) or negative (0) and computed based on the number of successful and unsuccessful downloads from the servent. This approach has the same weaknesses of rating-based systems; i) votes are subjective, i.e., success measure of a servent may be different from that of others, and ii) it is not possible to express context of interactions while sharing votes, i.e., while downloading documents, a servent may be reliable, but it may be unreliable while downloading videos. Caverlee et al. [4, 5] propose the SocialTrust framework for tamper-resilient trust establishment in online social networks. In this framework, initially all users have the same level of trust. Then, SocialTrust dynamically revise trust ratings based on three components: (i) the current quality of trust, (ii) the history, and (iii) the adaptation to change. The quality component evaluates the trustworthiness of the 165
320
World Wide Web (2012) 15:285–323
user based on the current state of the social network, without any consideration of the user’s behavior in the past. The history component considers the integral of the trust value over the lifetime of the user in the network, so it provides an incentive to all users in the network to behave well over time. Lastly, the change component of trust considers the sudden changes in the user’s behavior. Similar to the other trust and reputation approaches, SocialTrust is based on users’ feedback ratings about others. Hence, it suffers from the aforementioned shortcomings of rating-based systems. Srivatsa et al. [37] propose TrustGuard, a framework for building distributed reputation management systems that are robust to malicious activities: (i) strategic oscillations, (ii) fake transactions, and (iii) dishonest feedback. Like TRAVOS and BRS, this framework is based on ratings while determining trustworthiness of nodes in a network. It develop methods to determine malicious nodes that flood numerous ratings on another node with fake transactions and submit dishonest feedback about its transactions. These methods can be integrated into our approach as described in Section 6 to filter out dishonest feedbacks, especially in the presence of collusive malicious agents. Stephens et al. [39] propose an approach for the reconciliation of independent ontologies. They argue that if two ontologies share no concept in common, they cannot be reconciled. However, if they share concepts with a third ontology then the third ontology might provide a semantic bridge to relate these ontologies. Their approach makes use of different techniques such as string matching and lexical databases in order to measure the semantic distance between two concepts. However, they do not allow ontologies to evolve cooperatively as we have done here. Williams [42] proposes a methodology and algorithms for improving the mutual understanding of two agents. In this approach, agents develop a common feature description of a particular concept using knowledge sharing and machine learning techniques in a peer-to-peer setting. So, they gradually arrive at consensus on the concepts and they develop mappings between the concepts in their ontologies. However, Williams’ approach does not support cooperative ontology evolution. Aberer et al. [1] propose an approach for the global semantic agreements. They assume that mappings between two different ontologies are already made by skilled human experts. These mappings are exchanged by the agents and global semantic agreements are reached using the properties of the exchanged mappings. Laera et al. [17] use argumentation over concept mappings to reach global semantic agreements. Similar to Aberer et al., in Laera et al.’s approach, mappings between concepts are assumed to be made beforehand by a mapping engine. Each mapping has a confidence value for different agents. By using argumentation theory together with these mappings, an agreement over heterogeneous ontologies are reached dynamically. The proposed approach assumes that the agents share a common ontology. In open systems, one way to enable this is to allow the agents to download the ontology from a well-defined resource. The base level ontology will be the same for all the domains. However, the domain level ontology will differ based on the domain. For different domains, we expect the domain experts to come up with ontologies that capture the specifics of the domains. In the literature various methodologies are proposed and examined for the development of domain ontologies, such as METHONTOLOGY [10]. Furthermore, there are significant efforts to develop 166
World Wide Web (2012) 15:285–323
321
comprehensive ontologies for specific domains. For example, GoodRelations project4 builds a Web ontology for e-Commerce: a standardized vocabulary for product, price, and company data that can be embedded into existing Web pages and that can be processed by intelligent software agent [11]. Therefore, our expectation about the development of specific domain ontologies does not make our approach impractical. 9 Conclusions Service selection is becoming more important as the volume of e-commerce increases dramatically day by day. Current approaches to service selection mainly focus on ratings. However, ratings are subjective and do not contain any contextual information, which is crucial while evaluation ratings. This paper proposes a multiagent system, where each human consumer is represented by a consumer agent. The proposed approach enables consumers to share their past experiences using Semantic Web technologies. Ontology-based representation of consumers’ experiences can be regarded as machine understandable form of user reviews in real life. Shared experiences are interpreted by each consumer agent and used to select the most satisfactory service providers. In order to handle deception, we integrate trust into our framework so that consumers can evaluate the trustworthiness of others and eliminate experiences from untrustworthy consumers. Similarly, we enable consumers to evolve their ontologies. Hence, consumers express their service needs more concisely using new service concepts. Our experiments confirm that the proposed approach enable consumers to make satisfactory service decision even in the existence of deception, subjectivity, context-change and ontology evolution. We have also demonstrated that the proposed approach fails when the consumers frequently hide information while sharing their experiences with others. We set enhancement of the proposed approach in this direction as future work. We also plan to evaluate the proposed approach, in the future, with human users in the loop. Acknowledgement grant BAP5694.
˘ This research has been supported by Bogaziçi University Research Fund under
References 1. Aberer, K., Cudre-Mauroux, P., Hauswirth, M.: Start making sense: the chatty web approach for global semantic agreements. Journal of Web Semantics 1(1), 89–114 (2003) 2. Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2001) 3. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F. (eds.): The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, Cambridge (2003) 4. Caverlee, J., Liu, L., Webb, S.: Towards robust trust establishment in web-based social networks with SocialTrust. In: WWW ’08: Proceeding of the 17th International Conference on World Wide Web, pp 1163–1164. ACM, New York (2008)
4 http://www.heppnetz.de/projects/goodrelations
167
322
World Wide Web (2012) 15:285–323
5. Caverlee, J., Liu, L., Webb, S.: The SocialTrust framework for trusted social information management: architecture and algorithms. Inf. Sci. 180(1), 95–112 (2010) 6. Cornelli, F., Damiani, E., di Vimercati, S.D.C., Paraboschi, S., Samarati, P.: Choosing reputable servents in a P2P network. In: WWW ’02: Proceedings of the 11th International Conference on World Wide Web, pp. 376–386. ACM, New York (2002) 7. Doan, A., Madhavan, J., Domingos, P., Halevy, A.: Learning to map between ontologies on the semantic web. In: Proceedings of the 11th International WWW Conference, pp. 662–673 (2002) 8. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, West Sussex (2001) 9. Garcia-Laencina, P.J., Sancho-Gomez, J.-L., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19(2), 263–282 (2010) 10. Gómez-Pérez, A., Fernández-López, M., Corcho, O.: Ontological Engineering: with Examples from the Areas of Knowledge Management, e-Commerce and the Semantic Web (Advanced Information and Knowledge Processing). Springer, Secaucus (2007) 11. Hepp, M.: Goodrelations: an ontology for describing products and services offers on the web. In: EKAW ’08: Proceedings of the 16th International Conference on Knowledge Engineering, pp. 329–346. Springer, Berlin (2008) 12. Huynh, T.D., Jennings, N.R., Shadbolt, N.: Fire: an integrated trust and reputation model for open multi-agent systems. In: Proceedings of 16th European Conference on Artificial Intelligence, pp. 18–22 (2004) 13. Jøsang, A., Ismail, R.: The beta reputation system. In: Proceedings of the Fifteenth Bled Electronic Commerce Conference e-Reality: Constructing the e-Economy, pp. 48–64 (2002) 14. Jøsang, A., Ismail, R., Boyd, C.: A survey of trust and reputation systems for online service provision. Decis. Support Syst. 43(2), 618–644 (2007) 15. Kang, D.-K., Sohn, K.: Learning decision trees with taxonomy of propositionalized attributes. Pattern Recogn. 42(1), 84–92 (2009) 16. KAON2: Kaon2 Web Site: http://kaon2.semanticweb.org (2005) 17. Laera, L., Blacoe, I., Tamma, V., Payne, T., Euzenat, J., Bench-Capon, T.: Argumentation over ontology correspondences in MAS. In: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 1285–1292 (2007) 18. Maximilien, E.M., Singh, M.P.: A framework and ontology for dynamic web services selection. IEEE Internet Computing 8(5), 84–93 (2004) 19. McGuinness, D.L., Harmelen, F.V.: Owl Web Ontology Language Overview (2003) 20. Montgomery, D.C.: Design and Analysis of Experiments. Wiley, West Sussex (2001) 21. Oldham, N., Verma, K., Sheth, A., Hakimpour, F.: Semantic ws-agreement partner selection. In: Proceedings of WWW’06, pp. 697–706 (2006) 22. Pan, J.Z.: A flexible ontology reasoning architecture for the semantic web. IEEE Trans. Knowl. Data Eng. 19(2), 246–260 (2007) 23. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993) 24. Ramchurn, S.D., Huynh, D., Jennings, N.R.: Trust in multi-agent systems. Knowl. Eng. Rev. 19(1), 1–25 (2004) 25. Rettinger, A., Nickles, M., Tresp, V.: A statistical relational model for trust learning. In: AAMAS ’08: Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 763–770 (2008) 26. Sabater, J., Sierra, C.: Regret: reputation in gregarious societies. In: Proceedings of the Fifth International Conference on Autonomous Agents, pp. 194–195 (2001) 27. Schaupp, L.C., Belanger, F.: A conjoint analysis of online consumer satisfaction. J. Electron. Commer. Res. 6, 95–111 (2005) 28. Sen, S., Kar, P.: Sharing a concept. In: Working Notes of the AAAI-02 Spring Symposium on Collaborative Learning Agents (2002) 29. Sen, S., Sajja, N.: Robustness of reputation-based trust: Boolean case. In: Proceedings of the 1st International Joint Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pp. 288–293 (2002) 30. Sensoy, ¸ M., Yolum, P.: Ontology-based service representation and selection. IEEE Trans. Knowl. Data Eng. 19(8), 1102–1115 (2007) 31. Sensoy, ¸ M., Yolum, P.: Active concept learning for ontology evolution. In: Proceedings of The 18th European Conference on Artificial Intelligence (ECAI 2008), pp. 773–774 (2008) 32. Sensoy, ¸ M., Yolum, P.: Experimental evaluation of deceptive information filtering in contextaware service selection. Lect. Notes Artif. Intell. 5396, 326–347 (2008) 33. Sensoy, ¸ M., Yolum, P.: Evolving service semantics cooperatively: a consumer-driven approach. Journal of Autonomous Agents and Multi-Agent Systems 18(3), 526–555 (2009) 168
World Wide Web (2012) 15:285–323
323
34. Sensoy, ¸ M., Zhang, J., Yolum, P., Cohen, R.: Poyraz: Context-aware service selection under deception. Comput. Intell. 25(4), 335–366 (2009) 35. Sirin, E., Parsia, B., Grau, B.C., Kalyanpur, A., Katz, Y.: Pellet: a practical OWL-DL reasoner. Web Semantics 5(2), 51–53 (2007) 36. Sommerville, I.: Software Engineering. Addison Wesley, Redwood (1995) 37. Srivatsa, M., Xiong, L., Liu, L.: TrustGuard: countering vulnerabilities in reputation management for decentralized overlay networks. In: WWW ’05: Proceedings of the 14th International Conference on World Wide Web, pp. 422–431. ACM, New York (2005) 38. Staab, E., Engel, T.: Combining cognitive and computational concepts for experience-based trust reasoning. In: Falcone, R., Barber, S., Sabater, J., Singh, M. (eds.) Proc. of the 11th Int. Workshop on Trust in Agent Societies (TRUST ’09), pp. 41–45. IFAAMAS (2008) 39. Stephens, L.M., Gangam, A.K., Huhns, M.N.: Constructing consensus ontologies for the semantic web: a conceptual approach. World Wide Web 7(4), 421–442 (2004) 40. Teacy, W., Patel, J., Jennings, N., Luck, M.: TRAVOS: trust and reputation in the context of inaccurate information sources. Journal of Autonomous Agents and Multi-Agent Systems 12(2), 183–198 (2006) 41. Whitby, A., Jøsang, A., Indulska, J.: Filtering out unfair ratings in bayesian reputation systems. The ICFAIN Journal of Management Research 4(2), 48–64 (2005) 42. Williams, A.B.: Learning to share meaning in a multi-agent system. Auton. Agents Multi-Agent Syst. 8(2), 165–193 (2004) 43. Williams, D., Liao, X., Xue, Y., Carin, L., Krishnapuram, B.: On classification with incomplete data. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 427–436 (2007) 44. Wu, F., Zhang, J., Honavar, V.: Learning classifiers using hierarchically structured class taxonomies. In: Proceedings of the Symposium on Abstraction, Reformulation, and Approximation (SARA 2005), vol. 3607, pp. 313–320. Springer, Edinburgh (2005) 45. Yolum, P., Singh, M.P.: Engineering self-organizing referral networks for trustworthy service selection. IEEE Trans. Syst. Man Cybern. A35(3), 396–407 (2005) 46. Yu, B., Singh, M.: Detecting deception in reputation management. In: Proceedings of Second International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 73–80 (2003) 47. Zhang, J., Cohen, R.: A trust model for sharing ratings of information providers on the semantic web. In: Canadian Semantic Web. Semantic Web and Beyond: Computing for Human Experience, pp. 45–61. Springer, New York (2006) 48. Zhang, J., Honavar, V.: Learning decision tree classifiers from attribute value taxonomies and partially specified data. In: Proceedings of the International Conference on Machine Learning (2003)
169