Appl Intell (2015) 42:481–500 DOI 10.1007/s10489-014-0609-y
Type-2 fuzzy ontology-based opinion mining and information extraction: A proposal to automate the hotel reservation system Farman Ali · Eun Kyoung Kim · Yong-Gi Kim
Published online: 14 November 2014 © Springer Science+Business Media New York 2014
Abstract The volume of traveling websites is rapidly increasing. This makes relevant information extraction more challenging. Several fuzzy ontology-based systems have been proposed to decrease the manual work of a full-text query search engine and opinion mining. However, most search engines are keyword-based, and available full-text search engine systems are still imperfect at extracting precise information using different types of user queries. In opinion mining, travelers do not declare their hotel opinions entirely but express individual feature opinions in reviews. Hotel reviews have numerous uncertainties, and most featured opinions are based on complex linguistic wording (small, big, very good and very bad). Available ontology-based systems cannot extract blurred information from reviews to provide better solutions. To solve these problems, this paper proposes a new extraction and opinion mining system based on a type-2 fuzzy ontology called T2FOBOMIE. The system reformulates the user’s full-text query to extract the user requirement and convert it into the format of a proper classical full-text search engine query. The proposed system retrieves targeted hotel reviews and extracts feature opinions from reviews using a fuzzy domain ontology. The fuzzy domain ontology, user information and hotel information are integrated to form a type-2 fuzzy merged ontology for the retrieving of feature polarity and individual hotel polarity. The Prot´eg´e OWL-2 (Ontology Web Language) tool is used to develop the type-2 F. Ali · E. K. Kim · Y.-G. Kim () Department of Computer Science and Engineering Research Institute (ERI), Gyeongsang National University, Jinju, Kyungnam, 660-701, Republic of Korea e-mail:
[email protected] F. Ali e-mail:
[email protected]
fuzzy ontology. A series of experiments were designed and demonstrated that T2FOBOMIE performance is highly productive for analyzing reviews and accurate opinion mining. Keywords Type-2 fuzzy ontology · Opinion mining · Information extraction · Ontology merging and ontology evaluation
1 Introduction The heterogeneity of the hotel industry on the existing internet architecture is rapidly increasing. A huge number of web pages (such as traveling and hotel booking sites) are launched on the internet and are browsed by numerous users, who leave their individual views on hotels, features and organizations. However, the continual increase of dynamic web pages creates confusion for the user wishing to extract relevant information. “Complete text” is a well-known form of search-engine query to extract precise information from extraneous data on the internet. However, a complete-text query is inefficient for extracting relevant data about a specific topic because most search engines use conventional technologies such as keyword matching mechanisms [18]. The user’s opinion about different organizations in natural language sentences is another problem for the existing search engine architecture. Existing travel and hotel booking websites provide a rating score for organizations. The rating score does not provide precise information, although the reviews are meaningful because they help the user to decide about the hotel. The reviews usually contain the many users’ opinions in the form of natural language sentences. However, it is difficult for users to read all the reviews and obtain a meaningful opinion regarding their
482
F. Ali et al.
requirements of the hotel. Generally, travelers hide their opinions about the hotel, instead discussing it in terms of hotel features, for example, “the hotel location is good, but the rooms are small and without internet facility”. The main idea of the proposed system is to reformulate the user query for hotel searching, retrieve the hotel reviews and then compare the extracted hotel feature opinions with the user’s requirement. The system provides a summary in the form of polarity (strong positive, positive, neutral, negative and strong negative) along with the hotel and travel agency name. One recent useful method is opinion mining, which helps the user judge the success of the hotel by determining its popularity and particular features. The information extraction and opinion mining systems are mostly based on classical ontology. Classical ontology addresses crisp data and cannot retrieve desirable results from the blurred source of internet data. Therefore, fuzzy logic is integrated with classical ontology. The combination of both technologies has been proposed in several systems in recent years to better answer the user’s queries [1]. However, many heterogeneous systems share information, and the raw fact archive on the internet is rapidly increasing. Type-1 fuzzy ontology-based systems are inadequate and can extract relevant documents from the internet only to a limited extent. Therefore, a type-2 fuzzy logic system with ontology is considered a useful technology in extracting precise information from the hazy data environment of the internet. To resolve these problems, this paper proposes a system called type-2 fuzzy ontology-based opinion mining and information extraction (T2FOBOMIE). The overall process of T2FOBOMIE contains three phases: user query reformulation, feature opinion extraction, and comparison of the reformulated query with feature opinions. The query reformulation phase is completed in the following four steps. • • • •
Tags the parts of speech in the user query. Eliminates prepositions, articles and pronouns. Extracts the noun, adjective and adverb words. Converts the natural-language text into a proper searching query.
In the feature opinion extraction phase, the system downloads the target hotel’s reviews, extracts the features and opinion words from the reviews, and then applies a type-2 fuzzy opinion ontology to find hotel polarity information. Similarly, the type-2 fuzzy merge ontology is used to find individual feature polarity and hotel polarity in the comparison of the reformulated query with feature opinions phase. Furthermore, input query nouns are compared with hotel features to find semantic similarities and polarity of similar features. The rest of this paper is organized as follows: Section 2 presents related works. Type-2 fuzzy ontology is
explained in Section 3. Section 4 briefly explains the overall scenario and internal process of the proposed system. Finally, Section 5 presents the experiments and their results.
2 Related work Information extraction and mining opinions from travelergenerated reviews are a hot topic in natural language processing and information engineering research. The increase of travel websites on the internet has made the information extraction controversy more challenging. Presently, people use a full-text query in an available search engine to acquire information about a specific hotel. Most of the available search engines are still based on keyword matching and are unable to extract the meaning of the data from their servers. As a solution, extensive methodological work is required to classify the input query and acquire the needed information from the internet. Scientists have been working on this issue and have proposed several solutions to extract relevant information [1, 7, 9]. A system to extract information from the internet using a semantic full-text search query is presented in [3]. This system explains the advantages of full-text query search and describes how it works well when the extracted documents contain the keywords of the query because the primary concern is precision, not recall. The main problem with their method is that current search engines are based on keywordmatching algorithms. Indeed, a full-text query approach is not useful to extract relevant information from heterogeneous sources of internet data. Recently, the opinion-mining method has gained focus in two novel research directions: feature-level opinion mining and sentiment classification. In sentiment classification methods, the document is classified (positive, negative and neutral), and the knowledge is designated manually or semi-manually to represent the polarity words [17, 37]. In feature-level opinion mining, the features are identified to extract the opinion of features from reviews [27, 36]. In the previous study [27], the authors presented a method which accesses HTML documents and manually tags the polarity (-, +) for each opinion sentence. The mentioned opinion words are recognized as positive or negative and are easily used to tag the polarity manually. However, some opinion words are used as a verb, objective, adjective, or adverb; it is difficult for a system to understand these and to assign the correct polarity value automatically. The identification of features in current works is normally expressed as a noun or noun phrase in reviews. The idea of opinion mining and summarization is proposed in [10], which determine the product feature opinions (positive or negative) using a lexicon-based method. During the past few years, ontology has been widely used in the field of information extraction and opinion mining.
Type-2 fuzzy ontology-based opinion mining and information extraction
An ontology is a formal adjustment of a shared conceptualization of a specific domain into a machine-readable and human-understandable format [44]. A crisp ontology is used to solve the feature identification problem in the domain of movie reviews [46]. A crisp ontology is useful to extract data from arranged information. However, most reviews on the internet are in blurred format. Therefore, the mentioned crisp ontology is insufficient to define feature fuzzy terms (e.g., Room {small, normal and large}). Fuzzy logic works excellently with classical ontology during uncertain input execution. Fuzzy domain sentiment ontology is proposed for product-review sentiment analysis in [31]. The method in [31] performs at a product feature level to provide deep logic for the target product and uses ontology to construct relationships among product attributes, equivalent sentiments through fuzzy sets and assign polarity to sentiment words. A multilingual ontology system presents a framework for the accommodation sector in [10]. This framework receives a set of line reviews and annotates them in graphical format for the decision-making system. However, there is a limitation of fuzzy logic with ontology to represent linguistic variables of hotel features and intelligent knowledge for decision-making. A fine-grained sentiment classification of online product reviews approach is proposed to affect the linguistic hedges on opinion descriptors in [12]. This sentiment classification system classifies reviews into multiple output classes as positive or negative and automatically extracts opinion phrases from user reviews. Moreover, every linguistic variable stores sentiment scores and presents them in the form of a table. An ontology can act as a knowledge base to store all linguistic variables with sentiment scores and provide a common platform for feature polarity classification. It is notable that a type-1 fuzzy ontology-based system can precisely determine hotel polarity to some extent. However, it cannot perfectly address reviews when the feature information is intensively blurred. Type-2 fuzzy logic with classical ontology is the solution to this problem. The three-dimensional structure of type-2 fuzzy logic can handle blurred data easily. A type-2 fuzzy ontology based on fuzzy markup language (FML) is proposed to represent the ‘computer go knowledge’ including type-2 fuzzy logic, ontology and transformation of FML [30]. This FMLbased system uses type-2 fuzzy logic to handle uncertainties during intelligent decision making and to provide more degreed of freedom than do type-1 fuzzy-based systems. A classical ontology with type-2 fuzzy logic for meeting scheduling system is presented in [29]; it shows that T2FO made an organization meeting-scheduling process easy and recommended some references from attendees for the meeting hosts. This system uses a type-2 fuzzy ontology with a decision-support multi-agent system to provide facilities for organization during a meeting-scheduling process.
483
Additionally, the system uses fuzzy markup language (FML) to represent the knowledge and rule bases for the proposed system. Our proposed system uses an Extensible markup language (XML)-based ontology, which creates an intelligent knowledge base to store all linguistic variables with sentiment scores and provides a common platform for feature-polarity classification. A multiagent system with a type-2 fuzzy ontology combining strengths of these approaches is proposed for a personalized flight ticket booking domain in [7]. In such an approach, the authors proposed a general type-2 fuzzy ontology to reduce the manual work of air ticket booking. However, the proposed ontology is only concerned with a multi-agent system (MAS) to address information security challenges and to exchange transparent information among different agents of the distributed system. The discussion of previous studies shows that most of the research is flawed in the areas of opinion mining and information extraction. Most proposed information extraction systems are based on a crisp ontology or type-1 fuzzy ontology. Unfortunately, classical ontology cannot achieve the desired result from blurred data resources. Additionally, a type-1 fuzzy ontology-based system can extract data from hazy information only to a limited extent and cannot perfectly address intensively blurred data. The proposed type-2 fuzzy ontology-based opinion mining and information extraction is a novel effort to design an automatic hotel reservation system. In the proposed system, the user query is reformulated to extract the user’s desires and convert a query to proper format for hotel searching. Furthermore, the proposed type-2 fuzzy ontology presents domain knowledge for hotel features opinions. The type-2 fuzzy ontology is merged with the hotel ontology, user ontology, provider ontology, and fuzzy domain ontology to easily extract information for hotel reservations.
3 Type-2 fuzzy ontology The concept of a fuzzy set was proposed by Lofti Zadeh in 1965 to extract vague concepts [47]. Fuzzy set theory generalizes crisp set and extends its ability. It has been used in different areas to handle linguistic uncertainties. However, it can extract uncertainty only to a certain level. Type-2 fuzzy logic overcomes this problem because its membership grade is a fuzzy set instead of crisp [33, 43]. Mathematically, type-2 fuzzy set theory is defined in the following equations [7]. ˚ = {(x, μ), μ A(x, μ)|∀x ∈ X ∀μ ∈ Jx ⊆ [0, 1]} Where 0 A ≤ μ A(x, μ) ≤ 1
(1)
484
F. Ali et al.
If x = x , then for each value of x, we have the following. μA(x ) =
μ Jx fx (μ) , for μ ∈ Jx ⊆ [0, 1] and x ∈ x μ (2)
˚ is defined by the membership funcA type-2 fuzzy set A tion μ A (xμ), where χ ∈ x, μ ∈Jx ⊆ [0, 1] and μ Ax is a secondary membership function. Figure 1 shows a type˚ The shaded region is fully enclosed by the 2 fuzzy set A. upper membership function (UMF) and lower membership function (LMF). The shaded region is the collection of all primary membership functions and called the footprint of uncertainty (FoU) [4, 45]. The FoU of A is defined in the following equation.
4 Development of a type-2 fuzzy ontology-based opinion mining and information extraction system In this section, the T2FOBOMIE system is introduced along with the architecture and internal process of the proposed system. 4.1 The design architecture of the proposed system The architecture of the proposed system is type-2 fuzzy ontology-based opinion mining and information extraction (T2FOBOMIE) as shown in Fig. 2. The system architecture (Fig. 2) is divided into nine parts for simplicity. These parts are as follows.
(4)
1. Query processing and reformulation (QPR). 2. Assigning to the web crawler (WC) and google search engine (GSE). 3. Query storage (QS). 4. Consumers review database (CRD). 5. Pre- processing of reviews, Part-of-speech (POS) tagging, Stemming, and feature extraction of consumer reviews. 6. Opinion mining (OM). 7. Type-2 fuzzy ontology (T2FO). 8. Fuzzy domain ontology (FDO). 9. Classified review merged ontology (CRMO).
In the above equation, C stands for concepts, P is used to represent the Properties of concepts, R shows the relationship between concepts, V represents the value of concepts and Vc represents the constraint value of properties, respectively.
A user first generates a query and specifies their requirement for hotel reservation accordingly (Task 1 in Fig. 2). The QPR module expands the user query and submits it to the Web crawler (www) and APIs (GSE and e-commerce site) (Task 2 in Fig. 2). The T2FOBOMIE system uses the
FoU A = μχ ∈X Jx = DoU A DoU = Degree of Uncertainty
(3)
An ontology is used to share domain knowledge among people and software to reuse domain classes instead of remodeling them. It characterizes knowledge in a single domain as a set of concepts and relationships between concepts. Mathematically, ontology is defined in the following equation [7]. ˜ = (C, P, R, V, Vc ) O
Fig. 1 Type-2 fuzzy set
Type-2 fuzzy ontology-based opinion mining and information extraction
485
Fig. 2 T2FOBOMIE system architecture
search engines (e.g., Google) and e-commerce sites (e.g., Booking.com, hotels.com and tripadvisor.com) which provide APIs to retrieve consumer reviews for an appropriate hotel. The expanded user query is stored in query storage (QS) to assign a type-2 fuzzy ontology for further processing (Task 3 in Fig. 2). The crawler of our system also retrieves information about hotel features (Rooms, the gallery, Eat & Drink, Stay Fit and Free Wi-Fi) and downloads consumers reviews (Task 4 in Fig. 2). A set of consumer reviews is downloaded and pre-processing procedures are applied to stop word removal, sentence splitters, part-of-speech (POS) tagging, stemming and feature extraction (Task 5 in Fig. 2). The Wordnet API POS tagger is recycled for part-of-speech tagging, which is based on the WordNet lexicon [42]. Opinion mining is the next phase after preprocessing; this takes the features of the reviews i.e., positivity/ neutrality/ negativity, and generates a set of hotel features and their cumulative polarities (Task 6 in Fig. 2). The core of the proposed mechanism is based on type-2 fuzzy ontology (T2FO) (Task 7 in Fig. 2) to handle any type of real world scenario related to the hotel-booking domain. T2FO is executed offline and must be performed with the opinion- mining task by fuzzy domain ontology concepts. T2FO captures user information, hotel descriptions and provider descriptions. The hotel-booking support system information is gathered from the internet and is classified manually. Collection of data from the internet is
a primary task and can help to expedite the design process of fuzzy domain ontology. The fuzzy domain ontology captures analyzed information (Task 8 in Fig. 2) such as • • •
“price” (sub-class) “is-a” cost of “room” (super class of price), “rating” (sub-class) “is-a” standard score of “hotel” (super class of rating), “Wi-Fi” (room-feature) is “associated with” “room” (super class)
Consumer reviews, hotel information, user data and provider information can be retrieved from e-Commerce sites or search engines. We gathered all information and delivered it to the type-2 fuzzy ontology module to routinely build a merged ontology. The type-2 fuzzy ontology is described in detail in the following section. The opinionmining unit is based on fuzzy domain ontology, which can evaluate each pair of hotel features and verify its polarity. The hotel features polarity score is gathered from all the reviews. The final sentiment result and polarity values can be achieved for the target hotel. Classified review-merged ontology is accountable for representing the opinion mining result as well as anticipating the type-2 fuzzy ontology and fuzzy domain ontology (Task 9 in Fig. 2). The classified review-merged ontology merges a type-2 fuzzy ontology with an opinion-mining score and a fuzzy domain ontology. The user uses DL query [15] and Sparql query [39]
486
to retrieve exact hotel information and high sentiment score from a classified review merged ontology and then to connect with transaction processing. The transaction module will connect with the travel agency that has the required hotel in its contacts or directly connect to the hotel website for hotel reservations. An ontology is used to share domain knowledge among systems and people and is written in a specific language called OWL (Web Ontology Language) [34]. The Prot´eg´e is used to develop an ontology with OWL-2 [32, 41]. The Fuzzy OWL plug-in of Prot´eg´e is a semi-automatic tool, is used to convert a classical ontology into a fuzzy ontology. The fuzzy domain ontology is a knowledge representation for automatically generated opinion lexicon [28], which is established in OWL. It is mainly employed to exchange knowledge between a type-2 fuzzy ontology and opinion-mining systems on the web. Therefore, the representation of the opinion lexicon as a fuzzy domain ontology expedites the type-2 fuzzy ontology and repeats opinion knowledge to construct a sentiment lexicon view in a “classified review merged ontology”. Finally, all the results are viewed together with the polarity terms (neutral, positive, negative, strong positive and strong negative) in a classified merged ontology and optimized for user retrieval and interaction requirements. In the next sections, the internal working of each part is elaborated one by one. 4.2 Query processing and reformulation (QPR) To achieve a more appropriate and exact query, the user query is reformulated. The reformulation process is divided into four steps: morphological and semantic analysis, sentence-splitting process, word-tagging process and query enrichment. All these steps are performed in series. A Nearly New Information Engineering (ANNIE) with WordNet is used to make a natural language processing module. ANNIE is a sub-module of Generalized Architecture Text Engineering (GATE) [11], which is used to retrieve meaningful information from scattered data for further processing. GATE is a development environment for information engineering systems to provide the facility of sentence splitting, part-of-speech tagging and human languages processing [8]. To precisely understand the query reformulation process, a natural language query is quoted
Fig. 3 A sample user query in natural language format
F. Ali et al.
from the web user, which is shown in Fig. 3, and is applied to the following parts to find a more relevant query for a hotel reservation. 4.2.1 Morphological and semantic analysis Morphological analysis is the main component of natural language processing, dealing with the identification of the numerous forms of words using a lexicon. The lemmatization algorithm with morphological analysis is employed to determine the lemma of the mentioned words in a query, for example, “we need to find a centrally located hotel in Seoul with best restaurant”. In this query, the words ‘centrally’, ‘located’ and ‘best’ have the basic forms ‘central’, ‘locate’ and ‘good’ as their lemmas. Now the lemmas of the words will be applied in the new query. The Tree Tagger of Document Object Model (DOM) confirmed the lemmatization of the input query in the T2FOBOMIE system [23]. After lemmatization, the semantic analysis uses WordNet to solve the ambiguities of word meaning. It provides opportunity to select relevant meanings of included keywords in an input query. 4.2.2 Sentence-splitting process Sentence splitting is a process of segregating a compound text into small chunks or tokens. The compound sentence or user query may contain delimiters such as ‘;’, ‘.’, ‘:’, ‘ ’, and white spaces. The splitting function removes the delimiters from the user query. We took a user query, such as “We would like to stay in Seoul for one week at the end of June. My partner is vegetarian. So we need centrally located hotel in Seoul with good vegetarian restaurant and cost no more than 100 dollar per night”. After applying the splitting process, the system gets the result in the form of tokens: ‘we’, ’would’, ’like’, ’per’, ’night’ and so on. The generated result of the splitting process is stored in the form of an array and sent to the next function for further processing. 4.2.3 Word-tagging process The word-tagging process parses and tags each word of the query according to its grammatical order as follows: “We PRP would MD like VB to TO stay VB in IN Seoul NNP for IN one CD week NN at IN the DT end NN of IN June NNP . . My PRP$ partner NN is VBZ vegetarian JJ . . So IN we PRP need VBP centrally RB located JJ hotel NN in IN Seoul NNP with IN good JJ vegetarian JJ restaurant NN and CC cost NN no RB more RBR than IN 100 CD dollar NN per IN night NN . .”. The word-tagging process is recycled to tag every word of a sentence into a noun, pronoun, verb, adverb, preposition, adjective, conjunction or interjection. The tagging and parsing of a
Type-2 fuzzy ontology-based opinion mining and information extraction
compound sentence is a very difficult task because words can change their meaning in different positions of a sentence. However, the GATE functionality can perform this task very conveniently. 4.2.4 Query enrichment The query enrichment process uses TagCrowd algorithm to eliminate prepositions, pronouns and articles (a, an, the, of, as) to generate a proper searching query from a natural language text query [40]. The outputs of TagCrowd are ’central’, ‘hotel’, ’locate’, ’night’, ’restaurant’, ’Seoul’, ’stay’ and ’vegetarian’. The query enrichment algorithm finds a noun and verbs from the generated query array and grammatically organizes them to form a new query for hotel reservations. The words ‘hotel’, ‘central’ and ‘Seoul’ have a unique sense in WordNet. Therefore, the semantic analysis enriches and consults with the ontology to regenerate the first query and randomizes query words such as “central hotel Seoul”. The organized query is assigned to the API and web crawlers to find specific hotels and retrieves consumer reviews. The organized query is also stored in query storage to provide extra information about user requirements for the type-2 fuzzy ontology. The ontology is used to provide the user budget detail, personal information and other related information about hotel reservations. 4.3 The fuzzy ontology web crawler (WC) and google search engine (GSE) After the query enrichment process, the query is assigned to a web search engine and web crawler. The Google Search Engine (GSE) provides a list of popular web pages and its ranked-list result about the input query. The resulting documents are downloaded, except that word and pdf files are retrieved from their URLs in the form of HTML documents. Every URL represents the HTML documents as a parent class. The fuzzy ontology web crawler retrieves relevant information from fuzzy resources. The fuzzy ontology web crawler defines target URL patterns for links to follow, which hold hotel review pages and prove whether the target pages are changeable. The fuzzy web crawler handles HTML pages and frequently updates information using Really Simple Syndication (RSS). Most of the time, URLs contain many irrelevant links for retrieving information about hotels (e.g., advertisements and other links). Therefore, the developed fuzzy web crawler is restricted to pursue only appropriate links. The crawler normally goes through many intermediate pages, but selects alternative pages to retrieve consumer reviews. It also retrieves metadata such as scores and information about the reviewers and stores these data in the consumer review database (CRD).
487
4.4 Preprocessing A set of consumer reviews is retrieved from the CRD. It is pre-processed to remove needless content (such as dates, reviewer’s name and tags). Review sentences are tagged and split to easily identify the feature opinions and then stem the sentence to reduce words such as ‘cleaner’, ‘cleaning’ and ‘cleaned’ to the root word ‘clean’. Furthermore, features are extracted to find polarity and directly connect these polarities with the type-2 fuzzy ontology instance. In the initial step, the review is tagged to recognize the words and identify their part of speech. NLP is used to tag the consumer reviews [38]. For example, the POS tagging for the consumer review, “Stayed here as a transit hotel. Rooms are small but clean and it is in the middle of shopping district Myung-dong and near to downtown. A restaurant is just okay, but lots of variety”, is shown in Fig. 4. In the figure, VBN is used for verb past participle, NNS for plural nouns, JJ for an adjective and CC for a coordinating conjunction. The other part of speech is explained in [35]. After POS tagging, the reviews are stemmed to incline the accuracy of search features and removed stop-words, prepositions (‘on’, ‘in’, ’of’) and articles (‘the’, ’a’, ’an’). The consumer reviews are split to perceive a complete passage, which contains noun and verb phrase. At this stage, every sentence of the review is checked to confirm whether it is a complete clause with noun phrase and verb phrase. In the abovementioned review, connective words (‘and’, ’but’) and comma (‘,’) are used to discriminate the review sentence. The first sentence, “Stayed here as a transit hotel”, is a complete clause; therefore, there is no need to split it. The second sentence, “Rooms are small but clean and it is in the middle of shopping district Myung-dong and near to downtown”, has conjunctions and verbs. Therefore, it requires splitting to identify a complete clause, which comprises one noun, conjunctions, and verbs [26]. The sentence “Rooms are small but clean” is a perfect clause because it is composed of noun ‘room’ and verb ‘are’. The next split sentences are “the middle of shopping district Myung-dong near downtown” and “restaurant just okay”. However, the last sentence, “lots of variety”, has no noun phrase and cannot be carried as a complete sentence; therefore, it is attached to the previous sentence to form a new sentence “A restaurant just okay but lots of variety”. 4.4.1 Feature extraction based on fuzzy domain ontology Feature extraction from a single sentence is only to select a noun phrase. For example, in the sentence “Rooms are small but clean”, the ‘room’ is a noun phrase and can be easily identified. However, it is difficult to apply these rules if the sentence does not include any adjective or data that can be used for opinion such as “Stayed here as a transit hotel”.
488
F. Ali et al.
Fig. 4 POS tagging for consumer review
The previously mentioned sentence does not contain any opinion data for hotel, so it must be eliminated. The fuzzy domain ontology describes the concepts of a hotel feature and its relationship with type-2 fuzzy ontology concepts. The domain ontology has online available hotel database information. It is used to identify features in a sentence such as {‘hotel’, ‘room’ and ‘restaurant’}. The ‘hotel’, ‘room’ and ‘restaurant’ are already described in the domain ontology with its individuals. Every extracted feature from the consumer review sentence is checked against domain ontology
Fig. 5 Relationship between ontology terms of Fuzzy domain ontology
classes. The matched feature with specified classes is validated to predict its polarity; otherwise, it will be discarded. The fuzzy domain ontology with opinion-mining algorithm determines the power and polarity for each pair of features; this is explained in the next section. 4.4.2 Fuzzy domain ontology-based opinion mining The fuzzy domain ontology describes domain knowledge and creates a relationship between feature polarity and
Type-2 fuzzy ontology-based opinion mining and information extraction
type-2 fuzzy ontology concepts. Seven steps for developing an ontology that cover all aspects of ontology modeling are presented in [8]. The fuzzy domain ontology is developed using Prot´eg´e 4.3 with OWL-2. Every development phase of an ontology is discussed with domain experts to achieve accuracy and precision in the experiment. Figure 5 shows the relationship of fuzzy domain ontology terms.
Fig. 6 Graphical architecture of type-2 fuzzy ontology
489
The Prot´eg´e OntoGraf tab is used to develop the graph. The feature and opinion words have been identified in the ontology. For example, in the abovementioned query, the feature ‘room’ has two opinion words {‘clean’ and ‘small’} and restaurant has {‘okay’ and ‘lots of variety’} . Here ‘clean’ and ‘lots of variety’ show positive opinion, ‘small’ shows negative opinion and ‘okay’ represents neutral
490
opinion. If the NLP rules cannot identify the polarity of opinion words, a default sentiment opinion lexicon such as SentiWordNet is invoked to replace synonyms on opinion words [41]. For example, “restaurant is okay, but lots of variety” is converted to “restaurant is okay but great quantity”. It is already declared that ‘great’ is a positive opinion word in the domain ontology, so it represents a positive opinion of the restaurant. Sometimes, a consumer review sentence comprises many product features (f1 . . . . . . .fm ) and opinion words (w1 . . . .wn ). The opinion aggregation formula is used to calculate the opinion words score of feature fi in sentence sk [14]. This article proposes semantic scores for opinion positive words as 1 and opinion negative words as 1, which is assigned to a feature sentence during execution of reviews. Type-2 fuzzy ontology attributes are used as features in the fuzzy domain ontology; these are hotel feature {room, restaurant, free Wi-Fi, wine cellar, air conditioning, luggage room, concierge service, dry cleaning service, in room breakfast, non-smoking room, swimming pool, nightclub, business room, health and laptop computer available} . Each feature has its own polarity word such as: room polarity {‘clean’, ‘big’ and ‘good’ are opinion words with polarity value p = 1, ‘dirty’ and ’ small’ are opinion words with polarity value p = −1, and ‘medium’ and ’normal’ are opinion words with polarity value p = 0} . Similarly, the opinion words of restaurant are restaurant polarity {‘good’, ’excellent’, ‘great’ and ’satisfactory’ with polarity value P = 1, ‘bad’ and ‘poor’ with polarity value p = −1 and ‘okay’ with polarity value p = 0} . The SentiWordNet tool is employed to determine the initial sentiment value of the corresponding opinion words [2, 12, 25]. For example, the SentiWordNet score for opinion word “good” (used to describe the quality of the room) is 0.5 as an adjective, 0.531 as a noun and 0.188 as an adverb [20]. This is used to indicate opinion words as positive, negative and neutral. The opinion word “good” is used as an adjective, and its sentiment intensity value, ”0.531”, is used as a linguistic value in the ontology. The fuzzy sentiment variable of the reviews is computed using the Fuzzy OWL plugin of Prot´eg´e 4.3. This semi-automatic tool creates a fuzzy data type, fuzzy concepts and a fuzzy modifier. The fuzzy OWL plugin also represents annotations to define linguistic polarity terms. The polarity terms are strong positive, positive, neutral, negative and strong negative; its range is [-1, 1]. The polarity range is normalized using min-max normalization and mapped to a range of [0, 1] as described in [21]. The polarity terms are computed using the following rules. If (feature polarity >0.75 and feature polarity ≤1) then opinion =“strong positive” Else
F. Ali et al.
If (feature polarity >0.5and feature polarity ≤0.75) then opinion =“positive” Else If (feature polarity =0.5) then opinion =“neutral” Else If (feature polarity >0.25 and feature polarity