Opinion Mining and Summarization on the Web - CiteSeerX

16 downloads 7692 Views 28KB Size Report
Opinions are a key type of knowledge in these sources. This tutorial topic is thus related to the following WWW-08 areas: data mining, search, semantic Web, ...
WWW-2008 Tutorial Proposal

Opinion Mining and Summarization on the Web Presenter contact information: Bing Liu Department of Computer Science University of Illinois at Chicago 851 S. Morgan (M/C 152) Chicago, IL 60607-7053 Tel: 312-355-1318 Email: [email protected]

Objective:

The objective is to introduce the main tasks and techniques of opinion mining, and to encourage further research and development in this important area.

Duration:

3 hours

Scope: 1. 2. 3. 4.

Sentiment classification Feature-based opinion mining and summarization Mining opinions from comparative and superlative constructs Opinion spam analysis

Relevance to WWW-2008 attendees: Discovering useful knowledge from the user generated content (reviews, forum posts and blogs) on the Web is an important problem. Opinions are a key type of knowledge in these sources. This tutorial topic is thus related to the following WWW-08 areas: data mining, search, semantic Web, and social networks.

Keywords: Sentiment analysis, opinion mining, opinion summarization, opinion spam. Target audience: Researchers, practitioners and graduate students who are interested in extracting and mining opinions and sentiments from the user-generated content on the Web.

Prerequisite knowledge of audience: Basic knowledge of the Web, background in Computer Science or equivalent.

Will tutorial materials be provided to attendees?

Yes. There is no copyright issue.

Tutorial history (previous offerings of tutorial, if any): The tutorial has not been given in any conference. In my ACL-2007 tutorial on Web data mining, I spent 20 minutes on the topic to give a short introduction.

Presenter biography:

Bing Liu is a full professor at the Department of Computer Science, University of Illinois at Chicago (UIC). He obtained his PhD in Artificial Intelligence from the University of Edinburgh. He has published extensively in data mining, Web mining and text mining in leading conferences and journals, e.g., KDD, WWW, AAAI, SIGIR, ICML, TKDE, etc. He has also written a textbook recently titled “Web Data Mining: Exploring Hyperlinks, Contents and Usage Data”. One chapter of the book is on opinion mining. Liu also has extensive research experiences in opinion mining, and has pioneered the research direction of feature-based opinion mining and summarization. Due to his research, he has given many invited talks on the topic in academia and industry. Liu has served (or serves) as the deputy vice chair of the data mining track of WWW-2005, an associate editor of IEEE Transactions on Knowledge and Data Engineering, and an associate editor of SIGKDD Explorations. Currently, he serves as a program co-chair of KDD-2008 (14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). Further information about him can be found at http://www.cs.uic.edu/~liub.

Abstract:

Opinion mining, also known as sentiment analysis, became an important research area in recent years due to many interesting research problems and practical applications. To suit the WWW audience, I will focus on opinion mining from the user generated content on the Web, e.g., customer reviews, forum posts and blogs. It is now well recognized that the user generated content contains valuable information that can be exploited for many applications. Let us use customer reviews as an example. A consumer or a potential buyer of a product always wants to know the opinions of existing users of the product before deciding to purchase it. A product manufacturer also wants to find consumer opinions about its products and those of its competitors. Such information can be used for marketing and product improvements. However, for many products, the number of reviews can be large. Some popular products get hundreds of reviews or more at some merchant sites. It is thus highly desirable to mine such opinions and produce a summary of the opinions. The same can also be said about forum posts and blogs. In this tutorial, I will introduce four main topics of opinion mining, i.e., sentiment classification, featurebased opinion mining and summarization, mining comparative and superlative sentences, and opinion spam. All parts of the tutorial will have a mix of research and industry flavor, addressing seminal research concepts and looking at the technology from an industry angle. Apart from researchers and gradate students, we particularly encourage practitioners from industry to participate because of many important applications.

Description Textual information in the world can be generally categorized into two types, facts and opinions. Existing research on text information processing has been focused on mining and retrieval of facts, e.g., information retrieval, Web search, and many other text mining or natural language processing tasks. Little work has been done on the processing of opinions until only recently. Yet, opinions are so important that whenever we need to make a decision we want to hear others’ opinions. This is not only true for individuals but also true for organizations. One of the main reasons for the lack of study on opinions is due to the fact that there was little text about opinions before the Web. Before the Web, when one needs to make a decision, one typically asks for opinions from friends and families. When an organization wants to find opinions of the general public about its products or services, it usually conducts opinion polls, surveys, and focused groups. With the Web, especially with the explosive growth of the user generated content on the Web in the past few years, the world has changed. Now, if you want to buy a product, you no longer need to ask friends and families because there are plentiful of product reviews on the Web which give opinions of existing users. For a company, it no longer needs to conduct surveys, organize focused groups or employ external consultants in order to find consumer opinions or sentiments about its products and those of its competitors.

Finding such opinion sources and monitoring them, however, can still be very hard because of a large number of sources available on the Web, a huge amount of evaluative text, and their content diversity. In many cases, the opinions are hidden in long forum posts and blogs. It is very difficult for a human reader to find relevant sources, download them, read them, summarize them and organize them into some usable form. Thus, an automated opinion mining and summarization system is highly desirable. Opinion mining grows out of this need. Opinion mining is also a challenging texting mining, Web mining and natural language processing problem. In this tutorial, I will focus on the following important topics: 1. Sentiment classification at the document level and the sentence level: Sentiment classification at the document level studies ways to classify an evaluative document (e.g., a product review) as expressing positive or negative opinion on an object or topic by the authors. Existing methods are mainly based on machine learning techniques and custom score functions. This task is useful when one wants to obtain some general feeling about a particular topic or object. At the sentence level, sentiment classification tries to classify each sentence as expressing a positive or negative opinion. 2. Feature-based opinion mining and summarization: This research goes further to identify what have been commented on by the author and whether the comments are positive or negative. Using product reviews as an example, this model performs the following tasks: (1) identify the product features (parts or attributes) that have been commented on by reviewers, (2) determine whether the comments are positive or negative. The results of mining can be used to produce a structured summary from unstructured text reviews. This is a general model that also covers sentiment classification at both the document and the sentence level. I will introduce this model and also several techniques to perform the tasks. 3. Opinion mining from comparative and superlative sentences: Most opinion mining methods have been focused on mining direct opinion statements, e.g., “the picture quality of camera X is not good”. However, there is another type of evaluations expressed as comparative sentences or superlative sentences, e.g., “the picture quality of camera X is better than that of camera Y”, and “the picture quality of camera X is the best”. In this part of the tutorial, I will discuss some methods for mining opinions from comparative and superlative sentences. 4. Opinion spam analysis: This part discusses the trustworthiness issue of opinions on the Web. Due to the fact that there is no quality control, anyone can write anything on the Web. This results in many low quality reviews, and worse still review spam. For example, if one wants to buy a product and sees that the reviews of the product are mostly positive, one is very likely to buy the product. If the reviews are mostly negative, one is very likely to choose another product. Positive opinions can result in significant financial gains and/or fames for organizations and individuals. This gives good incentives for opinion spam. We will discuss some analysis of spam activities in reviews. All these tasks present major research challenges and their solutions also have immediate real-life applications. The tutorial will start with a short motivation for opinion mining, which is followed by presenting the above problems and current state-of-the-art techniques. Various examples will be given to help participants better understand how this technology can be deployed to help businesses. All parts of the tutorial will have a mix of research and industry flavor, addressing seminal research concepts and looking at the technology from an industry angle.

References [1].

A. Andreevskaia and S. Bergler. Mining WordNet for Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses. In EACL’06, pp. 209–216, 2006.

[2].

N. Archak, A. Ghose, and P. Ipeirotis. Show me the money! Deriving the Pricing Power of Product Features by Mining Consumer Reviews. KDD’07, 2007.

[3].

G. Carenini, R. Ng, and A. Pauls. Interactive Multimedia Summaries of Evaluative Text. IUI’06, 2006.

[4].

H. Cui, V, Mittal, M. Datar. Comparative Experiments on Sentiment Classification for Online Product Reviews. AAAI’06, 2006.

[5].

K. Dave, S. Lawrence, and D. Pennock. Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. WWW’03, 2003.

[6].

X. Ding, B. Liu and P. S. Yu. A Holistic Lexicon-Based Approach to Opinion Mining. WSDM’08, 2008.

[7].

A. Esuli and F. Sebastiani, Determining Term Subjectivity and Term Orientation for Opinion Mining, EACL’06, 2006.

[8].

C. Fellbaum. WordNet: an Electronic Lexical Database, MIT Press, 1998.

[9].

M. Gamon, A. Aue, S. Corston-Oliver, and E. K. Ringger. Pulse: Mining customer opinions from free text. IDA’2005.

[10]. V. Hatzivassiloglou and J. Wiebe. Effects of adjective orientation and gradability on sentence subjectivity. COLING’00, 2000. [11]. V. Hatzivassiloglou and K. McKeown. Predicting the Semantic Orientation of Adjectives. ACLEACL’97, 1997. [12]. M. Hu and B. Liu. Mining and summarizing customer reviews. KDD’04, 2004. [13]. N. Jindal, and B. Liu. Mining Comparative Sentences and Relations. AAAI’06, 2006. [14]. N. Jindal, and B. Liu. Opinion Spam and Analysis. To appear in WSDM-08, 2008. [15]. N. Kaji and M. Kitsuregawa. Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents. EMNLP’07, 2007. [16]. H. Kanayama and T. Nasukawa. Fully Automatic Lexicon Expansion for Domain-Oriented Sentiment Analysis. EMNLP’06, 2006. [17]. S. Kim and E. Hovy. Determining the Sentiment of Opinions. COLING’04, 2004. [18]. S. Kim and E. Hovy. Automatic Identification of Pro and Con Reasons in Online Reviews. COLING/ACL 2006. [19]. N. Kobayyashi, K. Inui, and Y. Matsumoto. Extracting Aspect-Evaluation and Aspect-of Relations in Opinion Mining. EMNLP’07, 2007. [20]. L.-W. Ku, Y.-T. Liang and H.-H. Chen. Opinion Extraction, Summarization and Tracking in News and Blog Corpora. In Proc. of the AAAI-CAAW'06, 2006. [21]. B. Liu, M. Hu, M. J. Cheng. Opinion Observer: Analyzing and Comparing Opinions on the Web. WWW’05, 2005. [22]. R. McDonald, K. Hannan, T Neylon, M. Wells, and J. Reynar. Structured Models for Fine-toCoarse Sentment Analysis. ACL-07, 2007. [23]. Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. Topic Sentiment Mixture: Modeling Facets and Opinons in Weblogs. WWW’07, 2007. [24]. T. Nasukawa and J. Yi. Sentiment analysis: Capturing favorability using natural language processing. K-CA-2003. [25]. V. Ng, S. Dasgupta and S. M. Niaz Arifin. Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews. ACL’06, 2006. [26]. NLProcessor – Text Analysis Toolkit. 2000. http://www.infogistics.com/textanalysis.html. [27]. B. Pang and L. Lee, Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. ACL’05, 2005.

[28]. B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? Sentiment Classification Using Machine Learning Techniques. EMNLP’02, 2002. [29]. A-M. Popescu and O. Etzioni. Extracting Product Features and Opinions from Reviews. EMNLP’05, 2005. [30]. E. Riloff and J. Wiebe. 2003. Learning extraction patterns for subjective expressions. EMNLP’03, 2003. [31]. V. Stoyanov and C. Cardie. Toward opinion summarization: Linking the sources. In Proc. of the Workshop on Sentiment and Subjectivity in Text, 2006. [32]. P. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. ACL’02, 2002. [33]. J. Wiebe, and R. Mihalcea. Word Sense and Subjectivity. ACL’06, 2006. [34]. J. Wiebe, and E. Riloff: Creating Subjective and Objective sentence classifiers from unannotated texts. CICLing, 2005. [35]. T. Wilson, J. Wiebe, and R. Hwa. Just how mad are you? Finding strong and weak opinion clauses. AAAI’04, 2004. [36]. H. Yu, V. Hatzivassiloglou. Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences. EMNLP’03, 2003. [37]. L. Zhuang, F. Jing, X.-Yan Zhu, and L. Zhang. Movie Review Mining and Summarization. CIKM06, 2006.