A Graph Clustering Approach to Product Attribute Extraction by Santosh Raju, Praneeth Shishtla, Vasudeva Varma
in 4th Indian International Conference on Artificial Intelligence
Tumkur (near Bangalore), India
Report No: IIIT/TR/2009/252
Centre for Search and Information Extraction Lab International Institute of Information Technology Hyderabad - 500 032, INDIA December 2009
A Graph Clustering Approach to Product Attribute Extraction Santosh Raju, Praneeth Shishtla, and Vasudeva Varma Search and Information Extraction Lab, International Institute of Information Technology. Hyderabad, India santosh
[email protected],
[email protected],vv@iiit. ac.in http://search.iiit.ac.in
Abstract. This work focuses on attribute extraction from product descriptions. We propose a novel solution to extract attributes of a product from a set of text documents. A graph is constructed from the text using word co-occurrence statistics. We compute word clusters and extract attributes from these clusters using graph based methods. Our solution is able to achieve nearly 80% precision and 45% recall. Experiments show that the methods employed are effective in identifying attributes for different dataset sizes.
1
Introduction
Recent trends in web show a rapid expansion of e-commerce and millions of transactions are taking place on wide range of products. An online shopper willing to buy a product has to go through its description in the website to know its features. Often there are many varieties and it is painful for a consumer to manually read all the descriptions to select a product. Also, manually creating feature lists is a difficult and time consuming task for e-commerce websites and search engines with new products emerging every day. One solution to this problem is to automatically extract useful information from the product descriptions. In this paper, we deal with the problem of automatically extracting product attributes. We define an attribute as a tangible or intangible property or feature of the product. Given a set of descriptions for a product, we extract a set of attributes. A sample product description of Digital SLR Camera is shown in Fig. 1. As it can be seen from the above sample, descriptions contain snippets which are incomplete sentences or long phrases like power-up time of approximately 0.2 seconds, RAW and JPEG capture, Includes 18-135mm AF-S DX Zoom-Nikkor lens etc. This makes the extraction task difficult. The challenge is to learn the characteristics of a product class from a small dataset that is sparse. We propose a graph clustering based approach to identify the attributes. Our solution is completely unsupervised which doesn’t use any domain knowledge. The rest of the paper is organized as follows. In Section 2, we discuss the related work. Section 3 explains the Attribute Extraction algorithm. We discuss
2
Santosh Raju, Praneeth Shishtla, Vasudeva Varma
Fig. 1. No. of clusters formed with varying iterations
our experiments and results in Section 4. Finally we present the future work and conclude in Section 5.
2
Related Work
Product attribute extraction is a problem that is closely related to Key phrase extraction. Both the tasks involve extraction of phrases from a single or set of input documents. Key phrase extraction is a more generalized problem which tries to identify important concepts discussed in the documents. [1, 2] describe work on domain specific key phrase extraction. [3] propose domain independent and unsupervised approaches to the problem. Product attribute extraction is a more specific problem where the input documents talk about a product(s). It has an additional constraint that the phrases extracted should define a property of the product being discussed. This makes the product attribute extraction a special and difficult problem. Sentiment analysis deals with classification of opinion text as positive or negative. [4, 5] present various solutions for sentiment analysis from customer reviews which is a huge resource for opinion content. [6–8] present techniques to extract product features from customer reviews. Product feature extraction from descriptions pose different challenges. Numerous reviews are available for each product whereas product descriptions are few in number and the text is sparse. [6] mine the frequently co-occurring words in phrases to find the product features using association rule mining. [7] present techniques that are based on frequently occurring patterns in reviews to extract product features from customer reviews. Patterns of this kind are rare in product descriptions which make the task challenging. Recently, few approaches [9, 10] were proposed to extract attributes from product descriptions. A semi-supervised algorithm is presented in [9] which tags attributes and values in the sentences from text descriptions of multiple products of a domain. We focus on extracting attributes specific to a product. Moreover, our unsupervised technique can extract attributes from small datasets containing less than 50 descriptions whereas their approach requires relatively large dataset belonging to a domain. [10] proposed a method based on clustering of noun phrases to extract the product attributes. They cluster noun phrases from the descriptions and extract
A Graph Clustering Approach to Product Attribute Extraction
3
an attribute from each of these clusters. They reported a very small scale experiment on the same. We explore a new approach where we try to group words from the descriptions using graph clustering techniques.
3
Attribute Extraction
Our approach to attribute extraction is based on the following hypothesis: In a description collection of a product, since attribute terms repeat in multiple products, they are more likely to occur than other terms. We try to exploit this redundancy to capture the attributes. Thus the simplest way to select attributes would be to take the most frequent terms in the collection. However, this method has a drawback. This method gives only frequent attributes and is likely to miss rare attributes appearing in only few products. To overcome the first problem, we propose a two stage method. In the first stage, we cluster the all the words found in the documents such that all the words close to an attribute are grouped together in a single cluster. This results in word clusters of different sizes. In the second stage which is explained in 3.4, we extract an attribute from each cluster. In this work, we propose a new method for the problem based on word clustering. A preliminary observation of the product descriptions showed that attribute and the corresponding values usually co-occur in noun compounds. So we represent the documents in a co-occurrence graph which exhibits the small world property. We give more details about the small world property in Section 3.1. This motivates us to use a graph clustering algorithm where we group all the words related to an attribute in a single cluster. We use Chinese whispers algorithm to cluster the words which has been used to cluster graphs exhibiting the small world property. We explain Chinese whispers algorithm in Section 3.3. Then we extract an attribute from each of these clusters. We explain graph clustering and attribute extraction methods in the following sub-sections. 3.1
Small World Property
A graph which is characterized by the presence of densely connected sub-graphs and where there exists a path between most pairs of nodes is said to possess the small world property. Most of the nodes need not be neighbors of one another, but can be reached from every other node by a small number of hops. The nodes that are densely connected share a common property and when mapped this to a social network represents the communities formed by the people. In social networks, two people may not know each other directly, but it is possible that both people are connected by common people[11]. There exist many other graphs which are found to exhibit small-world property. Examples include road maps, food chains, electric power grids, neural networks, voter networks, telephone call graphs, and social influence networks. We refer the reader to [11] for more details on the dynamics and structural properties of the small world graphs. According to Ferrer and Sole, co-occurrence graphs also possess the small world property[12]. The graph built in from the product
4
Santosh Raju, Praneeth Shishtla, Vasudeva Varma
descriptions also possesses the co-occurrence property. We now describe how the text is modelled as a graph in section 3.2. 3.2
Graph
Let D be a set of descriptions describing different varieties of a product. We tag these descriptions for POS tags using Brill’s Tagger and obtain all the noun phrases. We represent these phrases in a weighted, undirected graph G=(V,E) where each vertex vi ∈ V represents a distinct word in the document collection D and each edge (vi ,vj ,wi,j ) ∈ E represents co-occurenc es between a pair of words. Since a noun phrase typically describes a single attribute, we limit the context to the boundaries of the noun phrase. This way of using a complete noun phrases helps us capture the context better than a fixed window approach. So we say that two words co-occur if they occur within a noun phrase boundary. The weight of an edge wi,j is the number of co-occurenc es between the pair of words represented by vertices vi and vj . The neighborhood N (vi ) of a vertex vi is defined as the set of all nodes vj ∈ V , connected to vi i.e. (vi ,vj ,wij ) or (vj ,vi ,wij ) ∈ E. We build an adjacency matrix A from the graph G and identify the densely connected nodes in the graph using Chinese Whispers algorithm. 3.3
Chinese Whispers
Chinese Whispers (CW) [13] is an algorithm for partitioning the nodes of a weighted, undirected graph. This algorithm is motivated by a children’s game where children whisper words to each other. Though the goal of the game is to derive a funny message of the original text, CW finds the groups of nodes that share a common property. In children’s game all the nodes that broadcast the same message fall into a single cluster. Chinese Whispers is an iterative algorithm which works in a bottom-up fashion. It starts by assigning a separate class to each node. In each iteration, every node is assigned the strongest class in its neighborhood, which is the class having the highest number of edges to the current node. This process continues until no other assignments are possible for any node in the graph. The pseudo code of the algorithm is given below Algorithm 1 Pseudo code for Chinese Whispers Algorithm initialize: for each node vi in V: class vi = i ; while newclusters forall vi in V: class(vi ) = highest class in N(vi );
Generally, the CW algorithm can result either in a soft partition or hard partition. In our task, we use the hard partioning algorithm of CW i.e each node
A Graph Clustering Approach to Product Attribute Extraction
5
is assigned exactly one class. After obtaining the clusters, we proceed to the next step where we extract attributes represented by the clusters. Since the CW algorithm does not converge formally, it is important to define a stop criterion or to set the number of iterations. To show that only a few iterations are needed until almost-convergence, we conducted an experiment to see how the iterations affected the clusters formed in the CW algorithm. We chose four products namely iPods, Violins, Dome Cameras and Digital SLRs with 50 descriptions each. We ran the CW algorithm for iterations varying from 1 to 100 and recorded the number of clusters formed. Figure 2 plots the number of clusters against number of iterations. From the graph, we observe that for the first iteration, the number of clusters formed is very high which is equal to the number of unique tokens in the product description (as the CW algorithm starts by assigning a different class for each token). And gradually, during the initial few iterations, we see a exponential decrease in the number of clusters formed. For higher number of iterations, the difference between the clusters for consecutive iterations was minimal and thus it reaches an almost-convergence state. For our future experiments, we fixed the number of iterations to 80.
Fig. 2. No. of clusters formed with varying iterations
6
3.4
Santosh Raju, Praneeth Shishtla, Vasudeva Varma
Extraction
An attribute can be a single word attribute (monitor, zoom) or a multi-word attribute (water resistant, shutter speed ). A preliminary observation of the descriptions revealed that attributes are usually composed of a maximum of three words. So, we consider only n-grams up to length 3 as candidate attributes. In English language, concepts are often expressed not as single words but in noun compounds. This behavior is also noticeable in product descriptions. Moreover, attribute-value pairs tend to occur together in a single noun compound with value occurring first followed by the attribute at the head noun. For example, in the phrases “LCD display”, “CMOS Sensor”, the attributes are occurring at head noun (display, sensor) and are immediately preceded by values (LCD, CMOS). So the chance of finding an attribute decreases with its distance from the head noun.
Fig. 3. Sample sub-graphs
In order to capture these patterns, we construct a directed graph Gd : (Vd , Ed ) from all the noun phrases found in the descriptions. Each distinct token ti found in these phrases constitutes a node i ∈ Vd in the graph. And for each token ti preceding tj in a noun phrase, we draw an edge (i, j) ∈ E from i to j i.e an outlink from i and an inlink to j. Since a head noun is not followed by any other tokens as shown in Fig 3, an attribute node should have more number of inlinks and less number of outlinks. From each word cluster C, we pick the node a with the maximum difference between inlink and outlinks (Equation 1). The token ta represented by this node a is selected as the attribute if has a minimum support Sa of 0.5. Support is defined in Equation 2. We do not pick any attribute from cluster C if Sa < 0.5. a = argmax (inlinks(i) − outlink(i)) where i ∈ C i
(1)
A Graph Clustering Approach to Product Attribute Extraction
Sa =
inlinks(a) − outlinks(a) inlinks(a)
7
(2)
If all the inlinks to the node a are from a single node b, then we take the bigram tb ta as the attribute instead of ta and similarly we take the trigram tc tb ta as attribute if b has all the inlinks from a single node c. This helps us in extracting multi-word attributes like wood construction, pitch pipe etc. Digital SLRs Acoustic Guitars Kettles accessories tuning machines limited warranty* sensor chord chart spout resolution deluxe semirigid switch improved autofocus * rosewood fretboard * design style settings pitch pipe interior lcd monitor* finish housing screen protectors length shutoff display strings gauge image stabilization wood construction* quarts image retouching bag lid image processor strap button pouring optimization functions zipper closure windows lens care nut width indicators hdmi output fingerboard capacity noise reduction top filter ccd neck handle gadget bag package plastic tripod wase frets cord storage * indicates a partial match Table 1. Sample attributes extracted for 3 products
4
Experiments
The data used for our experiments are collected from the Amazon website1 . We obtained datasets for 12 products which include Acoustic Guitars, iPods, Binoculars, Ceiling Fans, Kettles. Each dataset consists of 50 texts descriptions of each product. A product description is a text document representing a product belonging to that product class. A description typically contains 6 to 10 incomplete sentences. The length of these incomplete sentences vary from long to very short. Sometimes, a sentence could be just be one single noun phrase. These incomplete sentences explain different features of the product represented by the description. We evaluate the performance of attribute extraction based on the 1
http://www.amazon.com
8
Santosh Raju, Praneeth Shishtla, Vasudeva Varma
precision and recall of the attributes extracted by our system. For this purpose we manually created a list of attributes for each product and use it as the gold standard list/phrases for our evaluation purposes. Precision and Recall Measuring the precision and recall is not a straight forward job. Consider the phrase: 3x optical zoom. Here, both zoom and optical zoom could be considered as attributes. People often do not agree on what the correct attribute is. So We use the paradigm of full match and partial match presented by [9]. Full match and partial match of an attribute are defined as follows: A match is considered fully correct if the phrase completely matches with a phrase in the gold standard list. A match is partially correct if the automatically extracted attribute completely contains any of the manually extracted attributes. Any attribute that forms a full match or a partial match considered recalled. Table 1 shows sample attributes extracted from the product descriptions of Digital SLRs. Attributes with * indicate partial matches with the gold standard attribute list. It is evident that our algorithm is efficient in extracting both single word(resolution, sensor, strings etc) and multi-word attributes(image stabilization, noise reduction, strap button, ). Table 2 shows the precision, recall values of the system for 12 products. We observe that precision of the system is decent for most of the products. Our system was able to achieve a precision of 79% and a recall of 45% for 50 input descriptions. Products Full Precision Partial Precision Acoustic Guitars 81.4 90.9 iPods 37.5 65.2 Binoculars 36.6 92.8 Camcorders 40.0 68.7 Ceiling Fans 48.1 92.3 Deep Fryers 57.1 93.3 Digital SLRs 41.8 77.2 Dome Cameras 73.3 100.0 Ice-Cream Machines 43.7 63.6 Kettles 56.6 84.6 Violins 51.8 80.9 Electric Guitars 42.0 50.0 Overall 48.2 78.7
Recall 60.8 45.0 36.3 38.0 62.0 47.9 39.4 34.0 32.0 47.2 58.8 53.5 44.9
Table 2. Average precision, recall values for various dataset sizes
Table 3 shows the performance of the system for various data set sizes. The system is able to perform well in terms of precision(70.79%) even for small data set containing 10 documents. The recall for 10 documents is very less(10.69%) and this is because of the fact that the evidence provided by the small data set is
A Graph Clustering Approach to Product Attribute Extraction
9
insufficient. As the data set size is increased from 10 to 50, we see a considerable increase in the recall of the system. As the number of descriptions increased, more evidence is being provided which resulted in extraction of more number of attributes. Our system is scalable and would be performing much better on larger datasets.
10 Full Precision (Full Match) Partial Precision (Full & Partial Match) Recall
Dataset size 20 30 40
50
45.83 50.0 47.58 47.73 48.26 70.79 70.99 76.04 73.60 78.73 10.69 20.37 30.05 35.99 44.99
Total Attributes Extracted 73
137
198
235
286
Table 3. Average precision, recall values for various dataset sizes
5
Conclusions
In this paper, we present a novel approach to the product attribute extraction problem using graph based methods. We presented a graph representation for the descriptions and used graph clustering methods to find the attributes. Our experiments proved that the proposed techniques are capable of achieving good accuracies. A preliminary error analysis of the results showed that results could be significantly boosted by applying simple pruning techniques along with the current methods. We plan to extend this work to extract values along with the attributes. The graph based representation of product descriptions can be useful in deriving the relationship between attributes and thus facilitates analysis of various products. As part of future work, we plan to perform experiments by changing the textual genres in order to check effectiveness of the proposed methods. Finally, we would like to study better the advantages and also the shortcomings of our extraction method by comparing its performance with existing supervised methods.
References 1. Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retr. 2(4) (2000) 303–336 2. Wu, Y.f.B., Li, Q., Bot, R.S., Chen, X.: Domain-specific keyphrase extraction. In: CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management, New York, NY, USA, ACM (2005) 283–284
10
Santosh Raju, Praneeth Shishtla, Vasudeva Varma
3. Tomokiyo, T., Hurst, M.: A language model approach to keyphrase extraction. In: Proceedings of the ACL 2003 workshop on Multiword expressions, Morristown, NJ, USA, Association for Computational Linguistics (2003) 33–40 4. Wiebe, J., Wilson, T., Cardie, C.: Annotating expressions of opinions and emotions in language. Language Resources and Evaluation 1(2) (2005) 0 5. Balahur, A., Montoyo, A.: Multilingual feature-driven opinion extraction and summarization from customer reviews. In: NLDB ’08: Proceedings of the 13th international conference on Natural Language and Information Systems, Berlin, Heidelberg, Springer-Verlag (2008) 345–346 6. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, New York, NY, USA, ACM (2004) 168–177 7. Popescu, A.M., Etzioni, O.: Extracting product features and opinions from reviews. In: HLT ’05: Proceedings of the conference on HLT and EMNLP, ACL (2005) 8. Scaffidi, C., Bierhoff, K., Chang, E., Felker, M., Ng, H., Jin, C.: Red opal: productfeature scoring from reviews. In: EC ’07: Proceedings of the 8th ACM conference on Electronic commerce, New York, NY, USA, ACM (2007) 182–191 9. Ghani, R., Probst, K., Liu, Y., Krema, M., Fano, A.: Text mining for product attribute extraction. SIGKDD Explor. Newsl. 8(1) (2006) 41–48 10. Raju, S., Pingali, P., Varma, V.: An unsupervised approach to product attribute extraction. In: ECIR ’09: Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, Berlin, Heidelberg, Springer-Verlag (2009) 796–800 11. Watts, D.J.: Small worlds : the dynamics of networks between order and randomness. (1999) 12. i Cancho, R.F., Sol, R.V.: The small world of human language. Proceedings of The Royal Society of London. Series B, Biological Sciences 268 (2001) 2261–2266 13. Biemann, C.: Chinese whispers - an efficient graph clustering algorithm and its application to natural language processing problems. In: Proceedings of TextGraphs: the Second Workshop on Graph Based Methods for Natural Language Processing, New York City, Association for Computational Linguistics (2006) 73–80