understanding should provide us with clear conceptual identification of an ... information and detail of surrounding objects. For ... Wikipedia, launched in 2001, is an online wiki- based encyclopedia. ... A limitation of this method is that it relies.
2009 Ninth IEEE International Conference on Advanced Learning Technologies
A Practical Model For Conceptual Comparison Using A Wiki David Webster, Jie Xu University of Leeds {dwebster/jxu}@comp.leeds.ac.uk
Darren Mundy, Paul Warren University of Hull {d.mundy/p.j.warren}@hull.ac.uk to this pattern, Wikipedia provides the ability to create redirects for synonyms or items that have been renamed. Wikipedia, therefore, provides a controlled vocabulary in addition to the one URI per concept. Using Wikipedia, we can develop a method to access and probe Wikipedia concept pages.
Abstract One of the key concerns in the conceptualisation of a single object is understanding the context under which that object exists (or can exist). This contextual understanding should provide us with clear conceptual identification of an object including implicit situational information and detail of surrounding objects. For example in learning terms, a learner should be aware of concepts related to the context of their field of study and a surrounding cloud of contextually related concepts. This paper explores the use of an evolving community maintained knowledge-base (that of wikipedia) in order to prioritise concepts that are semantically relevant to the user's interest space.
2. Existing Relatedness
in
Wiki
Concept
In recent years there has been a small body of work that has utilized Wikipedia in order to measure concept similarity. Our first example of this work comes from Strube and Ponzetto [2] who use Wikipedia to aid in word sense relatedness calculation. To achieve this they used the category taxonomy provided by Wikipedia and standard tree-based semantic similarity measures. A limitation of this method is that it relies on the taxonomy structure of the Wikipedia categories. A second method in wiki-based semantic similarity comes from Gabrilovich and Markovitch [3]. Their work is based upon text categorization by assigning a document to a concept label(s) that is represented by that document. The motivation behind this technique is that they can infer related concepts based upon a concept. The technique proposed performs a statistical analysis on the words used within a Wikipedia page and uses an inverted index to map words to the articles in which the concept word appears. Their work pays limited attention to the linked structure of Wikipedia in that the number of incoming links express a preference for the article. The work proposes the spidering of related articles to the sourced article to enrich the article and further acknowledges that relations can be derived between concepts through leveraging the cross linking between wiki concepts and suggests investigation for further work.
1. Introduction - Wikipedia Structure Wikipedia, launched in 2001, is an online wikibased encyclopedia. Wikipedia allows web users to create pages for concepts along with the ability to edit pages and interlink pages. Wikipedia allows the democratic social creation and editing of content by web users. Whilst the accuracy of Wikipedia articles may be disputed [1], there is at least a social process for creating a socially challenged account of the concept being written about in the page. An important focus for our work relates to the fact that Wikipedia provides an information space for articles based on a concept , for example an article for telephones called Telephone. This pattern can be described as "one URI per concept", whereby concepts can be consistently represented by a URI. It is important to note that this pattern bears close resemblance to the pattern present in Semantic Web languages, for example RDF, in relation to the identification of resources. Wikipedia provides a consistent access pattern for concept pages to allow the dereferencing of the URI. Taking the telephone example, the resource URI for the concept would be "http://en.wikipedia.org/wiki/Telephone". In addition 978-0-7695-3711-5/09 $25.00 © 2009 IEEE DOI 10.1109/ICALT.2009.209
Work
3. Our Proposed Solution Our method is based upon the previous notion in Section 2 of spidering from a Wikipedia URI a graph 728
for related concepts. The theory behind this method is that the set of concepts that are immediately linked to by a focus concept are more contextually related to that focus concept than concepts further away in the concept graph. We can, therefore, aggregate these concepts and apply them, in addition to the focus concept, to a set of keywords used in information filtering in order to identify related concepts. A limitation of this method when viewed in the context of Wikipedia articles is that all linked articles are aggregated without discrimination. This method has the potential of capturing a broad and shallow collection of related concepts, which we demonstrate later in the paper to be useful for contextual filtering.
read and, if needed, redirects will be resolved so that the page name can be retrieved. Experiment D - This experiment builds on Experiment C and explores the effect of spidering to two levels deep on the retrieved concepts. Due to computation resources it is not feasible to spider more than one layer deep with the whole page but, rather the concept scanning is limited to the introduction section.
5. Findings and Further Work On analysis of the Recall and Fallout averages across filtering methods, we observed the following features of the filtering methods: • All methods tested improved the recall for the filtering of news topics over the baseline. • Spidering the Wikipedia page introduction section to 2 levels deep produced an improvement in recall over the 1 level equivalents. However, this method produced a higher fallout average value than all other methods tested. • Partitioning the Wikipedia page improved the fallout of the filter over the full page equivalent filter. However, this partitioning produced a recall value only marginally improving on the baseline keyword filter and below that of their full page equivalents. Our findings in our research demonstrate the effectiveness of our concept expansion method based on a wiki. There is scope for improvement through further work in the development of a Wikipedia page partitioning optimisation strategy in addition to weighting concepts based on their spider depth.
4. Evaluation of Method A simple web feed aggregator has been built using the Java programming language that we call DAVe's Rss Organization System as a testbed environment to demonstrate specific tests used within this investigation. The idea behind the experiments is to demonstrate through news feed items that the wiki tool can provide a framework to aid in concept comparison. In order to evaluate the effectiveness of the method we focus on the "the recall of the system … the proportion of relevant material actually retrieved in answer to a search request" [4] and the fallout, "an estimate of the conditional probability that an item will be retrieved given that it is non-relevant" [4]. Four experiments were designed to test the solution. Experiment A - The control test was constructed from the selection of a news feed for a given focus topic/concept. To assist in creating a control measure for the baseline, we created a filter of the keyword in order to discard news titles that do not contain the focus concept keyword. The number of news titles returned forms a baseline that subsequent experiments are compared against. Experiment B - Our second experiment involves parsing links from the Wikipedia page for a given topic. For a given topic, all the links contained within the Wikipedia page will be captured from the wiki concept markup and converted into keywords. Note, that for this experiment, the pages are not spidered and re-directs ignored. Due to the low resource usage of this experiment, we refer to this as a shallow filter. Experiment C - In this experiment we use the experimental template from Experiment B but instead of performing a shallow text based extraction of concepts for Wikipedia concepts, we attempt to resolve (dereference) the concepts and extract the title from the resolved pages. For each link, the target page will be
6. References [1] M. Hepp, K. Siorpaes, and D. Bachlechner. Harvesting wiki consensus: Using wikipedia entries as vocabulary for knowledge management. IEEE Internet Computing, 11(5):54–65, 2007. [2] M. Strube and S. P. Ponzetto. Wikirelate! Computing semantic relatedness using wikipedia. AAAI ’06, pages 1419–1424, July 2006. [3] E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: enhancing textcategorization with encyclopedic knowledge. In TwentyFirst AAAI Conference on Artificial Intelligence, 2006. [4] C. J. Van Rijsbergen. Information Retrieval, 2nd edition. Dept. of Computer Science, University of Glasgow, 1979.
729