Sep 30, 2008 - metadata are symbolic descriptions of some content, e.g., web content, with ..... Internet http://www.boemie.org/system/files/D3.5 v2.1.pdf.
The BOEMIE Semantic Browser: An Application Exploiting Rich Semantic Metadata Sofia Espinosa Peraldi, Atila Kaya, Ralf M¨oller Abstract: Recent advances in the standardization of knowledge representation languages to realize the Semantic Web as well as advances in natural language processing techniques have resulted in increased commercial efforts to create so called annotation services which aim to annotate web content with metadata. Description Logics-based metadata are symbolic descriptions of some content, e.g., web content, with semantics defined through so called ontologies. Ontologies play a central role for the definition of semantics in the context of the Semantic Web. Currently, annotation services can be exploited to build content-driven applications. However for such applications to offer more valuable functionality, current annotation services should provide for richer metadata. Rich metadata describe a deeper and more abstract level of information through complex relational structures. In this paper we describe use cases that require richer metadata than currently provided by annotation services and describe a contentdriven application that implements them. The use cases focus on geography-aware multimedia navigation, content activation and retrieval of semantically related multimedia. Furthermore a declarative process to extract richer metadata is introduced.
1
Introduction
The Semantic Web vision promises to pave the way for the development of content-driven applications and services. To this end the semantics of content in the Semantic Web should be available. Nowadays the majority of information reside in web pages. Typically a web page has rich media content and includes visual and/or audio information in addition to textual information. In the last decade, information retrieval from the Web has become the underlaying basis of daily information access. At the same time, natural language processing techniques have reached a high level of maturity such that text analysis tools with good performance become widely-used in practical applications. Analysis tools such as Ellogon1 [FPT+ 08] provide a comprehensive infrastructure for natural language processing (NLP) including support for principal tasks such as information extraction (IE) and machine learning (ML). Facilitated by these advances, commercial interest arose in the creation of annotation services. Companies such as ClearForest2 and Reuters3 provide annotation services e.g., OpenCalais, about various entities such as persons, organizations, locations as well as facts, events and generic relations from textual content. Currently some information portals already contain web pages that exploit metadata for highlighting particular words in the text or presenting advertisements. These can be considered 1 www.ellogon.org 2 www.clearforest.com 3 www.reuters.com
as forerunners of upcoming more ‘intelligent’ content-driven applications that can exploit rich metadata to offer more valuable services. The sucess of current annotation services represent a positive indicator of progress towards the objectives of the Semantic Web. But the metadata obtained by current annotation services is not enough for content-driven applications to provide for more valuable functionality. For this, richer metadata is required. Rich metadata means symbolic descriptions that describe a deeper level of information i.e. more abstract entities and relations among those. Rich metadata is also multimodal, thus describes the content of multimedia documents. This paper has two main contributions: First we describe use cases for ‘intelligent’ content-driven applications that exploit richer metadata automatically extracted from multimedia documents. Furthermore, we describe the BOEMIE Semantic Browser (BSB), a content-driven application that implements the use cases as a proof of concept for the practical exploitation of rich semantic metadata. Second, to better understand what is meant with rich semantic metadata, we describe a framework for the extraction of rich semantic metadata. The rest of this paper is organized as follows: In Section 2, the approach followed for the generation of rich semantic metadata is introduced. In Section 3, use cases to demonstrate the advantages of exploiting rich semantic metadata are discussed together with examples and their implementation in BSB. The architecture underlying the BSB is the topic of Section 4. Finally, we conclude this work in Section 5.
2
An Ontology-based Framework for the Extraction of Rich Semantic Metadata
The framework for the extraction of rich semantic metadata that is described here is called the BOEMIE framework. The acronym BOEMIE means Bootstrapping Ontology Evolution with Multimedia Information Extraction. The framework integrates a set of modalityspecific analysis tools and a modality-independent interpretation and fusion engine. In this framework, a multimedia document is processed along three phases of metadata extraction. Consider that extraction techniques are not the central part of this paper therefore references are give where necessary. First phase: Analysis In the first phase, modality specific analysis takes place. Here the structure of the multimedia document is analyzed to identify the different modalities of content that a document contains such that the document is divided into modality specific content items, e.g., a web page containing text and two images is divided into three content items, one text item and two image items. Each content item is analysed with the corresponding modality-specific analysis tool. The analysis tools generate symbolic descriptions as metadata, whose semantics are defined through ontologies. The framework integrates three different OWLDL ontologies, two which are domain specific, these are the Athletics Event Ontology (AEO)4 [DEDT07] and the Geographic Information Ontology (GIO)5 [DEDT07] and one 4 http://repository.boemie.org/ontology 5 http://repository.boemie.org/ontology
repository tbox/aeo-1.owl repository tbox/gio-1.owl
more to address the structural aspects of a multimedia document, the Multimedia Content Ontology (MCO)6 [DEDT07]. The resulting metadata describes surface-level information. Surface-level information consists of observable entities, e.g., objects in visual content or words in non-visual content and observable relations between the entities, e.g., spatial relations between objects in an image or domain-specific relations between words in text. For a better illustration consider the captioned image in Figure 1, its analysis results consist of two sets of symbolic descriptions one from the image content item and one from the textual content item. The results can be observed in the upper part of the graphic in Figure 2.
Figure 1: A captioned image from a web page about athletics news
Notice that the term observable refers to the extraction capabilities of the analysis tools, for instance, in this example image analysis can extract a person’s face and a person’s body, but more sofisticated image analysis techniques with face recognition could extract the name of the person. In [PTK+ 06] the analysis tools are discussed in more detail. Second phase: Interpretation In the second phase, modality specific interpretation takes place. The interpretation process which is described in [PKM+ 07] with more detail, is a generic and declarative process that can be used for various domains of interest. The domain of interest is described through ontologies and rules which constitute the background knowledge of the interpetation process. The process uses the Description Logics (DLs) reasoner RacerPro7 to execute abductive and deductive reasoning over the results of the previous phase, such that metadata describing more abstract information is obtained. Abstract information is what we call deep-level information. The metadata is expressed through symbolic descriptions that we call aggregates. Aggregates have been defined by Neumann and M¨oller [NM06] in the context of DLs as representational units for object configurations consisting of a set of parts tied together to form a concept and satifying certain constraints 6 http://repository.boemie.org/ontology 7 www.racer-systems.com
repository tbox/mco-1.owl
Figure 2: A graphic representation of rich semantic metadata resulting from analysis, interpretation and fusion extraction phases of Figure 1
w.r.t. a background knowledge. In our athletics example (see the central-right part of the graphic in Figure 2), from image interpretation, two aggregates are extracted a PoleVault trial and a Person. The aggregate PoleVault has three parts, the aggregate Person and two sports equipment, a HorizontalBar and a Pole. The aggregate Person has two parts, a PersonFace and a PersonBody. Thus the results of interpretation is a set of relational structures that represent aggregates that have as parts other aggregates. These configuration of relational structures has a specific advantage, namely that it allows information to flow from aggregate to aggregate according to the semantics described in the background knowledge. Such that information flow contributes to increase the precision of the extracted metadata. For example, consider the following formula from the AEO ontology, P oleV ault v SportsT rial u ∀