Creating portals using light-weight ontologies: a ... - Semantic Scholar

12 downloads 16256 Views 225KB Size Report
based on XML). We have scrapped such a light-weight ontology from the site of a holiday .... Navigation What navigation structure(s) expresses best the content of the portal? A ..... data to a target web-site have to be coded by the site builder.
Creating portals using light-weight ontologies: a transformational approach Marta Sabou Vrije Universiteit Amsterdam Abstract. Creating web-portals based on semantic representations is an active field of development these days. However, it is expensive to create high-quality semantic data, and it is difficult to create portals that satisfy a variety of user profiles. We propose a transformational method for building portals based on cheaper, light-weight ontologies. Our transformations use frequent ontological patterns to organise the content of portals supporting a variety of user needs and tasks. This paper presents the first results of experimenting with the method in an industrial scenario.

1 Introduction With the advent of the information age, finding data in large and heterogeneous digital collections, especially on the web, is increasingly difficult. Several initiatives address this issue. The Semantic Web proposes to push the load of processing and finding information to computers by making web-data machine accessible [1]. Ontologies, defined as formal, explicit specifications of shared conceptualizations [2], play a key-role in producing the metadata. However, it is difficult and time-consuming to create high-quality (rich and rigorously formal) meta-data. It is unrealistic that semantic data will be manually created, therefor significant effort is done in the direction of (semi-) automatic ontology building. Such ontologies will be light-weight: restricted to hierarchies of terms, with very few formal definitions of the classes, and very few properties between the classes. See [6] for a survey of AI methods that support automatic extraction of ontologies. Portals, as logically organized gateways to large information sources, are a fashionable solution to information retrieval on intranets and on the web. The main issues in portal creation concern data organization and choosing navigation so that a variety of user profiles can be satisfied. Ontological meta-data can help in providing the content of the portals and deciding upon their structure and navigation [7, 10]. In the light of these two issues, we propose a portal building methodology that is tolerant to low ontology quality. This is achieved by transformations that use meta-data patterns to create several portal types for different user tasks. This paper reports on our first results obtained when integrating the idea in an industrial environment provided by Aidministrator1, a leading company in Semantic Web development. Section 2 describes the targeted scenario and the proposed method that relies on the use of light-weight ontologies (section 3). Section 4 shortly describes the technology used for implementing a set of transformations (section 5). The last three sections conclude the user test, discus related work and offer a summary of the paper. 1

http://www.aidministrator.nl.

2

Marta Sabou

2 The Method Our target organizations posses large collections of electronic documents ranging from webpages describing their activity and products, to internal reports and administration files. They need an easy to integrate solution that provides the right user with the right information and in the proper presentation form. They also have light-weight knowledge structures or legacy information usable for automatic ontology extraction [6]. This cheap meta-data can be the basis for creating portals with different content and navigation and allowing access to already existing resources. For example a theater, a holiday agency, a job agency, a bank they all have both their data items and light-weight meta-data readily available. As a solution to this scenario, we have developed a transformational approach for building portals using light-weight ontologies. We aim at developing a set of transformations that exploit often occurring ontological characteristics and patterns in order to organize portal content. It is important that the portals support a variety of user tasks. We took an empirical approach in implementing the method by integrating it in a realistic application (section 4). First we analyzed some light-weight ontologies and identified common characteristics (section 3), then defined a set of transformations (section 5). By applying them on different ontologies and evaluating the usability of the portals in user tests, we gathered new ideas for other transformations and about ontology patterns. The longer term goal, at one hand, is to collect and group ontology patterns. On the other hand we wish to enlarge and enhance transformations so that the method covers a large set of ontologies and offers a variety of portal types. We strive to identify essential modules that can be combined so that new transformations are created. This paper presents in a structured way the results of our experimental work. This is a first stage in structuring our knowledge about the main components of our approach: ontology patters and transformations. 3 Light-weight Ontologies Uschold and Jasper [11] classify ontologies along two dimensions defined by the amount of contained meaning and the formality of the representation. According to this framework our ontologies are light-weight, containing a reduced meaning: a set of concepts related by hierarchical relations (taxonomies). The relationship usually denotes specialization, but often is a conglomerate of part-of and similar-subject-matter [11], making the ontologies sloppy, i.e. unfit for inference. Ambiguity is increased by the fact that the ontology mediates human to human communication, therefore a common understanding of concepts is assumed. These taxonomies are semi-formal, being expressed in an artificial, formally defined language (often based on XML). We have scrapped such a light-weight ontology from the site of a holiday agency. It classifies the offers of the agency.We will use it as example in this article. Even in such light-weight ontologies, we can distinguish a number of frequently occuring patterns. We work with instantiated taxonomies that contain information both about the domain in general and about the particular data set. Vocabulary and hierarchy information is available about the domain.

Creating portals using light-weight ontologies: a transformational approach

3

For our example, in Figure (1), the classes of the ontology provide the terms used by the holiday agency while their place in the hierarchy adds extra meaning to their common sense semantics: now ”*” is not denoting multiplication but shows the quality of a room. Some ontologies accommodate several orthogonal classification dimensions. These are usually modeled by stating the classification criterion and then arranging the values of the dimension as its descendants. A criterion term is a modeling artifact and does not reflect any characteristics of an item. In our example, four dimensions are modeled, each introducing a criterion term. The values under Naar land (By country) denote the destination of the offers. The other dimensions refer to the characteristics of the accommodation such as: the number of rooms (Naar kamers), the quality (Naar klasse) and the number of persons (Naar personen). Ontology classes can be incomplete and overlapping. A class is incomplete when it is not covered by the union of its subclasses. This means that not Figure 1: all its data items could be further classified. Classes that share items, but are The example not in a hierarchical relationship, overlap. We refer to a set of data items all ontology. belonging to the same set of classes as a cluster. The size of these clusters can be determined from the ontology. The next section illustrates these terms. We group the terms discussed above in: ontology information (vocabulary V, hierarchy H, membership M, overlap O, size S) – contained in all ontologies ontology patters (classification dimensions, criterion term) – only in certain ontologies. We use them as a start-up set in our developments for mapping them to portals and describing new patterns. 4 The Technology The transformations are embedded in a tool designed for content managers, persons that are familiar with the data of an organization and have to provid the right user with the right information. Using this tool, a light-weight ontology classifying documents, can be visually represented and analyzed. The content manager can build up his own transformation by combining predefined modules (section 5). This will determine the way information is structured. Different navigation mechanisms are also available. The resulting portal is automatically generated: it renders a data set according to the predefined transformation and uses the selected navigation. The portal generation and the visualization software modules are developed at Aidministrator. Our portals consist of hierarchically organized web-pages, called sections. Each section has a name and contains various data items (text fragments, images or hypertext links). A textual tree (such as in Figure 1) is the default navigational aid. Three levels of information abstraction coexist in our portals: a first (global) level accommodates the navigational structure and allows access to the second (intermediate) level. The intermediate level contains collections of pointers to data items that share certain characteristics (sections). Finally, the links of the second level lead to the third level, the data itself. We are concerned with the creation of the first two levels, while the the information provider supplies the data items (web-pages, documents) and the light-weight semantic data describing them.

4

Marta Sabou

The Cluster Map visualizes the objects of a number of selected classes from a hierarchy, organized by their classifications. The maps were originally developed for data analyses, but support other user tasks as well [5]. The main elements of the graphs are illustrated in Figure 2 that shows a collection of job offers organized according to a very simple ontology. Each small sphere represents an offer. The big spheres represent ontology classes, with an attached label stating their name and cardinality. Figure 2: A Cluster Map. Directed edges connect classes pointing from specific to generic (e.g. IT is a subclass of Job Vacancies). Balloon-shaped edges connect objects to their most specific class(es). All five information types discussed in section 3 are present on the map. The vocabulary terms of the domain and their hierarchy are easily visible. Class characteristics such as incompleteness and overlap are shown: Job Vacancies is incomplete, IT overlaps with Technology but not with Management. The cardinality of the classes is attached to their name, and is also visually shown for all the six data clusters. 5 The Transformations A transformation divides the data of the ontology among the sections of a portal. It consists of three modules, each concerned with a separate design issue: 1. Focus How to partition the data? Two partitioning criteria co-exist in any light-weight ontology: by class and by cluster. The first classifies the original data set among the classes of the ontology. Overlapping sets are obtained. The second criterion groups items all belonging to the same classes together, so that non-overlapping sets are created. Accordingly, we have implemented two sets of transformations focusing on classes (section 5.1) and clusters (section 5.2), respectively. 2. Content What is the relation between the content of hierarchically related sections? We distinguish two possible relations. In an inclusive relation a section includes all the elements of its subsections. Opposite, an exclusive relation excludes all data that is contained in any of the subsections, retaining only those data-elements for which the current section is the most specific. 3. Navigation What navigation structure(s) expresses best the content of the portal? A navigation structure reflects the structure of the data. It has to be intuitive and easy to use. We have considered both textual and graphical navigation solutions. We describe the transformations grouped by their main focus. For each we explain the different content variants and the available textual navigation. Graphical navigation is described in a separate section. We discuss which patterns are problematic, and which portals support which user tasks (as a result of experimenting with real-life data and conducting a user test).

Creating portals using light-weight ontologies: a transformational approach

5

5.1 Focusing on Classes The class-to-section method implements a straightforward strategy: it settles a one to one correspondence between ontology classes and portal sections, both by name and hierarchy relationship. A textual navigational tree reflecting the structure of the ontology (Figure 1) allows access to the data. This approach is taken by many web portals (e.g. Yahoo! 2 , ODP 3 ). Through the inclusive variant of the method, a portal section reflects all the elements of the corresponding class, even if they can be further classified in sub-classes. The redundancy introduced is useful if the user is not so familiar with the domain and he wants to hierarchically focus on specific instances, by gradually descending the type hierarchy (unfolding the tree on the left until the screen on the right contains the right objects). Criterion terms are problematic. Usually being high in the hierarchy they cover a large set of the data but they posses little self-contained semantics (Naar land). The corresponding portal sections contain a large set of items with no specific meaning. The larger the data set, the more irritating the presence of criterion terms. By applying an inclusive class-to-section transformation to our example data set, in the resulting portal the sections corresponding to the four criterion terms contained the whole data set (100 data items). The exclusive variant locates data items only in their most specific class. This is interesting when the user already knows in which class he is interested in, so he can quickly navigate to that point. This is different from the first scenario, in which the user’s browsing behavior in the tree on the left is regulated by the instances on the right. A characteristic is that complete classes produce empty sections. In this case the user needs to navigate deep into the structure before finding any data items. 5.2 Focusing on Clusters The following methods map clusters to portal sections. Of particular interest are clusters with elements belonging to more classes (overlaps). Building textual navigation for such cases is not trivial. Creating names of sections by simply joining the names of the overlapping classes would lead to complex, long names. The hierarchical access method uses the hierarchy to express the overlaps. A cluster with elements belonging to n classes, will be accessible through all the paths that can be constructed by the n! permutations of the n class names.  For  three classes A, B, C that overlap, the content of the sections in the hierarchy branch A B C varies depending on the relation between the content of hierarchical sections: Section



   

Exclusive 



      

Inclusive



    

Table 1: Section content for inclusive and exclusive relations.

The attractive feature is that the same information can be accessed through different paths, depending on the priorities of the user. Different content relations suit different users: exclu2 3

http://www.yahoo.com http://dmoz.org

6

Marta Sabou

sive relations for users familiar with the domain and inclusive relations for users not familiar with the domain. Because the hierarchical access method builds on data features, it is very sensitive to them. The number of overlaps and their degree of overlap influence the depth and the breadth of the navigation tree, which easily exceeds the optimal sizes for easy navigation (3 levels deep/50 nodes broad) predicted by [9]. Class names with meaning dependent on their context (”1” has significance as a subclass of Naar kamers) introduce unclarity in the path names (ex. Vakancies/1/* denotes either all holiday offers with one first class room or all offers for one person in a first class accommodation). The dimensional transformation method populates a predefined structure that reflects nesting of classification dimensions. It is built for ontologies that contain a set of dimensions, each specified by a criterion term and having a single level of values. Items are classified in a predefined order of the dimensions. The generated structure will list the name of the first dimension and at the next level all its values. The items for each value are classified according to the values of the second dimension, then all the values of the second dimension by the values of the third etc.. Figure 3 shows the structure created for two dimensions of Figure 3: A dithe example, Naar klasse and Naar land. The users are guided in their search mensional transthrough the ordered and easy to follow path structure. formation. By certain ordering of the dimensions users can be forced to follow a certain path. For example, the location criterion before the quality, lets users chose for attractive holiday destinations before thinking about the price. Intersecting too many or too broad dimensions can cause large hierarchies. 5.3 Adding Graphical Navigation Besides textual navigation, we have also experimented with graphical navigation using the Cluster Map technique. Two basic approaches were developed and applied for different portals: Site Map and Leveled Navigator. The principle of the Site Map (SM) is to show an overview of the data contained in a portal and to allow access to entities at the intermediate level. It is a secondary navigational aid, co-existing with a textual tree. It is only shown at the user’s request and mainly has the goal to let him analyze and understand the domain. In portals based on our class-to-section method, a ”Site Map” section contains a map of the data, concept names being hyper-links to their corresponding sections. When accessing a section through the Site Map, the textual tree also unfolds, showing the context of the current position. As data items are connected to their most specific class, maps used for exclusive type portals can intuitively explain why certain sections are empty: they also have no items linked on the image. For portals created with our cluster-based methods, Site Maps are implemented in the same way, the click-able entities being not concepts but data clusters. If there is not a one to one correspondence between the elements on a map and the portal sections, it is not possible to use that map as a Site Map: the consistency between data architecture and navigation would break. This is the case for the dimensional mapping and also the inclusive hierarchical mapping. Also, because all the data is represented on a single map, data size and complexity can easily cause cluttered images.

Creating portals using light-weight ontologies: a transformational approach

7

The Leveled Navigator does not show the whole domain at once but allows a gradual zoom in (Figure 4). It is used as a primary navigational mechanism. This strategy applies in the case of portals focusing on classes where it allows the user to hierarchically focus on more specific classes. The top page, shows a top view of the ontology. In each section a map of its neighborhood exists showing his parent (except the root) and its subclasses (if any). A textual version of this strategy is frequently used in portals. Too many sub-sections at one Figure 4: A Leveled Navigator. level are difficult to visualize. By using graphical navigation all aspects of the domain are easy to depict visually. They support browse, search and analyses giving an insight in the domain. However, they are sensitive to data features (size and complexity) and are not immediately intuitive to novice users. 5.4 Overview Table 2 gives a structured summary of sections 5.1-5.3. It lists all possible portal types obtained by the combination of available modules. The +/- signs show if the portals preserve or not the five ontology information types. We point out the best-supported user profile. A set of conclusions, for each module, follow from sections 5.1- 5.3: 1. Focus While all portals reflect most of the ontological information, membership and overlap are complementary: it is hard to build a portal that focuses on both aspects. 2. Content Exclusive and Inclusive content relations support different navigational behavior. In the first case the user’s navigation is conditioned by the data set. The user being unfamiliar with the domain decides his next step based on the currently viewed items. In the second case, an expert user performs a more goal oriented search by knowing already the section he aims for. He does not need redundant information to decide his path. 3. Navigation Textual navigation imposes some difficulties in expressing overlaps. Maps are intuitive to show data but are not usable for certain portal structures, nor for large and complex data sets. However, they allow analyses of the domain.

Marta Sabou

8 Focus

Class

Method

ClassToSection

Cluster Hierarchical Dimensional

Content

Navigation Inclusive TX SM LN Exclusive TX SM LN Inclusive TX Exclusive TX SM Inclusive TX Exclusive TX

V

H M

O S

User task

+ + + + + + + + + + +

+ + + + + + + + + + +

+ + + + +

Browse beginners Hierarchical zoom Analyze/browse Browse Hierarchical zoom Analyses /Browse Search

+ + + + + + -

+ + + + + + + + + + +

Search/Analyses Directed search

Table 2: Combinations of modules, express different ontological information and support different user tasks

Tests on several real-life data sets, showed that certain ontological patterns cause unpleasant results.Usually, discovering these patterns lead to the development of new transformations. Table 3 gives a structured, but still partial, overview of pattern-module relationships. Pattern Inclusive Criterion concepts (class-based) Exclusive Complete classes (class-based) Hierarchical Unclear ontology terms Many clusters/high dgr. overlap Dimensional Too many/broad dimensions Graph. nav. Large/complex data

Effect Large sections Empty sections Umbiguos names Complex structure Complex structure Cluttered image

Use instead Exclusive Use Site Map Dimensional Dimensional Textual navigation

Table 3: Different modules are efficient for different ontology patterns.

6 User Tests A user test involving 10 students provided useful input for further development. We used a set of portals created on two different data sets, with a range of transformations and navigational methods. We interviewed the users after they performed browse, search and analyses tasks. Our experience confirmed the opinion of usability expert J. Nielsen : users are biased by the way the web looks nowadays [8]. They admitted feeling uncomfortable with the new technology because it is very different from what they are used too, especially in two aspects. First, most users found it hard to work at the meta-level before actually seeing a data item. They tended to find a self-contained meaning in each term, instead of looking for it in the context. Therefore terms like ”1” were hard to interpret. Also, it is less common on the web to understand the structure of a site, therefore they tried to find out information from the data items themselves and not on a global level. Second, they did not expect to read the text on the graphic. On the web graphical navigation is reduced to symbols with simple meaning, not requiring analyses. It took some time

Creating portals using light-weight ontologies: a transformational approach

9

before they tried to click something different than underlined text. After getting used to the portals, users found them easy to navigate and enjoyed using graphics. However, they admitted they would prefer to use textual portals simply because they are used to them. Several reasons for their discomfort were revealed. Some ontology terms were unclear. Replacing 3 with Three Rooms, or * with One star room would ease their job. Also, the data item presentation at the intermediate level is very functional and dry. The lack of details makes it hard to understand what they represent. Another issue relates to presenting data types such as months and countries in a non-conventional way (ex. alphabetically instead of chronologically). This disturbed the users. Maps caused problems. Some users thought that they contained too much information and needed effort to be understood. The navigational conventions were not clear enough and the place of the map over and not to the left of the displayed data items was unusual. Furthermore, the graphical quality of maps dropped when scaled to fit in one window. 7 Related Work Portals on light-weight ontologies Most web directories (Yahoo!, ODP) are based on a large, light-weight, informal ontology that classifies web-sites. The corresponding webportals simply display the structure of the ontology. In our terminology, a class-focussed exclusive transformation is performed on the light-weight ontology to build the portal. The majority of portals use a leveled navigator principle, in a textual or a graphical form. In the textual Yahoo!, the immediate neighborhood of a category is always accessible. Two graphical portals exits for ODP. The first one 4 uses The Brain5 technology, which scales very well but fails to confer size and overlap information (only hierarchy and vocabulary information is used). The second one 6 is more sensitive to data features (shows size), but ignores overlaps. Portals on heavy-weight ontologies The SEAL [7] portal generator framework uses rich, formal ontologies as a basis for portal creation. It relies on the inference power of OntoBroker [3] to provide a range of services, and to generate the data level. The transformation from data to portal is done dynamically, by inferring the data item based on user request. For navigation only the top level of the ontology is used. The main difficulty with SEAL resides in acquiring the rich ontology. Solutions for data-intensive web-sites A pioneer approach to implementing data intensive web-sites was done by the Strudel system[4], which proposed the separation of the site’s data management, the specification of content and structure and the visual presentation. Strudel provides a declarative query language to specify a site’s content and structure and a template language to define the HTML presentation of the site. Therefore the transformations from data to a target web-site have to be coded by the site builder. The experience with Strudel showed that ”programmers resist to learn a new language” and that GUI based tools are preferred to coding, especially for tasks such as web-site building. Note that in our case the provided library of transformation modules, accessible through a GUI, significantly lowers the effort of getting used to the system. 4

http://www.webbrain.com http://www.thebrain.com 6 http://maps.map.net

5

10

Marta Sabou

8 Summary We have pointed out two facts that motivate our work. First, because high-quality, manually built ontologies are expensive, it is more frequent that light-weight ontologies are automatically created [6]. Second, portals, as gateways to large and heterogeneous data sources, could benefit from ontological meta-data for structuring their content and building their navigation. In this context, our method uses light-weight ontologies for portal creation. The idea is to discover patterns that frequently appear in these ontologies and to provide a set of transformations that map them to portals. We developed a variety of transformation modules and studied how their combinations behave for different data and how they support user tasks. These basic modules can be selected and combined through a GUI based tool. To our knowledge, there are no initiatives that build on transformations from light-weight ontology patterns to portals. Web directories only partially exploit a light-weight ontology to create their portals, by always using the same mapping. Meanwhile, systems based on rich ontologies use them for complex services but encounter problems in acquiring them. The transformations from data to portal rely on inference. In general, web-site management systems suffer from their usage-complexity, as the user needs to code his own transformations. Our system optimises these costs by (1) using cheap meta-data and (2) providing a predefined set of easy to use transformations. Our future work will concentrate on two directions. First, more concern is needed for the aesthetic presentation of the portals, mainly for a richer intermediate-level presentation and a better adaptation of maps for navigation. Second, we wish to apply the transformations to more ontologies so that new patterns and transformations are discovered. Acknowledgments Many thanks to M. Marcos, C. Fluit and F. van Harmelen for their help with this paper, and to all employees of Aidministrator for supporting this project. References [1] T. Berners-Lee. Weaving the Web. Harper, San Francisco, 1999. [2] W. Borst. Construction of Engineering Ontologies. PhD thesis, University of Twente, Enschede, 1997. [3] S. Decker, M. Erdmann, D. Fensel, and R. Studer. Ontobroker: Ontology Based Access to Distributed and Semi-Structured Unformation. In R. Meersman, editor, Database Semantics: Semantic Issues in Multimedia Systems, pages 351–369. Kluwer Academic Publisher, 1999. [4] M. Fernandez, D. Florescu, A. Levy, and D. Suciu. Declarative Specification of Web Sites with Strudel. VLDB Journal, 9(1):38–55, 2000. [5] C. Fluit, M. Sabou, and F. van Harmelen. Ontology-based Information Visualisation. In Vladimir Geroimenko, editor, Visualising the Semantic Web. Springer Verlag, 2002. [6] A. Maedche and S. Staab. Ontology Learning for the Semantic Web. IEEE Intelligent System, 16(2):72– 79, 2001. [7] A. Maedche, S. Staab, N. Stojanovic, R. Studer, and Y. Sure. SEAL - A Framework for Developing SEmantic Web PortALs. In British National Conference on Databases, pages 1–22, 2001. [8] J. Nielsen. Designing Web Usability. New Riders Publishing, 2000. [9] R. Spence. Information Visualisation. ACM Press, 2001. [10] S. Staab and A. Maedche. Knowledge portals — ontologies at work. AI Magazine, 21(2), 2001. [11] M. Uschold and R. Jasper. A Framework for Understanding and Classifying Ontology Applications. In Proceedings of the IJCAI-99 Workshop on Ontologies and Problem-Solving Methods(KRR5), Stockholm, Sweden, August 1999.