within the ESPRIT programme) aims at providing support for knowledge .... document and its conceptual description are stored in the KnowWeb server data ..... dedicated to support knowledge management within organisations (including ...
Knowledge Modelling in Support of Knowledge Management Mach, Marian; Sabol, Tomas; Paralic, Jan; Kende, Robert Dept. of Cybernetics and Artificial Intelligence Technical University of Kosice, Letna 9, 041 20 Kosice, Slovakia {machm,sabol,paralic,kende}@tuke.sk
1. INTRODUCTION A considerable amount of knowledge is scattered throughout various documents within organisations. It is quite often that this knowledge is stored somewhere without possibility of being retrieved and reused any more. As a result of this, most knowledge is not shared and is forgotten in relatively short time after it has been invented or discovered. Therefore, it has become very important for “learning organisations” to make the best use of information gathered from various document sources inside the organisations and from external sources like the Internet. Organisations are concerned with preserving knowledge within an organisational setting. There are three stages in the information life cycle: finding (acquisition), organising and sharing. Many technologies have been developed for the finding stage. On the other hand, there is a lack of efficient technologies focused on organising and sharing of existing knowledge. One possibility is to utilise an organisation memory mechanism aiming at storing knowledge and making it retrievable. At a macro-level, organisational memory corresponds to organisational knowledge with persistence. At a micro-level, information seeking by some organisational member within an organisational context can be considered as the process of finding the right ‘piece’ of organisational memory [1]. The KnowWeb project (a joint research project funded by European Commission within the ESPRIT programme) aims at providing support for knowledge management (including capture, update and retrieval of knowledge) within an organisation, fostering efficient communication and supporting distributed groups to share knowledge and exchange information efficiently. In larger organisations these processes are thoroughly designed but in small and medium-sized enterprises (SMEs) they are far less supported and explored. Therefore, priority application areas for KnowWeb are SMEs, especially those where sharing of knowledge and rapid identification and access to relevant knowledge is a critical factor of success on the market, and/or those which have geographically distributed offices. Two distinct levels of representing knowledge in an organisation can be distinguished. There is a repository of physical documents (reports, tables, etc.) on one side. These documents can be retrieved using existing search facilities of a file system. On the other side, background knowledge is attached to each document. It defines the document’s context, its relation to other documents, and its relation to organisation’s activities. Thus, this background knowledge interprets documents and
gives them a deeper meaning within the context of a particular organisation, because originally such knowledge is not explicit but tacit [8]. KnowWeb aims at making at least a part of this knowledge explicit and usable for the retrieval and sharing. One type of background knowledge can have the form of well-structured ‘common vocabulary’. Such vocabulary is known as ontology [2], [4]. We assume that there is a conceptual description of an application domain in the form of a domain model. 2. DOMAIN MODELLING Theoretical foundations for the research of domain modelling can be found in the works of Chandrasekaran [3], Gruber [4], Wielinga [9], and others on ontologies and knowledge modelling. Ontology is a term borrowed from philosophy where it stands for a systematic theory of entities what exist. In context of knowledge modelling, Gruber introduced the term ontology as a set of definitions of content-specific knowledge representation primitives consisting of domain-dependent classes, relations, functions, and object constants. The ontology represents formal terms with which knowledge can be represented and individual problems within a particular domain can be described. Ontology in this sense determines what can 'exist' in a knowledge base. Chandrasekaran understands ontology as a representation vocabulary typically specialised to some domain. He suggests basically two purposes for which ontologies may be used:
q to define most commonly used terms in a specific domain, thus building a skeleton,
q to enable knowledge sharing and re-using both spatially and temporally (see also [6]). Ontology with syntax and semantic rules provides the 'language' by which KnowWeb(-like) systems can interact at the knowledge level [7]. Ontology allows a group of people to agree on the meaning of few basic terms, from which many different individual instantiations, assertions and queries may be constructed. Once there is a consensus on understanding what particular ‘words’ mean, knowledge represented by these words can be adapted for particular purposes. Knowledge must be defined unambiguously because different people in the organisational structure of an organisation need to use them with the same meaning. Thus, it is possible to reuse and share the knowledge thanks to understanding of its representation. Common understanding of the meaning of notions used in a given domain (the understanding may be domain-specific) results in the definition of concepts. Concepts are more or less abstract constructs on which a relevant part of the world is built, or better, which can be used to describe this relevant part of the world. Since concepts can differ in their character, several types of concepts, namely classes, relations, functions or procedures, objects, variables or constants, can be distinguished. These primitive constructs can be represented differently in different applications but they have the same meaning in all applications – i.e. when someone wants to communicate with somebody else, he/she can do it using constructs from the ontology of shared concepts.
The concepts create usually a very complicated hierarchical, network (or tree)-like structure. However, even a complex structure covers only a specific part of the world, e.g. a narrow world of an organisation and its activities. This structure models the world from a certain point of view. And here emerges the notion from the title of this section – Conceptual Modelling or Domain Modelling. Both terms are almost equivalent because, as it was mentioned above, the concepts are usually highly domain-dependent or subject-dependent, and can be meaningfully used only in the frame of the particular domain. In other words, what is acceptable and important, for example, for a property management company may be not suitable for a company dealing with distance delivery of educational courses. Based on needs analysis of several pilot applications, only two types of concepts are used. They can be either generic (type class) or specific (type instance). Both of them have attributes. Relations among them are used to construct domain model for the current version of the KnowWeb system. Formally, a relation in KnowWeb is an oriented link between two concepts. Two basic types of relations can be distinguished: subclass_of for relations between classes and instance_of for relations between classes and their instances. These two relation types enable inheritance of attributes and their values. The inheritance is an important mechanism for the development of a hierarchical ontology. Also multiple inheritance is supported i.e. a class concept can inherit its attributes from several parent class concepts. Figure 1 represents an example - a very small part of a domain model.
Figure 1: Sample of a domain model. What shall be included in the domain model? The simple but vague answer is everything what is relevant and important to describe a particular domain. In a case of SMEs such model may conceptually describe the company specific concepts, such as its activities, projects, customers, employees etc., as well as relations among these concepts.
Each organisation has some knowledge already gathered in the form of various databases and/or documents containing information about various technologies, products, customers, suppliers, projects or employees. Each company has usually some internal procedures how to perform specific tasks. Simply speaking – knowledge exists in an established environment. This knowledge is traditionally called organisation’s goals and know-how. From the knowledge modelling perspective a repository of know-how, goals etc. may be addressed as an organisational memory or a corporate memory. 3. BASIC KNOWWEB SYSTEM FUNCTIONALITY The KnowWeb system enables to store knowledge in the form of documents with attached concepts from ontologies – documents with their conceptual descriptions. The main purpose of using ontology is to define concepts that are useful to express knowledge about a particular document in domain specific terms. Subsequently, the concepts can be also used for search and retrieval of relevant documents. In such case the domain model serves as a ‘reference vocabulary’ of company-specific terms [9]. Such approach supports searching not only in the physical document discourse but also in the document context. It also supports ‘soft’ techniques where a search engine can utilise the domain model to find out related concepts to those specified by the user.
3.1 Enriching documents Documents are hierarchically structured from the syntactic point of view. They are divided into chapters (sections); chapters (sections) further into subchapters (subsections), these into paragraphs, and paragraphs into sentences. Sentences consist of words – the smallest syntactic units. Syntactic units (in general document fragments) are addressable – the result of some operation over a document (e.g. retrieval) will refer to a specific syntactic unit (or more units) within the document. First, in order to insert a document into the organisational memory, it is necessary to attach contextual knowledge to it. This context will be in the form of a conceptual description (CD) of a document. The conceptual description of a document consists of the conceptual descriptions of text fragments within the document. By conceptual description of a text fragment we mean a set of association links between the text fragment and concepts in the domain model. The set of links may be possibly empty if there is no conceptual description for a particular text fragment. Conceptual descriptions will consist of references to the knowledge context contained in the particular document in the future and will make this knowledge accessible in an easy and efficient way. The descriptions are usually created manually by authors of the document. The authors can select a text fragment (see Figure 2) and link it to the domain model. The linking can be done directly to ontology concepts or to a template representing a particular subset of ontology concepts. Templates provide users with the possibility to link some types of documents, which have a typical and repetitive structure, in an automatic way. This type of links to concepts is of many-to-many type. It means that it is legal to link a document
fragment to several concepts and vice-versa, a single concept to several fragments of one or more documents.
Figure 2: Tool for viewing, browsing and enriching documents When a document with its description is available for storage and publishing (after manual or semi-automatic linking within the KnowWeb system), it can be incorporated into the organisational memory represented by a KnowWeb server. The document and its conceptual description are stored in the KnowWeb server data repository. Another possibility is to store a conceptual description of a document without storing the document itself (the document can be located on other KnowWeb server or somewhere else). In the latter case there will be only a link to a document available. If a user is not satisfied with the conceptual description of a document stored in the organisational memory, he/she has the possibility to modify it and subsequently to upload the document with the modified description into the organisational memory. Generally, only the authors of the documents and administrators are entitled to make any content and context modifications in the existing document. Other users can only browse and view the documents.
3.2 Document retrieval The aim of storing documents in the organisational memory is to access the right knowledge in the right time and/or situation. In order to express requirements on documents, which should be retrieved from the organisational memory, the users will formulate search queries. When formulating queries they can take advantage of use of the existing concepts (or their attributes) from the domain model. Therefore, a query for knowledge retrieval is given in company-specific conceptual terms instead of traditional keywords. The queries can be composed of more complex structures using various logical operators and/or various available options that are supported in the KnowWeb toolkit. In general, the concepts used in the query together with the concepts that are related to them in a domain model will be used to search conceptual descriptions of documents. The level determining how many
‘neighbouring’ concepts should be added to the query will be given as an option in users’ queries. The Retrieval function enables to retrieve document(s) that are contextually relevant to a given query. Two basic modes of the retrieval can be distinguished. The ‘exact’ retrieval tries to find documents conceptual descriptions (CDs) that match the query exactly. In other words the neighbourhood size is set to zero and only concepts given in the query will be taken into account. On the contrary, the ‘approximate’ retrieval returns every document from the organisational memory connected to the concept, which is close enough to the concepts mentioned in users’ query. In order to control the interpretation of the proximity term users can attach their preferences to the query formulation. An example of a query and results of its application is depicted in Figure 3.
Figure 3: Query management interface Processing of a query may result in providing very limited amount of knowledge. Or the result can be a large number of documents with overwhelming amount of knowledge. In both cases users will not be particularly happy with the results, and may wish to modify their original query in order to extract more or less details, i.e. a larger or smaller set of relevant documents. To enable more sophisticated way of formulating refined query some additional information must be available. Another issue addressed by the KnowWeb system is the simplification of the postprocessing phase after document retrieval – document viewing and browsing. The focus is on the retrieval of relevant knowledge chunks from a document repository. With traditional search engines users have troubles to find relevant information in a retrieved document especially if the document is extensive or the information is not explicit but hidden ‘between lines’. Sometime the post-processing phase is more time
consuming task than the search and retrieval. Domain model can simplify also this activity when relevant parts/sections of a document will be shown using different colours in order to mark them. Since these parts/sections of a document will be linked to concepts from an existing domain model, finding the required information will be trivial. 4. SYSTEM STRUCTURE Within the project a conceptual framework and a generic architecture for computational support of organisational memory have been developed. The organisational memory enables to store documents together with their context knowledge and to access them in a user-friendly way. The organisational memory will be accompanied by a set of tools for:
q q q q q
preparing the documents to be stored, defining, viewing, browsing, and editing domain model, enriching documents by association links with relevant knowledge concepts, defining a context attached to the stored documents, storing and retrieving documents, browsing stored knowledge, etc.
All these tools together with an organisational memory constitute the KnowWeb system. From the practical point of view, the system is implemented using clientserver architecture.
4.1 KnowWeb server The server represents the heart of the KnowWeb system – it is a unit responsible for storing, maintaining, and accessing knowledge. The server consists of the following main modules (see Figure 4):
Organisational Memory Server Front - End Interface
Document Store
Association File Management System
Domain Model
WWWserver SQL server
Figure 4: KnowWeb server structure
1. Document store – where physical documents are stored, 2. Association file management system – for maintenance of background knowledge, document contexts (e.g. association links from a document or its parts to the concepts within the domain model), 3. Domain model – a core of the system, 4. Server interface – for an efficient and user-friendly communication with KnowWeb clients.
Document store The document store provides a platform to store and access documents that can be processed by the KnowWeb system - it supports storage of documents and an access control mechanism. The documents stored on the server are so called internal documents regarding the KnowWeb system. The term ‘document’ represents any collection of data, information and knowledge, which can be stored as a file in a computer. Thus, any new documents (e.g. created in word-processing or database packages) currently used in an organisation can be included into the store. Domain model The core of the KnowWeb approach is domain conceptualisation. A domain model identifies and describes the concepts that may exist in a particular domain, and relations among these concepts. The concepts from the domain model can be used to create conceptual descriptions of documents. The concepts (or their attributes) from the domain model can be used to formulate a query to retrieve relevant documents. Association file management system Associations represent conceptual descriptions of documents (or their parts). Both internal and external documents (documents addressed by their URLs and stored on a remote server) can be linked to a domain model. In case of internal documents (currently only HTML and MS-Word’s documents are supported) users may benefit from the full functionality regarding document fragment definition and retrieval. External documents cannot be retrieved using full-text searching capability, and users cannot define and link fragments to the domain model concepts. The only possibility is to link external documents as whole. The concepts from conceptual descriptions of documents (or their fragments) are used when a search for relevant documents is performed. Server front-end interface The interface maintains entire communication between KnowWeb server and specialised clients. The server front-end separates the application logic (usually called business logic) and the low level implementation layer. On one side it has to provide access to stored documents and to communicate with the database storing information about domain model, association files and data sources. On the other side, it has to provide methods for high level communication with specialised clients accessing KnowWeb server in order to perform activities necessary to administer and use server.
4.2 KnowWeb toolkit As it was mentioned above, KnowWeb toolkit is equipped with several tools that enable users to create and handle ‘enriched documents’. The tools differ in their functionality and the target user but together they help to manage knowledge in an organisation in an easy and user-friendly way. The main activity supported by the toolkit is the maintenance of document contexts. In other words, it helps users to link a new document (as a whole or its parts) to the concepts from a domain model, to modify this context within the domain model, etc. In order to link a document to a domain model, user must be able to view, browse and navigate throughout both the document and the domain model (with the possibility of synchronisation snapshots of the document and the domain model). These functions present the document in an enriched view that distinguishes fragments in the document that are linked to the domain model using different colours (see Figure 2). This view enables to find out which parts of the document are significant from the contextual and conceptual point of view (or exactly speaking – which parts were considered so significant by the authors that they had been selected and linked to some concepts). Similarly, it is possible to comprehend easily which parts of the domain model are applicable to a specific fragment of the document or to the document as whole. Since both the documents and/or the domain model can be rather extensive, it is not too difficult to stray despite the available layout and navigation tools. In such situation a search tool will be handy. The domain model stored within the KnowWeb system is not a rigid one. Since it reflects the current status of human understanding of the domain, it is expected that it will evolve and grow during the use of the system. The conceptual description of an application domain should be kept up-to-date by defining new concepts and/or relations. The authorised users are allowed to modify the domain model in a straightforward way – they can insert new entities, modify existing ones, and/or delete those entities which are obsolete or do not reflect the status of the domain in an appropriate way (see Figure 1). The modifications can be performed both manually and/or automatically. The automatic mode can be restricted only to some parts of domain model, for example to insert factual knowledge, which means inserting only instances of general classes. The aim of storing documents in the organisational memory is to access the right knowledge in the right time or situation. This is supported by tools for query formulation, query refinement and viewing retrieved documents. The KnowWeb system enables three types of retrieval: full-text search, attribute search, and concept search. In order to retrieve relevant documents using concept search, user has to formulate a query based on a domain model. The concepts from the domain model can be composed into a more complex structure using various logical operators and/or relations between concepts (see Figure 3). The retrieval employs similarity of concepts defined by used domain model. The attribute search is based on the fact, that documents can have (possibly empty) headers. A header represents some of the key attributes of documents. For example, they can contain document name, date and time, authors of document, comments, and so on. The set of applicable
attributes included in a header is application-dependent. Some attributes can be compulsory; others are optional. The retrieval can operate not only on the organisational memory but on external knowledge resources as well (only one external knowledge source is considered within the current KnowWeb version – the Internet). External resources contain documents designed by external authors. These documents are published without associated conceptual descriptions. Thus, the retrieval should accommodate this difference. The described functionality of the KnowWeb system has to be complemented by some additional features necessary to run and maintain the system (e.g. management of the organisational memory, statistic functions, maintaining consistency of descriptions of documents and the domain model or documents themselves, etc.). The available tools can be grouped together in order to form various KnowWeb clients specialised to some activities and devoted to some kind of users. Three basic attributes characterise these clients – interactivity, user friendliness, and simplicity. In this way it is possible to form a client for domain model administrators, a client for ordinary users allowing formulating a query and retrieval relevant documents, a client for advanced users allowing conceptualisation of documents and their upload on the server, a client for server administrators, etc. A client can be created as a standalone application or it can be created as a plug-in into software used within an organisation. This solution enables users to extend the functionality of their software by including features provided by the KnowWeb system. It is possible to add a ‘piece of intelligence’ into an existing system with the result of manipulating and interpreting knowledge instead of data. In this way users can get benefit of using an organisational memory with no (or only small) change in their work practices. An example of this approach is the extension of CONTACT 2000 – a powerful data storage and management system focused on project and contact management, electronic document flow control, personal agenda maintenance, group scheduling and co-ordination (see Figure 5). It is important to offer users the functionality of the KnowWeb system without adding too much additional work to their daily job loads. For example, performing two or three simple steps can be enough to enrich a document and to upload it on the KnowWeb server. First, in addition to the standard behaviour, user just checks the “Into KW” option. It signalises that the attached document should be uploaded into KnowWeb. Next, to link automatically the document, user can (but not need to) choose a “Template” and/or “Prescript”. And finally, the user just clicks “OK” button to confirm the operation.
Figure 5: CONTACT 2000 / KnowWeb interface.
5. CONCLUSIONS The aim of the KnowWeb system is to help organisations to manage the knowledge they currently possess, and to focus their operations towards knowledge-centred and knowledge-intensive business practices. To fulfil this goal a specific architecture dedicated to support knowledge management within organisations (including capture, update and retrieval of knowledge) has been designed and implemented. Priority application areas of the implemented system are those organisations which require large volumes of data to be processed during the decision-making processes in a satisfactory way. ‘Satisfactory’ means that a competitive organisation has to provide its current employees with an effective access to its knowledge repositories built on the previous experience gained throughout (possibly) geographically distributed offices. The focus is on the storage of knowledge in so-called organisational memory and retrieval of relevant knowledge chunks within documents. In general, there are two main approaches to information retrieval: keywords oriented search and concepts (notions) oriented search. The keywords oriented approach relies on such methods as search engines which take users’ queries and these are compared with an existing document collection to find the most likely match. On the other hand, the concepts (notions) oriented approach relies on the organisation of information (documents) into a hierarchic structure which is represented by domain model. The KnowWeb system supports attachment of the contextual knowledge to the documents. A query to the implemented system is supposed to be defined in
company-specific conceptual terms instead of traditional keywords. The employed knowledge-based approach enables searching not only in the physical document discourse but also in the document context space. The system enables users to find those particular documents (or those particular parts of these documents) that are most relevant to their current needs without the need of browsing large volumes of retrieved data. 6. ACKNOWLEDGEMENTS This work is supported by the European Commission within the ESPRIT 29065 project “Web in Support of Knowledge Management in Company (KnowWeb)” and partly by the Ministry of Education of the Slovak Republic within the VEGA grant No. 1/5032/98 from the Commission No. 4. 7. REFERENCES [1] Ackerman, M.S. Augmenting the Organizational Memory: A Field Study of Answer Garden. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work, 1994, pp 243-252. [2] Chandrasekaran, B., Josephson, J.R., Benjamins, V.R. Ontology of Tasks and Methods. In: Proceedings of the 11th Knowledge Acquisition for KBS Workshop, Banff, Canada, April 1998. [3] Chandrasekaran, B., Josephson, J.R., Benjamins, V.R. What Are Ontologies and Why Do We Need Them. IEEE Intelligent Systems, 14 (1), 1999, pp 20-26. [4] Gruber, T.R. A Translation approach to Portable Ontology Specifications. Knowledge Acquisition, 5 (2), 1993. [5] Malhotra, Y. Knowledge Management in Inquiring Organisations. In: Proceedings of the 3rd Americas Conference on Information Systems, 1997, pp 293-295. [6] Motta, E., Zdrahal, Z. A principled approach to the construction of a task-specific library of problem solving components. In: Proceedings of the 11th Knowledge Acquisition for KBS Workshop, Banff, Canada, April 1998. [7] Newell, A. The Knowledge Level. Artificial Intelligence, 18 (1), 1982, pp 87-127. [8] Sveiby, K.E. Tacit Knowledge. [WWW page]. Available at [referenced 7/4/1999]. [9] van Heijst, G., Schreiber, A.T., Wielinga, B.J. Using explicit ontologies in KBS development. International Journal of Human-Computer Studies, 46 (2/3), 1997, pp 183-292.