Knowlets: Components for Knowledge Management - CiteSeerX

9 downloads 4767 Views 41KB Size Report
access methods based on their contents and contexts. Keywords: ... Content-Based Access, Semantic Access, Context, Process Modeling, Content Delivery,.
Knowlets: Components for Knowledge Management Franz J. Kurfess, Leon Jololian and Donald H. Sebastian New Jersey Institute of Technology Computer and Information Sciences Department University Heights Newark, NJ 07102, USA Email: [email protected] Phone: ++1-973-596-5767 Fax: ++1-973-596-5777

Abstract The support we get from computers for dealing with large bodies of knowledge and immense quantities of information is rather mediocre: We can use data bases, file systems, or proprietary tools for organizing knowledge, and have to rely on text-based, syntax-oriented access methods like searching for keywords to locate items of interest. This paper presents knowledge components, or knowlets, as basic building blocks and a framework for the constructing of knowledge networks. In addition to typical component features like compositionality and encapsulation, knowlets are also mobile, and provide semantic and pragmatic access methods based on their contents and contexts.

Keywords: Knowledge Management, Intelligent Systems, Syntax-Oriented Access, Content-Based Access, Semantic Access, Context, Process Modeling, Content Delivery, Courseware Development, Intelligent Agent Design

Introduction The current limitations with the interoperability of information systems has long been recognized as a limiting factor for the integration of information. Database management systems, whether based on the relational or object-oriented model, have worked well in organizing data semantically and providing query-based

access to the data. However, it is not uncommon for different information systems or databases within the same domain to be incompatible in their data requiring considerable manipulation before unifying the data into a single view. What is required is for information systems to be designed for interoperability. What is needed is a common architecture for interoperability between systems, and a software methodology based on

components to support it. System developers are increasingly recognizing the critical importance of software architecture in the design of systems. Software architecture is not only important as the basis for low-level design and implementation but as a significant factor in determining the interoperability between systems in support of the business process. The utilization of software components as a basis for developing software systems has always been a goal of developers. Starting from scratch in writing the code every time we need to build a system is simply not an effective way to achieve efficiency. Current abstractions in programming languages are limited to programming inthe-small. They do not scale up sufficiently to describe the higher level structures of the system or the interactions among the components of the system. There are many component models that have been proposed, such as Sun Microsystems’ JavaBeans, Microsoft’s COM/DCOM, and OMG’s CORBA. For components to be successful they must possess characteristics that make them viable in building systems. Components should allow developers to build a system by composition. A market for components should exist where third-parties can develop components according to standards [8]. The components should be executable units with support for dynamic loading. Currently there is a lack of standardization between component models although a basic level of interoperability is possible (i.e. through CORBA). In the current environment where requirements can change dynamically, there is a need for software to be quickly developed, easily configured, and highly interoperable with other systems. The process of developing this type of software should preferably be done at the user level

which means that the software must be built from prefabricated components that can be easily configured and assembled by the end-user. An architecture will be automatically configured to meet the user’s demands of the software within a family of domain-specific applications [3]. Increasingly, information is becoming the lifeblood of many organizations. To put information to use we need a way to access heterogeneous information from various sources, managed independently (frequently belonging to different organizations), represented by different semantics. In this contribution, we expand the notion of components to knowledgebased components, or knowlets.

Knowlets A knowlet is a specific type of component. Its purpose is to encapsulate information and provide access to its contents in a way that is consistent with its purpose and usage. Knowlets can be structured hierarchically to reflect the structure of the information they steward. Therefore, composite knowlets can be formed by encapsulating smaller knowlets to represent larger amounts of information. Hyperlinks across knowlets enable collaboration between knowlets to reflect the often distributed nature of information. The granularity is based on the amount of information that we choose to associate with the knowlets. A deciding factor on the level of granularity of knowlets is determined by the perception of the knowlet creator about the independent usefulness of the information. The situation is similar to the one involved in determining the scope of an object in the object-oriented paradigm. For example, a knowlet may represent information as small as a record within a table of an RDBMS, a whole table of records, or even an entire database. On the other hand, knowlets also reflect the conceptual

entities domain experts and users are accustomed to. This leads to a larger granularity for knowlets than for typical objects. From a user’s perspective, a knowlet may contain a scientific paper, a book composed of chapter knowlets, a talk, or a movie. Again in contrast to objects, knowlets are designed to be largely self-contained, which makes them sufficiently mobile to be easily distributed through various communication channels, such as the Internet or CD-ROM’s, for example. As indicated above, knowlets can be viewed as knowledge components, and share most of the conceptual and technical underpinnings of components.

may consist of a video segment including sound and subtitles. Even though image processing and speech recognition may not work perfectly, the combination of all the available information will often be sufficient to provide a reasonable characterization of the contents in the form of keywords or characteristic phrases [11]. These keywords, together with other meta-information about knowlets, can be stored in an associative memory, thus enabling very quick access to knowlets on the basis of specific features. In addition to content-based or semantic information, usage-based or pragmatic information can be used to identify related knowlets.

Access to Knowledge Since the purpose of knowlets is the representation and organization of knowledge, they have additional features not typically available with components. A crucial one is the ability to access and organize knowlets according to their contents, instead of syntactical features like component names or text strings they contain. Content, however, is not easily accessible for computer-based processing since it requires an interpretation of the data contained in a knowlet. Since knowlets may contain information of different type in various formats, this content-based access must rely on a variety of methods. For text-based information, thesauri and related approaches like latent semantic indexing [7] offer an extension of syntactic search by incorporating similar terms into the search. For images or sounds the situation is more difficult, albeit not quite hopeless. Image recognition and speech understanding may be used to convert information into text, or to extract important features that can be used for categorization and thus for the identification of similar documents. In many cases, knowlets will contain related information of different types about a specific topic: A news story, for example,

Knowledge Organization The techniques described in the previous paragraphs are not only useful for identifying and retrieving individual knowlets. They also serve as the basis for organizing knowledge into knowledge networks, or conceptual maps. On an abstract level, knowlets form a graph, with arcs representing relationships between individual knowlets. Such a graph, however, can become rather complex: First, knowlets themselves are not atomic items, but may contain other knowlets. In addition, different types of relationships exist between knowlets: links explicitly specified by domain experts or users, relationships based on the similarity of their contents (depending on the similarity measure used), and relationships based on the usage of knowlets. Furthermore, knowlets are designed to be mobile, and change their logical or physical location in a system. And finally knowlets may be cloned easily, leading to problems of coherency and consistency. Our intention here is to find a practical solution based on currently available technology; a knowlet registry, possibly implemented as a distributed service, seems to be a reasonable compromise. This registry contains essential information about all the

knowlets in its domain, and is the access point for all queries by users or other system components. The associative memory mentioned above serves as very fast content-based identification mechanisms for knowlets matching a specific request.

Knowlet Architecture Each knowlet conceptually consists of three parts: The actual content, metainformation about the knowlet, and specific methods used to access and modify the contents. The contents itself may contain other knowlets, or related atomic items such as text documents, sound, and images. In a similar way, the methods may be simple functions for the manipulation of the contents, or fullfledged software components in themselves. This design is consistent with component-based systems architectures, such as described in [8,9].

Knowlet Structure Figure 1 shows an overview of the internal structure of a knowlet. The oval indicates that a knowlet is an encapsulated, selfcontained entity, accessible only by authorized users through pre-defined functions. Internally, a knowlet consists of two main parts, the knowlet content and the knowlet meta-information. The content of the knowlet may consist of data and information of various types and in different formats, and may be retrieved or modified through the respective interface methods. If necessary, references for more complex methods capable of performing certain functions on that particular type or format are included, instead of the respective code. An example could be text-based information in the Adobe Portable Document Format (PDF), with a URL providing a reference to Adobe Acrobat Reader as viewer for that format. The knowlet meta-information contains three parts: administrative,

usage-related, and content-related information. Listings of names, types, formats, sizes, or access permissions are examples of administrative information. Creation and modification dates, users, utilization of specific parts or methods, and similar data refer to the usage of the knowlet. Finally, content-specific aspects are keywords, descriptors like meta-tags, references to ontologies [1] or to related knowlets. UsageOriented Access

usage

content

ContentOriented Access

Knowlet Meta-Information administrative

Knowlet Content Knowledge Modification

Knowledge Retrieval

Figure 1: Knowlet Diagram Especially the content-oriented information, but also that about usage, are provided to the knowlet registry, which uses it for fast identification and access to specific knowlets. Since content-oriented information may be difficult to capture, this can be done off-line, either during periods of low system utilization such as at night or over weekends, or by specific, highperformance computer systems.

Knowlet Registry A diagram of the knowlet registry is shown in Figure 2. The registry collects contentand usage-related information about knowlets in its domain, and uses this information both for the organization of the knowlets within its domain, and for access requests from outside.

and many of the computationally intensive operations are performed off-line.

Knowlet Registry

remote knowlets

Figure 2: Knowlet Registry The knowlet registry is responsible for operations that require information about sets of knowlets, such as grouping knowlets according to similar contents or usage patterns. Access to knowlets can also be greatly accelerated in cases when the exact name or location of a knowlet is unknown; without the registry all knowlets might have to be queried in order to identify the one that is requested. Especially for the purpose of quick access, a neural associative memory [6] can be incorporated into the knowlet registry. In addition to quick access, which also is available through conventional indices as used by search engines and related tools, a neural associative memory offers the benefit of retrieving the entry that shares the largest number of properties with the item described in the query. For the identification of similar knowlets and the categorization of knowlets, various techniques from information retrieval, statistics, data mining and neural networks can be utilized. Since many of them rely on global information across a set of knowlets, the registry is a suitable place to collect relevant information. Placing all this information in a central location of course creates the danger of a bottleneck. What we describe here, and what is shown in Figure 2, however, displays the conceptual architecture of the registry system, not necessarily its actual implementation. The implementation can be of a distributed nature, either by partitioning or by mirroring the registry,

Implementation A prototypical implementation of knowlets is currently under development. Both for the storage of individual knowlets as well as for the registry it utilizes database technology. A knowlet is implemented as a record, and the different parts correspond to fields or groups of fields in the record. Since some items, especially in the contents part of a knowlet, may be quite large, they may reside in the file system, with a pointer from the respective field in the database. For performance reasons, the registry is implemented as a separate database, to be hosted on a different computer. Again a record corresponds to a knowlet, although the registry contains only a subset of the fields. In addition, a registry record contains information about the organization of knowlets, such as categories, or knowlets that are frequently used together. In essence, all the information in the registry is either a duplicate of the one contained in the individual knowlets, or can be derived by analyzing the overall collection of knowlets. Thus, from an implementation perspective, the registry can also be viewed as a cache for the actual knowlet data base. One important part of the registry, the associative memory, is implemented as a separate entity, and may be hosted on a different system for performance reasons. An associative memory is essentially a large, sparse binary matrix, and can be implemented with reasonable efficiency on a conventional computer system with sufficient main memory. Special-purpose architectures may be used if fast access is very critical.

Applications Secure Content Delivery Both for content providers as well as for users the storage and transmission of content in digital form has some attractive features, such as, lossless transmission, convenience, and high presentation quality. A critical issue, however, especially for the owners of intellectual property, is the restriction of usage to authorized customers. Within the framework of the New Jersey Multimedia Research Center with Panasonic as industrial partner we are currently working on the use of knowlets as a vehicle for the secure delivery of digital content. At the content provider’s side, the digital material, such as text or multimedia documents, is stored as a collection of knowlets, where one knowlet roughly corresponds to one document. The provider uses existing information and meta-information about the material to establish the content registry. If necessary or desired, the existing material may be enriched by an analysis of the contents of the individual items, and by the evaluation of usage patterns over time. From the user’s perspective, the registry acts as a catalog, listing the available items either as presented by the provider, or according to the individual preferences of the user. In a video collection, for example, the user might initially browse the collection in the categories offered by the provider, or on the basis of search criteria like names of actors or directors. Then the user might select a few titles, and request the system to identify similar ones. Again, the similarity measure can either be a predefined one, be defined explicitly by the user, or derived from previous activities of the user. Then the user narrows down the search to a few titles, and requests more detailed information, possibly including a preview. At this point, a knowlet will be extracted for each

requested title, and sent to the user. Depending on circumstances like transmission speed or available memory, either the full document or a part of it will be sent to the user. In addition to conventional security techniques like Secure Sockets Layer (SSL), specific encryption techniques may be used to prevent unauthorized use. In addition to the actual content, the knowlet may also contain the necessary software required to view the contents. The user may then browse the information and watch the preview at her leisure, but will be able to watch the full version only after obtaining the respective authorization by paying a usage fee. At any point, typically after the preview or after watching the full version, the user can express her satisfaction or displeasure about the requested item. This feedback is used to update the user’s own preferences, and also contributes to the overall usage statistics for that particular item.

Multi-Lifecycle Engineering The design and development of products for multiple reuse throughout their lifecycles is the objective of the MultiLifecycle Engineering Research Center at NJIT. A crucial aspect here is the availability of relevant information when the product developer needs to make a decision. Many of these decisions become very complicated if multi-lifecycle criteria have to be taken into account in addition to the usual technical aspects. Since it is practically impossible for developers to learn about all the additional aspects, it is very important to provide to them the right information at the right time: Access to a data or knowledge base, or a search on the World Wide Web may yield the information, but the developer might not be willing to devote a lot of time and energy to such searches. Our approach is to make relevant information easily available by “wrapping” it into knowlets, and using the knowlet registry in

combination with knowlet retrieval and transmission techniques to provide the user with that knowledge. For example, a user might design a particular part of a product, e.g. the case of a cellular phone. Critical criteria for reuse here are the choice of material, and the selection of fasteners that hold the different parts together. From within the design tool, or through a plug-in that extends its functionality, the designer sends a query to the knowlet registry about possible choices for material and fastener. This query combines information explicitly provided by the designer with information already present in the design document, and may contain a specific weighting of features for the selection of relevant knowlets. Based on these criteria, the knowlet registry identifies suitable candidate knowlets, and returns either the full knowlets, or a condensed version for perusal by the user. The user is presented with an overview of the candidate materials according to the criteria from the query, and can either make an immediate decision, or request further information. In this case, the use of component-based techniques is very important since it is the basis for the interaction between knowlets and the tools the developers use for their work. The above scenario can be implemented with conventional tools and technology, but the use of components and knowlets allows for far more flexibility and better efficiency across different tools.

Conclusions This paper describes a framework for knowledge management based on the methods and techniques used in component-based software development. In addition to advantages like modularity, encapsulation, compositionality, and reusability, knowledge components, or knowlets, incorporate access and organization features based on contentoriented and usage-based methods for knowledge management. The practical

use of knowlets is illustrated in their application to secure content delivery and multi-lifecycle engineering.

Acknowledgments The projects mentioned in this paper, and related ones have been funded by the State of New Jersey, and New Jersey Institute of Technology, the US Environmental Protection Agency, and the US Department of Defense. Especially the sections of this paper on component-based design have greatly benefited from the input and guidance provided by Dr. Murat M. Tanik.

References 1. B. Chandrasekaran and John R. Josephson and R. Richard Benjamins: What Are Ontologies, and Why Do We Need Them? IEEE Intelligent Systems, Vol. 14, No. 1, 1999, pp. 20-26. 2. M.A. Hearst, A.Y. Levy, C. Knoblock, S. Minton and W. Cohen, Information Integration, IEEE Intelligent Systems, Vol. 13, No. 5, 1998, pp. 12-24. 3. E. Mettala and M. Graham, The Domain Specific Architecture Program, Proceedings Software Technology Conference, Los Angeles, 1992. 4. A. Paepcke, S.B. Cousins, H. GarciaMolina, S.W. Hassan, S.P. Ketchpel, M. Rˆscheisen and T. Winograd, Using Distributed Objects for Digital Library Interoperability, IEEE Computer, Vol. 29, No. 5, 1996, pp. 61-68. 5. A. Paepcke, M. Q. Wang Baldonado, C.K. Chang, S. Cousins, and H. GarciaMolina, Using Distributed Objects to Build the Stanford Digital Library Infobus, IEEE Computer, Vol. 32, No. 2, 1999, pp. 80-87. 6. G. Palm, On Associative Memory, Biol. Cybernetics, Vol. 36, 1980, pp. 19-31. 7. B. Schatz, W. Mischo, T. Cole, A. Bishop, S. Harum, E. Johnson, L. Neumann, H. Chen, and D. Ng, Federated Search of Scientific

Literature. EEE Computer, Vol. 32, No. 2, 1999, pp. 51-59. 8. C. Szyperski, Component Software: Beyond Object-Oriented Programming. Addison-Wesley, 1998. 9. Y. Tang, A Methodology for Component-Based System Integration. Ph.D. Thesis, New Jersey Institute of Technology, Newark, NJ, 1998. 10.W. Van de Velde, A Construcivist View on Knowledge Engineering, in A. Cohn (editor), Proceedings of the 11th European Conference on Artificial Intelligence, Wiley, 1994, pp. 727-73 11. H.D. Wactlar, T. Kanade, M.A. Smith, and S.M. Stevens, Intelligent Access to Digital Video: Informedia Project, IEEE Computer, Vol. 29, No. 5, 1996, pp. 4652.

Suggest Documents