On Representing Knowledge in the Dependability Domain - IEEE Xplore

1 downloads 23 Views 314KB Size Report
A thesaurus (list of important terms with related terms for each entry) is .... Ravishankar K. Iyer, Director, Coordinated Science. Laboratory, University of Illinois at ...
International Conference on Dependable Systems & Networks: Anchorage, Alaska, June 24-27 2008

On Representing Knowledge in the Dependability Domain: a Panel Discussion Algirdas Avizienis University of California, Los Angeles, USA and Vytautas Magnus University, Kaunas, Lithuania [email protected] Abstract

An example of a long-term “manual” effort to create a framework of dependability concepts is the effort within IEEE CS TC/DCFT and IFIP WG 10.4 that began with a special session at FTCS-12 in 1982, and since then has resulted in several papers, a six-language book (1992), and in 2004 the paper “Basic Concepts and Taxonomy of Dependable and Secure Computing” (IEEE TDSC, vol. 1, no.1, January-March 2004). Useful as it may be, this “Taxonomy” paper represents only one of the dependability communities. It seems apparent that today the purely “manual” (i.e., human) process of ontology building for dependability concepts is reaching its limits. The complementary solution is to augment the human effort by the use of automatic natural language processing tools that have been developed by computer linguists. The next step must be computeraided building of a consensus ontology.

The objective of the panel is to discuss the urgent need, the means, the progress, the obstacles, and the challenges in creating a structured representation of the contents of the large and very rapidly increasing set of documents that represent knowledge in the technical domain of dependability.

1. The Need: Sometimes We Don’t Know What We Are Talking About Dependability has concerned most disciplines of computer science and engineering since the early days. As a consequence, significantly different terminologies were developed by different communities to describe the same aspects of dependability. The terminologies became entrenched through usage at annual conferences, in books, journals, research reports, standards, industrial handbooks and manuals, patents, and other documents. As an illustration, we have the concepts of dependability, trustworthiness, survivability, high confidence, resilience, high assurance, robustness, and self-healing, whose definitions appear to coincide or to overlap extensively. In many cases the definitions have multiple versions that depend on a given author’s preference. The use of several synonyms or near-synonyms that lack well-defined distinctions often is a source of continuing confusion that leads to re-inventions and plagiarism, impairs the transfer of research results to practical use and blocks the recognition of related documents. The orderly progress of dependability research and its practical applications requires that past work as well as new results should be classified on the basis of a single ontology and thus made visible and accessible to all communities of the dependability discipline.. However, it is unreasonable to expect that a committee formed by the different communities would by volunteer effort create a taxonomy document from which a single consensus ontology could be generated.

1-4244-2398-9/08/$20.00 ©2008 IEEE

2. The Means and Progress: Natural Language Processing by Computers During the past decade much progress has been made in the development of computer tools for natural language processing. Such tools have been developed for the extraction of term candidates from a corpus, i.e., a set of texts from a domain of interest. A thesaurus (list of important terms with related terms for each entry) is constructed from the term candidates. The ontology for a given domain is a data model that represents those terms and their relationships. Automatic indexation of the texts is carried out using the thesaurus, followed by clustering analysis using statistical and linguistic techniques. A measure of similarity between texts is computed that serves as a basis for automatic classification. The applicability of those techniques to texts in the dependability domain is currently investigated at the Center for Computer Linguistics of Vytautas Magnus University in Kaunas, Lithuania, using the tools developed at the Institute for Applied Information Research at Saarland University in Germany. The effort is part of research supported by the European Network of

440

DSN 2008: Avizienis

International Conference on Dependable Systems & Networks: Anchorage, Alaska, June 24-27 2008

Excellence ReSIST (Resilience for Survivability in Information Society Technologies). The corpus is composed of nearly 2000 papers and other texts from the Proceedings of all 29 FTCS and 7 DSN conferences (1971-2006). Term extraction yielded over 20000 term candidates. When only the abstracts of the papers were processed, about 5700 term candidates were extracted for the computer-aided construction of a thesaurus and ontology. The thesaurus was employed to perform automatic indexing of the FTCS and DSN papers, followed by the identification of about 180 clusters by means of automatic clustering analysis. Currently automatic synonymy extraction experiments are conducted using four approaches. Next an ontology will be constructed and used to do automatic classification experiments with the FTCS and DSN papers. The encouraging result of the processing of texts from the FTCS/DSN community leads to the conjecture that similar processing of texts from other conferences, journals, books, industrial documents, etc., will produce other ontologies that can be merged into a consensus ontology for the entire discipline of dependability.

construction of a thesaurus and an ontology for the entire CS&E profession. However, we must put our own house in order first: a consensus dependability ontology with explicit synonymy relations must be available to the CCS builders. The prize to be gained is also grand: a “researcher’s assistant” (or “referee’s helper”) that uses the ontology to search the immense collection of past publications for relevant references in the dependability domain.

5. From the Dark Side: the Info-Skeptic's View A different view of the CS&E ontology problem is also possible. The info-skeptic says: all concepts, systems and theories that deal with information are human-made, in contrast to natural phenomena that exist as given facts to be investigated by the physical sciences. Therefore the disappearance of a CS&E concept is simply a case of survival of the fittest: if the concept’s originator was not able to assure its survival, then someone else will rediscover it in due time. Thus there is no need to keep track of what was done in the past, because the good stuff will reappear, with some luck… in my research.

3. The Obstacles: Classification Is a Problem for All of CS&E

Panel Members Algirdas Avizienis, Professor Emeritus, UCLA, and Principal Research Scientist, VMU, Kaunas, Lithuania. He is the founding chairman of IEEE-CS TC on FaultTolerant Computing and of IFIP WG 10.4, and originator of surviving concepts “fault tolerance” (1967), “Nversion programming” (1977), “design diversity” (1982) and a few more, and of some extinct ones as well. Ravishankar K. Iyer, Director, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, USA, and first Editor-in-Chief, IEEE Transactions on Dependable and Secure Computing (2004-2007). Jean-Claude Laprie, Directeur de Recherche at LAAS-CNRS, Toulouse, France and Coordinator of the three-year ReSIST (Resilience for Survivability in IST) European Network of Excellence project (2006-2008). He is the author of several concept papers and the editor of the six-language book “Dependability: Basic Concepts and Terminology” (Springer, 1992). Ruta Marcinkeviciene, Director, Center for Computer Linguistics at Vytautas Magnus University in Kaunas, Lithuania. She is in charge of the effort to use natural language processing tools in order to generate a thesaurus and an ontology from the corpus of the FTCS and DSN papers published from 1971 to 2006 and to conduct automatic classification experiments.

A dependability ontology is an integral part of an (still non-existent) ontology for all of computer science and engineering. The only existing and widely used taxonomy that could be used to build it is the ACM Computing Classification System (CCS). The CCS was created in 1988 and was last revised in 1998. It has fallen far behind the evolution of CS&E and information technology. The concepts of dependability are treated very inadequately, and many significant dependability terms are altogether missing in the 1998 ACM CCS taxonomy. Most documents that deal with dependability refer to “the dependability of X”, where X is: hardware, software, system architecture, database, etc. These upper-level terms of the CS&E ontology must be available when classifying dependability documents. The existing CCS is a severe handicap, but it must be used until a better one is available. At this time the ACM is initiating the next update of the CCS, with one goal being the development of a flexible incremental process of updating.

4. The Grand Challenge The coming update of the CCS is a grand challenge to the dependability community: we must take part in the process of creating an up-to-date and evolvable version of the CCS that adequately incorporates dependability concepts. The new CCS would allow the computer-aided

1-4244-2398-9/08/$20.00 ©2008 IEEE

441

DSN 2008: Avizienis

Suggest Documents