Kind Theory and Distributed Knowledge Capture Joseph R. Kiniry
Computing Science Department University of Nijmegen Toernooiveld 1 6525 ED Nijmegen The Netherlands
[email protected] ABSTRACT
Kind theory is a logic for describing and reasoning about structured knowledge in communities. It provides a formal framework for describing, finding, customizing, composing, and reasoning about structured domains, such as those of software and mathematics. A wiki web-based asset repository called the “Jiki” has been created that provides a simple interface to interacting with reusable artifacts while avoiding any exposure of its theoretical or technical underpinnings. This paper summarizes kind theory and its use in the Jiki for distributed knowledge capture.
tion of agents that are not necessarily human. The interested reader should look to the author’s dissertation for more information [8]. Kind theory can be used to reason about reusable structured information in any domain. Thus, because the theory is quite generic, it must have numerous models (theoretically and colloquially). A discussion of kind theory’s models is found in Section 5. Software and mathematics natural “first application targets”, but only the former are discussed in this paper.
1. INTRODUCTION
Kind theory is a logical for reasoning about reusable artifacts. Kinds are the structure within the logic used to classify instances. One can think of kinds as a sort of ontological first-class classifier, much like types in type theories and sorts in algebras. Kind theory’s definition is the result of analyzing (a) several branches of mathematics, (b) several formal methods for software and knowledge engineering, and (c) the tools and technologies used in the software creation process, particularly those that help people collaborate. This analysis drove the definition of the core constructs and axioms of kind theory1 . Kind theory including a number of tertiary new ideas, but unfortunately there is insufficient space to discuss them here. Examples include an innovative syntax that combines aspects of syntax (well-formedness) and semantics; the use of canonicality for renaming; the classification of data and behavior by an interpretive context; and the explicit introduc-
The complete set of logical rules of kind theory are included on the final page of this paper. 1
DC-KCAP Workshop at K-CAP 2003, Sanibel, Florida
2.
SOFTWARE AS KNOWLEDGE
Software systems are more complicated today than at any point in the history of computing. Projects consisting of millions of lines of program code written in a dozen languages developed and maintained by hundreds of people across multiple companies are commonplace. The development of software systems, realized by the practice of software engineering, is primarily characterized by its complexity and variety. There are hundreds of languages, tools, and technologies to choose from for any given domain. The field of software engineering research has similar variety as there are an enormous number of formalisms, methodologies, and approaches. Software is a young discipline, so it is unsurprising that this state of affairs is similar to that which more mature fields like manufacturing faced many years ago. Until the development of standard measures, tools, and terminology, manufacturing was a cottage industry. Independent manufactures created their custom wares in their own manner and with their own tools and parts. The result of which often showed poor workmanship, irregularities, inconsistency, and incompatibility. Such can be said for software today. The desire for an “a la carte” approach to software engineering is often expressed by researchers and practitioners. Developers want the ability to pick and choose formalisms, tools, and technologies for each new problem. But the current state of affairs, characterized by fiefdoms of theory and technology, does not lend itself to this approach.
One way to help realize this goal is to provide a kind of bridging formalism. Such a theory might help software engineers, in a collaborative and distributed fashion, express the commonalities and differences between the various options available. If this information were to be collected from the software community and made publicly available, perhaps via a Web site, then the community as a whole would benefit. Describing the commonalities and differences between the various theory, tools, and technologies of software engineering can be accomplished in many ways. Several abstract branches of mathematics exist that can be used to describe and reason about such structures, but these options are overly general. They do not focus on the problems of structured ambiguous knowledge representation, particularly in communities. What is needed is a new logical system that focuses on the structures specific to reusable knowledge as (re)used by communities. This paper describes an integrated set of theories and software tools for the purpose of helping communities describe, construct, and reason about their complex reusable artifacts, such as software systems and mathematics. The underlying logic is called “kind theory”, and the primary tool through which (non-mathematician) software engineers to take advantage of it is called the “Jiki”.
3. A CLASSIFICATION OF KIND THEORY
Kind theory is a non-classical, autoepistemic, paraconsistent, self-reflective logic. That is to say, kind theory has more than two truth values (i.e., true and false), it can be used to describe the self-knowledge of individuals, a context need not be (classically) logically consistent in order to make sound judgments, and kind theory has been used to describe itself. As this is quite a mix of non-standard approaches to several problems, each aspect of kind theory is described in a bit more detail in what follows. The interested reader can check the bibliography to find more information about kind theory [8] and its applications [7, 9] in great detail.
3.1 Non-Classical
When asked whether something is absolutely true or absolutely false, most people say “I don’t know” or “it is not either completely true or completely false”. This gray area of truth can be represented in logic and mathematics any number of ways. Most of these representations deal with gradations of truth, sometimes encoded in a truth predicate, and other times with probability theory, and, in logical, often times with modal or non-classical logics. What all of these representations have in common is that they have an underlying non-classical formal formulation— a basic logic that has more than two, and possibly infinite, truth values. As kind theory’s primary purpose is to be used describe “what is thought about things” (in a very general sense), then it is natural to choose a non-classical foundation.
But, as domains, researchers, and tools have different models for representing the “grayness” of truth, kind theory does not stipulate one particular representation. A generic theoretical structure is defined that is sufficient for all reasoning within the logical, leaving the choice of model up to the user, given her domain, goals, and tools.
3.2
Autoepistemic
The word autoepistemic captures the notion that kind theory lets one describe and reason about one’s own (thus “auto”) knowledge and beliefs (thus “epistemic”). There are a large number of autoepistemic logics available today, most of which are non-monotonic. Their purpose is primarily to help draw conclusions with incomplete, introspective information. They also focus on the fact that not only knowledge, but the absence of knowledge, enables certain judgments [13]. Kind theory is autoepistemic for the same reasons. It is a non-classical logic with three truth values, one of which represents the absence of knowledge. Additionally, because its semantics are (intentionally, logically) weak, its models are existing autoepistemic logics.
3.3
Paraconsistent
Paraconsistent logics are those that can deal with logical inconsistencies of various types [14]. These logics’ primary purpose is formally modeling the messy real world, a world that contains contradictions of various sorts, but in which humans can operate and reason without issue [2, 12]. Most paraconsistent logics are non-classical. Their development originates from artificial intelligence research in the fields of knowledge representation and engineering. As they are a relatively recent addition to the logician’s toolbox, they are still outside the mainstream of logic. Thus, only a relatively small number of examples have been developed over the past twenty years. Kind theory is a paraconsistent logic because, in formalizing and reasoning about a collaborating group of individuals’ beliefs, inconsistency is frequent. Likewise, temporary, intentional inconsistency is a hallmark of software engineering— systems under development are necessarily inconsistent and incomplete because they are not fully elucidated from a requirements or an engineering point of view [3]. Thus, kind theory has the aspects of existing paraconsistent logics, and many of such logics are models for those aspects of kind theory.
3.4
Self-Reflective
Finally, kind theory is self-reflective because it has been partially expressed within itself and an experiment in representing reusable mathematical structures.
4.
KNOWLEDGE THEORIES
There is no one theory of knowledge, or what is sometimes called the “philosophy of knowledge” or “epistemology”. There is a debate about the theory of knowledge, a debate that has been ongoing for over 2,000 years.
A historical study of the works of philosophers that are interested in this debate reveals the companion label theory of reality, usually termed metaphysics. This branch concerns itself with questions about the nature of the world and of man. While most view this branch as the more fundamental of the two, any explicit discussion of it will be avoided. Implicitly, on the other hand, it must be recognized that kind theory engenders the process of metaphysics. Aristotle called it the study of “being qua being,” that is, the study of what there is and of the properties of what there is. Collaboratively exploring knowledge, recording its structure, building ontologies, capturing experience—this is all collaborative constructive metaphysics. But without some philosophical and mathematical backing, the metaphysician creating and recording knowledge is without a foundation and does not command attention. Identification, authentication, and credentials are necessary to enable judgments of the worth of what has been said. If Sara says, “I know that this algorithm is of computational complexity O(n),” she is advancing a claim for which you expect her to have reasons by which she should be able to defend and explain. Such a claim is stronger than the statement “I believe that this algorithm is of computational complexity O(n),” though this belief will also be backed by some defense. The theory of knowledge is in part interested in the relations between claims and reasons for claiming. This aspect is often called the normative or justificatory aspects of the theory of knowledge. To know requires not only that one has reasons which will justify one’s knowing and one’s claim; it requires an awareness, that one has perceived, understood, inferred, weighed evidence, etc. The analysis of these various sorts of mental operations—of perceiving, understanding, inferring, etc.—compromises what is called descriptive epistemology. Kind theory’s truth structures are exactly these relations between a claimant, called an agent, and predicate.
4.1 Truth Structures
An agent is a person or system that interacts with a kind. Four examples of agents are the author of a paper, or the tool that he uses to write the paper, read his email, or mechanically check a proof.
This definition might seem quite anthropomorphic, since artificial agents cannot discern any meaning in anything; they can only manipulate data according to ways that have meaning for the person who programmed the agent2 . To clarify, artificial agents do not have some magical oracular ability to discern truth or define semantics. Agents are necessary only to support the representation of external, often computational, entities within the theory. In this manner, the decisions of external agents, be they derived by a flip of a coin, a call to a function in a library, or the execution of a 2
Thanks to Jos´e Meseguer for pointing this out.
judgment within an automated proof system, are all equally representable. Semantics are specified in kind theory using truth structures. Truth structures come in two forms: claims and beliefs3 .
4.1.1
Claims
A claim is a statement made by an agent that is accompanied by a proof or other validation data of some form. Example validation data, or what is called evidence, are a formal mathematical proof, a well-reasoned argument (or what is called a demonstration by Martin-L¨of [10]), or a set of consistent supportive data. For example, the statement “a tree is a plant” is a claim. The operator of this sentence is the predicate relation “is-a” and the truth value associated with this claim is “true”. The agent is whomever is making the claim. Thus, the expanded version of this claim is something of the form: “Joe Kiniry claims that the statement a tree is a plant is true because it is supported by the following set of botany textbooks”. A mathematically proven statement that is widely accepted is a claim. This phrasing (i.e., “widely accepted”) is used because, for example, there are theorems that have a preliminary (not widely verified, or partial) proof, but are not yet recognized as being true by the mathematical community. For example, Andrew Wiles’s proof of Fermat’s Last Theorem is now broadly accepted by the mathematical community, but early versions of the proof were not because they had not yet been widely reviewed. Thus, early versions of this truth structure (“Andrew Wiles states that Fermat’s Last Theorem is true because of the following proof”) were not claims, but latter versions are. Various statements made during program development, both implicit and explicit in the development process, are also claims. For example, stating that a class C inherits from a parent class P is a claim. The inheritance relation is the behavior, the author of the class C is the agent involved, the context included the definition of inheritance, classes, etc., the truth value associated with the claim is “true”, and the evidence is the source code itself. The statement within documentation or specification that a particular function f has complexity O(g) because of some detailed provided analysis is another example of a claim.
4.1.2
Beliefs
A belief is a statement made by an agent that is accompanied by a verification structure and has some associated conviction metric. A belief is a statement with no formal evidence and is instead supported by statements of experience, intuition, etc. Beliefs range in surety from “completely unsure” to “absolutely convinced” [1, 4]. Within kind theory, no specific metric is defined for the degree of conviction. The sole requireThe fundamental distinction between claims and beliefs is a widely supported notion within the field of epistemology [1, 4, 5, 11, 15]. 3
ment placed on the associated belief logic is that the surety elements form a partial order. The reason that a partial order is chosen over a total order or lattice is because (a) a partial order is more general than the alternatives as both a total order and a lattice are partial orders and (b) it is widely recognized that some belief sureties are incomparable [1].
5. MODELS
Since kind theory is a logical system with an intentionally loose semantics, it has numerous theoretical and practical models. We discuss some of them in this section.
The second type of systems this work must cover are those developed wholly within the software reuse community. For the most part, such technologies are used to classify and discover program code as reusable assets. Popular techniques include keyword searches, facet-based classification, and signature matching. The systems used in the everyday experience of building software are the third, and most important type of system to consider. Developers are not likely to change the way that they work, so incorporating this technology into their development model, perhaps even in a surreptitious manner, is one of the only ways to enhance their capabilities and introduce a new formalism.
5.1 Theoretical Models
Information that describes kinds or instances must be structured so that computation can act upon it. Since a structural representation of data is necessary for computational realization, it seems natural to define a type system for kinds. The specific type system defined for kind theory is called TΓ . TΓ structures the information that is important to kind theory, in particular, the specification of kinds and instances of kinds. It also provides a number of decidable, computable, efficient operations on that structure—operations that are chosen specifically because they capture aspects of kind theory’s basic functions and relations. TΓ is not a model for the whole of kind theory—it only models the structural aspects of the theory. What is missing are the functional pieces, the aspects of kind theory that express and execute functions. To computationally realize TΓ , a many-sorted algebra called AT that models TΓ is defined. In this algebra, one can create, manipulate, and reason about types as models for kinds. The functional part of kind theory is realized as sets of equations in an equational and in a rewriting logic.
5.2 Practical Models
Because this work can be characterized as a framework for collaboration as applying to software, it needs to be at least as capable as some of the systems that have been built and used in the past. Such systems come in three varieties. The first are general purpose collaboration mediums, particularly in a distributed setting. Examples include general purpose discussion forums like Slashdot4 , knowledge archives like Everything25 and the AAAS’s Signal Transduction Knowledge Environment (mySTKE), and collaborative software development sites like SourceForge6 . This work need be at least complete enough to model the capabilities of such collaborative systems. The intent is to both reason about their use as well as incorporate some of their lessons and technologies. http://www.slashdot.org/ http://www.everything2.com/ 6 http://www.sf.net/
4 5
6.
A KIND SYSTEM
A system that realizes kind theory is called a kind system. The types of questions that a developer puts to such a system are many and varied. Example questions include “Does this new thing fit the definition of any kinds I understand?” and “How closely related are the following two kinds from a structural point of view?”. History and experience shows that the majority of developers have no interest in learning such a formalism, regardless of how it might impact their work. Thus, the system must be exposed in both an explicit and an implicit fashion. Explicitly, a user can directly interact with the formal system, manually introducing new kinds and instances, making claims and beliefs about structure, and evaluating judgments. Only an expert, or a person interested in extending the fundamentals of kind theory, would use such a direct, explicit interface. The majority of users are meant to implicitly interact with the system. This interaction is facilitated by integrating the kind system with the major tools used by software engineers in their day-to-day activities. The editor, the compiler, the configuration and version management tools, and the integrated development environment are all legitimate targets for integration. Due to kind theory’s general autoepistemic nature, more than program code can be formally characterized. A developer’s tools, context, process, and more can be part of his explicit kind theoretic context, implicitly captured by these knowledgeable environments. Also, because kind theory exposes hooks to the non-expert user in the form of truth structures with intuitive semantics, the hope is that the feedback mechanism inherent in the theory is usable, almost accidentally, by the common software engineer. Tools that have been augmented with a kind theoretic infrastructure are termed knowledgeable. Performing software engineering with such infrastructure is knowledgeable software engineering, and similarly, such software development environments are called knowledgeable development environments.
6.1 The Jiki
The integration of these environments with a reusable knowledge repository known as the Jiki [6] is underway. The Jiki is a read/write web architecture realized as a distributed component-based Java Wiki7 . All documents stored in the Jiki are represented as instances of kind. Manipulating Jiki assets, including adding or deleting information or searching for reusable assets, is realized through a forms-based web interface as well as through a Java component-based API. Figure 1 is a mock-up of the user interface we have designed for manipulating truth structures and kinds.
7. CONCLUSION
Kind theory’s first realization was created for the Maude algebraic/rewriting environment. The first version of the Jiki was written in 1997 at OOPSLA by a group of developers interested in distributed computing. Now armed with the experienced gained by these two pieces of work, we are working toward integrating the two environments. A new realization of kind theory within a HOL is underway, and it will be coupled with one of the more recent, and thus advanced, wikis available today.
7.0.0.1
Acknowledgments.
[7] Joseph R. Kiniry. A new construct for systems modeling and theory: The Kind . Technical Report CS-TR-98-14, Department of Computer Science, California Institute of Technology, October 1998. [8] Joseph R. Kiniry. Kind Theory. PhD thesis, Department of Computer Science, California Institute of Technology, 2002. [9] Joseph R. Kiniry. Semantic component composition. In Proceedings of the Third International Workshop on Composition Languages, 2003. ECOOP’03. [10] Per Martin-L¨of. Intuitionistic Type Theory. Bibliopolis, 1984. [11] Jack Minker, editor. Logic-Based Artificial Intelligence, volume 597 of The Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishing, 2000. [12] G. Priest, R. Routley, and J. Norman, editors. Paraconsistent Logic: Essays on Inconsistent. Philosophia, 1989. [13] Raymond Turner. Logics for Artificial Intelligence. Ellis Horwood Series in Artificial Intelligence. Ellis Horwood Limited, 1984.
This work was partially performed while the student was [14] Max Urchs. Essays on Non-Classical Logic, chapter at the California Institute of Technology in Pasadena, CA. Recent Trends in Paraconsistent Logic, pages This work was supported under ONR grant JJH1.MURI219–246. World Science Publishing, 1999. 1-CORNELL.MURI (via Cornell University) “Digital Libraries: Building Interactive Digital Libraries of Formal Al[15] John W. Yolton, editor. Theory of Knowledge. The gorithmic Knowledge” and AFOSR grant JCD.61404-1-AFOSR.Macmillan Company, 1965. 614040 “High-Confidence Reconfigurable Distributed Control”. This work was also supported by the Netherlands Organization for Scientific Research (NWO).
8. REFERENCES
[1] D. M. Armstrong. Belief Truth and Knowledge. Cambridge University Press, 1973. [2] Diderik Batens. Frontiers of Paraconsistent Logic. Taylor and Francis, Inc., 2000. [3] Steve Easterbrook, John Callahan, and Virginie Wiels. V & V through inconsistency tracking and analysis. In Proceedings of the Ninth International Workshop on Software Specification and Design, 1998.
[4] Dov M. Gabbay and Philippe Smets, editors. Handbook of Defeasible Reasoning and Uncertainty Management Systems Volume 3: Belief Change. Kluwer Academic Publishing, 1998. [5] A. Phillips Griffiths, editor. Knowledge and Belief. Oxford University Press, 1967. [6] Joseph R. Kiniry. The Jiki: A distributed component-based Java Wiki, 1998. Available via http://www.jiki.org/. 7
http://c2.com/cgi/wiki
Figure 1: Stating Truth Structures via a Web Interface
Figure 2: Rules of Kind Theory
Inheritance (Parent Is-a) Γ ` K