POPPER, a Simple Programming Language for

4 downloads 0 Views 1MB Size Report
As well as such a semantic triple, Q-UEL statements can be more elaborate ..... following useful “ready to go” definition without thinking further about the linkers, ... re-expressed as greater than 0.5 by some form of negation in statements, i.e. it.
POPPER, a Simple Programming Language for Probabilistic Semantic Inference in Medicine. Barry Robson St. Matthew’s University School of Medicine, Grand Cayman, Department of Mathematics Statistics and Computer Science, University of Wisconsin-Stout, Wisconsin, US, and The Dirac Foundation, Oxfordshire UK. Tel. 1-345-945-3199x193, Fax 1-345-945-3130, [email protected]

Our previous reports described the use of the Hyperbolic Dirac Net (HDN) as a method for probabilistic inference from medical data, and a proposed probabilistic medical SW (SW) language Q-UEL to provide that data. Rather like a traditional Bayes Net, that HDN provided estimates of joint and conditional probabilities, and was static, with no need for evolution due to “reasoning”. Use of the SW will require, however, (a) at least the semantic triple with more elaborate relations than conditional ones, as seen in use of most verbs and prepositions, and (b) rules for logical, grammatical, and definitional manipulation that can generate changes in the inference net. Here is described the simple POPPER language for medical inference. It can be automatically written by QUEL, or by hand. Based on studies with our medical students, it is believed that a tool like this may help in medical education and that a physician unfamiliar with SW science can understand it. It is here used to explore the considerable challenges of assigning

probabilities, and not least what the meaning and utility of inference net evolution would be for a physician. Keywords: Medical Inference, Decision Support System, Expert system, SW, Dirac, hyperbolic, complex, Bayes Net, Popper.

1. Introduction 1.1. Background. Our suggested Q-UEL universal exchange language for healthcare [1] and for inference [2] is amongst many efforts aimed at a more probabilistic form [3] of the current Semantic Web (SW) [4]. The present report extends the basic Hyperbolic Dirac Net (HDN) [2] that relates to so-called Bayesian approaches [5, 6] that are confined to conditional probabilities, i.e., forms like P(A | A, B) which is essentially the probability of the statement “A if B and C” being true. The HDN is now extended to encompass other relationships than “if”, such as in subject-verb-object, that the SW and Q-UEL provide. However, the methodological emphasis below is on a simple POPPER application and POPPER computer language. In its recent form it is seen in part as a tool (a) to convert Q-UEL tags to a form that makes simpler the building of inference networks and the inference engines that drive them, and, importantly, (b) as an interface by which a medical expert can construct Q-UEL tags (when data mining to obtain probabilistic information is not an easy option). In actuality, POPPER was originally partly developed as a simplest possible “toy system” with which medical students might express and solve inference problems in probabilistic semantics, and the original form goes back to 2009 and is the parent of Q-UEL. The first motivation for Q-UEL was to provide a

universal or at least uniform language to extract biomedical information, for automated reasoning systems like POPPER, from the web and the SW in particular. The second motivation was because Q-UEL seemed to meet US Federal demands for a universal exchange language for healthcare in late 2010 [1], including means of securely transporting patient data, and extracting further statistical biomedical information from many millions of consenting patients [1].

1.2. Probabilistic Conditional Inference. It is not obvious even to current SW developers how to assign plausible probability values to semantic statements and how a user should interpret them, and perhaps even less clear how one is supposed to interpret and utilize the changes in the inference net that can result from the action of the rules [3]. The diversity of proposed solutions suggests lack of agreement [1]. The popularity of the Bayes Net (BN) [6] is almost certainly because it is easy to understand its quantitative meaning, utility, and mode of use, as what we call a probabilistic conditional inference (PCI) approach. PCI in general uses basic building blocks that are the probabilities that one or more aspects are the case given that one or more other aspects are the case. As noted above, a conditional probability like P(A | B, C) = P(A, B, C) / P(B, C) has the force of P(“A if B and C”). The aspects A, B, C, etc. are states, events, things, observations, measurements, descriptions, etc. that we call attributes by reference to the attributes of the XML-like QUEL language. A PCI net is an estimate of a joint or conditional probability with many attributes when there is insufficient data to calculate the probability directly. Primarily, conditional probabilities with fewer attributes such as P(A | B, C) are multiplied together.

For a traditional BN this is according to the rule that the attributes as nodes and the conditional relationships between them as connections must describe a directed acyclic graph (DAG) [6]. The HDN as described in ref [2] is also a PCI but uses dual probabilities such as (P(A | B, C), P(B, C| A)), considering two directions of conditionality at the same time, say (0.9, 0.6). [1,2]. This is the value of a POPPER tag . The perhaps unobvious mathematical consequences are that the HDN is no longer constrained to a DAG. The dual itself is, nonetheless, a notion that should be obvious to physicians. For example, there is the risk factor that obesity presents for type 2 diabetes, and the different risk factor that type 2 diabetes presents for obesity. These are conditional probabilities P( type2 diabetes | obesity) and P(obesity | type 2 diabetes) though typically expressed on, e.g., a percentage or per millum basis.

1.3. Association Constants and Mutual information. While emphasis here is on assigning probabilities to statements subjectively by human experts, it can sometimes be done more objectively and/or automatically by data mining and text analytics. By such means one may often obtain association constants such as K(A; B) = P(A, B) / P(A)P(B) = P(A|B)/P(A) = P(B|A)/P(B), or K(A; B; C) = P(A, B, C) /P(A)P(B)P(C) [2]. The effect of a verb as the relationship between subject and object expressions can often be quantified by an association constant. Association constants form the basis of our K method for building an HDN [2], and K(A; B) appears on Q-UEL tags along with forms such as P(A|B) and P(B|A) to allow many clinical and epidemiological metrics to be calculated [1]. It relates to Bayes equation [1,2] in that inference can be framed in terms of P(A) as prior probability and P(A|B) as posterior

probability in P(A|B) = K(A; B)P(A). Reference will also be made below to information I(A; B) = lnK(A; B), which the author refers to as Fano mutual information, after Fano [7]. Many authors speak of Fano’s inequality mutual information but, strictly speaking, the only inequality for present purposes is that P(A|B) is not in general equal to P(B). It can be, but then I(A; B) = 0 and K(A; B) = 1, and A carries no information about B and vice versa. Note that, unlike some other kinds of mutual information, it is important to our use of I(A; B) that it may be greater or less than 0 (and hence K greater or less than 1).

1.4. Probabilistic Semantic Inference. Conditional probabilities such as P(A | B) are often capable of a number of other semantic

interpretations,

including

the

causal

P(“A

is

caused

by

B”),

the

transformational P(“B becomes A”), the comparative P(“A is greater than B”), and not least the categorical P(“B are A”) and similar set-theoretic interpretations, or to imply relationships where something is propagated, or a chain of effect (Section 5.3). These remain important as valid probabilistic semantics, and are responsible for the remarkably broad range of applicability of a BN. Also, the role of a verb of action as in “physicians treat patients” can often be rendered categorically: “physicians are patienttreaters”. However, homely examples are often used below of “dogs chase cats” and “cats chase mice” because “chase” is a good example of a verb of action that not fully interpretable in such conditional ways and very difficult to use in chains of inference if its role is rendered categorical. For a probabilistic SW and probabilistic semantic inference (PSI), the vertical bar ‘|’ must also often be replaced by many other kinds of relationships, including verbs of action and prepositions, when perceived linguistically.

Traditionally there is no P(dogs | chase | cats) except as one possible canonical format for P(“dogs chase cats”), yet humans have no difficulty in seeing that a probability can be assigned, at least qualitatively. Few would bet a thousand dollars that you will never find a dog that does not chase cats, or even that you would never see a cat chase a dog. But does have special meaning encoding a probability dual in Dirac notation and corresponding algebra [8] used in quantum mechanics (QM) [9,10], albeit that there words like dogs and cats are replaced by physical states, events, observations and measurements and a verb like chase is replaced by an operator. Indeed, a verb is seen a kind of operator in Q-UEL and POPPER, using the term “relationship operator” or relator to describe a relationship in general: hence the R in . Note that POPPER does not simply see a verbal relationship as some kind of additional condition, but as with QM, R is an operator that can in principle be defined as a matrix.

As well as such a semantic triple, Q-UEL statements can be more elaborate semantic multiples [1], such as the nested form , that can represent a tree graph as (the parsed structure of) a sentence. relates to the physicist’s spinor or dual spinor, and the above nested is an extension of a physicst’s higher level twistor in format [1,2,10]. POPPER is a simplified language that does not support or utilize all Q-UEL features, but while it does not currently explicitly support the above nested form, it also does not prohibit it. As described in Section 3.7, POPPER metastatements (essentially, match and edit instructions that act on statements to apply logical and definitional rules in inference) derive from such a nested “extended twistor”

form. In addition, the POPPER method allows construction of statements with such nested forms, simply by using assignment statements to create them as it would create a new statement as a semantic triple in format. However, the resulting forms have not as yet been manipulated in any way that would justify their use. In this report, statements are confined to the format. It remains that A, R and B can each be strings of words that can be manipulated by metastatements, and these could have implied nested forms like that above. 1.5. Previous Work. Although the literature of probabilistic semantics goes back a long way, it has been overshadowed by use of symbolic logic [11], and only recently has there been a significant increase in interest (see ref [12] for discussion and review). The problem of what probabilities to assign in POPPER goes back in part to the work of Karl Popper [13] that addressed more generally what it means to make statements about the world. Dirac himself felt that QM and his methods should also be applicable to many aspects of logical and probabilistic human thought [1, 2], but he did not say exactly how. Considering the way in which he extended Schrödinger’s wave mechanics to particle theory [10], it seems likely that he had in mind (albeit under other guises) the hyperbolic imaginary number h (i.e., hh = +1) that allows use of empirical classical probabilities 1,2]. Adding the semantic interpretation, however, draws on artificial intelligence [14], and linguistic theory [15, 16], relational data mining [17], as well as on Expert Systems [18, 19] which have a tradition of innovation for medical decision support [20, 21]. While our approach also appears to be such an innovation, h has been rediscovered in different guises in separate fields many times since Cockle first described it [22],

including by Dirac [10]. A largely h-complex QM has only been described fairly recently [23]. Significantly, the relevance to mental function was quickly posited [24], and before that h-complex algebra has been of considerable interest to the neural network community (e.g. ref [25]). Considering the relevance to quantification of relationships, it is not too surprising that the algebra has also been explored for use in on-line dating recommender systems [26]. Such efforts are relatively recent and may yet converge to a unified field, but at present Dirac’s work remains the comprehensive resource. 1.6. POPPER as a Language. By also being miscible with Perl and potentially other programming languages [1], POPPER has some force as a programming language. Nonetheless, this is not POPPER’s primary purpose. A traditional example such as print “Hello World!\n”; for a learner’s first entry into using a computer language can be inserted, but this is not fundamental POPPER. In this report focus is on the basic features required to build inference nets and inference engines, using the following key commands. Using these actual commands relates to POPPER HELPER described in this report, but one could simply prepare a text file as input instead.

(1) Update. Enter a sequence of statements such as < type 2 diabetes | causes | obesity> = 85%,55%, although the values can be replaced by a computed expression involving other tags as described in this report. Briefly, 85% relates to the percentage probability of the statement as read, and 55% to the statement after subject and object are switched. If followed by, say, < obesity| causes | overeating> = 70%,85%, the user would already have a very simple (and bidirectional) inference net [2]. Also optionally enter metastatements such as the syllogistic form = . These allow

statements to evolve from others. These two kinds of statement are the only ones of concern in this report. As long as purely multiplicative (AND) logic is used between tags and programming features (not discussed here) are avoided, the order of statements and metastatements is unimportant as “programming”, although metastatement order can affect the details of inference net evolution in a way that is not necessarily predictable in advance.

(2) Show. More like “Hello world!”, input so far can be displayed on request by the command show, along with the overall probabilities of the implied net computed and displayed at the same time.

(3) Think. On that command, metastatements evolve the network as automated reasoning, along with the evolving overall probabilities computed and displayed at the same time (though the way metastaments are usually constructed means that probabilities may not change). The net usually contacts in the interest of reducing much data and information to knowledge, but another command Ogden, discussed later below, can change that.

(4) Thesaurus. A kind of extension to show, this displays not the input statements but all the connections implied by vocabulary definitions in metastatements. For example, “pays” may be defined by the fact that if $A gives money, and the money is given to $B, then $A pays $B. The thesaurus may contain the relation to other tenses and persons of the verb, if provided in input, and notably the active-passive inverse (here, “is paid by”)1.

1

Such connections of meaning are computed prior to, and independent of, any use of think, in which the

job of metastatements is more specifically to “deduce” that, say, Jack pays Jill. There is sometime a need to avoid ambiguities (e.g., did someone else really pay Jill?) that make POPPER a little more complicated than the account so far may suggest. Command thesaurus allows the user to check that vocabulary and

The above describes all the essential features of POPPER for routine use. Simplicity is seen as important even to the developer of a POPPER “program” as above, whereas SW representations have a quite complicated appearance. To that end also, POPPER can be considered as Q-UEL but with statements are confined to the above tag format and stripped of features that are required for Q-UEL tags residing on the web. While metastatements allow considerable definitional power, a user usually builds POPPER tags that read like simple sentences in a natural language of user’s choice. Attributes as arguments in expressions are, for example, English words, not potentially elaborate attribute metadata language structures as in Q-UEL [1]. All this is not least because the domain in which Q-UEL and POPPER must work is already complicated enough. Medical use is a natural priority for the SW, but until such time as a human can confer with it as one would with another but extremely expert human, simplicity and intuitive form of the underlying elements may help not only physicians but also developers avoid error. 1.7. POPPER’s Evolving Status, and Implementing the POPPER Method. Complete codes will be made available when POPPER is more polished and the best embodiment for a release version becomes clear. From experience with early ancestral forms of the present approach, and notably the GOR method in bioinformatics [5], it is evident that making source code freely available is the best way for widespread

the logic it implies builds up in the intended manner, and can make the action of metastatements more efficient by providing a direct reference to grammar and related words and phrases.

adoption. However, the nature and status of POPPER is currently rather different from such earlier efforts. POPPER is really a method, and a code development environment for research in the area, not yet a fixed application.

In some respects, this is

necessarily so. First, the best algorithm is not yet clear, because the problems addressed overlap with many of the broader open issues in Artificial Intelligence. Compared with the epidemiological estimate of the incidence or prevalence of tuberculosis (TB) averaged over a large population, a preliminary clinical assessment of whether a specific baby has TB, using evidence based medicine and physician’s reasoning, is more complex. It depends on a variety of types of inference to solve little “puzzles” to do with the specific family members and their situation and past movements. More generally, how much to reduce a network of information and declare it as concise and coherent knowledge (say, in the manner of a deduced system of physical laws that can be briefly stated) is also not so clear, as discussed in this report. To such ends, POPPER replaces interpreter/compiler/ executer code by user definable input, not only including metastatements that define format, logic, grammar and vocabulary, but also by exploiting a fundamental relationship between the semantic triple and the dyadic function that can call a user-defined subroutine [1]. An example subroutine given in section 3.1 appears as input in the current set up, but if it represented a persistently wanted action, it would be moved from input to represent a more permanent feature of the interpreter/compiler. Nonetheless, a developer should be able to produce a core code from the present text. There is an account given of the most important variables and the most important subroutines in Section 3.1, with programming guidelines and a few code examples. An

example is give of a toy subroutine to illustrate what a user might create and use in current POPPER input, or to build a similar application to POPPER anew. An example code also given there for an important core subroutine ProcessTwoTriple. This illustrates well that the action of metastatements is not simply to match and edit one or more statements but to “match” statements that are semantically equivalent, i.e. mean the same thing, even if they look different. However, most of the subroutines are really composed of “housekeeping” details that can be implemented in many ways, and interested readers are likely to have access to text analytic, web and semantic web, and/or decision support suites that also provide the established basics. More important is to follow the relatively simple mathematical theory, including the notion of semantic equivalence that determines the more novel aspects of algorithmic structure, and “insert” the algorithms into other code. Examples inevitably assist comprehension, and many input examples are given with associated theoretical explanation.

2. Theory 2.1. Algebraic Development and Notes on Proofs.

Showing the nature of consistency between the probabilistic semantics described here and QM is important because we can borrow the mathematics. It is not obvious that it is applicable to the classical everyday world. There appear to be fundamental differences between standard textbook QM, Dirac’s spinor picture, and purely h-complex algebra:

Wave mechanics: = cei= c[cos() + i sin()]

Spinor wave mechanics: = c ehi = c[cos() + hi sin() = e+i + *e-i Purely Hyperbolic mechanics: = ceh = c[cosh() + hsinh()] = c[ e+ + *e−

(1)

From each description we can determine many corresponding (see below). Constant c is determined by the nature and scale of the system, relating to P(A) and P(B) prior to any observation, and  = (A, B) relates to I(A; B) (see below). Note the spinor projection operators = ½ (1+h) and  ½ (1−h) [1], for which notation varies a great deal in physics. Space here does not permit rigorous proofs but they are facilitated by use of the idempotent rule  = and **=*, the annihilation rule * = 0 and * = 0, and the normalization rule

+ * = 1. As in QM, asterisk * indicates complex

conjugation, in the present case changing the sign of h. The first two lines of Eqn. 1 describe wave mechanics. Stripped of the notoriously bizarre predictions that wave mechanics provides in the everyday world of human experience, it is the third that is of interest here, and the hyperbolic functions cosh and sinh give the hyperbolic number its name. All three cases in Eqn. 1 represent a primitive or “mother” form of symmetry called conjugate symmetry, in which A fully determines B and vice versa, save for the uncertainty principle as commutator [AB – BA] = ih/2 where h is Planck’s constant [810]. Note that (A| and |B) in P(A|B) typically share this non-commutative AB ≠ BA property, because P(A|B) - P(B|A) ≠ 0 (unless P(A) = P(B)). Conjugate symmetry is thought of as broken in an asymmetric field, an interaction such as an observation to determine P(A|B) or P(B|A). Our resulting Hermitian commutator form is simple, the probability dual (P(A|B),P(B|A)) as = P(A|B) + *P(BIA) = ½[P(A|B) + P(B|A)] + ½h[P(A|B) – P(B|A)]. Though traditionally probability relates to ||2 by QM

definition, we avoid square roots of probabilities [1,2] when Dirac’s recipe for normalizing and extracting P(A|B) and P(B|A) [8] is applied to the h-complex case [2]. Now  = I(A; B) exactly, and = [P(A) + *P(B)]e= [P(A) + *P(B)]K(A; B) in which P(A) and P(B) relate to c after conjugate symmetry breaking. Dirac’s braoperator-ket with operator R is harder than , but the remain the basic formal building blocks. For vectors we have = [,

,

,

…]T

where

T

indicates the transpose,

interchanging rows and columns in a vector or matrix [1]. is the inner product , and we also have the outer product |B>