RESEARCH REPORT Learning Boolean Functions in Incongruence ...

7 downloads 0 Views 400KB Size Report
Dec 17, 2010 - 2 An example of a model for audio-visual pro- cessing .... [4] M. Hazewinke, editor. ... http://en.wikipedia.org/wiki/Chunking (psychology).
CENTER FOR MACHINE PERCEPTION

Learning Boolean Functions in Incongruence Detection CZECH TECHNICAL UNIVERSITY IN PRAGUE

Tomas Pajdla, Michal Havlena, Micaela Hartley {pajdla, havlem1, hartlmi1}@cmp.felk.cvut.cz

CTU–CMP–2010–23

Available at ftp://cmp.felk.cvut.cz/pub/cmp/articles/pajdla/Pajdla-TR-2010-23.pdf The work was supported by the EC project FP6-IST-027787 DIRAC. Any opinions expressed in this paper do not necessarily reflect the views of the European Community. The Community is not liable for any use that may be made of the information contained herein. Research Reports of CMP, Czech Technical University in Prague, No. 23, 2010 ISSN 1213-2365

RESEARCH REPORT

December 17, 2010

Published by Center for Machine Perception, Department of Cybernetics Faculty of Electrical Engineering, Czech Technical University Technick´a 2, 166 27 Prague 6, Czech Republic fax +420 2 2435 7385, phone +420 2 2435 7637, www: http://cmp.felk.cvut.cz

Learning Boolean Functions in Incongruence Detection Tomas Pajdla, Michal Havlena, Micaela Hartley December 17, 2010 Abstract We study a possibility how to discover relations between concepts recovered from data by detectors, which transform measurements into concepts (e.g. human present, sound present, speech present, person walking, standing, sitting, talking, etc.) represented by boolean variables. We try to recover a relationship between the variables by predicting each variable from the other variables by a suitable boolean function. We show that the construction is feasible for four boolean variables, which may be a basis for constructing models of events in video and audio.

1

Introduction

Recovering dependencies between concepts, i.e. theories, is the basic problem of constructing models of the world. We follow the general scientific and machine learning paradigm [9, 1, 7] in which the functions are constructed based on a subset of observations (training data) and then evaluated on the rest of observations (test data) to observe the predictive ability of the functions. Learning models and in particular boolean functions has been studied in artificial intelligence [1, 10] as well as in machine learning [11] and optimization [2]. Here we approach learning the boolean functions in its simplest, fully combinatorial form, without attempting to achieve optimality or efficiency. Our goal is to present a general paradigm. Efficient implementation is beyond the scope of this work. It is clear that this “combinatorial enumerative approach” is limited to a very small number of variables but we believe that it is still interesting to 1

(a)

(b)

Figure 1: An example of an audio-visual scene with several direct detectors of human presence, speaker sound presence and a human speaker presence in the scene. The outputs of sound and appearance direct detectors evaluated at different positions along the horizontal dimension explain the output of direct detector of a human speaker. The output of the direct human speaker detector can be with high accuracy predicted by coincidence of human appearance and human sound at the same place and time. investigate it since many theories in cognitive psychology suggest that human capabilities in remembering and mentally manipulating discrete concepts in short term memory are rather limited (to seven plus minus two bits [6], to four bits [3], see [12] and [13] for more on chunking and short term memory capacity). The power of human combinatorial thinking could be explained by the ability to work in a hierarchical way, thus always combining only a very small number of concepts (binary variables) at a time [5].

2

An example of a model for audio-visual processing

Let us start with an example of events we want to model. We use an example from [8]. Figure 1 shows several direct and composite detectors. Direct detectors transform video and audio signal into concepts whose presence can be adequately modeled by two-state (boolean) variables which equal one when concepts are present in observations and zero otherwise. Direct detectors model concepts, which are familiar and for which there are 2

x y z f

0 0 0 1

0 0 1 1

0 1 0 0

0 1 1 0

1 0 0 1

1 0 1 0

1 1 0 0

1 1 1 1

Figure 2: Boolean function f (x, y, z) of three boolean variables. There are 3 22 = 256 different boolean functions of three variables. specialized means of detection. They are constructed by specialized processes (usually) from a large number of observations with labels putting them into desired classes. Direct detectors are often efficient but can’t be easily disassembled into pieces and reassembled into modified detectors with desired functionality. Direct detectors may capture simple concepts (e.g. presence of a sound or a human appearance in the scene) but also capture more complex concepts (e.g. presence of a human speaker in the scene). Composite detectors explain some of the direct detectors by outcomes of other (potentially simpler) direct detectors. Composite detectors provide “understanding to the concepts” defined by direct detectors. Composite detectors are represented as boolean functions which take as their input outputs of a subset of direct detectors and return the outcome explaining another direct detector.

3

Modeling world by boolean functions

Direct detectors are represented by boolean functions f : {0, 1}n → {0, 1} of n boolean variables within the scope of propositional logic [4, 7]. Considering n n variables, there are 2(2 ) different boolean functions. Figure 2 shows an example of a boolean function f of three boolean variables x, y, z. Boolean functions can be represented as formulas constructed from variables and boolean algebra operations , ∧, ∨. By the nature of the boolean algebra, formulas of n variables can be constructed from literals, which are variables and their negations. Literals connect to terms, which are conjunctions of sets of literals, e.g. x1 , xk , x1 , x1 ∧ x3 ∧ xk . There are 3n different terms for n variables as eachvariable may appear directly, in negated, or k 2 n may be omitted. There are i=1 i terms of size not larger than k. A clause is a disjunction of a set of literals, e.g. (x1 ∧ x2 ) ∨ x3 . There are 3m different clauses for m literals. Many different boolean formulas may represent the same function. Two canonical forms of formulas are the disjunctive (DNF) and conjunctive (CNF) 3

normal forms. A boolean formula f is in DNF when it can be written as a disjunction of terms, e.g. f = (x1 ) ∨ (x2 ∧ x3 ) ∨ · · · A boolean formula is in CNF when it can be written as a conjunction of clauses, e.g. f = (x1 ) ∧ (x2 ∨ x3 ) ∧ · · · If f is a term, then f is a clause and if f is in DNF, then f is in CNF and vice versa by de Morgan’s laws. Having f given as a table, it is easy to generate the corresponding DNF of f . For each combination which gets assigned 1 by f , a term is constructed to evaluate those variables to 1. The DNF of f is conjunction of all such terms. DNF of function from Figure 2 is DNF(f ) = (x ∧ y ∧ z) ∨ (x ∧ y ∧ z) ∨ (x ∧ y ∧ z) ∨ (x ∧ y ∧ z) To get CNF of f , we use CNF(f ) = DNF(f ) DNF(f) = (x ∧ y ∧ z) ∨ (x ∧ y ∧ z) ∨ (x ∧ y ∧ z) ∨ (x ∧ y ∧ z) CNF(f ) = (x ∨ y ∨ z) ∧ (x ∨ y ∨ z) ∧ (x ∨ y ∨ z) ∧ (x ∨ y ∨ z) We see that every boolean function can be represented by DNF as well as by CNF. In general, a boolean function can be represented by many (actually inn finite) different formulas. Consider that there are 2(2 ) different boolean functions. Considering n variables, we have 2 n literals and 22 n different terms but we normally consider only 3n terms that are not identically zero (e.g. x1 ∧ x1 is always zero) and one identically zero term, i.e. there are 3n + 1 terms in total. Next, considering k terms, we can construct 2k clauses but again we are interested only in 3k + 1 clauses that do not repeat tautologies.

4

Learning boolean functions explaining observations

In order to explore the possibility of automatic classifier hierarchy creation, we use an enumeration of all possible Boolean functions, i.e. formulas, built from a given number of logical variables together with the sets of positive and negative examples. We use an enumeration of Boolean functions built from three logical variables in our sample example. Any formula built from three logical variables 4

falls into one of the 2( 23 ) = 256 classes when evaluated for the 23 = 8 possible combinations of the values of the three variables and our aim is to find “simple” representants of each of these classes. We seek formulas which are simple when written in CNF or DNF, i.e. the contain a small number of disjunctive or conjunctive clauses. Tautology and contradiction are excluded from consideration and the remaining 254 formulas are assigned in a greedy manner by going through a list of all possible satisfiable non-tautology formulas sorted based on formula simplicity. As a result, 46 formulas contain a single clause, 158 formulas contain two clauses, 48 formulas contain three clauses, and only 2 formulas needed four clauses to be written in CNF or DNF. We are going to model audio-visual speaker detector on our three logical variables, their meaning being: x1 audio detection exceeded a threshold x2 visual detection exceeded a threshold x3 the most significant audio and visual detections are co-located (but can be weaker than the respective thresholds) and we build our synthetic positive and negative example sets in the following manner: 1,000 examples of [1, 1, 1] (speaker present) are put in the positive set together with 20 random examples representing measurement and/or annotation error 10,000 examples of [0, 0, 0] (empty scene), 1,000 examples of [1, 0, 0] (sound coming from outside the scene), and 1,000 examples of [0, 1, 0] (silent person) together with 200 random examples are put in the negative set, but a certain portion of the examples is modified from [∗, ∗, 0] to [∗, ∗, 1] as the most significant detections can be accidentally co-located while at least one of them still being under the threshold. This portion is 5% in our experiment simulating co-located evaluation in 20 azimuthal bins. The suitability of using a given formula to separate the positive and negative sets is measured as the relative number of correctly decided examples minus a weighted regularization term. The regularization term describes the simplicity of the formula and reads: log(2 n (number of clauses − 1) + sum(numbers of literals)), n being the number of logical variables, i.e. 3 in our case. x1 ∧ x2 yields log(2), x1 ∧ x2 ∧ x3 then log(3), but (x1 ∧ x2 ) ∨ x3 log(9) because the number of clauses increased. We use weight 0.01 in our experiments. Using this setup, the formula selected to describe the positive and negative sets is x1 ∧ x2 . Lets observe additional examples now. They are of the 5

form [1, 1, 0] (non-co-located sound source and person) and put in the negative set. When introducing more and more such examples, the popularity of x1 ∧ x2 falls. About 30 examples are needed in order to have the correct formula, x1 ∧ x2 ∧ x3 , selected. This formula is more complicated than the previously selected one but it decides the presented examples much better.

5

Conclusion

Recovering dependencies between concepts in the form of boolean functions is feasible. Our experiment shows that we certainly may think about finding relationships between triples and quadruples of variables in a naive enumerative way. With more advanced optimization techniques, one might probably work also with 5 variables. It is also true that living organisms are certainly not searching for the optimal solutions. They are often satisfied with any feasible solution or solution that gives them an advantage compared to their competitors.

References [1] A. Baar, P. R. Cohen, and E. Feigenbaum. The Handbook of Artificial Intelligence. Addison-Wesley, 1990. [2] E. Boros, P. L. Hammer, and J. N. Hooker. Boolean regression. Annals of Operations Research, 58:201–226, 1995. [3] N. Cowan. The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24:87185, 2001. [4] M. Hazewinke, editor. Encyclopaedia of Mathematics. Springer-Verlag, http://eom.springer.de/, 2002. [5] K. S. Lashley. The problem of serial order in behavior. In L. A. Jeffress, editor, Cerebral Mechanisms in Behavior. Wiley, 1951. [6] G. A. Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63:81–97, 1956. [7] N. J. Nilsson. Introduction to machine learning, 1998.

6

[8] T. Pajdla, M. Havlena, J. Heller, H. Kayser, J-H. Bach, and J. Anemueller. Incongruence detection for detecting, removing, and repairing incorrect functionality in low-level processing. Research Report CTUCMP200919, Czech Technical University in Prague, 2009. [9] K. R. Popper. The Logic of Scientific Discovery (translation of Logik der Forschung). Hutchinson, London, 1959. [10] L. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984. [11] M. Viswanathan and C. S. Wallace. An optimal approach to mining boolean functions from noisy data. In IDEAL, pages 717–724, 2003. http://springerlink.metapress.com/openurl.asp?genre=article&issn=03029743&volume=2690&spage=717. [12] Wikipedia. Chunking psychology. http://en.wikipedia.org/wiki/Chunking (psychology). [13] Wikipedia. Working memory. http://en.wikipedia.org/wiki/Working memory.

7

Suggest Documents