Chess endgame features white win or nowin mushroom. Mushroom features poisonous or edible nursery. Nursery schools features recommendation: very, not ...
Slide 1/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion
Towards Relational Theory Formation from Undifferentiated Sensor Data Marc Pickett Department of Computer Science, University of Maryland Baltimore County
Advisor: Dr. Tim Oates
References
Committee: Dr. Tim Finin Dr. Robert Goldstone Dr. Matt Schmill Dr. Sergei Nirenburg
Motivation I Slide 2/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
For vision, people have idea of angle of rotation For sound, people have idea of pitch change, speed change SPEED-CHANGE in sound is like SCALING in vision (People can see this relation)
Motivation II Slide 3/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Computers start with Raw Sensors
Motivation III Slide 4/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Computer sees this: {
S0002=ON, S0175=ON, S0341=ON, S0551=ON, S0763=ON, S0947=ON, S1188=ON, S1308=ON, S1579=ON, S1796=ON, S2129=ON, S2214=ON, S2325=ON,
S0017=ON, S0180=ON, S0351=ON, S0567=ON, S0779=ON, S0956=ON, S1214=ON, S1352=ON, S1582=ON, S1853=ON, S2137=ON, S2218=ON, S2344=ON,
S0048=ON, S0197=ON, S0404=ON, S0573=ON, S0798=ON, S1027=ON, S1239=ON, S1395=ON, S1631=ON, S1891=ON, S2161=ON, S2227=ON, S2355=ON,
S0055=ON, S0233=ON, S0444=ON, S0623=ON, S0827=ON, S1043=ON, S1244=ON, S1452=ON, S1651=ON, S1898=ON, S2186=ON, S2247=ON, S2363=ON,
S0000=OFF,
S0001=OFF,
S0003=OFF,
etc....
S0056=ON, S0269=ON, S0483=ON, S0711=ON, S0833=ON, S1132=ON, S1275=ON, S1555=ON, S1655=ON, S1968=ON, S2195=ON, S2258=ON, S2398=ON,
S0117=ON, S0284=ON, S0490=ON, S0729=ON, S0859=ON, S1137=ON, S1305=ON, S1572=ON, S1771=ON, S2059=ON, S2206=ON, S2308=ON, S2406=ON,
(all 2,419 other sensors are OFF)
}
Motivation IV: Correlations among sensors? (Not Quite) Slide 5/45
Introduction A Feature-Set Ontology Application to Supervised Learning
How to get grid structure? “Connect each sensor to top 8 most correlated others.” Pierce & Kuipers (1997)
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Implicitly gives computer domain knowledge: That each node has 8 neighbors That domain is 2D That domain is spatial
Motivation V Slide 6/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Can’t rely on knowledge-engineering E.g., Highway Traffic speed sensors
Problem, Claims, & Evaluation Criteria Slide 7/45
Problem Statement: How can a computer develop rich relational theories from raw sensor data?
Claims: Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts
1 2
Evaluation: 1 Bridge story should be elegant. We rely on a few principles: Minimum Description Length (MDL) “Signatures” for recognizing patterns and binding (HMax idea) “Crunching” by finding big/frequent overlap to “explain” data
Finding Useful Mappings Apples to Angles
Partially implemented design for bridge from sensors to relational theory 1st link of bridge builds and uses conceptual structures
2
System should be independent of modality (vision, audio, etc.) Minimal innate knowledge Should work on wide range of domains Might even be in 5 Dimensional world
Conclusion References
3
Theory learned by the system should allow for compression of data Wolff (2003) contain concepts useful for tasks
Solution Overview: The Bridge Slide 8/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
All thesis work is building or testing parts of bridge Test on multiple disparate domains Concrete “side applications” along the way Data Compression, Macros in RL, Semi-supervised Learning
A few recurring principles: MDL, Signatures, Crunching
Phase 1 is core of dissertation Other phases are bonus
Slide 9/45
Phase 1: Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Creating A Feature-Set Ontology
Related Work Slide 10/45
Introduction A Feature-Set Ontology
HMax Riesenhuber & Poggio (1999)
Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion
HTMs Hawkins & Blakeslee (2004)
References
Neither say how structure is built autonomously
Representation Slide 11/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Like HMax & HTMs Can represent invariance Uses “signature” idea, like hash or checksum
Parsing and Inference with An Ontology Slide 12/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts
“Inference” does bottom-up “abduction” & “top-down” unfolding like HTMs: AND nodes want all their children to be ON OR nodes want at least 1 child ON All nodes want to be “explained” from above (Can have inhibitory connections too) “Parse”: Minimal∗ set of ON/OFF node settings to re-infer inputs Best parse minimizes “Probabilistic MDL” Pfunction ER (Ri ) = − log2 (P (Di |Ri , Ω)) − k r ∈Ri log2 P (r |Ω) Parse algorithm searches for this Optimal Parsing is NP-Hard (proof in thesis)
Finding Useful Mappings OR
Apples to Angles Conclusion
OR
AND
AND
AND
References AND
0
1
2
3
4
5
AND
6
7
8
9
Building An Ontology: Chunking Slide 13/45
Terminology note: Ontol system that builds and uses ontologies The Cruncher part of Ontol that builds ANDs How Cruncher Works:
Introduction
Search to minimize P P E (Ω) = k |Ω| − i maxRi log2 (P (Di |Ri , Ω)) + k r ∈Ri log2 P (r |Ω)
A Feature-Set Ontology
“The Cruncher” does this by recursively squeezing out feature-set overlap. E.g., if
Application to Supervised Learning
S1 = {A, B, C, D, E} S2 = {A, B, C, D, F, G} S3 = {A, B, C, not D, F, G}
DL = 17
Parameterized Concepts
Then, new set N1 = S1 ∩ S2 = {A, B, C, D}. Then
Finding Useful Mappings
Then, new set N2 = S2 ∩ S3 = {N1, F, G}. Then
S1 = {N1, E}, S2 = {N1, F, G}, S3 = {N1, not D, not D, F, G}
DL = 14 S1 = {N1, E}, S2 = {N2}, S3 = {N2, not D, not D}
Apples to Angles
DL = 13
Cruncher Algorithm:
Conclusion References
// Returns an ontology that compactly expresses S Cruncher(S) (where S is a set of attribute-value sets) let B be a set of ConceptNodes such that S foreach attribute-value A in S there is a corresponding ConceptNode c in B such that A ∈ c.hasA and c.isA = ∅. while we are still decreasing the description length of B candidates = findAllIntersections (B) // score is the potential decrease in description length compute score(B, candidate) foreach element in candidates let best be the highest scoring candidate if score(B, best) > 0 then let B = replaceBest(B, best) + best return B
Ontology Creation with The Cruncher Slide 14/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References Cruncher forms taxonomic categories naturally Penguin in multiple classes Octopus erroneously grouped with Amphibians: eggs aquatic predator haslegs Octopus yes yes yes yes Amphibians yes yes yes yes Invertebrates ? ? ? ?
hair no no ?
domestic no no ?
breathes no yes ?
toothed no yes no
backbone no yes no
Crunching Patches from Natural Images Slide 15/45
In Zoo ontology, node102 = {toothed, hair, milk, 4-legs} (i.e., “Mammal”) What do concepts for other Crunched feature-set sets look like?
Input: 50x50 Image Patches (undifferentiated) Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts
Concepts Learned
Finding Useful Mappings Apples to Angles Conclusion References
These concepts are useful for describing dataset Contiguous chunks Cruncher begins with no knowledge of which pixels are next to which
Evaluating Ontol/Cruncher Slide 16/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
How to evaluate? Can eyeball zoo dataset, but many others too complex Primary goal: help span gap between raw sensors and rich theory (difficult to gauge progress) Applications: Ontol/Cruncher developed for main goal, but works on “side applications” too: Compression Macro-actions in Reinforcement Learning Semi-supervised Learning
Test on range of well-known UCI datasets Dataset connect-4 house-votes-84 kr-vs-kp mushroom nursery SPECT tic-tac-toe zoo
Description States of Connect 4 boards Congressional voting records Chess endgame features Mushroom features Nursery schools features Features from cardiac images tic-tac-toe game states Features of animals
Prediction win, loss, tie democrat, republican white win or nowin poisonous or edible recommendation: very, not, priority, etc. normal or abnormal x win or nowin mammal, amphibian, fish, etc.
Compression Performance of Ontol/Cruncher Slide 17/45
Compression Introduction
18
A Feature-Set Ontology
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion
14 12 10 8 6 4 2 0 o zo
T
m
e
to c-
-ta
tic
y
er
EC
SP
rs
oo
4
-8 es
t vo
kp
e-
hr us
nu
m
s-v kr
-4 ct
ne
us
ho
n co
References
Ontol LZ Ontol & LZ
16
Percent of Original File
Application to Supervised Learning
(Lower is Better)
Ontol not for all text files (just feature-set descriptions) Lossless because files are sorted (otherwise add log2 (|items|!) to specify ordering) Cruncher doesn’t compress gensyms (so use LZ to do this)
Application: Creating Macro-actions in RL (Pickett & Barto (2002)) Slide 18/45
Crunch policies from many MDPs with same structure but different reward Use “crunched” subpolicies as policy building blocks or “macro-actions”
Introduction A Feature-Set Ontology NEW10
Application to Supervised Learning NEW6
Parameterized Concepts
NEW11 S08
S15 S14
S01
S19 S18
S13
Finding Useful Mappings
S07 NEW5 NEW3
S05
NEW1
NEW0
NEW4
NEW2
Apples to Angles
S12
S09
S11
NEW8
S16
S10 S06 S04
S20 S17
Conclusion
NEW9
S02 S03
References NEW7
Application: Creating Useful Macro-actions Slide 19/45
Results for Gridworld (Higher is better) Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Averaged over 100 runs. Does well on other domains too (see Thesis)
Application: Semi-supervised Learning Slide 20/45
An Adze
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Related Work: ILP Muggleton (1996) Learns from handful of positive training instances
How To Learn from a Few Positive Instances Slide 21/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Bayesian Energy Function: Ess (M) = |D| log2 N (M) + |M| log2 |U| −
1 2
P
Build ontology from unlabeled training set Search for Boolean expression M (which may use nodes in ontology), s.t. M is True for all positives M is False for negatives (if any) M minimizes Ess (M)
(Negatives unnecessary, but can be used)
Mi ∈M
log2 N (Mi )
Semi-supervised Experiment Setup Slide 22/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Experiment: given P & N (# of Positive & Negative instances to use): & unlabeled training set & labeled training set & labeled testing set 1 Build ontology from unlabeled training set 2 Average over 100 trials: 1 randomly pick class to learn from labeled training set 2 randomly pick P positive instances from class (& N negatives) 3 search for Boolean expression M to minimize Ess (M) 4 use M to classify testing set
Results from Semi-supervised Learning Slide 23/45
Results for P = 5 and N = 0 (Higher is better, Red = 95% conf.) 100 Baseline Ontol ILP
Introduction
90
A Feature-Set Ontology
Parameterized Concepts Finding Useful Mappings
70
60
50
40 o
zo
-ta tic
m
te
e
to c-
y
T
EC SP
oo
er rs
nu
kp s-
hr us
m
-v kr
vo eus
t-4 ec
Conclusion
ho
nn
co
Apples to Angles
80 Accuracy
Application to Supervised Learning
4
8 s-
References Ontol significantly outperforms ILP on zoo and connect-4 Ontol is never significantly worse than ILP Underperform Baseline (“everything is most common class”)? How? Overspecialization: E.g., “Mammal”: gorilla, monkey, chimpanzee, orangutan, baboon
Results from Semi-supervised Learning: Increasing Training Size Slide 24/45 100 Ontol ILP
Introduction 95
A Feature-Set Ontology
Parameterized Concepts
90 Accuracy
Application to Supervised Learning
85
Finding Useful Mappings Apples to Angles Conclusion
75 + 20
+ 10
20
20
20
10
10
10
5-
5-
0-
5-
0-
0-
Size Labeled Training Set (Positives, Negatives)
5+
+ 20
+ 10
5+
+ 20
+ 10
5+
+ 20
+ 10
5+
References
80
Back to Ontology Building Slide 25/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Crunching gives ANDs What about ORs?
Building ORs: Merging Slide 26/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Merging finds “OR”s or Equivalence Classes Interchangeable concepts form OR Find ORs by Crunching context
Building ORs: Merging Slide 27/45
E.g., Artificial “grammar” for creating bags of features.
Introduction A Feature-Set Ontology Application to Supervised Learning
> > > > > > > >
::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::=
{< doubles > < M >} < A > (with probability .4) | < B > (with probability .6) < 4 >, < 5 >, < 6 >, < 7 > < 1 >, < 2 >, < 3 > < AA > < BB > < CC > < DD > < EE > < FF > < GG > 1A | 1B | 1C | 1D | 1E | 1F 2A | 2B | 2C | 2D 3A | 3B | 3C | 3D | 3E | 3F 4A | 4B | 4C 5A | 5B | 5C | 5D | 5E 6A | 6B | 6C | 6D | 6E | 6F 7A | 7B aa, | ∅ bb, | ∅ cc, | ∅ dd, | ∅ ee, | ∅ ff, | ∅ gg, | ∅
{cc, dd, gg, 4C, 5A, 6B, 7B} {aa, bb, dd, 1F, 2A, 3A} {cc, dd, ee, 4C, 5A, 6C, 7B}
“Context” as Cooccurrence Matrix
Parameterized Concepts Finding Useful Mappings
Each row really just bags of features So, Crunch context to get candidate Equivalence Classes cc
Apples to Angles Conclusion
dd
ff
gg
N00004
4C
4A
4B
aa
bb
7A
ee
N00000
N00003
5C
5D
5E
5A
5B
3E
3F
3A
2B
3B
7B
N00006
3C
3D
2C
2D
N00008
2A
1F
1A
References 6A
6B
6C
6D
6E
6F
N00007
N00011
N00002
N00010
N00005
N00001
N00009
Will integrate with crunching in future work.
1B
1C
1D
1E
Slide 28/45
Phase 2: Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Parameterized Concepts
Parameterized Concepts Slide 29/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Merging alone won’t give us translation Need isomorphism or “analogy”
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
How to represent parameterized calls? How to efficiently find behaviorally similar areas?
Mechanics of Parameterized Calls Slide 30/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Extract out overlap Parameterize differences Gate uses BIND nodes to control who calls region
Mechanics of Parameterized Calls Slide 31/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Can also represent combinatorics
Behavioral Signatures Slide 32/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
To find similar regions, create signatures Same core idea as representing “dog” shape (HMax) To “sample” from inputs of graph (using sample size = 3):
Behavioral Signatures: Results Slide 33/45
2500 A B C D
Introduction 2000
A Feature-Set Ontology Application to Supervised Learning
Finding Useful Mappings
Tally
Parameterized Concepts
1500
1000
500
Apples to Angles 0 TT TF TF T TT TFF TF T TT FFT TF T TT FFF TF T TT TFF FF T TT TFT TF T TF TFF TF F TF TFF TF T TF FFF TF F TF FFF TF T TF TFF FF F TF TFF FF T TF FFF FF F TF FFF FF T TF TFF TF T FT TFF TF F FF TFF TF T FF FFF TF F FF FFF TF T FF FFF FF F FF FFF FF
References
FF
Conclusion
kA − Bk2 kA − Dk2 kA − C k2 kB − Dk2 kB − C k2 kC − Dk2
= = = = = =
926, 354 9, 109, 628 9, 546, 816 5, 544, 956 5, 843, 940 301, 794
(9.72%) (30.49%) (31.21%) (23.79%) (24.42%) (5.55%)
Slide 34/45
Phase 3: Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Finding Useful Mappings
The Problem of finding Mappings Slide 35/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
The Problem of finding Mappings II Slide 36/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
We assume Phase 2 will give us something like this.
What Makes a Useful Mapping? MDL! Slide 37/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Mapping is set of ordered pairs of features. E.g.: Mapping37 (Rotate15◦ ) 000.05 → 015.05 000.10 → 015.10 000.15 → 015.15 000.25 → 015.25 005.10 → 020.10 005.15 → 020.15 .. . 090.05 090.10 090.15
→ → →
105.05 105.10 105.15
Dog on right = Mapping37(Dog on left) + any residual
Finding Useful Mappings: Algorithm Slide 38/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Find mapping to minimize description length Search is 2-pass, like EM: Which features map to which? Which instances map to which?
Once mapping found, use to reduce DL, then repeat like Cruncher!
Results: Finding Rotations Slide 39/45
Compression Savings Using Discovered Mappings (Higher is better.) Introduction 80000
A Feature-Set Ontology
Discovered Hand Picked 70000
Application to Supervised Learning
Finding Useful Mappings Apples to Angles
60000 Compression Savings
Parameterized Concepts
50000
40000
30000
Conclusion
20000
References
10000
0 5
0
16
5
15
0
13
5
12
10
90
75
60
45
30
15
Mapping Rotation Angle
Slide 40/45
Phase 4: Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Comparing Apples to Angles
Building & Using Graphs for how Mappings Behave Slide 41/45 1
have20apples
A
add5apples
Introduction 2
A Feature-Set Ontology
B
Application to Supervised Learning
have25apples
A
Parameterized Concepts
add5apples
B
3
Finding Useful Mappings
add10apples
A
Apples to Angles
have30apples
add5apples
4
B
Conclusion
add10apples
have35apples
A
add10apples
add5apples
References B
5
add10apples
A
add5apples
6
Angles
have40apples
have45apples
Apples
Building & Using Graphs for how Mappings Behave Slide 42/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Substantiation of Claims Slide 43/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
1 Design provides plausible story for getting rich theory from raw sensor data
Phases 2 & 3 provide proof-of-concept for how core ideas can bridge gap Phase 2: Representing and Finding Behavioral Isomorphisms Phase 3: Finding and Using Generalized Mappings (e.g., Rotation) 2 Phase 1 creates useful structure from feature-sets
Does better compression than Lempel-Ziv alone on feature sets Finds useful macro-actions for Reinforcement Learning Learns concepts from a handful of positive training instances
Future Work: Fill In Bridge Slide 44/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
The Speed Prior The MacGlashan Transform: Representing Relational Structures as Feature-Sets Meta-Cognition: Feeding the Dragon its Tail Future work for Phase 1 Combined Chunking and Merging Splitting Incremental Learning Wide Signatures and Low Resolution Constraint Satisfaction Search
Future work for Phase 2 Segmentation Munching behavioral signatures
Future work for Phase 3 Finding Primitive Mappings and Minimal Mapping Set Future Application: Using Mappings for Speaker Classification and Identification
References Slide 45/45
Introduction A Feature-Set Ontology Application to Supervised Learning
Parameterized Concepts Finding Useful Mappings Apples to Angles Conclusion References
Hawkins, J., and Blakeslee, S. 2004. On Intelligence. Times Books. Muggleton, S. 1996. Learning from positive data. In Muggleton, S., ed., Inductive Logic Programming Workshop, volume 1314 of Lecture Notes in Computer Science, 358–376. Springer. Pickett, M., and Barto, A. 2002. PolicyBlocks: An Algorithm For Creating Useful Macro-Actions in Reinforcement Learning. In Proceedings of the International Conference on Machine Learning. Pierce, D., and Kuipers, B. 1997. Map learning with Uninterpreted Sensors and Effectors. Artificial Intelligence 92:169–229. Riesenhuber, M., and Poggio, T. 1999. Hierarchical Models of Object Recognition in Cortex. Nature neuroscience 2(11):1019–1025. Wolff, J. G. 2003. Information Compression by Multiple Alignment, Unification and Search as a Unifying Principle in Computing and Cognition. Artif. Intell. Rev. 19(3):193–230.