lecture 03 recap basic terms and definitions the BIG picture: machine learning for pattern recognition building an automatic pattern recognition system summary ...
Pattern Recognition Prof. Christian Bauckhage
outline lecture 03 recap basic terms and definitions the BIG picture: machine learning for pattern recognition building an automatic pattern recognition system summary
recap what is pattern recognition ?
regression
classification
clustering
recap word counts from definitions in the literature
category, decision, class, name measurement, signal, object, observation, data
|6 ||| | |6 ||| |
classification, assignment
|||
mathematics, algorithm, function
|||
automatic
||
representation
|
basic terms and definitions
basic axioms
we live in a very large but finitely sized universe U what we can perceive of U is our environment E inside of E, there are a lot of objects or events o to perceive any o requires to record physically measurable quantities q(o) such as sizes, densities, energies, . . . acoustic properties, . . . visual properties, . . .
observe
no sensory system (biological or technical) can record every aspect of E or every property of o for example the human eye only reacts to a tiny interval of the whole em-spectrum the human ear only reacts to a small interval of acoustic frequencies the human tongue only perceives 5 (or so) basic tastes
basic axioms
a problem domain Q consists of quantifiable objects from a specific application domain; the q ∈ Q are called patterns pattern recognition deals with mathematical and technical aspects of processing and analyzing patterns the usual goal is to map map patterns to symbols or data structures in order to classify them, for instance a class Ωi a tuple of classes Ωi1 , Ωi2 , . . . q→ a class, a location, and an orientation Ω , t, R i a symbolic description D
example
what animal is this (Ωi )? vs. what is going on here (D)?
basic axioms
classes or categories result from decomposing the problem domain Q into k or k + 1 subsets such that Ωi 6= ∅
∀i
Ωi ∩ Ωj = ∅
∀ i 6= j
where either k [
Ωi = Q
i=1
or k [ i=0
Ωi = Q
postulates
patterns within a class Ωi are similar patterns from different classes Ωi , Ωj are dissimilar Ω0 is the rejection class used for ambiguous patterns
observe
classification applies to simple- and complex patterns
example
classifying a simple pattern ⇔ assign a pattern to a class Ωi
example
classifying a complex pattern ⇔ assign (parts of) a pattern to classes Ωi1 , Ωi2 , . . .
postulate
patterns have features that are characteristic of the class(es) they belong to for instance, pictures of cheetahs show spots
postulate
there is a function f that extracts features from patterns x1 .. f q =x= . xm for very simple patterns, it may suffice to consider f = id
postulate
there is a function f that extracts features from patterns x1 .. f q =x= . xm for very simple patterns, it may suffice to consider f = id features of patterns of a class form a more or less compact region in the feature space features of patterns from different classes reside in more or less well separated regions of the feature space
example increasingly less compact and less well separated regions
x2
Ω1 Ω2 x1
example increasingly less compact and less well separated regions
x2
x2
Ω1
Ω1 Ω2
Ω2 x1
x1
example increasingly less compact and less well separated regions
x2
x2
Ω1
x2
Ω1 Ω2
Ω1 Ω2
Ω2 x1
x1
x1
example increasingly less compact and less well separated regions
x2
x2
Ω1
Ω1 Ω2
x2
Ω2
Ω1 Ω2
Ω2 x1
Ω1
x2
Ω1 x1
x1
x1
example increasingly less compact and less well separated regions
x2
x2
x2
Ω1
Ω1 Ω2
Ω1 Ω2
Ω2 x1
x2
x1
x2
Ω1
Ω1 Ω2
Ω1
Ω2 x1
x1
x1
example increasingly less compact and less well separated regions
x2
x2
x2
Ω1
Ω1 Ω2
Ω1 Ω2
Ω2 x1
x2
x1
x2
Ω1
x1
x2
Ω1 Ω2
Ω1
Ω2 x1
x1
Ω1 Ω2
x1
classifier
a classifier is a function y : Rm → Ω 0 , Ω 1 , . . . , Ω k that maps features x = f q to classes Ωi
question how to obtain a classifier ?
question how to obtain a classifier ?
answer let’s see . . .
the BIG picture
mathematics data science
pattern recognition
data mining machine learning
computer science
observe
in this course, we understand the problem of obtaining classifiers or predictors as a machine learning problem
machine learning
machine learning is the science of fitting models to data
machine learning
machine learning is the science of fitting models to data
⇔ given a sample of problem specific data 1) decide for a ”suitable” model class ⇔ specify a class of mathematical functions which you expect to be able to solve the problem at hand 2) determine / learn “appropriate” model parameters ⇔ use optimization algorithms to fit a mathematical function to the data
example regression
15.0
data
xi , yi
n
12.5 10.0
i=1
7.5 5.0
where xi , yi ∈ R
2.5 0.0 −2.5 −2.5
0.0
2.5
5.0
7.5
10.0
12.5
15.0
example regression
15.0
data
xi , yi
n
12.5 10.0
i=1
7.5 5.0
where xi , yi ∈ R
2.5 0.0 −2.5
0.0
2.5
5.0
7.5
10.0
12.5
15.0
0.0
2.5
5.0
7.5
10.0
12.5
15.0
−2.5
possible model 15.0
y(x) = w0 + w1 x
12.5 10.0
w0 ≡ offset
7.5 5.0
w1 ≡ slope
2.5 0.0 −2.5 −2.5
example classification
data
xi , yi
2
n
1
i=1 0
where xi ∈ R2 , yi ∈ {−1, +1}
−2
−1
0
−1
−2
1
2
example classification
data
xi , yi
2
n
1
i=1 0
where xi ∈ R2 , yi ∈ {−1, +1} possible model +1, if f (x) > 0 y(x) = −1, otherwise
−2
−1
0
−1
−2
1
2
example classification
data
xi , yi
2
n
1
i=1 0
where xi ∈ R2 , yi ∈ {−1, +1}
−2
−1
1
2
0
1
2
−2
possible model +1, if f (x) > 0 y(x) = −1, otherwise e.g. using quadratic discriminant T f (x) = x − µ1 C−1 x − µ1 1 T − x − µ2 C−1 x − µ2 2
0
−1
2
1
0 −2
−1 −1
−2
example density estimation 0.20
data 0.15
n xi i=1
0.10
0.05
where xi ∈ R
0.00 −10
−5
0
−0.05
5
10
15
example density estimation 0.20
data 0.15
n xi i=1
0.10
0.05
where xi ∈ R
0.00 −10
−5
0
5
10
15
0
5
10
15
−0.05
possible model p(x) =
2 X
0.20
0.15
wj N x | µj , σ2j
0.10
j=1
0.05
0.00 −10
−5
−0.05
example hard clustering
data
2
n xi i=1 where xi ∈ R2
1
0 −1
0
−1
−2
1
2
example hard clustering
data
2
n xi i=1
1
0
where xi ∈ R2
−1
0
1
2
0
1
2
−1
−2
possible model 2
2 C(x) = argmin x − µj j
1
0 −1
−1
−2
example sequences of symbols
data BBCBCAABCCCABCA BCACCAACABBCCAB CABCCCCCCABABBC BAABBBCACCCABCC CCCBABBABCACCAC CABCACABCCCABBB CACCCCCABCBBCCA BBBCCCBCCABCABB CBCABCABCABBABA BABAACABCBAABCA...
example sequences of symbols
data BBCBCAABCCCABCA BCACCAACABBCCAB CABCCCCCCABABBC BAABBBCACCCABCC CCCBABBABCACCAC CABCACABCCCABBB CACCCCCABCBBCCA BBBCCCBCCABCABB CBCABCABCABBABA BABAACABCBAABCA...
possible model Markov chain 0.2
0.3
0.2
A
B
0.6
0.3
0.5
0.2
0.1
C 0.6
applications
once a machine learning algorithm has fitted a model that generalizes well, it can be applied in practice ⇔ depending on the task at hand, the learned model can do inference reasoning predictions decision making .. .
based on novel, previously unseen data
building an automatic pattern recognition system
observe
in this course, we treat the problem of training classifiers or predictors as a supervised learning problem ⇔ we assume we are given a (large) representative, labeled sample of tuples
q1 , y1 , q2 , y2 , . . . , qn , yn containing information about the problem domain at hand the qj denote patterns, the yj are (hopefully correct) labels
example qj
:
picture of an animal
yj
:
name of its species or index i of its class Ωi
note
the process of building / implementing and using a pattern recognition system involves the following 3 21 phases 1) training phase validation phase (optionally) 2) test phase 3) application phase
training
collect a representative training set Strain of patterns and (manually) label them if necessary or appropriate, determine suitable features xj = f qj otherwise let xj = qj decide for a model class Y and train a classifier y ∈ Y, i.e. determine the parameters of y such that it maps the given training data to its labels y(xj ) ≈ yj
validating (optional)
collect a representative validation set Sval of patterns and (manually) label them the validation set must be independent of the training set Sval ∩ Strain = ∅
evaluate the performance of y(x) on the labeled set Seval if the overall performance is not “good enough”, readjust the parameters of y(x)
testing
collect a representative test set Stest of patterns and (manually) label them this test set must be independent of the training set Stest ∩ Strain = ∅
testing
collect a representative test set Stest of patterns and (manually) label them this test set must be independent of the training set Stest ∩ Strain = ∅ objectively evaluate y(x) on Stest by measuring, say accuracy precision recall .. .
application
if y(x) meets desired, problem specific quality requirements, apply it in practice or ship it to your customer
note
a (very) good performance of a trained model y on the training data means nothing what really counts in pattern recognition and machine learning are the generalization capabilities of y ⇔ what really counts in pattern recognition and machine learning is how y performs on independent test data in a few weeks, we will study details as to why this is
take home message
you must never ever in your life make the rookie mistake of improperly testing a machine learning or pattern recognition system any statement you ever make about the quality or performance of your system must be derived from test data which is independent of the training data
summary
we now know about
problem domains and patterns classes, features, and classifiers the fact that machine learning is nothing but model fitting general aspects of building a pattern recognition system