COMPARISON OF THE EFFECTIVENESS OF 1D

Image Processing & Communication, vol. 19, no. 1, pp.5-12 DOI: 10.1515/ipc-2015-0001

5

COMPARISON OF THE EFFECTIVENESS OF 1D AND 2D HMM IN THE PATTERN RECOGNITION

JANUSZ B OBULSKI Institute of Computer and Information Science, Faculty of Mechanical Engineering and Computer Science, Czestochowa University of Technology, 73 Dabrowskiego Str., 42-200 Czestochowa, Poland, [email protected]

Abstract. Hidden Markov Model (HMM) is

supervised and unsupervised learning differ only in the

a well established technique for image recog-

causal structure of the model. In supervised learning, the

nition and has also been successfully applied

model defines the effect one set of observations, called in-

in other domains such as speech recognition,

puts, has on another set of observations, called outputs. In

signature verification and gesture recognition.

unsupervised learning, all the observations are assumed to

HMM is widely used mechanism for pattern

be caused by latent variables, that is, the observations are

recognition based on 1D data. For images one

assumed to be at the end of the causal chain. In practice,

dimension is not satisfactory, because the con-

models for supervised learning often leave the probability

version of one-dimensional data into a two-

for inputs undefined. This model is not needed as long

dimensional lose some information. This paper

as the inputs are available, but if some of the input values

presents a solution to the problem of 2D data

are missing, it is not possible to infer anything about the

by developing the 2D HMM structure and the

outputs. If the inputs are also modelled, then missing in-

necessary algorithms.

puts cause no problem since they can be considered latent variables as in unsupervised learning. In unsupervised

1

Introduction

learning, the learning can proceed hierarchically from the observations into ever more abstract levels of representa-

Hidden Markov models are omnipresent mechanism for tion. Each additional hierarchy needs to learn only one data classification. They are establish technique in pattern step and therefore the learning time increases linearly in recognition, speech recognition, character recognition, bi- the number of levels in the model hierarchy [6]. In spite ological sequence analysis, texture analysis, face recogni- of recalled virtues 1D HMM is unpractical in image protion, financial data processing, etc [1, 2, 3, 4, 5]. The wide cessing, because the images are two-dimensional. When HMM application is a result of their effectiveness and ad- we convert an image from 2D to 1D, we lose some invantages resulting from unsupervised learning. In theory, formation. So, if we process two-dimensional data, we Unauthenticated Download Date | 11/7/15 1:20 PM

6

J. Bobulski

should apply two-dimensional HMM, and this 2D HMM

2

1D HMM

should works with 2D data. One of solutions is pseudo 2D HMM [7, 8]. This model is extension of classic 1D HMM. Hidden Markov Model allows us to specify the probabilThere are super-states, which mask one-dimensional hid- ity of unobserved directly sequence of events. A HMM den Markov models. Linear model is the topology of su- consists of two stochastic processes. The first stochastic perstates, where only self transition and transition to the process is a Markov chain that is characterized by states following superstate are possible. Inside the superstates and transition probabilities. The states of the chain are there are linear 1D HMM. The state sequences in the rows externally not visible, therefore "hidden". The second are independent of the state sequences of neighbouring stochastic process produces emissions observable at each rows. Additional, input data are divided to the vector. So, moment, depending on a state-dependent probability diswe have 1D model with 1D data in practise. Interesting tribution (Fig. 1). We can describe Hidden Markov Model with following results showed in paper [9]. This article presents analytic solution and proof of correctness two-dimensional HMM. parameters: But this 2D HMM is similar to MRF [10, 11], which

• N is the number of states.

works with one-dimensional data and can be apply only for left-right type of HMM. An extension of the HMM to

• Q = q1 , . . . , qN is the set of states. Note that the

work on two-dimensional data is 2D HMM. A 2D HMM

Hidden Markov Model keeps no history, so the only

can be regarded as a combination of one state matrix and

thing which it can remember is what state it is in now.

one observation matrix, where transition between states

The states of a Hidden Markov Model are hidden; we

take place according to a 2D Markovian probability and

never observe them directly.

each observation is generated independently by the corresponding state at the same matrix position. It was noted that the complexity of estimating the parameters of a 2D HMMs or using them to perform maximum a posteriori classification is exponential in the size of data. Similar

• The number of symbols M • O = O1 , . . . , OM is the set of symbols that may be emitted.

to 1D HMM, the most important thing for 2D HMMs is

• π ∈ [0, 1] = π1 , . . . , πN is the initial probability

also to solve the three basic problems, namely, probability evolution, optimal state matrix and parameters estima-

distribution on the states. It gives the probability of PN starting in each state. We expect that i=1 πi = 1.

tion. This article presents real solution for 2D problem in

We should think of π as a column vector.

HMM. There is showing true 2D HMM which processes 2D data. Moreover the presented algorithms are regarding

• A = (aij )1 ≤ i, j ≤ N is the transition probability

ergodic models, rather than of type "left-right" [12]. This

matrix. If the "machine" is in state j, it may be in

paper presents an automatic pattern recognition system

state i on the next clock tick with probability aij .

which uses two dimensional wavelet transform of second

We expect that aij ∈ [0, 1] for each i and j, and that P i aij = 1, 1 ≤ i, j ≤ N for each j.

level decomposition for features extraction, and the classification module bases on two dimensional hidden Markov models, which work with two dimensional data.

• B = (bij )1 ≤ i ≤ M, 1 ≤ j ≤ N is the emission probability matrix, if the "machine" is in state j, it may emit symbol i on with probability bij .

Unauthenticated Download Date | 11/7/15 1:20 PM

7

Image Processing & Communication, vol. 19, no. 1, pp. 5-12

3

2D HMM

Presented solutions are for Markov model type "leftright", and not ergodic. So, we present solution to problems 1 and 3 for two dimensional data, which is sufficient to build a image recognition system. The statistical parameters of the 2D model (Fig. 2 and 3): • The number of states of the model N 2 Fig. 1: One-dimensional HMM

• Q = q1 , . . . , qN 2 is the set of states. • The number of data streams k1 x k2 = K

There are two fundamental problems which must be

• The number of symbols M

solved in order to build the pattern recognition system: • Oservation sequance O = {ot }, 1 ≤ t ≤ M, ot is 1. Given observation O = (o1 , o2 , . . . , oT ) and model

square matrix simply observation with size k1 x k2 =

λ = (A, B, π), efficiently compute P (O|λ) 2. Given observation O = (o1 , o2 , . . . , oT ), estimate

K • The transition probabilities of the underlying

model parameters λ = (A, B, π) that maximize

Markov chain, A = {aijl }, 1 ≤ i, j ≤ N, 1 ≤ l ≤

P (O|λ)

N 2 , where aij is the probability of transition from state ij to state l

To solve the problem 1 we can use well know forwardbackward algorithm, and to solve the problem 2 - Baum-

• The observation probabilities, B = {bijm }, 1 ≤ i, j ≤ N, 1 ≤ m ≤ M which represents the proba-

Welch algorithm [1, 12]. HMM requires three proba-

bility of gnerate the mth symbol in the ijth state.

bility measures to be defined, A, B, π and the notation λ = (A, B, π) is often used to indicate the set of pa-

• The initial probability, Π = {πijk }, 1 ≤ i, j ≤ N, 1 ≤ k ≤ K.

rameters of the model. The parameters of the model are generated at random at the beginning. Then they are estimated with Baum-Welch algorithm, which is based on the forward-backward algorithm. The forward algorithm calculates the coefficient αt(i) (probability of observing the partial sequence (o1 , . . . , ot ) such that state qt is i). The

3.1

Solution problem 1 for 2D

Forward Algorithm • Define forward variable αt (i, j, k) as:

backward algorithm calculates the coefficient βt(i) (probability of observing the partial sequence (ot+1 , . . . , oT ) such that state qt is i). The parameters of new model λ, based on λ0 and observation O, are estimated from equation of Baum-Welch algorithm [1], and then are recorded to the database.

αt (i, j, k) = P (o1 , o2 , . . . , ot , qt = ij|λ)

(1)

• αt (i, j, k) is the probability of observing the partial sequence (o1 , o2 , . . . , ot ) such that the the state qt is i, j for each kth strem of data • Induction


8

J. Bobulski

1. Initialization: α1 (i, j, k) = πijk bij (o1 )

(2)

2. Induction:

αt+1 (i, j, k) =

X N

αt (i, j, k)aijl bij (ot+1 )

l=1

(3) 3. Termination: P (O|λ) =

M X K X

αM (i, j, k)

(4)

t=1 k=1

Fig. 2: Two-dimensional ergodic HMM

3.2

Solution problem 2 for 2D

Parameters reestimation algorithm: • Define ξ(i, j, l) as the probability of being in state ij at time t and in state l at time t + 1 for each kth strem of data

ξt (i, j, l) =

αt (i, j, k)aijl bij (ot+1 )βt+1 (i, j, k) = P (O|λ) (5)

PK

k=1

αt (i,j,k)aij bij (ot+1 )βt+1 (i,j,k) PN 2 l=1 αt (i,j,k)aijl bij (ot+1 )βt+1 (i,j,k)

• Define γ(i, j) as the probability of being in state i, j at time t, given observation sequence. 2

γt (i, j) =

N X

ξt (i, j, l)

(6)

l=1

-

PT

t=1

γt (i, j) is the expected number of times state

ij is visited PT −1 - t=1 ξt (i, j, l) is the expected number of transition from state ij to l Fig. 3: Two-dimensional HMM in in subsequent time Update rules: - πijk ¯ = expected frequency in state i, j at time (t = 1) steps (s1 − s4 - observations) = γ1 (i, j) Unauthenticated Download Date | 11/7/15 1:20 PM


9

-a ¯ij = (expected number of transition from state i, j to state l) / (expected number of transitions from state i, j): P ξt (i, j, l) a ¯ijl = Pt (7) t γt (i, j) - b¯ij (k) = (expected number of times in state j and oserving symbol k)/(expected number of times in state j): P ¯bij (k) =

4 4.1

t,ot =k γt (i, j) P t γt (i, j)

(8)

Experiments Experiment 1

There was used Amsterdam Library of Object Images the image database in this experiment. There are over one thousend colour images of small objects in this collection (Fig. 4). They systematically varied viewing angle, illumination angle, and illumination colour for each object, in order to capture the sensory variation in object recordings. They recorded over a hundred images of each object, yielding a total of 110,250 images for the collection [15, 16]. There has been selected fifty objects in testing the method. Three images for learning and three for testing has been chosen for each object. The 2D HMM has been implemented with parameters N = 5, N 2 = 25, K = 25, M = 50. Wavelet transform has been chosen as features extraction technique, and as wvelet function db10. Function selection has been made experimentally. Tab. 1 presents results of experiments.

Tab. 1: Comparison of recognition rate - pattern Method Recognition rate [%] Eigenvector 94 1D HMM 90 2D HMM 92

Fig. 4: Random representatives of the patterns in the ALOI dataset [17]


10

J. Bobulski

4.2

Experiment 2

iment.

The road signs image database German Traffic Sign Benchmark [17] was used in the experiment. There is over 1700 images of road signs in this database. The images show the signs in variable condition, lighting, rotation and size. Fig. 5 shows the example of traffic sign [18].

5

Conclusion

New conception about two-dimensional hidden Markov models is presented. Presented method allows for faster image processing and recognition because they do not have to change the two-dimensional input data in the image form into a onedimensional data. We show solutions of principle problems for ergodic 2D HMM, which may be applied for 2D data. Recognition rate of the method is 92% and 83%, which is better than 1D HMM. The advantage of this approach is that there is no need to convert the input two-dimensional image on a one-dimensional data, so we do not lose the information. The obtained results are satisfactory in comparison to other method and proposed method may be the alternative solution to the others in unsupervised learning systems.

References [1] Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected application in speech recogniFig. 5: Random representatives of the traffic sign in the GTSRB dataset [17]

tion. Proceedings of the IEEE, 77(2), 257-285 [2] Samaria, F., Young, S. (1994). HMM-based Architecture for Face Identification. Image and Vision

Tab. 2: Comparison of recognition rate -road signs Method Recognition rate [%] ESOM [19] 84 HMM [20] 49 1D HMM[our] 81 2D HMM[our] 83 We chose the 50 objects in order to verify the method, and for each object chose three images for learning and five for testing. The 2D HMM implemented with parameters N = 4, N 2 = 16, K = 16, M = 25. Wavelet

Computing, 12(8), 537-583 [3] Hamilton, J. D. (1990). Analysis of time series subject to changes in regime. Journal of econometrics, 45(1), 39-70 [4] Baldi, P., Chau, Y., Hunkapiller, T., McClure, M. A. (1994). Hidden Markov models of biological primary sequence information. Proceedings National Academy of Science, 91, 1059-1063

transform was chosen as features extraction technigue,

[5] Bobulski, J., Kubanek, M. (2012). Person identifi-

and db10 as wavelet function. Function selection has been

cation system using sketch of the suspect. Optica

made experimentally. Tab. 2 presents the results of exper-

Applicata, 4(42), 865-873


11


[6] Valpola, H. (2000). Bayesian Ensemble Learning for Nonlinear Factor Analysis.: Finnish Academies of Technology

International Journal of Computer Vision, 61(1), 103-112 [16] —, (2013). Amsterdam Library of Object Images.

[7] Eickeler, S., Mller, S., Rigoll, G. (1999). High Performance Face Recognition Using Pseudo 2-D Hidden Markov Models. Paper presented at European Control Conference

Paper presented at http://aloi.science.uva.nl/ [17] —, (2014). Database German Traffic Sign Benchmark. Paper presented at http://benchmark.ini.rub. de/Dataset/GTSRB-Final-Training-Images.zip

[8] V Vitoantonio Bevilacqua,L. Cariello, G. Carro, D. Daleno, G. Mastronardi, (2008). A face recognition system based on Pseudo 2D HMM applied to neural network coefficients. Soft Computing, 12(7), 615621

[18] Stallkamp, J., Schlipsing, M., Salmen, J., Igel, C. (2011). The German Traffic Sign Recognition Benchmark: A multi-class classification competition. Paper presented at IEEE International Joint Conference on Neural Networks

[9] Yujian, L. (2007). An analytic solution for estimating two-dimensional hidden Markov models. Applied Mathematics and Computation, 185(2), 810822

[19] Nguwi, Y. Y., Cho, S. Y. (2010). Emergent selforganizing feature map for recognizing road sign images. Neural Computing and Application, 19(4), 601-615

[10] Li, J., Najmi, A., Gray, R. M. (2000). Image classification by a two-dimensional Hidden Markov model. IEEE Transactions on Signal Processing, 48(2), 517-533

[20] Hsien, J. C., Liou, Y. S., Chen, S. Y. (2006). Road Sign Detection and Recognition Using Hidden Markov Model. Asian Journal of Health and Information Sciences, 1(1), 85-100

[11] Joshi, D., Li, J., Wang, J. Z. (2006). A computationally Eficient Approach to the estimation of twoand three-dimensional hidden Markov models. Image Processing, IEEE Transactions on, 15(7), 18711886 [12] Bobulski, J., Adrjanowicz, L. (2013) Part I. In Artificial Intelligence and Soft Computing (pp. 515523). : Springer [13] Kanungo, T. (2014). Hidden Markov Model Tutorial. Retrieved from http://www.kanungo.com [14] Forney, G. D. (1973). The Viterbi Algorithm. Procedings of the IEEE, 61(3), 268-278 [15] Geusebroek, J. M., Burghouts, G. J., Smeulders, A. W. M. (2005). Amsterdam library of object images. Unauthenticated Download Date | 11/7/15 1:20 PM