Modeling Multidimensional Incomplete Sequences ...

0 downloads 0 Views 654KB Size Report
Tatyana A. Gultyaeva. Novosibirsk State Technical University, Novosibirsk, Russia ... Hidden state of HMM is denoted as si,i = 1,N where N is the total number of.
Applied Methods of Statistical Analysis

Modeling Multidimensional Incomplete Sequences using Hidden Markov Models Vadim E. Uvarov, Alexander A. Popov and Tatyana A. Gultyaeva

Novosibirsk State Technical University, Novosibirsk, Russia e-mail:

[email protected], [email protected], [email protected]

Abstract This paper addresses the problem of multidimensional incomplete sequence modeling using hidden Markov models (HMMs). We propose modied BaumWelch algorithm that can be used for training HMM on sequences that contain missing observations and modied forward-backward algorithm for classication of incomplete sequences. Additionally, we propose a modied Viterbi algorithm whcih can be used to decode and impute incomplete sequences using HMM. It was shown that both algorithms outperform other methods of missing observation handling, namely: elimination of missing data and imputation of missing observations using the mean of the neighboring observations. Keywords: hidden Markov models, machine learning, sequences, BaumWelch algorithm, missing observations, incomplete data, Viterbi algorithm, classication, decoding, imputation.

Introduction Hidden Markov model (HMM) concept is a popular and powerful tool for sequence modeling. It was presented in late 1960-s and initially applied to speech recognition problems [1]. Despite the fact that this concept is relatively well studied, its usage in missing data scenarios is not properly investigated yet. Traditional algorithms that are used with HMMs cannot process sequences that contain missing observations i.e. incomplete sequences. Some attempts to solve this problem were already made by several authors. For example, in [2] authors used marginalization and imputation approaches to classify incomplete noisy speech samples using HMMs. Such an approach proved to be more eective than the standard ltering methods.

However in this work HMMs were

trained on clean samples and sequence decoding problem was not addressed. In [3] missing observations in a sequence of movements were used to represent a situation when humanoid body part movement was occluded by some obstacle. The authors proposed a decoding algorithm that was used to infer the actual movements of a humanoid model from a sequence of data extracted from video-feed. However, once again HMMs were trained on a clean data. In this work we address the problem of HMM training on incomplete sequences, decoding of incomplete sequences using HMM and classication of incomplete sequences using HMM. This paper continues the studies of HMM concept which are

343

Krasnoyarsk, 18-22 September, 2017

carried out at the chair of theoretical and applied informatics at Novosibirsk State Technical University [4, 5].

1

Hidden Markov Model

The thorough description of hidden Markov model concept is out of scope of this paper. A good overview can be found in [1]. However we will introduce some notation to refer to elements of HMM and we will mention the basic algorithms.

Z -dimensional observations as O = {o1 , ..., oT }, where T ot is the observation at time t. Hidden state of HMM is denoted as si , i = 1, N where N is the total number of hidden states in a model and hidden state of model at time t is denoted as qt , t = 1, T .  We denote an HMM as λ = {Π, A, B}, where Π = πi = p (q1 = si ) , i = 1, N  initial state distribution, A = aij = p (qt+1 = sj |qt = si ) , i, j = 1, N - transition  Z - conditional probabilities matrix and B = bi (o) = f (o|q = si ) , i = 1, N , o ∈ R We denote a sequence of

is the total number of observations in the sequence and

probability density functions of multidimensional observations.

In this paper we

use mixtures of Gaussian distributions to model the observation densities, hence M P bi (o) = τim g(o; µim , Σim ), i = 1, N , o ∈ RZ , where M is the number of mixtures, m=1 τim is the weight of m-th mixture in i-th hidden state, µim is Z -dimensional mean

m-th mixture in i-th hidden state, Σim is Z by Z covariance matrix of m-th Z mixture in i-th hidden state and g(o; µim , Σim ), o ∈ R is Gaussian density function, T −1 1 e−0.5(o−µim ) Σim (o−µim ) , o ∈ RZ . i.e. g(o; µim , Σim ) = √ Z vector of

(2π) |Σim |

To perform a sequence classication given a sequence

O = {o1 , ..., oT }

and a set

λ1 , ..., λD one would usually use a maximum likelihood criterion. A sequence O is referred to a model for which likelihood function is maximum, i.e. ∗ to a model λ = arg max (p (O|λ)). An ecient forward-backward algorithm is

of HMMs

λ∈λ1 ,...λD

usually used to calculate a likelihood function

p (O|λ)

[1].

A forward variable is

αt (i) = p({o1 , o2 , ..., ot } , qt = si |λ) and backward variable is dened as βt (i) = p({ot+1 , ot+2 , ..., oT } |qt = si , λ), t = 1, T , i = 1, N . By decoding a sequence of observations O = {o1 , ..., oT } using a hidden Markov model λ one usually means nding the most probable sequence of hidden states {b q1 , ..., qbT } which correspond to the observations. Viterbi algorithm is usually used for this purpose [1]. This algorithm relies on calculation of probabilities δt (j) = p (qt = sj |ot , λ) , j = 1, N , t = 1, T .

dened as

Training of hidden Markov models is usually performed with Baum-Welch algorithm, which is essentially a modication of EM-algorithm [1].

This algorithm

iteratively maximizes the likelihood function and requires an initial approximation. Since it converges to a local maximum, it is usually run with a several dierent initial approximations. We denote additional probabilities used in Baum-Welch al-

γt (i) = p(qt = si |O, λ), ξt (i, j) = p(qt = si , q t+1 = sj |O, λ) γt (i, m) = p(qt = i, ωit = m|O, λ), t = 1, T − 1, i, j = 1, N , m = 1, M .

gorithm here:

344

and

Applied Methods of Statistical Analysis

2

Incomplete Sequence Analysis

We dene an incomplete sequence as a sequence where the value of some observations is undened. We call such observations 'missing' and denote them with a sign

∅.

It is

clear that none of the standard algorithms cannot handle missing values, since calculation of probabilities of

bi (ot )

αt (i), βt (i), δt (j), γt (i), ξt (i, j)

and

γt (i, m)

requires calculation

which in its turn requires the knowledge of the actual value of

ot .

A standard way of dealing with missing observations is to delete them from sequence completely and then glue the remaining parts together.

We will call such

method 'gluing'. The other standard way to eliminate the gaps is to ll them with the mean of the neighbor observations. In the next to subsections we propose a more accurate and complex algorithms for dealing with incomplete sequences. The main idea of these algorithms is to suppose that

bi (ot ) = 1

when

ot = ∅

since any observation could have been in place of

the probability of seeing any observation at time

2.1

t

ot

and

is one.

Modied Viterbi Algorithm

The proposed modied Viterbi algorithm extends original Viterbi algorithm to the case of missing observations. HMM

λ

O = {o1 , ..., oT }

Given an observation sequence

and

the steps of modied  Viterbi algorithm are as follows:

πi , o1 = ∅ δ1 (i) = , i = 1, N , ψ1 (i) = 0; πi bi (o1 ) , otherwise   max [δt−1 (i) aij ] , ot = ∅ 1≤i≤N , δt (j) =  max [δt−1 (i) aij ] bj (ot ) , otherwise

1) initialization:

2) induction:

1≤i≤N

j = 1, N , t = 2, T , ψt (j) = arg max [δt−1 (i) aij ] ,

j = 1, N , t = 2, T ;

1≤i≤N 3) termination:

qbT = arg max [δT (i)]; 1≤i≤N

4) backward inference of the most probable sequence of hidden states:

 qbt = ψt+1 qbt+1 , t = T − 1, 1. In the result we will obtain the most probable sequence of hidden states

b = Q

{b q1 , · · · , qbT }.

Besides decoding this algorithm can be also applied to impute the missing values.

After decoding the missing value can be replaced with the most probable observation

obt

in the decoded hidden state

qbt

(mean of corresponding distribution) or sampled

from an distribution which corresponds to decoded hidden state

qbt .

It should be clear

that after imputation one can also perform classication of imputed sequence or train model using that imputed sequence.

2.2

Modied Forward-Backward Algorithm

Given an observation sequence

O = {o1 , ..., oT }

and HMM

forward variable computation algorithm are as follows:

345

λ

the steps of modied

Krasnoyarsk, 18-22 September, 2017



πi , o1 = ∅ α1 (i) = , i = 1, N ; π b (o ), otherwise  Ni i 1 P   ot+1 = ∅   j=1 αt (j)aji , " # αt+1 (i) = , i = 1, N N P    αt (j)aji , otherwise  bi (ot+1 )

1) initialization:

2) induction:

j=1

t = 1, T − 1. p(O|λ) =

N P

αT (i), this algorithm can be used for incomplete sequence i=1 classication using the maximum likelihood criterion. Since

The steps of modied backward variable computation algorithm are as follows:

βT (i) = 1, i = 1, N N P βt (i) = βt+1 (j)bj (ot+1 )aij , i = 1, N , t = 1, T − 1

1) initialization: 2) induction:

.

j=1 The backward variables are used in modied Baum-Welch algorithm.

2.3

Modied Baum-Welch Algorithm

Given a set of training incomplete sequences approximation

b, λ

 O∗ = O1 , O2 , ..., OK and some HMM

one iteration of Baum-Welch algorithm consists of the following

steps:

αt (i)βt (i) i = 1, N , b , p(O|λ) α (i)a bj (ot+1 )βt+1 (j) = t ij p(O| , b λ) (

1)

γt (i) =

2)

ξt (i, j)

3)

γt (i, m) =

t = 1, T − 1; i, j = 1, N ,

ot = ∅

γt (i) hτim ,

4) New approximation of HMM

π ˆi0

=

1 K

K P

K P

(k) γ1 (i),

k=1

b a0ij

=

k −1 TP

P

P

µ b0im =

k −1 TP

(k) γt (i,m)

k=1 t=1 K T k −1

P

P

k=1 t=1

(k)

okt

b0 = ,Σ im

b0 λ

,

K P

(k)

(k)

γt (i)

0 , τbim

K P

k −1 TP

k=1

t=1 ot 6=∅

γt (i,m)

After iteration steps

i

ξt (i,j)

k=1 t=1 K T k −1 k=1 t=1

K P

i = 1, N , m = 1, M , t = 1, T . otherwise b0 is calculated as follows: parameters λ

τim g(ot ,µim ,Σim ) bi (ot )

γt (i)

t = 1, T − 1;

replaces

=

k −1 TP

P

P

k=1 t=1 (k)

(k)

γt (i,m)

k=1 t=1 K T k −1

, (k)

γt (i)

(k)

(k)

T

γt (i,m)(ot −b µ0im )(ot −b µ0im ) K P

k −1 TP

k=1

t=1 ot 6=∅

. (k)

γt (i,m)

b and new iteration begins. λ

This iterative process

continues until likelihood function converges to local maximum.

346

Applied Methods of Statistical Analysis

3

Evaluation

3.1

Training on Incomplete Sequences

To evalute the training algorithms for incomplete sequences we used the original



 0.1 0.7 0.2 HMM with the following parameters: N = 3, M = 3, Z = 2. A =  0.2 0.2 0.6 , 0.8 0.1 0.1   0.3 0.4 0.3   0.3 0.4 0.3 , τim , i = 1, N , m = 1, M =  0.3 0.4 T0.3 T T  0 0 1 1 2 2    T   µim , i = 1, N , m = 1, M =  3 3 T 4 4 T 5 5 , all covariance T T T 6 6 7 7 8 8  matrixes Σim , i = 1, N , m = 1, M were diagonal with 1 on diagonal. Using this HMM we generated K = 100 sequences with the length T = 100. We were varying the percent of gaps in sequences from 0% to 90%. 4 dierent training approaches for incomplete sequences:

We were comparing

modied Baum-Welch

algorithm (Baum-Welch), gluing of incomplete sequences (gluing), imputation using the modied Viterbi algorithm (Viterbi) and mean imputation (mean) - mean of 10 neighbours were taken.

To measure training performance we used loglike-

lihood of original sequences and distance between models as was proposed in [1]: b)+D(λ,λ b ) D(λ, λ Ds = , D (λ1 , λ2 ) = T1 | ln p (O2 |λ1 ) − ln p (O2 |λ2 ) |, where O2 is the 2 sequences generated by model

λ2 .

Average results of 50 launches with dierent ran-

doms seeds are presented in Fig. 1.

Figure 1: a) Average Loglikelihood of Complete Sequences for HMM Trained on Incomplete Sequences b) Average Distance Between True and Estimated HMM

347

Krasnoyarsk, 18-22 September, 2017

3.2

Classifying Incomplete Sequences

To evalute the classication algorithms for incomplete sequences we used two HMMs with the same parameters as in the previous subsection except for the transition



 0.1 + ∆A 0.7 − ∆A 0.2 0.2 0.2 + ∆A 0.6 − ∆A . Using these HMMs probabilities matrix: A =  0.8 − ∆A 0.1 + ∆A 0.1 we generated K = 100 sequences with the length T = 100. We were varying the percent of gaps in sequences from 0% to 90%. We classied these sequences using the original HMMs. We were comparing 4 dierent classication approaches for incomplete sequences:

modied forward-backward algorithm (forward-backward), gluing

of incomplete sequences (gluing), imputation using the modied Viterbi algorithm (Viterbi) and mean imputation (mean) - mean of 10 neighbours were taken. To measure training performance we used accuracy metric. Average results of 50 launches with dierent randoms seeds are presented in Fig. 2.

Figure 2: Accuracy of Incomplete Sequence Classication

3.3

Imputation and Decoding of Incomplete Sequences

To evalute the classication algorithms for incomplete sequences we used HMM with the same parameters as in the rst subsection of this section. We generated sequences with the length

T = 100.

K = 100

We were varying the percent of gaps in sequences

from 0% to 90%. We imputed and decoded these sequences using the original HMMs. We were comparing 2 dierent imputation and decoding approaches for incomplete sequences: the modied Viterbi algorithm (Viterbi) and mean imputation (mean) mean of 10 neighbours were taken. To measure decoding performance we used percent of correctly decoded states and to measure imputation performance we used mean of norms of dierence between actual and imputed observations. Average results of 50 launches with dierent randoms seeds are presented in Fig. 3.

348

Applied Methods of Statistical Analysis

Figure 3: a) Percent of Correctly Decoded Observations b) Mean of Norms of Dierence Between Actual and Imputed Observations

Conclusion In this paper we proposed a method for training hidden Markov models on incomplete sequences, and methods for classication, decoding and imputation of incomplete sequences using hidden Markov models.

Computer experiment results show that

the proposed methods outperform the standard methods of dealing with missing observations.

References [1] Rabiner L.R. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition Proceedings of the IEEE. Vol.

77, pp. 257-285.

[2] Cooke M., Green P., Josifovski L., Vizinh A. (2001). Robust automatic speech recognition with missing and unreliable acoustic data Speech Communication. Vol.

34, pp. 267-285.

[3] Lee D., Kulic D., Nakamura Y. (2008). Missing motion data recovery using factorial hidden Markov models IEEE International Conference on Robotics and Automation. pp. 1722-1728. [4] Gultyaeva A., Popov A., Kokoreva V., Uvarov V. (2015) Classication of observation sequences described by Hidden Markov Models Proceedings of the International Workshop Applied Methods of Statistical Analysis: Nonparametric approach. pp. 136-143. [5] Gultyaeva A., Popov A., Uvarov V. (2016) Training Hidden Markov Models on Incomplete Sequences Proceedings of 13th International Conference on Actual Problems of Electronic Instrument Engineering. Vol.

349

1, pp. 317-320.