454
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 3, JUNE 2006
Type-2 Fuzzy Hidden Markov Models and Their Application to Speech Recognition Jia Zeng and Zhi-Qiang Liu
Abstract—This paper presents an extension of hidden Markov models (HMMs) based on the type-2 (T2) fuzzy set (FS) referred to as type-2 fuzzy HMMs (T2 FHMMs). Membership functions (MFs) of T2 FSs are three-dimensional, and this new third dimension offers additional degrees of freedom to evaluate the HMMs fuzziness. Therefore, T2 FHMMs are able to handle both random and fuzzy uncertainties existing universally in the sequential data. We derive the T2 fuzzy forward-backward algorithm and Viterbi algorithm using T2 FS operations. In order to investigate the effectiveness of T2 FHMMs, we apply them to phoneme classification and recognition on the TIMIT speech database. Experimental results show that T2 FHMMs can effectively handle noise and dialect uncertainties in speech signals besides a better classification performance than the classical HMMs. Index Terms—Hidden Markov models (HMMs), nonsingleton fuzzification, speech recognition, type-2 fuzzy sets.
I. INTRODUCTION
H
IDDEN Markov models (HMMs) have been used successfully in many applications, notably in speech recognition. Rabiner [1] mentioned three inherent limitations of HMMs for practical use: 1) The statistical independence assumption that successive observations are independent so that the joint probability of a sequence of observations can be written as a product of probabilities of individual observations; 2) The assumption that the observation distributions can be well represented as a mixture of Gaussian densities; and 3) The first-order Markov assumption that the probability of being in a given state . at time depends only on the state at time The statistical independence assumption may be satisfied by the Mel-frequency cepstral coefficients (MFCC) front-end processing so that each acoustic vector is uncorrelated with its neighbors [2], [3]. As far as the first-order Markov assumption is concerned, some schemes have explored a longer history of the process, but high computational complexity makes them intractable [4]. So the most efforts have been put into relaxing the second limitation, i.e., enhancing the “expressive power” of the HMM [5]–[8]. Some researchers have used artificial neural networks in each state as observation distributions [9]–[11]. Other extensions include factorial HMMs and coupled HMMs. Factorial HMMs can represent the combination of multiple signals using distinct Markov chains [12], and coupled HMMs
Manuscript received February 19, 2004; revised June 23, 2004, March 15, 2005, and August 30, 2005. The authors are with the School of Creative Media, City University of Hong Kong, Hong Kong, P. R. China (e-mail:
[email protected]; smzliu@cityu. edu.hk). Digital Object Identifier 10.1109/TFUZZ.2006.876366
can model audio-visual signals simultaneously in the noisy environment [13]. In this paper, we enhance the HMMs expressive power for uncertainty by type-2 fuzzy set (T2 FS). Uncertainties exist in both mode and data. A model may be interpreted as a set of elements and rules that map input variables onto output variables. Model uncertainty is uncertainty in the mapping induced by uncertain parameters in the model. Data uncertainty is uncertainty in the input variables. For example, speech data have the following uncertainties: 1) The same phoneme has different values in different contexts; 2) The same phoneme has different lengths or different frames; and 3) The beginning and the end of a phoneme are uncertain. The HMM characterizes these uncertainties by probability density function. Given sufficient training data, the HMM can accurately represent the training data according to the maximum likelihood (ML) criterion [14]. In practice, however, the HMM generalize poorly to the test data because of noise, insufficient training data, and incomplete information. Therefore, modeling uncertainties is needed in both the HMM and speech data. Fig. 1 illustrates the importance of modeling uncertainty to achieve a better generalization to test data. The recent success of T2 FS in signal processing has largely been attributed to its three-dimensional membership functions (MFs) to handle uncertainties existing universally in video streams, communication channels, and time series [15]–[22]. The T2 FS represents the uncertainties by two fundamental concepts: secondary membership function and footprint of uncertainty (FOU). Having the domain in the interval 0,1], the secondary MF is a function of the membership (not just a point of value as type-1 (T1) MF) at each value of the input primary variable. The union of all the secondary MF domain composes a bounded region FOU reflecting the degree of uncertainty of the model (the shaded region in Fig. 1). After specifying the uncertain bounds of the FOU and the corresponding secondary MFs, both model and data uncertainties can propagate through T2 FS operations, and their effects can be evaluated using the type-reduced and defuzzified value of the output T2 FS [22]. This strategy of modeling uncertainty has found its utility in a broad spectrum of applications of interest to signal processing, especially when the signal is nonstationary and cannot be expressed ahead of time mathematically [22]. In light of the T2 FS framework, let us reexamine the HMMs uncertainty: The parameters of the HMM are uncertain because of insufficient and corrupted training data. Meanwhile, such large variations as noise and dialect greatly degrade the HMMs expressive power to evaluate the unseen test data. In view of this problem, we propose a novel extension of the HMM based on
1063-6706/$20.00 © 2006 IEEE
ZENG AND LIU: TYPE-2 FUZZY HIDDEN MARKOV MODELS
455
Fig. 1. In (a) and (b), the distributions of the training data and test data are the solid line and dotted line. Because of incomplete information and noise, the two distributions are not close. In (c) and (d), by introducing uncertainty in the model, i.e., letting the model move in a certain way, one of the models (the thick solid line) is probably to approximate the test data distribution. The shaded region is the “footprint” of the model uncertainty.
the T2 FS referred to as the T2 fuzzy HMMs (T2 FHMMs). Because the T2 MF is three-dimensional, we may use one dimension, the primary MF, to evaluate the likelihood of the random data and use the other dimension, the secondary MF, to describe the fuzziness of that likelihood. Therefore, T2 FS is natural to handle two kinds of uncertainties, namely, randomness and fuzziness; probability theory is associated with the former and FS theory is associated with the latter [22]. From this perspective, the T2 FHMM is a hybrid scheme that allows some degree of fuzziness in the HMMs description. Just as the general T2 FS can be represented by a set of embedded T1 FSs [23], the T2 FHMM can be thought of being embedded with a set of HMMs. On the other hand, the T2 nonsingleton fuzzification can represent the uncertainty of data as T2 fuzzy vectors [15]–[17]. Therefore, both uncertainties of the HMM and speech data can be accounted for in the T2 FHMM framework by mapping uncertain input data to a T2 FS through set operations. Compared with the HMM, the T2 FHMM handles a sequence of T2 fuzzy vectors rather than a crisp numeric sequence, and the output of the T2 FHMM is an uncertain T1 FS rather than a crisp scalar. The linguistic hidden Markov model (LHMM) [24] and linguistic fuzzy c-means [25] can also compute fuzzy vectors. However, the proposed T2 FHMM is different from LHMM in three important aspects: 1) We represent uncertainty of data as a sequence of T2 fuzzy vectors rather than T1 fuzzy vectors; 2) We use T2 FS to model HMMs fuzziness rather than a union of different level interval HMMs used by LHMM; and 3) We extend the forward-backward algorithm and Viterbi algorithm
using T2 FS operations rather than the extension principle and decomposition theorem. Though both the T2 FHMM and LHMM use interval arithmetic for computation in practice, the T2 FHMM is a T2 and LHMM is a T1 fuzzy extension of the classical HMM. Generalized hidden Markov models (GHMMs) [26]–[28] can relax the statistical independence limitation and additivity constraint in the probability measure by fuzzy measures and fuzzy integral, whereas the T2 FS, with its solid theoretical background and practical success in signal processing, is probably to shed more light on building a theoretically well-founded framework that can increase the HMMs expressive power for modeling uncertain sequential data. In the following section we review the classical HMMs. Section III introduces basic concepts of T2 FS. Section IV presents the general framework of the T2 FHMM. We formulate the T2 fuzzy forward-backward algorithm and Viterbi algorithm using the T2 FS operations. Section V discusses the interval T2 (IT2) FHMM. Lower computational complexity makes the IT2 FHMM more suitable for practical use than the general T2 FHMM. In Section VI, we apply the IT2 FHMM to phoneme classification and recognition on the TIMIT speech database. Finally, we draw conclusions in Section VII. II. HMMS An HMM consists of many states , . Each has an observation probability density that destate at time . termines the probability of emitting observation and has an transition probability , Each pair of states
456
Fig. 2.
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 3, JUNE 2006
Example of the left–right Gaussian mixture HMM, where no transitions are allowed to states whose indices are lower than the current state.
Fig. 3. T2 MF is three-dimensional. All concepts are labeled here: vertical slice, secondary MF, secondary grade, primary MF, primary grade and lower and upper MFs. The shaded area in the x-u plane is the footprint of uncertainty (FOU).
. Then an -state HMM is defined by the pa, where is rameter set initial state distribution. For simplicity, we assume continuous density distributions. The entry state and exit state are nonemitting. Fig. 2 shows an example of the emitting process where the , 2, 3, five-state model moves through the state sequence, to . In HMM-based 4, 5, to emit the observation sequence speech recognition, the left-right Gaussian mixture HMMs (see Fig. 2) are well suited to MFCC features [3], so we represent the observation distributions by Gaussian mixture densities (1) is the number of mixture components, is the where is a multivariate weight of the th component, and Gaussian with mean vector and covariance matrix : (2) where
is the dimensionality of .
The forward-backward or Baum–Welch algorithm and the Viterbi search algorithm are efficient algorithms for HMMbased training and recognition [1]. Given a set of training observation sequences, the Baum-Welch algorithm can iteratively and automatically adjust parameters and in the th mixture component of the HMM. The Baum–Welch algorithm, an implementation of the expectation-maximization (EM) algorithm, guarantees that the model converges to a local maximum likelihood (ML) of observations of the training set [14]. Using the Viterbi algorithm, we can decode the maximum likelihood given an observation state sequence sequence and an HMM , where the best state sequence may represent words or phonemes in speech recognition. III. TYPE-2 FUZZY SETS Fig. 3 illustrates all terminology of T2 FS [23]. A T2 FS, de, where noted , is characterized by a T2 MF and . Two important concepts distinguish T2 MF from T1 MF: secondary MF and FOU. The secondary MF is at each value of , i.e., the funca vertical slice of tion , . The amplitude of the secondary MF is
ZENG AND LIU: TYPE-2 FUZZY HIDDEN MARKOV MODELS
457
Fig. 4. Examples of Gaussian primary MF with (a) uncertain mean and (b) standard deviation. The thick solid line and the dashed line denote the upper MF and the lower MF. The shaded regions are the FOUs.
called the secondary grade. The domain of the secondary MF is called the primary membership of , and is the primary grade. The FOU is a bounded uncertain region in the primary memberships of a T2 FS , and is the union of all primary memberships. An upper MF and a lower MF are two T1 and , MFs that are bounds for the FOU denoted by . If , , , the secondary MFs are interval sets, which reflect a uniform uncertainty at the primary memberships of . Because all the secondary grades are unity, an IT2 FS can be denoted by the interval of upper . The membership and lower MFs, i.e., of in a T2 FS is a T1 FS in [0,1] (See Fig. 3). grade is an IT1 set. The T2 FS has two Similarly, in an IT2 FS, new operators, the meet (denoted by ) and the join (denoted by , to account for the intersection and union [29]. Details about the meet and join operations for T2 FS can be found in [22]. IV. FRAMEWORK OF GENERAL TYPE-2 FHMMS A. Elements of a Type-2 FHMM We denote each hidden fuzzy state by . Instead of obser, each fuzzy state is associated vation probability density with a T2 MF that determines the membership of to the fuzzy state . We use the T1 fuzzy number with support to model the uncertainty of the transition probability from state to . The entry state and exit state are nonemitting. Let us formally define the notations of the T2 FHMM as follows. , set of hidden fuzzy states. Fuzzy state visited at time . The fuzzy transition probability from state to . Observation at time ’s membership to the fuzzy state .
T2 MF of the nonsingleton fuzzified observation vector . Weight of the th mixture component in fuzzy state . Vector of means for the th mixture component of fuzzy state . Covariance matrix for the th mixture component of fuzzy state . The set of all parameters defining a T2 FHMM. The membership grade of reflects the uncertainty of the transition probability from to . The primary membership is the membership that belongs to the fuzzy hidden of state , and the secondary grade reflects our belief to this membership. Let denote the space of fuzzy observation sequence from time slot to slot . The observais fuzzified as a vector of T2 fuzzy numbers detion vector noted by the T2 MF . Given an observation sequence, , and a T2 FHMM , denotes the membership grade that the sequence belongs to the T2 FHMM, and is a T1 FS that contains uncertain information conand . veyed by B. Type-2 FHMMs : The Gaussian priThere are two implementations of mary MF with uncertain mean and the Gaussian primary MF with uncertain standard deviation as illustrated in Fig. 4. Consider the case of a Gaussian primary MF having a fixed standard deviation and an uncertain mean that takes on values in (3)
458
The upper MF
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 3, JUNE 2006
is
Suppose each hidden fuzzy state has becomes (4)
(13)
(5)
where is the mixture weight. Because in (11) and (12) is not always in the unit interval, we scale the output into unit interval as a T2 MF. of and specify the bounds of FOU The range of according to the uncertainty of the model. A greater uncertainty will result in a larger spread of the FOU and increase the range and . We use the following formulas: of
where
The lower MF
mixtures, then
is
(14) (15)
(6) Consider the case of a Gaussian primary MF having a fixed mean and an uncertain standard deviation that takes on values in (7) The upper MF
is (8)
and the lower MF
is (9)
and , a T2 MF is reduced to a T1 MF. Given an -dimensional observation vector, , the , and covaricorresponding mean vector , we can rewrite (2) for ance matrix the statistical independence assumption If
where , and are uncertainty factors of the model, i.e., the larger the greater uncertainty. Similarly, we can model by a T2 MF in (3) and (7), by the T1 fuzzy number with support . and describe of the hidden So far we have implemented the T2 MF fuzzy state , the T2 nonsingleton fuzzified observation vector , and the fuzzy transition probability . The and are fixed according to prior uncertainty factors , knowledge before training and recognition. Therefore, in the T2 FHMM we have to estimate the following parameters: the mix, the mean vectors , the covariance matrix ture weight in (13), and the support of the T1 fuzzy number . C. Type-2 Fuzzy Forward-Backward Algorithm In this paper, the T2 nonsingleton fuzzified observation vec, are assumed independent. From theory tors of the classical HMM, we can redefine the forward-backward algorithm using T2 FS operations and nonsingleton T2 fuzzification as follows. • The T2 fuzzy forward variable: (16)
(10) where is the product operator. If we represent each exponential factor in (10) by a T2 MF in (3) and (7), we obtain 1)
means the membership grade of the partial observation sequence, , (until time ) and state at time , given the T2 FHMM . This variable is a T1 FS that can be solved recursively. Initial condition: (17) (18)
(11)
2)
Recursion:
and
(19) 3) (12)
Final condition: (20)
ZENG AND LIU: TYPE-2 FUZZY HIDDEN MARKOV MODELS
459
From the definition of , we have the total membership grade of to the T2 FHMM as follows: (21) • The T2 fuzzy backward variable
grade can be computed using almost the same algorithm as the T2 fuzzy forward-backward algorithm, except that the join operation “ ” is chosen as the maximum -conorm. For a given T2 FHMM , let represent the maximum membership grade of the first observations to and ends in state . is a T1 FS that can be computed as follows:
(22)
1)
means the membership grade of the partial observation sequence from to the end, given state at time and the T2 FHMM . This variable is a T1 FS that can be solved recursively. Initial condition: (23)
2)
(28) where (29) (30) for grade
, . The maximum membership along the best state sequence is given by (31)
Recursion:
(32)
E. Training for the Type-2 FHMM (24) 3)
Final condition:
, , is Suppose a set of training observations mixused to estimate the parameters of a T2 FHMM with ture components. The T2 fuzzy Viterbi algorithm can be used represents the total number of to initialize a T2 FHMM. If transitions from to , the support of fuzzy transition probacan be estimated as follows: bility
(25)
(33)
Note that the T2 fuzzy forward and backward variables allow the total membership grade of observation sequence being in the th state at time to be determined by taking the meet operation “ ” of them
The best state sequence implies an alignment of the training observation vectors with the fuzzy state sequence. Within each state, observations are further associated with the mixture component holding the highest defuzzified membership grade. The result is that every observation is associated with a single unique mixture component. This association can be represented by the indicator function:
(26) because
accounts for the partial observation sequence accounts for the remainder of the observation given state at . Let denote the defuzzified value of a T1 FS , which is a mapping from a T1 FS to a crisp scalar. The defuzzified membership grade of , is observation being in state , denoted by , and
if
is with the th mixture component of state otherwise.
(34)
The means and variances are then estimated via simple averages (35)
(27) (36) where
is a normalization factor. (37)
D. Type-2 Fuzzy Viterbi Algorithm The T2 fuzzy Viterbi algorithm is able to search the single best state sequence, , which maximizes . This maximum membership the membership grade
As far as the parameter re-estimation is concerned, we follow the T2 fuzzy forward-backward algorithm (16)–(27), and use the defuzzified values to update all parameters of the T2 FHMM,
460
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 3, JUNE 2006
as shown in (38) at the bottom of the page, where and , and is the total membership grade of the th training observation. The transition from the nonemitting entry state are re-estimated by
A. Interval Type-2 Fuzzy Forward-Backward Algorithm • The IT2 fuzzy forward variable (45)
(39) 1)
is an IT1 set and can be solved recursively. Initial condition:
. The transitions from the emitting states where to the final nonemitting exit state are re-estimated by
(46)
(40) where . and denote the T2 fuzzy forward and Let th mixture of state . The rebackward variables at estimation formulas can be expressed by weight
(47)
(48) 2)
Recursion:
(41) (42) (49)
(43) where (44) is the defuzzified membership grade of state .
to the
th mixture of
(50) 3)
Final condition:
V. INTERVAL TYPE-2 FHMMS The computation of the general T2 FHMM is prohibitive because the general T2 FS operations are complex. The IT2 FS is a special case of the general T2 FS, where all secondary grades equal to one so that the set operations can be simplified to interval calculations [17], [24], [25]. Therefore, the IT2 FHMM is more suitable for practical use. In the IT2 FHMM, we can denote all variables by the intervals of their lower and upper MFs, , , such as , , and . Similarly, we represent the as an IT1 set, i.e., , where the range is determined by prior knowledge. Furthermore, we choose the product -norm “ ” and the bounded sum -conorm “ ” in the meet and join operations to implement the IT2 FHMM.
(51) (52) The total membership grade
is (53)
where (54) (55)
(38)
ZENG AND LIU: TYPE-2 FUZZY HIDDEN MARKOV MODELS
461
• The IT2 fuzzy backward variable
where (67)
(56)
1)
(68)
is an IT1 set and can be solved recursively. Initial condition:
(69)
(57) (58) 2)
and Recursion:
for grade by
, . The maximum membership along the best state sequence is then given (70)
where (71) (59)
(72) If we use the center of the interval as the defuzzified values, then (73)
(60) 3)
Similarly,
Final condition:
and
are IT1 sets, too.
C. Training and Recognition (61)
(62) Obviously, , and are all IT1 sets [17], [18]. From the previous definitions and (26), we have (63) (64) B. Interval Type-2 Fuzzy Viterbi Algorithm From (28)–(31), we can derive the IT2 fuzzy Viterbi algorithm. We choose the product -norm and maximum -conorm in the meet and join operations. The maximum membership grade , can of the first observations at state , be computed by the following recursion:
(65)
(66)
The algorithms of parameter estimation for the IT2 FHMM are almost identical with (33)–(44), except that the defuzzified values are the centers of the intervals, and the -norm “ ” replaces the meet operator “ .” Compared with the classical in (44) Baum–Welch algorithm, the updating weight have many choices because of different defuzzification methods [30], [31]. As far as the center of the interval is concerned, may be greater than that in the classical Baum-Welch deviates far greater when the training observation vector from the center of the underlying density, as shown in Fig. 5. This phenomenon reflects a big difference from the classical may deviate from the center of the Baum–Welch: Though underlying density, it can still affect parameter re-estimation through a reasonable weight. In this sense, the IT2 fuzzy forward-backward algorithm is a softer kind of generalized EM algorithm according to the ML criterion [14]. In recognition, the IT2 fuzzy Viterbi algorithm classifies the observation sequence to the IT2 FHMM by the IT1 set in (70). In practice, we have found it necessary to use the two end points as well as the range of the interval for classification. So we fuzzify the output interval set into T1 isosceles triangular fuzzy numbers. The decision rules are as follows. and , then . 1) If and , then . We have to 2) If compare them further by the areas of two shaded regions and . , then , and vice versa, as illustrated 3) If in Fig. 6. We classify the observation sequence to the IT2 FHMM pro. Usually, misclassification occurs ducing the largest when output interval sets have a large overlap.
462
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 3, JUNE 2006
a
b
Fig. 5. Dotted line and thick solid line denote the center of an IT2 MF and the distribution without uncertainty. The intersections and divide the axis into three zones. The dotted line is lower than thick solid line in the range [ , ] and vice versa, which implies the updating weight induced by the center of the interval is greater than that of the classical Baum–Welch when the input lies outside [ , ].
ab
h
h are compared. In (a), if h < h h < h , and vice versa.
Fig. 6. Two fuzzified IT1 sets and and . If , then regions
S
S
S >S
ab
and
D. Computational Complexity The computational complexity of the classical forward-back[1]. Similar to the view that an IT2 ward algorithm is FS is a set of embedded T1 FSs [23], an IT2 FHMM can be considered as a union of embedded HMMs as shown in Fig. 7. In this sense, the IT2 fuzzy forward-backward algorithm seems to compute two embedded HMMs: The “lower” HMM and the “upper” HMM. Intuitively, the computational complexity of the IT2 FHMM is as twice as that of the HMM. Similar to random uncertainties that flow through the HMM and their effects can be evaluated using the mean vectors and variance matrix, fuzzy and random uncertainties flow through the IT2 FHMM, and their effects can be evaluated using the type-reduced and defuzzified output of the IT2 FHMM. If we choose the product -norm in meet operation in the IT2 fuzzy forward-backward algorithm, the bounded sum -conorm in join operation in the IT2 fuzzy forward-backward algorithm, and the maximum -conorm in join operation in the IT2 fuzzy Viterbi algorithm, the IT2 FHMM reduces to the classical HMM when all uncertainties disappear. Therefore, the incorporation of IT2 FS greatly increases the ex-
h < h , then h < h
. In (b), we have to further compare the two shaded
pressive power of the HMM for uncertainty while retaining the tractable training and recognition procedures. VI. APPLICATION TO SPEECH RECOGNITION A. Automatic Speech Recognition (ASR) System An ASR system includes five components: a speech database with front-end acoustic processing, acoustic models, language models, training algorithm, and recognition algorithm, in which the acoustic models and the language models are crucial in ASR. The HMM reduces a nonstationary process to a piecewise-stationary process. Phonetic units—phonemes—can be divided into three stationary parts: Initial, central, and final parts. Thus, the HMM is a good acoustic model for phonemes. Fig. 8 shows the hierarchical structure of HMM-based speech modeling. The within-HMM transitions are determined from the HMM parameters. The between-model transitions are constant. The word-end transitions are determined by the language model with word-level networks. The language model is used to compute the probability of a sequence of words, , and the bigram and
ZENG AND LIU: TYPE-2 FUZZY HIDDEN MARKOV MODELS
463
Fig. 7. Example of the left–right IT2 FHMM that is a set of embedded HMMs. Uncertain sequential data are represented by IT2 fuzzy vectors h MF ~ b (o ) reflects the HMMs uncertainty. The IT1 set a~ describes the uncertainty in the transition probability.
(o ). The IT2
Fig. 8. Hierarchical structure of speech modeling.
trigram are two widely used models. A useful method to evaluate the impact of the language model on recognition accuracy is perplexity, defined as the geometric mean of the number of words that can follow a word after the language model has been applied [2]. TIMIT [23] is a widely used speech database that contains 6300 utterances produced by 630 speakers from eight major dialect divisions of the United States. For each speaker, there are: 1) Two dialect sentences (the “SA” sentences), which meant to expose dialect variants of the speakers; 2) Five phoneticallycompact sentences (the “SX” sentences), which offer a good coverage of phones; and 3) Three phonetically-diverse sentences (the “SI” sentences). Roughly 20% to 30% of the corpus is used for test purposes, leaving 70% to 80% for training. Two male speakers and one female speaker from each dialect region are selected, providing a core test set of 24 speakers. The two “SA” sentences have been excluded from the core test set in order to avoid overlap with the training material.
B. Training for the IT2 FHMMs First we decide the number of states and mixtures of the IT2 FHMM. Then we choose the form of the IT2 MF and fix the uncertainty factors. At each recursion of the IT2 fuzzy forwardback algorithm and the Viterbi algorithm, IT2 MF generates a continuous IT1 set that reflects the uncertainty of the primary grade. We use the IT2 fuzzy Viterbi algorithm to initialize a prototype IT2 FHMM as follows. 1) Start by uniformly segmenting the data, associating each successive segment with successive states, and further clustering data into mixtures by fuzzy c-means algorithm [33]. 2) Initialize parameters by data with each mixture by (33)–(37). 3) Produce a prototype IT2 FHMM by fixing uncertainty factors. 4) Use the IT2 fuzzy Viterbi algorithm to search the best state sequence for all IT2 nonsingleton fuzzified observa-
464
tion sequences , , which implies alignments of training sequences with the fuzzy states. 5) Further associate the observation vector with the th mixture having the largest defuzzified membership grade, i.e., . 6) Update the parameters by data (not fuzzified) with each mixture by (33)–(37). 7) If the average membership grade of all training observafor this tions iteration is not higher than the value at the previous iteration then stop, otherwise repeat steps 4)–7) using the updated IT2 FHMM. Finally, we use the IT2 fuzzy forward-backward algorithm to refine the parameters of the initialized IT2 FHMM. The steps to perform parameters re-estimation are summarized as follows. 1) For every parameter vector/matrix requiring re-estimation, allocate storage for the numerator and denominator summations of the form illustrated by (41)–(43). These storage are referred to as accumulators. 2) Calculate the IT2 fuzzy forward and backward variables of the IT2 nonsingleton fuzzified observation sequence , , for all states , mixtures , and times . 3) For each state , mixture , and time , use the weight and the current observation to update the accumulators for that mixture. 4) Use the final accumulator values to calculate new parameter values by (41)–(43) to produce a new IT2 FHMM. 5) If the average membership grade of all training observafor this iteration tions is not higher than the value at the previous iteration then stop, otherwise repeat the previous steps using the new IT2 FHMM. C. Experimental Results 1) Phoneme Classification: In phoneme classification, we used phoneme boundaries during the test. We converted the TIMIT transcription files into broad-transcription files in which phonemes were broadly grouped as follows: C—consonant, V—vowel, N—nasal, L—liquid, S—silence. We selected about 300–500 training utterances for each phoneme from ten speakers in the training set. We also selected about 300–500 utterances for each phoneme from ten speakers in the core test set for classification. All these utterances were parameterized using 39 coefficients (one energy coefficient 12 cepstrum coefficients+their first and second derivatives). To compare the classification performance, for simplicity, we used five states and three mixtures in both the HMM and IT2 FHMM (Gaussian primary MF with uncertain mean), which are adequate to produce good results in this experiment. We implemented the T2 fuzzy forward-backward algorithm and the Viterbi algorithm on MATLAB. The uncertainty factor and the range of transition, . We corrupted test data by white Gaussian noise with different signal-to-noise ratios (SNRs): 5, 10, 15, 20, 25, and 30 dB. Tables I and II show the classification results by the HMM and the IT2 FHMM. The average classification rate of the IT2 FHMM
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 3, JUNE 2006
is higher than that of the HMM at all SNR levels: 11.9% higher in 5 dB, 11.9% higher in 10 dB, 7.2%dB higher in 15 dB, 5.3% higher in 20 dB, 2.8% higher in 25 dB, and 3.1% higher in 30 dB. We can see that the IT2 FHMM has a comparable classification ability to the HMM in clean speech data. However, when the SNR is low, for example, lower than 20 dB, the IT2 FHMM has a much higher classification rate than the HMM. Fig. 9 compares the performance of the IT2 FHMM and HMM at different SNR levels. From Fig. 9(c), we can see that the classification rate for the C (consonant) phonemes drops as the SNR rises. This phenomenon may be caused by that the test samples with added noise become increasingly like consonant phonemes. As the experimental results suggest, the proposed IT2 FHMM still outperforms the HMM in this instance. 2) Phoneme Recognition: In phoneme recognition, we evaluated the recognition results by “percent correct” and “accuracy” [2]. We transcribed the 61 phonetic labels defined in TIMIT to 39 labels [32]. For comparison, we used the same number of states and mixtures in both the HMM and IT2 FHMM (Gaussian primary MF with uncertain standard deviation) for each phoneme. We assigned phonemes {b d g dx} four states, {ih ah uh l r y w m n ng ih dh p t k v hh} six states, {iy eh uw er ch z f th s sh sil} eight states, and {ae aa ey ay oy aw ow} ten states according to their lengths. We used and 32 mixtures in each state. Uncertainty factors . For benchmarking, we used the hidden Markov model toolkit (HTK) to implement the HMM [2]. The training set of the TIMIT database consists of 462 speakers from eight dialect regions. There are total of 3696 utterances except the “SA” sentences. We used the same training samples for both the HMM and IT2 FHMM, and a bigram language model in this experiment. Finally we evaluated these two models on the TIMIT core test set [32]. Table III shows the comparison of the IT2 FHMM with other TIMIT phoneme recognizers. Lee et al. reported the first results of phoneme recognition on the TIMIT database [34]. They selected 48 phonemes to model and used the bigram language model. The features were 12 cepstral coefficients after front-end processing. They could achieve recognition accuracy of 53.27% in context-independent condition. Young used 48 context-independent left-right HMMs of three states with diagonal covariance, and obtained 52.7% recognition accuracy [35]. Glass et al. got 64.1% recognition accuracy on the core test set by 50 diagonal Gaussian Mixtures [36]. Becchetti et al. used ten Gaussian mixtures and generated 48 models. They did not use “SA” sentences and reported 62.91% accuracy on the core test set [8]. The fuzzy GHMM was not applied to phoneme recognition but digit recognition, and had almost the same recognition rate as that of the HMM besides a shorter training time [28]. From Table III, we can see that the IT2 FHMM has at least a comparable recognition rate to the above phoneme recognizers. We also evaluated the IT2 FHMM by “SA” dialect sentences. The uncertainty facand in this experiment. Table IV tors shows that the recognition rate of the IT2 FHMM is 5.55% higher than that of the HMM, which demonstrates that the IT2 FHMM is more robust to large dialect variations in speech data. Phoneme recognition was conducted without boundary information, which means recognition was performed in word layer
ZENG AND LIU: TYPE-2 FUZZY HIDDEN MARKOV MODELS
465
Fig. 9. Classification results of the HMM and IT2 FHMM for five broad phonemes: S (silence), C (consonant), V (vowel), L (liquid) and N (nasal). Average is the average classification rate of S, C, V, L, and N. Experimental results show that the IT2 FHMM is more robust to noise than the classical HMM under different SNRs.
TABLE I CLASSIFICATION RATE (%) OF THE HMM WITH DIFFERENT SNRS
reason why the IT2 FHMM dose not have much better recognition rate as shown in Table III. However, from Table IV we can see that the IT2 FHMM still outperforms the classical HMM in dialect recognition. VII. CONCLUSION
TABLE II CLASSIFICATION RATE (%) OF THE IT2 FHMM WITH DIFFERENT SNRS
illustrated in Fig. 8. The between-model transitions are governed by the language model that determines the boundary between phonemes in a word. The language model is the main
This paper has shown the T2 FHMM that can effectively handle both randomness and fuzziness. In the general T2 FHMM, the primary grade evaluates the likelihood of the random data, and the secondary grade describes the fuzziness of that likelihood. The output of the T2 FHMM is an uncertain T1 FS rather than a crisp scalar in the classical HMM. In this way, both random and fuzzy uncertainties can be accounted for in a unified framework. In order to realize the T2 FHMM, we modeled each hidden state with two kinds of forms: Gaussian primary by a T2 MF MF with uncertain mean and Gaussian primary MF with uncertain standard deviation. Meanwhile the transition probability was modeled by a T1 fuzzy number. Following the classical forward-backward algorithm and Viterbi algorithm, we extended training and recognition processes using T2 FS operations. The IT2 FHMM reduces the complex set operations to computing the two ends of the interval. We have proposed a method to classify the observation sequences into different IT2 FHMMs by ordering the output IT1 sets. Experimental results have shown that the IT2 FHMM has a higher classification rate than the classical HMM in clean speech data, and is more robust to speech signals with noise and large dialect variations.
466
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 3, JUNE 2006
TABLE III COMPARISON WITH OTHER TIMIT PHONEME RECOGNIZERS (CONTEXT-INDEPENDENT)
TABLE IV DIALECT RECOGNITION (CONTEXT-INDEPENDENT)
Future works include the application of IT2 FHMM to such more complicated sequential data modeling as those in the handwriting and gesture recognition.
[18]
[19]
REFERENCES [1] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257–286, Feb. 1989. [2] S. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodlands, The HTK Book for HTK Version 3.2. Cambridge, U.K.: Eng. Dept., Cambridge Univ., 2002. [3] S. Young, “A review of large-vocabulary continuous-speech recognition,” IEEE Signal Process. Mag., vol. 13, no. 5, pp. 45–56, 1996. [4] L. Saul and M. I. Jordan, “Exploiting tractable substructures in intractable networks,” in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 1996, vol. 8, pp. 486–492. [5] J. Bilmes, “What HMMs can do,” Dept. Elect. Eng., Univ. Washington, Seattle, WA, UWEE Tech. Rep. UWEETR-2002-2003, Jan. 2002. [6] Y. Bengio. (1999) Markovian models for sequential data. [Online]http://www.icsi.berkeley.edu/~jagota/NCS [7] S. Nakagawa, “A survey on automatic speech recognition,” IEICE Trans. Inf. Syst., vol. E85-D, no. 3, pp. 465–486, Mar. 2002. [8] C. Becchetti and L. P. Ricotti, Speech Recognition Theory and C++ Implementation. New York: Wiley, 1999. [9] Y. Bengio, Neural Networks for Speech and Sequence Recognition. London, U.K.: International Thomson Computer Press, 1996. [10] H. Morgan and H. Bourlard, “Continuous speech recognition using multilayer perceptrons with hidden Markov models,” in Proc. IEEE ICASSP, vol. 77, 1990, pp. 413–416. [11] A. J. Robinson, “An application of recurrent nets to phone probability estimation,” IEEE Trans. Neural Netw., vol. 5, no. 2, pp. 298–305, Mar. 1994. [12] Z. Ghahramani and M. I. Jordan, “Factorial hidden Markov models,” in Proc. Conf. Advances in Neural Information Processing Systems, NIPS, vol. 8, 1995, pp. 472–478. [13] A. V. Nefian, L. H. Liang, X. X. Liu, X. Pi, C. Mao, and K. Murphy, “A coupled HMM for audio-visual speech recognition,” in Proc. IEEE ICASSP, vol. 2, 2002, pp. 2013–2016. [14] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. New York: Wiley, 2001. [15] G. C. Mouzouris and J. M. Mendel, “Nonsingleton fuzzy logic systems: Theory and application,” IEEE Trans. Fuzzy Syst., vol. 5, no. 1, pp. 56–71, Feb. 1997. [16] N. N. Karnik, J. M. Mendel, and Q. Liang, “Type-2 fuzzy logic systems,” IEEE Trans. Fuzzy Syst., vol. 7, no. 6, pp. 643–658, Dec. 1999. [17] Q. Liang and J. M. Mendel, “Interval type-2 fuzzy logic systems: theory and design,” IEEE Trans. Fuzzy Syst., vol. 8, no. 5, pp. 535–549, Oct. 2000.
[20] [21]
[22] [23] [24] [25]
[26]
[27]
[28]
[29] [30] [31] [32]
[33] [34]
[35] [36]
, “Equalization of nonlinear time-varying channels using type-2 fuzzy adaptive filters,” IEEE Trans. Fuzzy Syst., vol. 8, no. 5, pp. 551–563, Oct. 2000. , “Overcoming time-varying co-channel interference using type-2 fuzzy adaptive filters,” IEEE Trans. Circuits Syst. II, vol. 47, no. 12, pp. 1419–1429, Dec. 2000. J. M. Mendel, “Uncertainty, fuzzy logic, and signal processing,” Signal Process., vol. 80, no. 6, pp. 913–933, Jun. 2000. Q. Liang and J. M. Mendel, “MPEG VBR video traffic modeling and classification using fuzzy technique,” IEEE Trans. Fuzzy Syst., vol. 9, no. 1, pp. 183–193, Oct. 2001. J. M. Mendel, Uncertain Rule-Based Fuzzy Logic Systems: Introduction and New Directions. Upper Saddle River, NJ: Prentice-Hall, 2001. J. M. Mendel and R. I. B. John, “Type-2 fuzzy sets made simple,” IEEE Trans. Fuzzy Syst., vol. 10, no. 2, pp. 117–127, Apr. 2002. M. Popescu, J. Keller, and P. Gader, “Linguistic hidden Markov models,” in Proc. FUZZ-IEEE, May 2003, pp. 796–801. S. Auephanwiriyakul and J. M. Keller, “Analysis and efficient implementation of a linguistic fuzzy c-means,” IEEE Trans. Fuzzy Syst., vol. 10, no. 5, pp. 563–582, Oct. 2002. M. A. Mohamed and P. Gader, “Generalized hidden Markov models— Part I: Theoretical frameworks,” IEEE Trans. Fuzzy Syst., vol. 8, no. 1, pp. 67–81, Feb. 2000. , “Generalized hidden Markov models—Part II: Application to handwritten word recognition,” IEEE Trans. Fuzzy Syst., vol. 8, no. 1, pp. 82–94, Feb. 2000. S. Chevalier, M. N. Kaynak, A. D. Cheok, and K. Sengupta, “Use of a novel nonlinear generalized fuzzy hidden Markov model for speech recognition,” Int. J. Control Intell. Syst., vol. 30, no. 2, pp. 68–82, 2002. N. N. Karnik and J. M. Mendel, “Operations on type-2 fuzzy sets,” Fuzzy Sets Syst., vol. 122, pp. 327–348, 2001. J. Zeng and Z.-Q. Liu, “Interval type-2 fuzzy hidden Markov models,” in Proc. FUZZ-IEEE, 2004, pp. 1123–1128. , “Type-2 fuzzy hidden Mrkov models to phoneme recognition,” in 17th Int. Conf. on Pattern Recognition, vol. 1, 2004, pp. 192–195. J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, “DRAPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM,”, Feb. 1992. J. C. Bezdek, Pattern Recognition With Fuzzy Objective Function Algorithms. New York: Plenum Press, 1981. K.-F. Lee and H.-W. Hon, “Speaker-independent phone recognition using hidden Markov models,” IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 11, pp. 1641–1648, Nov. 1989. S. Young, “The general use of tying in phoneme-based HMM speech recognizers,” in Proc. IEEE ICASSP, 1992, pp. 569–572. J. Glass, J. Chang, and M. McCandless, “A probabilistic framework for feature based speech recognition,” in Proc. IEEE ICASSP, 1996, pp. 2277–2280.
ZENG AND LIU: TYPE-2 FUZZY HIDDEN MARKOV MODELS
Jia Zeng (S’05) received the B.Eng. degree in automatic control from Wuhan University of Technology, China, in 2002. He is currently working toward the Ph.D degree with the School of Creative Media, City University of Hong Kong, China. In 2003, he was a Research Assistant with the School of Creative Media, City University of Hong Kong. His research interests are type-2 fuzzy sets, type-2 fuzzy logic systems, machine learning, image processing, computer vision, and pattern recognition. Mr. Zeng has been awarded First Place in the 2005 IEEE Region 10 Postgraduate Student Paper Competition.
467
Zhi-Qiang Liu (S’82–M’86–SM’91) received the M.A.Sc. degree in aerospace engineering from the Institute for Aerospace Studies, The University of Toronto, Toronto, ON, Canada, and the Ph.D. degree in electrical engineering from The University of Alberta, Alberta, Canada, in 1983 and 1986, respectively. He is a Professor with the City University of Hong Kong, China. Previously, he was with the Department of Computer Science and Software Engineering, The University of Melbourne, Melbourne, Australia. His interests are neural-fuzzy systems, machine learning, human-media systems, media computing, computer vision, and computer networks.