cross-validation based decision tree clustering for hmm ... - Google Sites

1 downloads 182 Views 333KB Size Report
CROSS-VALIDATION BASED DECISION TREE CLUSTERING FOR HMM-BASED TTS. Yu Zhang. 1,2. , Zhi-Jie Yan. 1 and Frank K. Soong. 1
CROSS-VALIDATION BASED DECISION TREE CLUSTERING FOR HMM-BASED TTS Yu Zhang 1

Introduction

Microsoft Research Asia, Beijing, China

2

1,2

1

, Zhi-Jie Yan and Frank K. Soong

Shanghai Jiao Tong University, Shanghai, China

Cross-validation based decision tree clustering

◮ Conventional HMM-based speech synthesis ⊲ spectrum, excitation, and duration features are modeled and generated in a unified HMM-based framework ⊲ decision tree along with ML and MDL criteria is used for parameter tying ◮ Conventional decision tree based context clustering ⊲ ML-based greedy tree growing algorithm ⊲ MDL-based stopping criterion ◮ Cross validation-based decision tree ⊲ improve the conventional greedy splitting criterion ⊲ propose a new stopping criterion in node splitting

◮ Divide training data D

Yes

D

Λ

m

D

m

Yes

... m DK

Experiments

No

Smqy Smqn

m

\

m D1 m D2

m Λ1 m Λ2

m DK

...... m ΛK

◮ Determining the number of cross-validation folds K Table: The log spectral distortion for different K on the development set

m δ(D1 )q m δ(D2 )q

K 4 6 8 10 14 LSD (dB) 5.32 5.33 5.32 5.32 5.31 ◮ Objective Test Results

m δ(DK )q

Log spectral distance

Root mean square error of F0

Root mean square error of durations 30.8000

23.4000

m

δ(D )q

◮ MDL criterion for stopping m M DL m ML δ(D )q = δ(D )q − αL log G

⊲ likelihood increased by node splitting mqy mqn CV m CV CV CV m δ (Dk )q = Lk (Dk ) + Lk (Dk ) − Lk (Dk ) ⊲ select the best question over all validation sets X CV m qm = arg max δ (Dk )q q

23.2000

CV

5.8500

RMSE of F0 (Hz/frame)

α=1.0

5.8000 5.7500 5.7000 5.6500

α=0.5 5.6000

23.0000

α=1.0

22.8000

α=1.0

30.6000

MDL

RMSE of duraon (ms/phone)

CV

5.9000

α=0.5

22.6000

22.4000

α=0.8

30.4000

30.2000

30.0000

MDL 29.8000

CV 29.6000

5.5500

Stop automatically

Stop automatically

22.2000

Stop automatically

5.5000

29.4000

0

2000

4000

6000

8000

State number

10000

12000

22.0000

0

0

2000

4000

6000

State Number

8000

1000

2000

3000

4000

5000

10000

State Number

◮ Subjective Test Results

k

◮ Node stopping criteria ⊲ node splitting intuitively stops when X CV m δ (Dk )qm < 0

Main problems ◮ Splitting criterion: greedy search is sensitive to the biased training set ◮ Stopping criterion: not effective when training data is not asymptotically large / manually-tuned threshold

Log Spectral Distance (dB)

D

x∈Dkm

m

MDL

5.9500

◮ Node splitting criteria ⊲ evaluate likelihood on K validation sets X CV m m Lk (Dk ) = P (x|Λk )

Yes No Smqy Smqn m

◮ Training database ◮ MDL-based decision tree ⊲ Mandarin corpus, 16 kHz, 1,000 sentences, female ⊲ standard MDL method: α = 1.0 ⊲ tuning α on development set speaker ⊲ 40th-order LSP + gain, f0 ◮ Cross-Validation based decision tree ⊲ first- and second-order dynamic features ⊲ stop node splitting using intuitive criterion ⊲ 25,761 different rich context phone models ⊲ MDL-based criterion (Eq.(1))

6.0000

No R-voiced? Sm

Likelihood Increase

No R-voiced? Sm

D

⇒ Yes

into K subsets at each node

m D2

D \ m D \

◮ ML criterion for node splitting m ML δ(D )q = L(Smqy ) + L(Smqn) − L(Sm)

Experimental Setup

m D1

m

MDL-based decision tree clustering

m

1

Conclusions

k

⊲ MDL criterion can also be used X CV m δ (Dk )qm + αL log G < 0 k

⊲ in our experiments, the first criterion gives good results

(1)

◮ Use cross validation in decision tree clustering for HMM-based TTS ◮ Propose a splitting and a stopping criterion in tree building ◮ Compared with conventional method, cross-validation yields better performance given similar model size

Suggest Documents