cross-validation based decision tree clustering for hmm ... - Google Sites
Recommend Documents
paper, we adopt the cross-validation principle to build such a deci- sion tree to minimize the ... Index Termsâ HMM-based speech synthesis, cross validation,.
Conventional decision tree based clustering is a top-down, data driven training process, based on a greedy tree growing
To improve these decision tree classifiers, we research the initial data structure of a training data set in order to gain information for constructions of more ...
State-of-the-art speech recognition technology uses phone level HMMs to model the ..... ing in-house linguistic knowledg
Combination of cross validation and hierarchi- cal priors within decision tree-based context clustering offers bet- ter model selection and more robust parameter ...
Classification rules for hotspot occurrences using spatial entropy- based decision tree algorithm. Indry Dessy Nurpratami, Imas Sukaesih Sitanggang*.
nition [16], machine learning [15], and database and data mining (e.g.,. [25, 32, 7, 14, ... Cluster tree construction: This step uses a modified decision tree algorithm ...... S. Guha, R. Rastogi, and K. Shim (1998) CURE: an efficient clustering alg
Abstract. Fuzzy decision tree induction algorithms require the fuzzy quantization of the input variables. This paper demonstrates that supervised fuzzy clustering ...
rithm against earlier attribute clustering algorithms and evaluate the performance in several domains. Key words: Maximum Spanning Tree, Clustering.
All Polynomials. Factor out Greatest. Common Factor first! Binomials. Difference of Two. Squares a2 - b2 = (a + b)(a-b).
developed for building decision trees from large datasets. ... computation to operate datasets with a big ..... Large Da
of decision trees from machine learning. In Section 5, we evaluate the performance of our algorithm using different versions of the SCOP. We conclude with a ...
Each component classifier is trained to answer the first question, i.e., whether a ... est neighbor, in the database of proteins with known classification, based on ...
Apr 8, 2011 - kalkulasi nilai maksimum transformasi Stockwell invers tanpa penapis (MUNIST) ... termasuk transformasi Stockwell (ST) telah digunakan untuk ...
nition performance. First an overview ... present in testing, can be assigned a model using the decision tree. Clusterin
This paper explores the application of the Min- imum Description Length principle for pruning decision trees. We present a new algorithm that intuitively captures ...
to the design of a Paraconsistent Learning system, able to extract useful ..... values of the supremes in class C" and C#, a value much more favorable to belief in ...
It is possible to apply Machine Learning, Uncer- tainty Management and Paraconsistent Logic concepts to the design of a Paraconsistent Learning system, able.
hardware and software technologies that allow spatio- temporal ... the object trajectory include tracking results from v
Viterbi algorithm, the HMM scales well to large data set. HMM-based handwriting .... There have been large databases of
input-image and supervised methods can be applied easily, many state-of-the-art systems for the recogni- tion of handwri
This is an online mistake-driven procedure initialized with ... Decision trees can, to some degree, overcome these short
Clustering Based Lifetime Maximizing Aggregation Tree for Wireless ... important issue in all facets of wireless ..... Wireless Networks, Cambridge, UK, pp. 45-52 ...
cross-validation based decision tree clustering for hmm ... - Google Sites
CROSS-VALIDATION BASED DECISION TREE CLUSTERING FOR HMM-BASED TTS. Yu Zhang. 1,2. , Zhi-Jie Yan. 1 and Frank K. Soong. 1
CROSS-VALIDATION BASED DECISION TREE CLUSTERING FOR HMM-BASED TTS Yu Zhang 1
Introduction
Microsoft Research Asia, Beijing, China
2
1,2
1
, Zhi-Jie Yan and Frank K. Soong
Shanghai Jiao Tong University, Shanghai, China
Cross-validation based decision tree clustering
◮ Conventional HMM-based speech synthesis ⊲ spectrum, excitation, and duration features are modeled and generated in a unified HMM-based framework ⊲ decision tree along with ML and MDL criteria is used for parameter tying ◮ Conventional decision tree based context clustering ⊲ ML-based greedy tree growing algorithm ⊲ MDL-based stopping criterion ◮ Cross validation-based decision tree ⊲ improve the conventional greedy splitting criterion ⊲ propose a new stopping criterion in node splitting
◮ Divide training data D
Yes
D
Λ
m
D
m
Yes
... m DK
Experiments
No
Smqy Smqn
m
\
m D1 m D2
m Λ1 m Λ2
m DK
...... m ΛK
◮ Determining the number of cross-validation folds K Table: The log spectral distortion for different K on the development set
m δ(D1 )q m δ(D2 )q
K 4 6 8 10 14 LSD (dB) 5.32 5.33 5.32 5.32 5.31 ◮ Objective Test Results
m δ(DK )q
Log spectral distance
Root mean square error of F0
Root mean square error of durations 30.8000
23.4000
m
δ(D )q
◮ MDL criterion for stopping m M DL m ML δ(D )q = δ(D )q − αL log G
⊲ likelihood increased by node splitting mqy mqn CV m CV CV CV m δ (Dk )q = Lk (Dk ) + Lk (Dk ) − Lk (Dk ) ⊲ select the best question over all validation sets X CV m qm = arg max δ (Dk )q q
23.2000
CV
5.8500
RMSE of F0 (Hz/frame)
α=1.0
5.8000 5.7500 5.7000 5.6500
α=0.5 5.6000
23.0000
α=1.0
22.8000
α=1.0
30.6000
MDL
RMSE of duraon (ms/phone)
CV
5.9000
α=0.5
22.6000
22.4000
α=0.8
30.4000
30.2000
30.0000
MDL 29.8000
CV 29.6000
5.5500
Stop automatically
Stop automatically
22.2000
Stop automatically
5.5000
29.4000
0
2000
4000
6000
8000
State number
10000
12000
22.0000
0
0
2000
4000
6000
State Number
8000
1000
2000
3000
4000
5000
10000
State Number
◮ Subjective Test Results
k
◮ Node stopping criteria ⊲ node splitting intuitively stops when X CV m δ (Dk )qm < 0
Main problems ◮ Splitting criterion: greedy search is sensitive to the biased training set ◮ Stopping criterion: not effective when training data is not asymptotically large / manually-tuned threshold
Log Spectral Distance (dB)
D
x∈Dkm
m
MDL
5.9500
◮ Node splitting criteria ⊲ evaluate likelihood on K validation sets X CV m m Lk (Dk ) = P (x|Λk )
Yes No Smqy Smqn m
◮ Training database ◮ MDL-based decision tree ⊲ Mandarin corpus, 16 kHz, 1,000 sentences, female ⊲ standard MDL method: α = 1.0 ⊲ tuning α on development set speaker ⊲ 40th-order LSP + gain, f0 ◮ Cross-Validation based decision tree ⊲ first- and second-order dynamic features ⊲ stop node splitting using intuitive criterion ⊲ 25,761 different rich context phone models ⊲ MDL-based criterion (Eq.(1))
6.0000
No R-voiced? Sm
Likelihood Increase
No R-voiced? Sm
D
⇒ Yes
into K subsets at each node
m D2
D \ m D \
◮ ML criterion for node splitting m ML δ(D )q = L(Smqy ) + L(Smqn) − L(Sm)
Experimental Setup
m D1
m
MDL-based decision tree clustering
m
1
Conclusions
k
⊲ MDL criterion can also be used X CV m δ (Dk )qm + αL log G < 0 k
⊲ in our experiments, the first criterion gives good results
(1)
◮ Use cross validation in decision tree clustering for HMM-based TTS ◮ Propose a splitting and a stopping criterion in tree building ◮ Compared with conventional method, cross-validation yields better performance given similar model size