Detecting hyperfunctional females voices using

0 downloads 0 Views 1MB Size Report
Indirect measurements of vocal function (i.e., peak to peak glottal airflow, ... (SQ), First to Second harmonic ratio (H1H2), Cepstral Peak Prominence (CPP),.
Detecting hyperfunctional females voices using inverse filtered aerodynamic measures. 1 ˜ V´ıctor M. Espinoza1,2, Mat´ıas Zanartu , Jarrad H. Van Stan3,4, Daryush D. Mehta3,4,5, Robert E. Hillman3,4,5 1 2 3 4 5

´ Department of Electronic Engineering, Universidad Tecnica Federico Santa Mar´ıa, Chile. Department of Music and Sonology, Faculty of Arts, Universidad de Chile, Chile. Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, MA, USA. Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, MA, USA. Department of Surgery, Harvard Medical School, Boston, MA, USA.

Contact email:

[email protected]

Introduction • Many of the most common voice disorders (e.g., nodules, polyps, muscle tension dysphonia) are chronic or recurring conditions that are believed to result from repeated detrimental patterns of vocal behavior, referred to as vocal hyperfunction (VH). • Subjects with these disorders often attempt to compensate for their vocal deficits, which has been shown to produce increased aerodynamic energy transfer and high collision forces. • With this in mind, the goal of the study is to extend early approaches for the objective detection of VH using inverse filtered aerodynamic measures and robust statistical methods in a database of female voices.

• An independent sample t-test for means is performed between normal and pathological groups, in both regressed and non-regressed (without covariate SPL and f0) versions. In the presence of outliers, we perform a conservative cleanup criteria of 3 standard deviation, using trimmed means (20 %) and MADN [4]. • The vocal parameters (or features) used here are: Transglottal Pressure (TP), maximum flow declination rate (MFDR), Peak to Peak airflow (ACFL), Open Quotient (OQ), Speed Quotient (SQ), First to Second harmonic ratio (H1H2), Cepstral Peak Prominence (CPP), and Normalized Amplitude Quotient (NAQ)[5, 6, 7].

Results • All results are presented in Table 1.

Methods • Indirect measurements of vocal function (i.e., peak to peak glottal airflow, maximum flow declination rate, among others) were obtained through noninvasive inverse filtered approximations to glottal airflow [1].

• Using the regressed data, the results show statistically significant differences (p 1, 96 and pn < 0.05 ∀βn, to avoid overfitting and collinearity [3].

• The most salient feature is Transglottal Pressure, follow up for Open Quotient.

• Both sets (control and pathological) are regressed using the model, i.e., the predicted response of the model is subtracted to both datasets. See figure 1 for a graphical explanation.

SPL vs log Feature Control Pathological

y = 0.51 x − 21.1 R2=0.93

30 20

30

70

80 SPL

90

20

feature fˆ Phonotraumatic tvalue (pvalue) dCohen TP Non-Regressed -4.546 (0.0000) 0.65 (medium) OQ Non-Regressed -8.332 (0.0000) 0.94 (large)

0

100

0

0.02

0.04

0.06

0.08

Conclusions

Estimated p.d.f. 10 Corrected log Feature

Corrected log Feature

10

5

0

−5

−10 50

60

70

80 SPL

90

feature fˆ Phonotraumatic tvalue (pvalue) dCohen TP Regressed -6.188 (0.0000) 1.13 (large) Regressed -2.779 (0.0058) 0.41 (medium) ACFL OQ Regressed -8.091 (0.0000) 1.26 (large) H1H2 Regressed -3.150 (0.0018) 0.48 (medium) CPP Regressed 4.407 (0.0000) 0.57 (medium)

feature fˆ Non-Phonotraumatic tvalue (pvalue) dCohen TP Regressed -3.735 (0.0002) 0.61 (medium)

10

60

Table 1: Results for significant differences (p< 0.1, Bonferroni corrected) for regressed and non-regressed data versions, rejected the null hyphothesis. The t and p value from the test is tabulated, and the effect size dCohen as well.

Similar mean’s and dispersion

40

10 0 50 Correction using model

• For Phonotraumatic and Regressed case, multiple features reject the null hypothesis of equal means.

Estimated p.d.f.

log Feature

log Feature

40

• The non-regressed data yielded minimal findings when comparing the normative group to the phonotraumatic (1 of 8 were significantly different) and non-phonotraumatic (0 of 8 were significantly different) groups.

100

Greater separation after correction

• We have presented a simple approach to apply a t-test to detect VH in female voices.

5

• Using regression for correcting/adjusting for SPL and f0 increase the sensitivity of the inverse filtered measures in the independent-samples t-test.

0

• Our findings show which features are more useful to detect hyperfunctional behavior in this kind of experiment.

−5

−10

0

0.1

0.2

0.3

0.4

Forthcoming Research Further research will be conducted using multivariate models, discriminant analysis and machine learning techniques to improve discrimination.

Figure 1: A conceptual example correcting vocal parameters with SPL. Top-Left: SPL vs logged feature, with a fitted line for controls, i.e., the model. Top-Right: Estimated p.d.f. for the logged feature without any correction, shown similar central tendency and dispersion. Bottom-Left: Both sets (control and pathological) are regressed using the model, i.e., the predicted response of the model is subtracted to both datasets. Bottom-Rigth: Estimated p.d.f. shown greater separation after correction. Vertical gray lines indicate the transition of non-regressed to regressed data. Red and Blue dash-dot line, point out p.d.f. peak.

Acknowledgements V´ıctor M. Espinoza acknowledges the support of CONICYT and Universidad de Chile.

References [1] J. S. Perkell, E. B. Holmberg, and R. E. Hillman, “A system for signal processing and data extraction from aerodynamic, acoustic, and electroglottographic signals in the study of voice production,” The Journal of the Acoustical Society of America, vol. 89, no. 4, pp. 1777–1781, 1991. [2] R. Maronna, R. Martin, and V. Yohai, Robust Statistics: Theory and Methods. John Wiley & Sons, Ltda, 2006. [3] G. James, G. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning, G. Casella, S. Fienberg, and I. Olkin, Eds. Springer, 2015. [4] R. R. Wilcox, Fundamentals of Modern Statistical Methods: Substantially Improving Power and Accuracy. Springer, 2010. [5] R. E. Hillman, E. B. Holmberg, J. S. Perkell, M. Walsh, and C. Vaughan, “Objective assessment of vocal hyperfunction: An experimental framework and initial results,” J Speech Hear. Res., vol. 32, pp. 373–392, 1989. ¨ ¨ [6] P. Alku, T. Backstr om, and E. Vilkman, “Normalized amplitude quotient for parametrization of the glottal flow,” The Journal of the Acoustical Society of America, vol. 112, no. 2, pp. 701–710, 2002. [Online]. Available: http://scitation.aip.org/content/asa/journal/jasa/112/2/10.1121/1.1490365 [7] E. B. Holmberg, R. E. Hillman, and J. S. Perkell, “Glottal air-flow and transglottal air-pressure measurements for male and female speakers in soft, normal, and loud voice,” J. Acoust. Soc. Am., vol. 84, pp. 511–529, 1988.