Estimating dysphonia severity in continuous speech

3 downloads 5504 Views 472KB Size Report
Judges were asked to rate the severity of voice quality using a custom. MATLAB ... SigmaPlot 10.0 for Windows (Systat Software Inc., San Jose, CA). Prior to ...
Clinical Linguistics & Phonetics, November 2009; 23(11): 825–841

Estimating dysphonia severity in continuous speech: Application of a multi-parameter spectral/cepstral model

SHAHEEN N. AWAN1, NELSON ROY2, & CHRISTOPHER DROMEY3 1

Bloomsburg University of Pennsylvania, Bloomsburg, PA, USA, 2The University of Utah, Salt Lake City, UT, USA, and 3Brigham Young University, Provo, UT, USA (Received 21 May 2009; accepted 6 August 2009)

Abstract The purpose of the study was to identify a sub-set of spectral/cepstral-based analysis methods that would most effectively predict dysphonia severity (as estimated via auditory-perceptual analysis) in samples of continuous speech. Acoustic estimates of dysphonia severity were used as an objective treatment outcomes measure in a set of pre- vs post-treatment speech samples. Pre- and post-treatment continuous speech samples from 104 females with primary muscle tension dysphonia (MTD) were rated by listeners using a 100 point visual analogue scale (VAS) and analysed acoustically with spectral/ cepstral-based measures. Stepwise linear regression produced a three-factor model consisting of the cepstral peak prominence (CPP); the mean ratio of low-to-high frequency spectral energy; and the standard deviation of the ratio of low-to-high frequency spectral energy that was strongly correlated with perceived dysphonia severity ratings (R ¼ .85; R2 ¼ .73). Mean differences between predicted vs perceptual ratings for pre- and post-treatment speech samples were < 6 points on the 100 point VAS; mean absolute differences between predicted and perceived ratings were < 16 points on the 100 point VAS (equivalent to within one scale value on commonly used 7-point equal-appearing interval rating scales). A multi-parameter acoustic model consisting of spectral/cepstral-based measures shows considerable promise as an objective measure of dysphonia severity in continuous speech, even across the diverse voice types and severities observed in pre- and post-treatment MTD speech samples.

Keywords: Cepstrum, spectral analysis, continuous speech analysis, muscle tension dysphonia, auditory-perceptual ratings, dysphonia

Introduction The objective evaluation of normal and disordered voice often includes acoustic measurements of sustained vowel productions that are related to the quality of the voice. Voice quality disruptions such as breathiness, roughness, and hoarseness have been reported to relate to measures such as jitter, shimmer, and harmonic-to-noise ratio (HNR). Jitter and shimmer are measures of short-term instability that quantify cycle-to-cycle variations in fundamental frequency and amplitude, respectively (Hartelius, Buder, and Strand, 1997). Harmonicsto-noise ratio (HNR) is a method that detects cycle-to-cycle differences in the profile of the Correspondence: Shaheen N. Awan, PhD, Department of Audiology & Speech Pathology, Bloomsburg University of PA, Centennial Hall, 400 East Second St., Bloomsburg, PA 17815-1301, USA. E-mail: [email protected] ISSN 0269-9206 print/ISSN 1464-5076 online © 2009 Informa UK Ltd. DOI: 10.3109/02699200903242988

826

S. N. Awan et al.

voice signal, and was initially designed as a method to quantify spectral noise. While several investigators have reported significant correlations between acoustic perturbation measures and voice quality categories (Takahashi and Koike, 1975; Yumoto, Gould, and Baer, 1982; Wolfe and Steinfatt, 1987; Martin, Fitch, and Wolfe, 1995; Wolfe and Martin, 1997), traditional perturbation measures depend on accurately identifying cycle boundaries (i.e. where a cycle of vibration begins and ends) along the time axis of the sound wave (i.e. timebased analyses). Unfortunately, it is clear that the presence of significant noise in the voice signal makes it more difficult to accurately locate these cycle onsets/offsets. This problem introduces error in tracking the periodic vibration of the voice signal and thus contributes to inaccuracy in perturbation measurements (Hillenbrand, 1987). As a result, some researchers have questioned the appropriateness, validity, and clinical usefulness of specific perturbation measures, especially when applied to moderately or severely disordered voices (Rabinov, Kreiman, Gerratt, and Bielamowicz, 1995; Bielamowicz, Kreiman, Gerratt, Dauer, and Berke, 1996). In addition to being unreliable with highly perturbed signals, measures such as jitter, shimmer, and HNR are valid only for sustained vowels produced with steady pitch and loudness. Any purposeful changes in vocal pitch or loudness will be measured as increases in vocal perturbation, even though these measurements may not reflect vocal abnormality. Unfortunately, sustained vowels may not ideally reflect the vocal characteristics of the patient. Several authors have indicated that continuous/running speech may provide a more valid assessment of the patient’s control of vocal parameters such as vocal quality, and may correlate better with perceptions of dysphonia (Qi, Hillman, and Milstein, 1999; Halberstam, 2004; Laflen, Lazarus, and Amin, 2008; Maryn, Cornthals, Van Cauwenberge, Roy, and De Bodt, in press). In some cases, the signs and symptoms of dysphonia may be most prominent to both patient and clinician during conversational speech (Yiu, Worrall, Longland, and Mitchell, 2000). In addition, continuous speech incorporates important vocal attributes such as rapid voice onset/termination and variations in fundamental frequency and amplitude that may be highly relevant to the perception of dysphonia in everyday situations and to clinical decisions regarding the voice quality of the patient (Hammarberg et al., 1980; Parsa and Jamieson, 2001). Furthermore, in certain voice disorders (e.g. spasmodic dysphonia), sustained vowel productions are less symptomatic than connected speech and may lead to an under-estimation of the impairment (Roy, Gouse, Mauszycki, Merrill, and Smith, 2005). While several researchers have applied traditional time-based measures to the analysis of running speech (Klingholz, 1990; Hall and Yairi, 1992; Zhang and Jiang, 2008), perturbation measures such as jitter and shimmer will be expected to show artificially inflated values due to possible interactions with unvoiced segments and intonation patterns. In addition, measures of jitter and shimmer could be adversely affected by the relatively short duration vowel segments observed in normal continuous speech vs the relatively long duration vowel samples (generally 1 second or greater in duration) typically analysed with traditional perturbation measures (Zhang and Jiang, 2008). The use of spectral/cepstral methods in voice analysis Rather than relying on time-based acoustic methods which have questionable validity when applied to non-stationary signals, several authors have attempted to use spectral-based acoustic methods to analyse normal and disordered voice quality in running speech. In particular, cepstral analysis has been reported to show considerable promise as a measure

Estimating dysphonia severity in continuous speech

827

of dysphonia in both sustained vowel and running speech contexts. Cepstral analysis was originally described by Noll (1964) as a procedure for extracting the fundamental frequency from the spectrum of a sound wave. The cepstrum has been described as a Fourier transform of the logarithm power spectrum (Baken, 1987), and may be used to graphically display the extent to which the vocal fundamental frequency is individualized and emerges out of the background noise level. In contrast to traditional time-based measures such as jitter and shimmer, the principal advantage of spectral analysis methods is the capacity to produce estimates of aperiodicity and/or additive noise without the identification of individual cycle boundaries. A number of studies have demonstrated the effectiveness of measures derived from cepstral analysis to quantify dysphonic voice characteristics in sustained vowel productions. Hillenbrand, Cleveland, and Erickson (1994) observed that the relative amplitude of the dominant cepstral peak (i.e. CPP—Cepstral Peak Prominence) was among the strongest correlates of the severity of breathy voice quality, and that an inverse relationship existed whereby increased severity of perceived breathiness was related to decreased relative amplitude of the CPP. De Krom (1995) described a method of quantifying spectral noise derived from a comb-liftering operation of the cepstrum in vowel segments obtained from normal and dysphonic speakers. The comb-liftering procedure attempted to separate the cepstral peak (‘rahmonic’) amplitudes from the non-rahmonic samples. Results indicated that harmonicsto-noise ratios (HNRs) derived from the cepstrum were ranked as strong single predictors of severity in both rough and breathy voice samples, and were also strong contributors to multiple regression predictions of vocal severity ratings. In a study of male and female dysphonic voice samples, Wolfe, Martin, and Palmer (2000) reported strong correlations between measures derived from the cepstral peak prominence and degree of abnormality in female voices. Wolfe et al.’s (2000) results indicated that increased abnormality was associated with a decrease in overall CPP (i.e. lower harmonic energy) and an increase in high frequency (> 2500 Hz) spectral energy. Similar findings were reported by Hartl, Hans, Vaissiere, Riquet, and Brasnu (2001), who observed that measures of the relative amplitude of the cepstral peak and ratios of low-to-high frequency spectral energy were sensitive to voice change in two patients before and after the onset of iatrogenic (i.e. treatment-induced) unilateral vocal fold paralysis. Recent studies by Awan and Roy (2005; 2006) have also demonstrated the effectiveness of multi-parameter models incorporating automatic cepstral analysis in predicting both severity of dysphonia and vocal quality type (normal, breathy, hoarse, rough). In the Awan and Roy (2005) study, discriminant analysis produced a multiparameter model that correctly classified voice quality type with 79.9% accuracy in a diverse set of normal and disordered voices. In a second study (Awan and Roy, 2006), stepwise multiple regression analysis indicated that a similar multi-parameter model was able to strongly predict perceived severity of dysphonia (mean R ¼ .88). In both studies, a cepstralbased measure was determined to be the most significant contributor to the prediction of dysphonia severity and type. Most recently, Awan and Roy (2009) assessed the ability of an acoustic model composed of both time- and spectral-based measures to track change in the sustained vowel characteristics of females with primary muscle tension dysphonia treated using manual circumlaryngeal therapy. Results showed that acoustically predicted severity scores were strongly associated with perceived dysphonia severity ratings for pre-treatment, post-treatment, and change in dysphonia severity. These authors concluded that their acoustic model and predicted dysphonia severity scores showed promise as a sensitive and objective outcomes measure, even with extremely perturbed pre-treatment voice samples that would be difficult to analyse using traditional time-based perturbation measures.

828

S. N. Awan et al.

Acoustic analysis of continuous/running speech Several studies have also applied spectral/cepstral methods to the analysis of continuous speech segments. Hillenbrand and Houde (1996) reported strong correlations between perceptual ratings of breathiness and cepstral measures obtained from samples of the second sentence of ‘The Rainbow Passage’ (Fairbanks, 1960) in 20 breathy and five non-disordered speakers. Results indicated that measures of the cepstral peak prominence (CPP) strongly correlated with perceived ratings of breathiness. Because their study was focused on the characteristics of breathy voice, Hillenbrand and Houde (1996) could not speculate whether the measures would predict overall dysphonia ratings from a set of voices with a broader range of dysphonia qualities. Qi et al. (1999) used linear prediction analyses to estimate the signalto-noise ratio (SNR) of continuous speech. Using this method, the speech signal was decomposed into correlated/predictable components (signal) and uncorrelated/unpredictable components (noise). This procedure was applied to samples of ‘The Rainbow Passage’ obtained from 87 voice-disordered speakers. The voice samples were also perceptually rated using both categorical (7-point scale ranging from 1 ¼ normal to 7 ¼ aphonic) and continuous direct magnitude scaling (a standard voice sample was assigned the number 100; judges were free to rate voices as high or low as necessary in relation to this standard). Results indicated correlations between categorical ratings and SNRs of r ¼ .76 and between continuous ratings and SNRs of r ¼ .78. Parsa and Jamieson (2001) calculated traditional time-based perturbations measures of jitter and shimmer, as well as spectral-based measures of spectral tilt (ratios of low vs high frequency spectral energy) and a frequency domain harmonic-to-noise ratio via cepstral comb-liftering for samples of ‘The Rainbow Passage’ for 53 normal and 175 voice-disordered patients. Results indicated that the overall spectral tilt, the frequency domain HNR, and a measure of spectral flatness resulted in 96% normal vs disordered classification accuracy. The severity of dysphonia for these samples was not indicated. Heman-Ackah, Michael, and Goding (2002) have also reported that measures derived from the cepstral peak (both in continuous speech and in sustained vowel samples) were the strongest individual correlates of overall dysphonia and ratings of breathiness. Cepstral measures were also significantly related to ratings of roughness, although the authors felt that too little variance was accounted for in the prediction of ratings of roughness to make them clinically applicable. In a subsequent study, Heman-Ackah, Heuer, Michael, Ostrowski, Horman, Baroody, Hillenbrand, and Sataloff, (2003) applied measures of the cepstral peak prominence to 281 running speech samples (176 female; 105 male). The samples had also been rated for severity using an undifferentiated 100 mm scale with normal ¼ 0 and most abnormal ¼ 100. Results indicated an overall sensitivity of 87% and a specificity of 90% in detecting overall dysphonia from running speech. These authors concluded that cepstral measures obtained from running speech had better sensitivity, specificity, and positive and negative predictive values than did time-based measures of perturbation. Later, Halberstam (2004) applied Hillenbrand and Houde’s (1996) measures of the cepstral peak prominence to 60 normal and disordered samples of ‘The Rainbow Passage’. Speech samples were rated on a 7-point severity scale. A strong correlation between measures of the CPP and perceived hoarseness was observed. In addition, correlations between cepstral measures from speech and perceived dysphonia were stronger than those observed between perceptual judgements and sustained vowel measurements. Halberstam (2004) concluded that cepstral measures appeared to be more valid measures of perceived hoarseness than traditional acoustic measures. Recently, Laflen et al. (2008) used spectral

Estimating dysphonia severity in continuous speech

829

and cepstral analyses to examine the relative deviation of the fundamental frequency and the intensity of the cepstral peak in vowel, CVC, and continuous speech samples (‘How are you?’) obtained from 10 normal and 31 voice-disordered speakers. Results indicated that spectral/ cepstral measures from connected speech samples effectively discriminated between normal vs disordered samples and showed promise as a tool for the analysis of dynamic voice. Statement of purpose Previous studies with both sustained vowel and continuous speech samples have indicated that measures of the cepstral peak may be strong indicators of dysphonia and indices of dysphonia severity. In addition, several studies have indicated that cepstral measures may be supplemented by other acoustic measures (such as ratios of low vs high frequency spectral energy) to account for larger degrees of variance between perceived and acoustically predicted severity ratings (Wolfe et al., 2000; Hartl et al., 2001; Awan and Roy, 2006; 2009). To date, no study has assessed the ability of these measures to track voice improvement in continuous speech following treatment. Voice clinicians need a robust, fairly automatic measure of dysphonia severity in continuous speech that relates well to listener ratings, and is sensitive to diverse voice qualities as well as to improvements following management. In this regard, the purpose of this study was to identify a sub-set of spectral/ cepstral-based methods which could be used to (1) most effectively predict dysphonia severity (as estimated via auditory-perceptual analysis) in a set of pre- vs post-treatment samples of continuous speech; and (2) serve as a potential objective treatment outcomes measure in voice disorders. Methodology Participants Pre- and post-therapy voice recordings were selected from an archival database of patients with muscle tension dysphonia (MTD) collected by the second author during routine clinical practice. Participants were included in the database based upon their positive response to a single voice therapy session. The University of Utah Institutional Review Board approved the use of these voice samples and waived the requirement to obtain new consent from the participants. From this database, pre- and post-treatment voice samples from 104 women with primary muscle tension dysphonia (MTD) were identified for analysis (208 voice samples in total; mean participant age ¼ 46.4 years; SD ¼ 13.7). Primary muscle tension dysphonia (MTD) refers to a voice disturbance that occurs in the absence of structural or neural pathology, and is characterized by a variety of voice qualities and severities. MTD may account for 10–40% of cases referred to multidisciplinary voice clinics (Bridger and Epstein, 1983; Koufman and Blalock, 1991; Schalen and Andersson, 1992; Sama, Carding, and Price, 2001). All participants received the diagnosis of primary MTD after a speech-language pathologist and otolaryngologist specializing in voice disorders completed a full medical and voice history, as well as a transoral and/or nasendoscopic laryngeal examination. The examination revealed the following results: (a) a voice disturbance in the absence of any visible structural vocal fold pathology or mucosal disease, (b) no discernable neurological pathology including vocal fold paresis, paralysis, and/ or motor speech disturbances,

830

S. N. Awan et al.

(c) no previous laryngeal surgery, and (d) no coexisting upper respiratory infections at the time of the examination. Following diagnosis, treatment was completed in a single extended session using manual circumlaryngeal techniques. A detailed description of the assessment and treatment procedures is outlined in Roy and Bless (1998). Briefly, each participant underwent a case history, a traditional voice evaluation, and an assessment of musculoskeletal tension. Manual laryngeal reposturing maneouvres and/or circumlaryngeal massage were implemented to stimulate improved voice. Speech sample and acoustic analyses As part of a standard clinical test battery, each participant was asked to read ‘The Rainbow Passage’ at a comfortable pitch and loudness. Voice samples were recorded using a Shure Prologue Model 14H Dynamic microphone (Shure Inc., Niles, IL) and digitized at 25 kHz and 16 bits of resolution using the Computerized Speech Lab (CSL Model 4300, Kay Elemetrics, Pinebrook, NJ). At a later time, the speech samples were edited to include only the 2nd and 3rd sentences (‘The rainbow is a division of white light into many beautiful colours. These take the shape of a long round arch with its path high above and its two ends apparently beyond the horizon’). All samples were analysed using a Windows-based computer program developed by the first author. The program included an implementation of spectral and cepstral analysis methods (Hillenbrand and Houde, 1996; Awan and Roy, 2005; 2006; 2009) that have been used to characterize voice quality type and predict dysphonia severity in normal and disordered voice samples. All of the spectral/cepstral measures to be used in this study were obtained in a single program with a common set of core algorithms. Unlike the computer algorithms employed in previous studies by Awan and Roy (2005; 2006; 2009) which combined spectral/cepstral based acoustic methods with time-based measures such as shimmer and pitch sigma, the computer algorithms used in this study are solely spectral based and do not depend upon the identification of cycle boundaries for any of the measurements obtained. The following describe the basic procedures used in the analysis of the continuous speech samples: (1)

(2)

The speech sample was divided into a series of 1024-point overlapping frames (75% overlap). For each analysis frame, a Hamming window was applied and a 1024-point discrete Fourier transformation (DFT) was computed. As described by Baken (1987), the DFT was then converted to the log power spectrum, followed by a second DFT. This procedure results in the cepstrum (essentially a Fourier transform of a Fourier transform). The cepstrum of a highly periodic signal is characterized by a prominent peak which is the dominant rahmonic (i.e. fundamental period) of the signal, and has been referred to as the cepstral peak prominence (CPP—Hillenbrand et al., 1994; Hillenbrand and Houde, 1996). As described by Hillenbrand and Houde (1996), a combination of time and frequency averaging can aid in smoothing the cepstrum prior to identification of the CPP. In this study, a 7-frame cepstral averaging was carried out, with each smoothed cepstral frame being calculated from the average of the current frame with the three previous and three subsequent cepstral frames. Cepstral averaging across time was followed by 11-bin frequency averaging, in which each cepstral coefficient (i.e. data value observed on the abscissa of the cepstrum) was replaced by the average of the

Estimating dysphonia severity in continuous speech

831

Figure 1. Application of spectral/cepstral analysis methods to a sample of a continuous speech sample (the 1st and 2nd sentences from ‘The Rainbow Passage’). The upper window (‘A’) shows the varying CPP (dB) over time; the lower window (‘B’) shows the raw fundamental frequency (Hz) contour. Window ‘C’ provides an example of a smoothed cepstral frame. The cepstral peak has been automatically identified. A linear regression line has been computed for normalization of the CPP.

(3)

(4)

current coefficient with the five previous and five subsequent cepstral coefficients. Figure 1 provides an example of a smoothed cepstral frame. For each frame, several acoustic measures were computed. From the original unsmoothed window, a ratio of low/high frequency energy was calculated as a measure of spectral tilt (referred to as the DFT Ratio (DFTR) in Awan and Roy, 2005; 2006) using a 4000 Hz cut-off and reported in decibels. From the smoothed cepstral frames, the ratio of the CPP to the expected amplitude of the CPP, as estimated via linear regression analysis, was computed. For the purposes of this study, the search for the cepstral peak was restricted to frequencies of 3.3–16.7 ms (300–60 Hz; Hillenbrand and Houde, 1996). Once analysis was completed for all of the analysis windows, the mean DFTR and CPP, as well as their respective standard deviations (SDs), were calculated for the entire signal. In addition, pilot work had indicated that measures of the mean regression line slope and its standard deviation in the region of frequencies  2 ms (frequencies  500 Hz) may also be indicative of the severity of dysphonia. For the purpose of this study, the regression line slope for each smoothed cepstral frame was normalized by dividing the slope by the amplitude of the cepstral peak within each frame. Since the magnitude of the slope as compared to the cepstral peak is often quite small, this ratio was reported using two methods: (1) as a decibel value and (2) as a direct ratio multiplied by a constant (multiplied by 104) to make the resulting value more manageable. Because the various spectral/cepstral measures were to be averaged across relatively long duration samples of non-stationary running speech affected by vowel-to-consonant transitions and intonation, it was reasoned that measures of the average variability (standard deviation) of each of the key variables would be important to collect. Previous studies have indicated that measures of

832

S. N. Awan et al. variability may be effective in characterizing the severity of the voice (Wolfe and Steinfatt, 1987; Callan, Kent, Roy, and Tasko, 1999; Awan and Roy, 2006; 2009). The computer program then displayed the varying CPP over time for the speech sample (see Figure 1) and saved all computed statistics in a data file.

As in Hillenbrand and Houde (1996), no purposeful attempt was made to separate unvoiced from voiced segments of ‘The Rainbow Passage’ samples. However, pilot work indicated that removal of signals which had normalized CPP values < 0 dB helped to remove low amplitude signals often associated with breath sounds and/or portions of unvoiced consonants without excessively removing segments of intended voicing in severely dysphonic samples. Auditory-perceptual severity ratings Judges were five master’s degree students in communication disorders, who had been exposed to disordered voice samples during their coursework. They were not experienced in treating dysphonia, and did not receive extensive training for the rating task. The judges rated pre- and post-treatment continuous speech samples (the 2nd and 3rd sentences of ‘The Rainbow Passage’) from 104 speakers (208 samples in total) via headphones at self-selected comfortable loudness levels. All samples were presented in randomized order for each judge. Judges were asked to rate the severity of voice quality using a custom MATLAB (The Mathworks Inc., Natick, MA) routine which allowed the user to adjust an on-screen slider to a particular position on a 100-point visual analogue scale (VAS). One end of the scale was labelled as ‘normal’ and the other end as ‘profoundly abnormal’, with higher numbers reflecting increased severity of dysphonia. Inter-judge reliability for the dysphonia severity ratings was assessed using the intra-class correlation coefficient (ICC; McGraw and Wong, 1996), a measure of the degree of consistency among judges. The ICC may be used to assess inter-judge reliability when more than two judges are involved and interval/ratio data are obtained (Sheskin, 2004). Results indicated an Average Measures ICC ¼ .97 (95% confidence interval of .96–.98) and a Single Measure ICC ¼ .87 (95% confidence interval of .84–.89) for the rating of dysphonia severity. These results indicated that the judges could differentiate between the different levels of dysphonia severity; additionally, the average rating scores of the five judges were highly reliable, despite any differences in rating the severity of dysphonia. In addition to the ICC analyses, the mean inter-judge correlation (mean Pearson’s r ¼ .87; SD ¼.03) was considered to be strong and acceptable for the purposes of this study. Results Statistical analyses were computed using SPSS v.15.0 (SPSS Inc., Chicago, IL) and SigmaPlot 10.0 for Windows (Systat Software Inc., San Jose, CA). Prior to computing multiple regression analyses, a series of Pearson correlations were computed amongst the various independent variables to assess for highly inter-correlated variables that may result in multicollinearity in the multiple regression analysis. Results indicated that the mean CPP, CPP SD, normalized Slope (in both dB and as a direct ratio), and Slope SD (in dB) were highly correlated (r’s > .90). While the CPP, CPP SD, normalized Slope (in both dB and as a direct ratio  104), and Slope SD (in dB) were all observed to be strong individual correlates of

Estimating dysphonia severity in continuous speech

833

perceived dysphonia severity (r’s  .80), it was decided to retain the CPP measure in light of its extensive use and reporting in the extant literature. Results across all 104 pairs of pre- and post-treatment voice samples (208 samples in total) indicated that the remaining variables combined in a three-factor model composed of CPP, DFTR SD, and DFTR to correlate with mean perceived severity with R ¼ .85 (R2 ¼ .73; see Figure 2). Table I provides the step-wise order in which the three acoustic variables entered significantly into the multiple regression equation, with each additional variable resulting in a significant change in R2 and F-value. The standardized beta coefficients indicated that the CPP was the strongest contributor to the three-factor predictive model and was the strongest individual correlate with listener perceived dysphonia severity (r ¼ .81; r2 ¼ .66). While previous studies have achieved cross-validation of the R2 value via data splitting (Awan and Roy 2006; 2009), calculation of the adjusted R2 using Stein’s equation (Stevens, 1992; Field, 2005) can also provide an indication of how much variance would be accounted for in the event that the model was to be applied to alternative samples. The adjusted R2 was equal to .72 for the predictive model across all 208 voice samples, indicating that the loss of predictive power when applied to alternative samples would be minimal.

100.00

Acoustically Predicted Rating

80.00

60.00

40.00

20.00

R2 = 0.73 0.00

0.00

20.00

40.00 60.00 Listener Perceived Rating

80.00

100.00

Figure 2. Scatterplot of listener perceived dysphonia ratings vs acoustically predicted dysphonia ratings from continuous speech. Listener perceived ratings were obtained using a 100-point visual analogue scale.

834

S. N. Awan et al.

Table I. The relative contribution of each acoustic variable to the multiple regression analysis. Multiple correlation R, R2, and change in R2 values are provided. Acoustic variable CPP DFTR SD DFTR

R

R2

R2 change

F change

df

p

Standardized beta coefficient

.81 .84 .85

.66 .71 .73

.66 .05 .02

402.95 34.80 11.72

1, 206 1, 205 1, 204

< .001 < .001 .001

.60 .21 .17

CPP, the ratio of the amplitude of the cepstral peak prominence to the expected cepstral amplitude; DFTR, the discrete Fourier transformation ratio; SD, standard deviation. R, multiple correlation; df, degrees of freedom.

The following multiple regression equation was used to calculate a predicted dysphonia severity rating: Predicted continuous speech dysphonia severity ¼ 154:59  ðCPP  10:39Þ  ðDFTR SD  3:71Þ  ðDFTR  1:08Þ

Using this formula, a predicted dysphonia severity was computed for each voice sample. The mean predicted dysphonia severity rating across all 208 voice samples was 41.77 (SD ¼ 30.00) vs the mean perceived severity rating of 41.78 (SD ¼ 35.20). A paired t-test indicated no significant difference between mean predicted vs mean perceived dysphonia severity ratings (t ¼ .01; df ¼ 207; p ¼ .97). The mean absolute difference between predicted and perceived ratings across all 208 voice samples was 14.79 (SD ¼ 10.91). Pre- and post-treatment severity estimation Predicted dysphonia severity ratings were computed for all 104 pre-treatment voice samples. Results showed a mean pre-treatment predicted dysphonia severity rating of 63.47 (SD ¼ 27.02) vs the mean pre-treatment perceived rating of 69.27 (SD ¼ 26.83). A paired t-test indicated a significant difference between mean predicted vs mean perceived pre-treatment dysphonia severity ratings (t ¼ 3.24; df ¼ 103; p ¼ .002). The mean absolute difference between predicted and perceived ratings for the 104 pre-treatment voice samples was 15.12 (SD ¼ 11.73). In a similar fashion, post-treatment severity ratings were computed for the 104 post-treatment samples. Results showed a mean post-treatment predicted dysphonia severity rating of 20.07 (SD ¼ 11.31) vs the mean post-treatment perceived rating of 14.28 (SD ¼ 15.61). A paired t-test indicated a significant difference between mean predicted vs mean perceived post-treatment dysphonia severity ratings (t ¼ 3.54; df ¼ 103; p ¼ .001). The mean absolute difference between predicted and perceived ratings for the 104 post-treatment voice samples was 14.46 (SD ¼ 10.07). Estimation of pre- vs post-treatment change Paired t-tests were computed to determine whether significant changes in each of the three components (CPP, DFTR, and DFTR SD) of the predictive dysphonia severity model were observed for the pre- vs post-treatment voice samples. Results (see Table II) indicated significant differences in all pre- vs post-treatment comparisons, with significant increases in all variables following treatment. In addition, a series of paired t-tests were computed to

Estimating dysphonia severity in continuous speech

835

Table II. Pre- vs post-treatment means and standard deviations for the three acoustic variables included in the predictive dysphonia severity model for continuous speech. Variable CPP (dB) DFTR SD (dB) DFTR (dB)

Pre-treatment

Post-treatment

Paired t results

p-value

2.73 (1.61) 11.01 (2.16) 20.30 (5.45)

5.86 (0.87) 12.57 (1.37) 25.01 (4.19)

t ¼ 21.50, df ¼ 103 t ¼ 7.33, df ¼ 103 t ¼ 9.13, df ¼ 103

< .001 < .001 < .001

CPP, the ratio of the amplitude of the cepstral peak prominence to the expected cepstral amplitude; DFTR, the discrete Fourier transformation ratio; SD, standard deviation.

Table III. Mean listener perceived severity ratings and acoustically predicted severity ratings for pre- and posttreatment continuous speech samples. Pre- vs post-treatment change is also provided, with the negative number indicative of a reduction in severity. Standard deviations are provided in brackets.

Pre-treatment Mean listener rating Mean predicted rating

69.27 (26.83) Range: 8.62–100.00 63.47 (27.03) Range: 7.09–116.05

Post-treatment 14.28 (15.61) Range: 0.00–72.86 20.07 (11.31) Range: 6.09–55.20

Paired t results

Pre- vs postchange**

t ¼ 19.51, df ¼ 103* 54.99 (28.75) t ¼ 16.44, df ¼ 103* 43.40 (26.92)

* p < .001. ** A paired t-test indicated a significant difference between listener perceived pre-post change in dysphonia severity and acoustically predicted change (t ¼ 6.18; df ¼ 103; p < .001).

ascertain whether significant differences existed between (a) pre- vs post-treatment mean perceived severity ratings and (b) pre- vs post-treatment predicted severity ratings. Significant results were observed in both instances, with post-treatment mean perceived severity and post-treatment predicted values both significantly lower than pre-treatment observations (see Table III). In addition, treatment change scores were computed by subtracting posttreatment from pre-treatment ratings. Results indicated a mean treatment change of 54.99 (SD ¼ 28.75) in perceptual dysphonia severity vs 43.40 (SD ¼ 26.93) in predicted dysphonia severity (the negative sign is indicative of a reduction in dysphonia severity; see Table III). Figure 3 provides the distributions of perceived vs predicted dysphonia severity ratings for the pre- and post-treatment continuous speech samples. The distributions indicate a tendency for perceived ratings to be either negatively skewed (in the pre-treatment samples) or positively skewed (in the post-treatment samples), reflecting a possible ‘end-effect’ in perceptual ratings. In contrast, the predicted ratings more closely approach normal distributions for both pre- and post-treatment samples. Discussion The application of spectral/cepstral measures via a multi-parameter model shows considerable promise as an objective measure of dysphonia severity in continuous speech samples. The results confirmed that robust predictions of listener-perceived dysphonia severity can be achieved across the range of voice qualities typically observed in pre- and post-treatment speech samples of primary MTD. The CPP, as a single measure, was a very strong correlate of

836

S. N. Awan et al.

100

Dysphonia Severity

80

60

40

20

0

Perceived Severity Pre-Tx Perceived Severity Post-Tx Predicted Severity Pre-Tx Predicted Severity Post-Tx Figure 3. Distributions of perceptual and predicted dysphonia severity ratings for 104 pre- vs 104 post-treatment continuous speech samples.

perceived dysphonia severity (r ¼ .81), accounting for ,66% of the shared variance between listener perceived and acoustically predicted dysphonia severity. However, the addition of other measures obtained via the same spectral/cepstral analysis procedures significantly increased the amount of shared variance to 73%. Since all of these measures can be obtained efficiently via a common core of spectral/cepstral analysis procedures, it appears advisable to combine measures of the CPP with measures such as DFTR and the DFTR SD to strengthen dysphonia predictions across a wide range of dysphonia severities and types. As reported in a number of previous studies with both sustained vowels and continuous speech, a measure of the cepstral peak prominence (CPP) was observed to be the strongest single predictor of dysphonia severity and was observed to have an inverse relationship with perceived dysphonia severity. Increases in CPP post-treatment were indicative of an increase in the amplitude of the dominant rahmonic corresponding to the fundamental frequency in relation to other spectral frequencies. In normal or near-normal speech samples, the cepstral amplitude is relatively large as compared to the expected cepstral peak amplitude estimated via linear regression. However, as dysphonia severity increases, the cepstral peak amplitude decreases and the amplitude of other cepstral coefficients increase—the result is a reduction in the CPP. In addition to pre-/post-treatment changes in the CPP, a ratio of low vs high frequency spectral energy (DFTR) was observed to contribute to the prediction of dysphonia

Estimating dysphonia severity in continuous speech

837

severity. This measure may be particularly important in predicting the severity of breathy voice, which has been characterized by spectral noise especially noticeable above 2–3 kHz (Hillenbrand and Houde, 1996; Wolfe et al., 2000; Hartl et al., 2001; Awan and Roy, 2006; 2009). As dysphonia severity decreased post-treatment, a reduction in high frequency spectral energy likely occurred, particularly for those patients who presented originally with breathy/hoarse voice quality. While all three variables in the final predictive model (CPP, DFTR SD, DFTR) were observed to differ significantly from pre- to post-treatment (see Table II), the direction of change for the DFTR SD variable was somewhat unexpected. While the variability in sustained voice signal characteristics such as fundamental frequency and amplitude is generally expected to be reduced in normal or near-normal productions (Callan et al., 1999; Awan and Roy, 2005; 2006), the DFTR SD variable was observed to significantly increase in post-treatment continuous speech samples. A possible explanation for the increased DFTR SD in post-treatment speech samples, as compared to pre-treatment, may relate to the effect of transitions from consonant to vowel and vowel to consonant. A review of specific CV and VC transitions in many of our samples indicated that, in normal and near-normal voices, there was often a clear distinction and transition between aperiodic or mixed aperiodic/periodic speech signal productions (as observed in true consonant productions) and highly periodic vowel productions. This transition results in increased variability in the CPP and DFTR and an increased DFTR SD. In contrast, voices that have increased dysphonia severity tend to be consistently unstable and, therefore, do not transition as effectively to or from quasi-periodic vowel production. In a number of dysphonic voice samples, the production of ‘noise’ was relatively constant, regardless of whether the patient was attempting vowel or true consonant production. The result was reduced variability in the DFTR SD for more severely dysphonic voices (as observed in the pre-treatment samples). While many traditional voice quality measures (jitter, shimmer, etc.) and previous applications of cepstral analyses have been applied to the central portions of sustained vowels, it may be that the transitions from consonant (particularly unvoiced) to vowel and vice versa present a valuable area of investigation regarding the characterization of normal vs disordered voice, which is only possible with segments of continuous speech. The results of this study indicate that spectral/cepstral methods are applicable to highly dysphonic signals in which cycle boundaries are not clearly delineated, and can adequately predict dysphonia severity with the relatively short duration vowel segments characteristic of continuous speech. Because spectral/cepstral methods are applicable to even severely dysphonic speech samples, the current results indicate that these methods can be used as a reliable measure of pre-/post-treatment change. Methods that are unable to measure pre-treatment dysphonia severity due to highly perturbed acoustic voice signals may not be effective as treatment outcomes measures because a valid pre-treatment estimate of dysphonia is unobtainable. It is noted that the results of this study indicated that acoustic predictions of dysphonia severity tended to under-estimate pre-treatment perceived dysphonia severity and over-estimate post-treatment perceived severity (in both cases, by an average of ,6-points on the 100 point VAS), and produced a mean absolute difference of ,15-points as compared to perceptual ratings. Although acoustically predicted pre- and post-treatment severity estimates differed significantly from the pre- and post-treatment listener perceived severity ratings, the mean observed differences would be within one scale value if the 100 point VAS were converted to a 7-point equal-appearing interval (EAI) rating scale (incorporating six increments) typical of perceptual ratings of voice.

838

S. N. Awan et al.

Perceptual vs predicted ratings of dysphonia The results of this study show that strong predictions of dysphonia severity can be achieved from continuous speech samples across the wide range of severities observed in pre- and post-treatment voices. However, there are a number of reasons why auditory perceptual ratings and acoustic predictions of dysphonia severity may not always coincide: (1)

(2)

(3)

(4)

(5)

Inspection of Figure 3 reveals that under-/over-estimations of perceived ratings may be due to an ‘end effect’ in listener ratings, with a tendency for judges to rate severely dysphonic samples with a score of 100 points and samples with very little if any discernable quality disruption with a score of 0 points. In contrast, acoustically predicted severity scores are more normally distributed and do not cluster at the extremes of the severity scale. These differences in distribution between perceived and predicted ratings may result in the reported under-/over-estimations in dysphonia severity, and contribute to the observed mean absolute differences between predicted and perceived severity ratings reported in this study. It is important to recall that the acoustically predicted ratings are not bounded by ‘normal’ (0 points) or ‘profoundly abnormal’ (100 points) anchors. The predictive equations generated via multiple regression may produce scores which can be negative for highly periodic voice signals or extend above 100 points for essentially aphonic samples (see Table III). This difference in scaling may also influence the strength of correlation between perceived and predicted dysphonia severity ratings. However, because the acoustically predicted dysphonia ratings do not have any definitive boundaries, they may be effective in revealing very subtle but important changes in extreme voice signals (profoundly abnormal or near normal voices) that may be difficult to perceive by some listeners. Even though we have obtained strong inter-judge correlation in perceptual judgements, the mean severity ratings obtained from these judges have inherent variability (Kreiman, Gerratt, Kempster, Erman, and Berke, 1993; de Krom, 1994; Orlikoff, Dejonckere, Dembowski, Fitch, Gelfer, Gerratt, Haskell, Kreiman, Metz, Schiavetti, Watson, and Wolfe, 1999) that influences the strength of correlation with acoustic measures. When computing correlations, variability in one or both of the variables obviously reduces the strength of correlation and, therefore, predictive accuracy. Acoustic analyses such as described in this paper are focused on the measurement of particular signal characteristics; in contrast, even with experienced judges, listeners may be distracted by the intrusive effects of voice and speech characteristics other than the quality dimension that is meant to be judged (Kreiman et al., 1993; Orlikoff et al., 1999). This presents another factor that may add variability to perceptual judgements that affects the strength of any computed correlation between perceptual and predicted ratings. As mentioned in the methodology section, signals with CPP values < 0 dB were omitted from analysis as a means of removing extraneous background noises and consonant production from analysis. However, it was noted that this procedure also occasionally omitted periods of aphonia from analysis. While most voice signals may be analysed adequately using automatic procedures, the threshold of the CPP value for certain signals may have to be manually adjusted to include all attempts at voicing in the speech sample under analysis.

Estimating dysphonia severity in continuous speech

839

This current study focused on analysis of female voices because, as is the case in many multidisciplinary voice clinics, the majority of patients seeking help for voice difficulties are female. The high prevalence of voice disorders among females in both treatment and non-treatment seeking populations has been confirmed by Coyle, Weinrich, and Stemple (2001), and Roy, Merrill, Gray, and Smith (2005), respectively. However, future studies should further investigate the ability of spectral/cepstral analysis methods to provide estimates of dysphonia severity with different normal (male speakers and children) and disordered populations. In addition, the possible effects of speech sample elicitation parameters (e.g. sample type, vocal intensity, and speaking rate), as well as computer algorithm parameters such as the size of spectral analysis window, should be explored in order that the reliability of the analysis of voice from continuous speech segments may be confirmed. Conclusion A valid description of the disordered voice must include an assessment of the patient’s performance during continuous speech. While, to date, this assessment has been conducted primarily by auditory-perceptual means, the results of this study confirm that strong predictions of listener perceived dysphonia severity can be achieved from continuous speech samples analysed with an acoustic model comprised of spectral and cepstral measures (CPP, DFTR, DFTR SD). While it is clear that any quantitative method of tracking severity will differ to some degree from the actual process of auditory perception, the automatic speech/voice analysis methods described and evaluated in this study can provide an objective method of documenting the effects of dysphonia in continuous speech pre- and post-therapy. Because the procedures described in this study are compatible with most computer systems used in clinical settings, they are able to provide valuable, objective information for the clinician, patient, and other stakeholders, regarding the effects of intervention. As such, the results of this study represent an important advance in the development of an objective treatment outcomes measure which is sensitive to change in voice during continuous speech. Acknowledgements The authors of this paper would like to thank Dr J. Hillenbrand for his helpful comments and suggestions on a previous version of this manuscript. Dr S. N. Awan has an agreement with KayPentax (Lincoln Park, NJ) regarding the future development of computer software including cepstral analysis of continuous speech algorithms. Please contact Dr S. N. Awan ([email protected]) regarding use of the computer program described in this manuscript. Declaration of interest: The authors report no conflicts of interest. The authors alone are responsible for the content and writing of this paper. References Awan, S. N., & Roy, N. (2005). Acoustic prediction of voice type in adult females with functional dysphonia. Journal of Voice, 19, 268–282. Awan, S. N., & Roy, N. (2006). Toward the development of an objective index of dysphonia severity: a four-factor model. Clinical Linguistics & Phonetics, 20, 35–49.

840

S. N. Awan et al.

Awan, S. N., & Roy, N. (2009). Outcomes measurement in voice disorders: application of an acoustic index of dysphonia severity. Journal of Speech, Language, and Hearing Research, 52, 482–499. Baken, R. J. (1987). Clinical Measurement of Speech and Voice. Boston, MA: Little, Brown and Co. Bielamowicz, S., Kreiman, J., Gerratt, B. R., Dauer, M. S., & Berke, G. S. (1996). Comparison of voice analysis systems for perturbation measurement. Journal of Speech and Hearing Research, 39, 126–134. Bridger, M. M., & Epstein, R. (1983). Functional voice disorders: a review of 109 patients. Journal of Laryngology and Otology, 97, 1145–1148. Callan, D. E., Kent, R. D., Roy, N., & Tasko, S. M. (1999). Self-organizing maps for the classification of normal and disordered female voices. Journal of Speech and Hearing Research, 42, 355–366. Coyle, S. M., Weinrich, B. D., & Stemple, J. C. (2001). Shifts in relative prevalence of laryngeal pathology in a treatment-seeking population. Journal of Voice, 15, 424–440. CSL Computerized Speech Lab [Computer program]. (1994). Pine Brook, NJ: Kay Elemetrics. de Krom, G. (1994). Consistency and reliability of voice quality ratings for different types of speech fragments. Journal of Speech and Hearing Research, 37, 985–1000. de Krom, G. (1995). Some spectral correlates of pathological breathy and rough voice quality for different types of vowel fragments. Journal of Speech and Hearing Research, 38, 794–811. Fairbanks, G. (1960). Voice and Articulation Drillbook, 2nd Ed. New York: Harper & Row. Field, A. P. (2005). Discovering Statistics Using SPSS. London: SAGE Publication. Halberstam, B. (2004). Acoustic and perceptual parameters relating to connected speech are more reliable measures of hoarseness than parameters relating to sustained vowels. ORL, 60, 70–73. Hall, K., & Yairi, E. (1992). Fundamental frequency, jitter, and shimmer in preschoolers who stutter. Journal of Speech, Language, and Hearing Research, 35, 1002–1008. Hammarberg, B., Fritzell, B., Gauffin, J., Sundberg, J., & Wedin, L. (1980). Perceptual and acoustic correlates of abnormal voice qualities. Acta Otolaryngologica, 90, 441–451. Hartelius, L., Buder, E. H., & Strand, E. A. (1997). Long-term phonatory instability in individuals with multiple sclerosis. Journal of Speech, Language, and Hearing Research, 40, 1056–1072. Hartl, D., Hans, S., Vaissiere, J., Riquet, M., & Brasnu, D. (2001). Objective voice quality analysis before and after onset of unilateral vocal fold paralysis. Journal of Voice, 15, 351–361. Heman-Ackah, Y., Heuer, R. J., Michael, D. D., Ostrowski, R., Horman, M., Baroody, M. M., Hillenbrand, J., & Sataloff, R. T. (2003). Cepstral peak prominence: a more reliable measure of dysphonia. Annals of Otology, Rhinology, and Laryngology, 112, 324–333. Heman-Ackah, Y. D., Michael, D. D., & Goding, G. S. (2002). The relationship between cepstral peak prominence and selected parameters of dysphonia. Journal of Voice, 16, 20–27. Hillenbrand, J. (1987). A methodological study of perturbation and additive noise in synthetically generated voice signals. Journal of Speech and Hearing Research, 30, 448–461. Hillenbrand, J., & Houde, R. A. (1996). Acoustic correlates of breathy vocal quality: dysphonic voices and continuous speech. Journal of Speech, Language, and Hearing Research, 39, 298–310. Hillenbrand, J., Cleveland, R. A., & Erickson, R. L. (1994). Acoustic correlates of breathy vocal quality. Journal of Speech and Hearing Research, 37, 769–778. Klingholz, F. (1990) Acoustic recognition of voice disorders: a comparative study of running speech versus sustained vowels. Journal of the Acoustic Society of America, 87, 2218–2224. Koufman, J. A., & Blalock, P. D. (1991). Functional voice disorders. Otolaryngology Clinics of North America, 4, 1059–1073. Kreiman, J., Gerratt, B., Kempster, G. B., Erman, A., & Berke, G. S. (1993). Perceptual evaluation of voice quality: review, tutorial, and a framework for future research. Journal of Speech and Hearing Research, 36, 21–40. Laflen, J. B., Lazarus, C. L., and Amin, M. R. (2008). Pitch deviation analysis of pathological voice in connected speech. Annals of Otology, Rhinology and Laryngology, 117, 90–97. Martin, D., Fitch, J., & Wolfe, V. (1995). Pathologic voice type and the acoustic prediction of severity. Journal of Speech and Hearing Research, 38, 765–771. Maryn, Y., Corthals, P., Van Cauwenberge, P., Roy, N., & De Bodt, M. (in press). Toward improved ecological validity in the acoustic measurement of overall voice quality: Combining continuous speech and sustained vowels. Journal of Voice. MATLAB [Computer Program]. (1994). Natic, MA: The Mathworks, Inc. McGraw, K., & Wong, S. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1, 30–46. Noll, A. M. (1964). Short-term spectrum and ‘cepstrum’ techniques for vocal pitch detection. Journal of the Acoustic Society of America, 41, 293–309.

Estimating dysphonia severity in continuous speech

841

Orlikoff, R. F., Dejonckere, P. H., Dembowski, J., Fitch, J., Gelfer, M. P., Gerratt, B. R., Haskell, J. A., Kreiman, J., Metz, D. E., Schiavetti, N., Watson, B. C., & Wolfe, V. (1999). The perceived role of voice perception in clinical practice. Phonoscope, 2, 89–104. Parsa, V., & Jamieson, D. G. (2001). Acoustic discrimination of pathological voice: Sustained vowels versus continuous speech. Journal of Speech, Language, and Hearing Research, 44, 327–339. Qi, Y., Hillman, R. E., and Milstein, C. (1999). The estimation of signal to noise ratio in continuous speech for disordered voices. Journal of the Acoustic Society of America, 105, 2532–2535. Rabinov, C. R., Kreiman, J., Gerratt, B., & Bielamowicz, S. (1995). Comparing reliability of perceptual ratings of roughness and acoustic measures of jitter. Journal of Speech and Hearing Research, 38, 26–32. Roy, N., & Bless, D. M. (1998). Manual circumlaryngeal techniques in the assessment and treatment of voice disorders. Current Opinion in Otolaryngology Head and Neck Surgery, 6, 151–155. Roy, N., Gouse, M., Mauszycki, S. C., Merrill, R. M., & Smith, M. E. (2005). Task specificity in adductor spasmodic dysphonia versus muscle tension dysphonia. Laryngoscope, 115, 311–316. Roy, N., Merrill, R. M., Gray, S. D., & Smith, E. M. (2005). Voice disorders in the general population: prevalence, risk factors, and occupational impact. Laryngoscope, 115, 1988–1995. Sama, A., Carding, P. N., & Price, S. (2001). The clinical features of functional dysphonia. Laryngoscope, 111, 458–463. Schalen, L., & Andersson, K. (1992). Differential diagnosis and treatment of psychogenic voice disorder. Clinical Otolaryngology, 17, 225–230. Sheskin, D. (2004). Handbook of Parametric and Nonparametric Statistical Procedures. 3rd Ed. Boca Raton: CRC Press. SigmaPlot 10.0 for Windows [Computer program]. (2006). San Jose, CA: Systat Software, Inc. SPSS 15.0 for Windows [Computer program]. (2006). Chicago, IL: SPSS, Inc. Stevens, J. P. (1992). Applied Multivariate Statistics for the Social Sciences. 2nd Ed. Hillsdale, NJ: Erlbaum. Takahashi, H., & Koike, Y. (1975). Some perceptual dimensions and acoustical correlates of pathologic voices. Acta Otolaryngologica (Stockholm), 338, 1–24. Wolfe, V., & Martin, D. (1997). Acoustic correlates of dysphonia: type and severity. Journal of Communication Disorders, 30, 403–416. Wolfe, V., & Steinfatt, T. M. (1987). Prediction of vocal severity within and across voice types. Journal of Speech and Hearing Research, 30, 230–240. Wolfe, V. I., Martin, D. P., & Palmer, C. I. (2000). Perception of dysphonic voice quality by naïve listeners. Journal of Speech and Hearing Research, 43, 697–705. Yiu, E., Worrall, L., Longland, J., & Mitchell, C. (2000). Analysing vocal quality of connected speech using Kay’s computerized speech lab: a preliminary finding. Clinical Linguistics & Phonetics, 14, 295–305. Yumoto, E., Gould, W. J., & Baer, T. (1982). Harmonics-to-noise ratio as an index of the degree of hoarseness. Journal of Acoustical Society of America, 71, 1544–1550. Zhang, Y., & Jiang, J. J. (2008). Acoustic analyses of sustained and running voices from patients with laryngeal pathologies. Journal of Voice, 22, 1–9.

Copyright of Clinical Linguistics & Phonetics is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.