Draft of the paper published in Cogn Comput (2013) 5:399-425 © by Springer DOI 10.1007/s12559-013-9207-2
Characterizing Neurological Disease from Voice Quality Biomechanical Analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Pedro Gómez-Vilda1, Victoria Rodellar-Biarge1, Víctor Nieto-Lluis1, Cristina Muñoz-Mulas1, Luis Miguel Mazaira-Fernández1, Rafael Martínez-Olalla1, Agustín Álvarez-Marquina1, Carlos RamírezCalvo2, Mario Fernández-Fernández2 1
Grupo de Informática Aplicada al Tratamiento de Señal e Imagen, Facultad de Informática, Universidad
Politécnica de Madrid, Campus de Montegancedo, s/n, 28660 Madrid e-mail:
[email protected] 2
Ear, Neck and Throat (ENT) and Neurology Services, Hospital del Henares, Avda. Marie Curie s/n,
28822 Coslada, Madrid, Spain Corresponding author: Pedro Gómez-Vilda, e-mail:
[email protected], Tel.: +34913367384, Fax: +34913366601
Abstract. The dramatic impact of neurological degenerative pathologies in life quality is a growing concern nowadays. Many techniques have been designed for the detection, diagnose and monitoring the neurological disease. Most of them are too expensive or complex for being used by primary attention medical services. On the other hand it is well known that many neurological diseases leave a signature in voice and speech. Through the present paper a new method to trace some neurological diseases at the level of phonation will be shown. In this way the detection and grading of the neurological disease could be based on a simple voice test. This methodology is benefiting from the advances achieved during the last years in detecting and grading organic pathologies in phonation. The paper hypothesizes that some of the underlying neurological mechanisms affecting phonation produce observable correlates in vocal fold biomechanics, and that these correlates behave differentially in neurological diseases than in organic pathologies. A general description about the main hypotheses involved and their validation by acoustic voice analysis based on biomechanical correlates of the neurological disease is given. The validation is carried out on a balanced database of normal and organic dysphonic patients of both genders. Selected study cases will be presented to illustrate the possibilities offered by this methodology.
Keywords: voice production, voice pathology grading and monitoring, e-health, Parkinson's Disease
1. Introduction The early detection and monitoring of neurological diseases is of most importance in a world where progressive population aging is demanding important resources for health care which may become unbearable in the near future. For instance, it is estimated that the prevalence of Parkinson's Disease (PD) is less than 0.4% among the population under 40 years whereas it is around 2.5% in the population over 65 [1]. During the past decade, acoustic analysis has been focused in detecting and grading the organic pathology in phonation (see [2] for a review). The methods, tools and protocols developed for that
4
purpose are based in the estimation of phonation alterations induced by organic dysfunctions. Similarly
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
altered phonation may report the progress of neurological diseases. It is known that these produce low tone, tremor, poor prosody, impaired fluency, slow vowel onsets, frequent pauses, excess of fillers, dysarthria, reduced sharp nasopharyngeal, lingual and bilabial transitions, clumsy or impaired articulation, elisions, metatheses, etc. (for a review of these phenomena in PD patients see [3]). Many of these effects are induced by neurological deterioration or lesions found in the language neuromotor cortex [4], in the descending neural pathways [5] or in the neuromuscular connections in pharynx, larynx, mouth and facial structures, affecting the dynamics of the voice production system [6, 7]. Among the observable correlates, voice tremor may be used to infer the etiology and progress of neural diseases affecting the production of voice [8], such as spasmodic dysphonia, stammering, Huntington's Chorea or PD [1]. For instance, the possibility of early detection during the first stages of PD may grant a better preventive treatment reducing the progress of illness [9]. Recognizing the importance of acoustic voice analysis Gamboa et al. [10] pioneered an early work in monitoring PD treatment using general voice perturbation parameters used at that time to detect organic dysphonia. Following this same line the aim of this exploratory study is to give some preliminary results in detecting the neurological disease using biomechanical correlates obtained in this case from the inverse filtering of voice [11]. The hypotheses sustaining the proposed methodology are the following: neurological disease found at any stage from the neuromotor cortex to the vocal fold innervating structures will induce abnormal vocal fold tension, either showing asymmetrical, hyper- or hypotonic function or tremor. These dysfunctions will alter the voicing correlates (glottal excitation or glottal source). The methodology relies on model inversion to estimate the following correlates: glottal source, biomechanical vocal fold tension, and tension perturbations (hypertonia, tremor, asymmetry [12]). Estimates of these correlates may be used in classifying phonation patterns and producing statistically validated control databases to detect and quantify neurological deterioration progression or regression in response to specific treatments (drugs and dosage) complementing subjective clinical evaluation. The paper is organized as follows: a neuro-physiological simplified model of the phonation system is introduced in section 2. A bottom-up chain from the neuromotor cortex to the lip radiation place is described in Section 3. This chain is inverted using conventional signal processing techniques, to estimate the biomechanical tension related with the neural discharges in the muscles stretching and tensioning vocal folds, as shown in Section 4. In Section 5 these estimates are modelled as an autoregressive process producing a set of cyclicality coefficients serving as descriptors of tremor. Section 6 describes the experimental framework to validate the study determining the statistical distributions of the parameters in normal and organic dysphonic subjects supporting contrast hypotheses. Section 7 presents results from eight study cases including non-tremor and tremor normophonic subjects, and non-tremor and tremor PD subjects. Section 8 presents a methodology to detect and quantify neurological correlates related with hyper- and hypo-tension and tremor in voice. Section 9 presents and comments the results produced by the proposed methodology for the cases studied. Conclusions and future work are described in section 10. 2. Summarized Description of the Neuro-Physiological Phonation Model Speech production is planned and instantiated in the linguistic neuromotor cortex (see point 1 in Fig. 1). The neuromotor activation sequence involved in speech production is transmitted to the pharynx (2), 5
tongue (3), larynx (4), chest and diaphragm (5) through the subthalamic secondary units. Fine muscular
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
control is provided by a sophisticated feedback control system (6).
6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Fig. 1 Simplified view of main neural pathways involved in the production of phonation: 1. Links from linguistic neuromotor cortex to basal ganglia. 2. Branch of the X nerve acting on the velo-pharingeal switch (levator veli palatini, palatoglosus and palatopharyngeous). 3. Idem acting on the retro-lingual and epiglottal switches (superior, middle and inferior pharyngeal constrictors and stylopharyngeous). 4. Branch of the laryngeal nerve acting on the transversal and oblique arytenoid and cricothyroid muscles responsible for the vocal fold adduction and abduction (cricothyroid, transverse and oblique arytenoid and posterior cricoarytenoid). 5. Branch of the vagus nerve (phrenic) actuating on the diaphragmatic muscles (crural diaphragm). 6. Feedback loop in basal ganglia damping muscular tone (hypothalamic feedback loop, its malfunction causing PD). N: nasal cavity, V: velum, P: palate, A: alveoli, L: lips, T: teeth, G: tongue.
Neural speech activation patterns are transmitted through the jugular foramen from the hypothalamic system to the glosopharyngeal and vagus nerves (Cranial Nerves CN IX and X) in several derivations innervating the following muscular structures: levator veli palatini, palatoglosus and palatopharyngeous (2), acting on the naso-pharyngeal switch. These structures play a most relevant role in nasalization (hyper-, hypo- and modal). The superior, middle and inferior pharyngeal constrictors, and stylopharyngeous (3) muscles found in the mid-pharynx, are responsible for the swallowing function as well as of changes in the vocal tract during speech articulation. The cricothyroid, transverse and oblique arytenoid, as well as the posterior cricoarytenoid (4) muscles in the larynx are responsible for vocal fold stretching, adduction and abduction by acting on the cricoarytenoid joint as well as in raising and lowering the cricothyroid cartilage. The vagus nerve (5) is responsible for filling and depleting the lung cavity with air by contraction and relaxation of the crural diaphragm. From these, only sections (2-5) need to be taken into account in sustained phonation. The deepest causes of many neuro-degenerative pathologies with correlates in phonation are to be found in lesions on the neurological paths from the neuromotor linguistic cortex [4] through the sub-thalamic region [5] to the laryngeal nerve pathways [7] and in the innervations of the thyro-arytenoid muscle structure. Any alteration in the functionality of these structures will produce perturbations in the vocal fold biomechanical parameters (visco-elasticity) [13]. At this point some terms need further clarification: The term normophonic refers to a subject presenting phonation conditions free from irregularities such as roughness, airiness, asthenia, strain, tremor, etc. The term organic dysphonic refers to subjects presenting irregular phonation due to some organic lesion, pathology or dysfunction (mainly affecting the larynx), and the term neurologic dysphonic refers to subjects presenting irregular phonation due solely to problems in the neuro-motor structures or neural transmission pathways involving the larynx (in a wider sense it could be extended also to problems in the neuro-motor cortex). The terms vocal fold stiffness and tension bear a strong relationship. Stiffness refers to parameters relating force and strain in biomechanical model springs. Tension refers to the longitudinal forces distributed along the vocal folds. Vocal fold body and cover stiffness parameters may be related to vocal fold tensions under certain assumptions. 3. Direct and inverse systems of the neuromotor and acoustic pathways of phonation A crucial task before using voice correlates in neurological disease monitoring is to create a simplified and comprehensive model of the mechanisms involved in phonation from the upper brain cortex to the sound propagation level at the voice recording point. This model will comprise hierarchically organized neurophysiologic and biomechanical activity descriptions as shown in the left part of Fig. 2 (top-down chain). This is known as the 'direct model', defined as a chain of subsystems or 'black boxes', each one of them representing a well-defined transformation on a specific input, to convert it to a new output under a given functional relation. One condition is that the transformation must represent a process or set of
7
processes realistically, which must be formalized or represented by a functional associating input and
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
output in their representation spaces and in the time domain. Once the direct model (top-down) has been validated by estimating its capability to reproduce the observable output signals (as for example, voice in our case), an inverse or 'bottom-up' model has to be devised to transform the observable signals to their non-observable sources (correlates to the activity in the upper levels of the analysis model). This 'bottomup' model may be built by inverting each layer functional in the 'top-down' model using non-linear system theory [14]. The direct model would be integrated by the articulation planner in the neuromotor cortex to be modeled by indirect inference, according to observable muscular activation in certain tasks (reading, speaking, etc.) as inferred from myoelectrography [15, 16], electroencephalography [17, 18] magnetoencephalography [19, 20] or functional magnetic resonance [21, 22]. The neuromotor cortex will activate the crico-arytenoid system of muscles and cartilages to order the voiced and voiceless sequence of sounds by means of a stream of neural discharges propagated through the innervation paths described above (temporal phonation activation). This stream is regulated by the hypothalamic subsystem (basal ganglia) by means of different feedback loops involving cerebral cortex and cerebellum. A malfunction of this subsystem due to different reasons may be among the causes behind PD syndrome [5, 23]. The resulting stream of firing spikes on the laryngeal nerves act on the transverse and oblique arytenoid muscles, inducing vocal fold adduction and stretching. Different configurations of vocal fold adduction and stretching will result in different phonation modalities (pressed, modal, whispered, etc.). Streams of voiced and unvoiced intervals will result from the activation patterns of vocal folds following the neural discharge patterns in the laryngeal nerves. Thus, phonated intervals will consist in trains of glottal pulses of different pitch, modality and duration. These glottal pulses when acoustically filtered and modulated by the pharynx, vocal and nasal tracts, will be radiated through the lips as voiced speech sounds. These phonation bursts are not produced during voiceless speech intervals or silent segments, stops or pauses. Through the present study it is assumed that the sounds of interest are not influenced by the nasal tract, or that its filtering activity is embodied within a joint subsystem together with that of the vocal tract.
8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Fig. 2 Direct-inverse systemic view of the neuro-physiological model of phonation. Left: Direct model from the neuromotor cortex to the lip radiation output. Right: System model inversion.
The inverse or 'bottom-up' system is intended to characterize the behavior of the direct system by estimating inverse functions of the ones composing the top-down model. The first subsystem inverted is the lip radiation function, followed by the vocal and nasal tracts. The result is a set of parameters related with vocal tract resonances and anti-resonances. These may be used to estimate vowel triangle aspect ratios, as the vowel space area (VSA), the vowel articulation index (VAI) and the formant centralization ratio (FCR), used as descriptors of dysphonias produced by neurologic diseases [24]. The residual information after removing vocal and nasal tract models is a correlate of the glottal source. This signal is strongly related to vocal fold biomechanics. The glottal source may be used to estimate the parameters of a second-order vocal fold biomechanical model [25], at a low computational effort. The set of biomechanical parameters estimated are the dynamic mass, the longitudinal stiffness and the viscous losses of the vocal folds. The most relevant one for neurological disease studies among these, may be the stiffness, as it is related to the stress that the vocal folds are supporting from longitudinal stretching. Vocal fold stiffness may serve to monitor the pattern of vibration related with neuromotor activity in the laryngeal nerves. Perturbations affecting stiffness as hyper-tonia, asymmetry or tremor, may serve to monitor anomalous neuromotor activity. Recent studies carried out by our group on asymmetric vibration of the vocal folds using independent component analysis (ICA) allow differentiating the vibration of both folds [12]. The cause of vibration asymmetry may be an anatomic or organic unbalance of the folds, or a differential neuromotor activity of each fold due to pathologic etiology, as it would be the case in vocal fold lateral pareses of neurological origin.
9
4. Inversion of the biomechanical system related to vocal fold adduction control
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
The glottal source correlate resulting from vocal tract inversion may be defined as a representation of the dynamic pressure wave in the supra-glottal ridge of the vocal folds. These structures are brought together by the action of the laryngeal muscles activated by the laryngeal nerves referred above producing a glottal closure (adduction). Air pressure build-up in the lungs while vocal folds are in contact will force their separation. A subsequent release of an air burst will relax lung pressure, and the muscular visco-elastic forces on the vocal folds will drive them to a new contact and the consequent closure of the glottis, completing a glottal cycle. There are good models to emulate the behavior of the vocal fold dynamic structures composed of masses and springs, as for example the one in [25]. Model inversion is carried out on the frequency domain using the glottal source power spectral density, detecting peaks and troughs induced by resonances and anti-resonances of the vocal fold dynamics. The chain of procedures implementing model inversion is given in Fig. 3.
Fig. 3
Inversion of the vocal fold biomechanical model.
The first block estimates the vocal and nasal tract model, and its cancellation by inverse adaptive filtering, and reconstructs the glottal source correlate. The second block estimates the glottal source power spectral density, matching it against the second-order electro-mechanical model of masses and springs. The modulus of the glottal source power spectral density is matched with the modulus of the electromechanical model transfer function, forcing an optimal estimate by adaptive parameter fitting [11]. The result is a set of biomechanical parameters associated to the dynamic mass, the longitudinal stiffness and the viscous losses of the vocal folds (see the results for a case of spasmodic dysphonia in Fig. 4). Vocal fold elastic parameters may be associated under specific geometrical models with the longitudinal elastic tension supported by the vocal folds. The perturbations (vocal fold dystonia, asymmetric tension, and tremor) found in these parameters may monitor neurological diseases.
10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Fig. 4 Estimates of the vocal fold biomechanical parameters for Case 100308 (45 years-old female affected by spasmodic dysphonia). a): Vocal fold body dynamic mass per phonation cycle. b) Its statistical distribution given as a boxplot. c-d) idem for viscous losses. e-f) idem for stiffness. The variables for templates a), c) and e) are estimated for each glottal cycle.
The vocal fold body stiffness in (e) given in g.s-2 is especially relevant, representing a fraction of the longitudinal tension acting on the musculus vocalis (the inner muscular structure of the vocal folds). The template in the right hand side (f) gives its dispersion box plot marking the median, and the first and third quartiles of the distribution by the middle segment and the upper and lower limits of the box. In this particular case, 55 phonation cycles of a voice segment 300 ms long from a patient suffering spasmodic dysphonia are represented. During this interval a little more than a cycle of tremor can be observed. It may be seen that the oscillation approximately ranges from 19,000 g.s-2 to almost 31,000 g.s-2, with a mean in 23,079 g.s-2. This indicates strong fluctuations in the neuromotor activity during vocal fold adduction, of pathological origin. If voice tremor comprises at least 2 spasms per second, segments up to 0.5 s may be used in the analysis [26]. Similar fluctuations may be observed in the dynamic mass (a and b), and viscous losses (c and d) although their fluctuation ranges are smaller. During the hypertonic spasm (extremes of the interval), pitch, stiffness and mass experience an increment, contrary to what happens during the hypotonic episode (in the middle of the interval). 5. Adaptive Estimation of cyclicality parameters From what has been said, tremor can be seen as a perturbation of phonation due to unexpected changes of vocal fold tension and thus it becomes a main target of the present study. As it has been shown in the 11
example given in Fig. 4, a typical neurological disease produces correlates in the vocal fold body
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
stiffness, but not only biomechanical parameters show a cyclic perturbation due to tremor. Cyclical behavior may also be observed on other estimates from case 100308 as shown in Fig. 5 (refer to [11] for a more detailed description of each parameter nature and properties).
Fig. 5 Influence of tremor in other perturbation estimates from the glottal source. a) Absolute normalized area shimmer. b) Statistical dispersion of this parameter. c) Absolute normalized minimum sharpness of the glottal source negative peak. d) Its statistical dispersion. e) Glottal to noise energy ratio. f) Its statistical dispersion. Templates c) and e) show clear fluctuations associated to tremor. The variables in templates a), c) and e) are estimated for each glottal cycle and are in relative dimensionless units.
On one hand it may be seen that classical perturbation parameters as shimmer (variations in the amplitude of the glottal source from cycle to cycle) do not present a clear cyclical behaviour in this case The same happens with jitter (although not being shown). This may be so because estimation of classical perturbation parameters as jitter and shimmer is carried out on a number of neighbour phonation cycles (typically from 1 to 10 at most), which are not enough to capture the cyclicality of tremor (typically from 2 to 8 Hz). On the other hand, template c) shows a clear cyclical component in the sharpness of the glottal source negative peaks at the closing point (Abs. Norm. Min. Sharp., refer to [11]). Glottal/Noise Energy given in e) is another parameter sensitive to cyclicality (MAE, refer again to [11]). This parameter evaluates the ratio of high to low frequency components in the glottal source, or the presence of a normal mucosal wave specific of proper phonation (in fact its value decays strongly during the hypotonic episode between cycles 25-35).
12
The working hypothesis derived from these observations is that tremor may leave correlates in different
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
biomechanical as well as perturbation parameters. It is hypothesized that the influence of the neurological disease has to leave a mark in vocal fold stiffness, as far as the vocal fold stiffness is concerned, sometimes appearing as a hypertension, sometimes as a cyclic alteration observable on estimates of vocal fold body and cover stiffness, some other times as an asymmetric pattern. Now, the question is how to estimate cyclical behavior in a specific glottal variable. A possible approach may be autoregressive (AR) modeling by adaptive inverse filtering as shown in Fig. 6.
Fig. 6 Adaptive modeling of the biomechanical stiffness ξn by the minimization of the estimation error εKn. A set of pivoting coefficients ckn resulting from adaptation may be used as descriptors of the cyclical behavior.
If ξn is the stiffness estimate at phonation cycle n, its AR model may be described in terms of previous estimates as: K
n ai n i Kn i 1
(1)
where a={ai} are the regression coefficients and εKn is the modeling error. The estimation of the coefficients is carried out by the minimization of the modeling error εKn in terms of Least Mean Squares (LMS). A possible way to implement such estimation is by means of adaptive lattice filters [27]. An adaptive lattice filter may be defined as an operator ΦKn{·} of order K producing an output error εKn which is minimum in terms of LMS for a given time window WK sliding along the cycle index n using an adaptation factor β. A sequence of sub-optimal models characterized by a set of coefficients {ckn} will be produced as a side result:
Kn , cKn Kn n , WK ,
(2)
Once the adaptive lattice model has been fitted to the input series, either its pivoting coefficients {cKn} or the equivalent transversal filter ones {aKn} may be used as model descriptors. Both sets of coefficients are related by the Levinson-Durbin iteration [26]:
akn ak 1n ckn~ ak 1n
(3)
where ã is the order-reversal operation on vector a. In the present study pivoting coefficients will be preferred, as they are pre-normalized in the interval (-1, 1). The model in eq. (2) has an associated all-pole
13
transfer function in the frequency domain with a set of K poles {zi= riejφi; 1≤i≤K; j=[-1]1/2} which
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
determine the response frequencies of the model according to the following expression: K
1
H ( z)
K
1 a i z i
i 1
z z zi
(4)
1
where ri are the moduli of the poles (ri>0), and φi are their respective phases (-π≤φi≤π). Usually if the signal being modeled shows quasi-periodic fluctuations (cyclicality) some poles will be paired complex conjugates, i.e.: zi=riejφi; zi*= rie-jφi. The closer be the poles to the unity circle (ri→1) the strongest will be the sinusoidal response represented. The remnant poles will be distributed on the real axis of the z plane (φi=0). For an order-3 model it is possible to establish a relationship among the transversal equivalent model coefficients {a1-3n} and their associated pivoting coefficients {c1-3n} accordingly to the following expressions:
c1
a1 a 2 a 3
1 a 2 a1 a 3 a 32 a a1 a 3 c2 2 ; 1 a 32 c3 a 3
; (5)
It may be shown that when the oscillation being modeled is strong enough, the moduli of the associated conjugate poles will approach the unity circle, i.e.: ri→1. In this case it is easy to show that the first pivoting coefficient will approach the lower limit of the interval (-1, 1), i.e.: c1→-1. This fact may be used to detect strong cyclical components in the modeled parameter (ξn). The frequency of the tremor fti and its relative relevance ρti, as well as the energy ratio ηt between the static and the cyclic components of the oscillating parameter considered are other important estimates regarding cyclical behavior. These may be defined as follows:
f ti
i fs; 2
ti
1 ; 1 ri
(6) 12
1 2 Kn K N nWk t k
K
where K is the parameter average, and NK is the number of samples in window WK. In the present study the three lowest-order pivoting coefficients {c1n, c2n, c3n} will be used as descriptors of the stiffness cyclical characteristics, or cyclicality parameters. The average values of the cyclicality estimates for the interval shown are {ĉ1=-0.94, ĉ 2=-0.17, ĉ 3=0.02} for the case shown in Fig. 4. As voice cannot be considered a stationary signal, these values may vary slightly depending on the segment analyzed. 6. Statistical validation of normophonic and organic dysphonic populations The purpose of the present study is to monitor the behaviour of vocal fold stiffness perturbation in PD patients, especially concerning hypo- and hypertension and tremor. Initially it is not assumed that all PD
14
patients may show tremor, or that tremor may be regarded an exclusive pattern behaviour of PD patients.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Besides, tremor could be classified as a pathology-induced correlate or not. Within pathology-induced tremor, spasmodic dysphonia and PD have been the main targets in early studies [26, 28]. But tremor may also be present in normophonic subjects when there is emotional stress present in phonation, or when the subject intentionally produces vibrato as in singing, among other realistic situations. Table 1 gives possible scenarios to consider.
Table 1. Different situations regarding dysphonia and tremor. PD: Parkinson's Disease, AD: Alzheimer's Disease, LAS: Lateral Amiotrophic Schlerosis, HK: Huntinton's Chorea, SD: Spasmodic Dysphonia. Tremor-free Tremor-affected Baseline Control Essential (non-pathologic) Normophonic Patients Emotional (stress induced) Intentional (vibrato) Organic Pathology Spasmodic Dysphonia, PD, others Dysphonic Patients Dystonic Pathology of different etiology (hyperand hypo-), as PD, AD, LAS, HC, etc.
One of the difficulties to carry on the present study was to differentiate biomechanical estimate behaviour of PD with respect to organic pathology. PD patients had to be first evaluated by the ENT services to check them being organic-pathlogy free, disregarding cases with organic dysphonias, to avoid confusing phonation perturbations due to organic pathologies potentially being present. Of course, this protocol reduces the number of PD cases available, and complicates the study, but increaes robustness. In this sense, the main hypothesis to be tested is that neurological diseases (ND) and specifically PD leave a differential behavior in the vocal fold biomechanical estimates with respect to normophonic (control) cases. A classical test to validate this hypothesis would require the formulation of a null hypothesis: ND and specifically PD patients do not present alterations in the biomechanical correlates of interest with respect to normophonic subjects. This hypothesis had to be falsified for the methodology to be successfully applied to PD pathology characterization. A way to implement the evaluation test should be to estimate two probability density functions in the domain of biomechanical correlates f(x|HN) for normophonic cases and f(x|HP) for PD cases. Having in mind the limited number of PD cases available for the study, as mentioned above, the possibility of validating the methodology following this assumption had to be disregarded. Nevertheless a possible way to circumvent this inconvenience at the cost of reducing the accuracy of the test is to produce a good estimate of the probability density function supporting the null hypothesis f(x|HN) for which there is enough supportive information in the control database, and sustain the alternative hypothesis by the complementary pdf: 1-f(x|HN). An important requirement for this strategy to be applicable is that normophonic and organic dysphonic cases should reproduce a similar behavior regarding the biomechanical correlates used. If this second hypothesis is fulfilled it could be said that any deviation from a joint probability density function involving normophonic and organic dysphonic cases given by f(x|HN,HD) could only be attributed to neurological etiology. Thus the main hypothesis in the present study could be reformulated as that tremor in PD patients is a special modality of vocal fold tension perturbation (deviating and fluctuating) differing from tremor in subjects not affected by PD (either organic dysphonic or normophonic showing emotional or intentional tremor). The reasoning line is this one: neurological pathology seems to alter parameters 15
related with the vocal fold tension. The burning question is if organic dysphonias may affect these
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
parameters the same as neurologic dysphonias. The null hypothesis would be expressed as: organic pathology does not alter cyclicality parameters relative to the normophonic condition, therefore the differentiation of normophonic subjects from organic-dysphonic ones would not be possible based only on the parameters of interest. Thus the different behavior of study cases with respect to the joint pdf f(x|HN,HD) could only be attributed to neurologic pathology, not to organic pathology. In this way seeing a differential behavior in the cyclicality parameters of a patient could indicate that the subject may be affected by neurological disease, not by organic disease. The lack of differentiation capability of normophonic from organic dysphonic distributions would validate the null hypothesis and therefore assess the validity of differential cyclicality parameter behavior to detect neurological pathology excluding organic pathology. The experimental framework designed to test this main hypothesis was conceived with a double objective in mind: On one hand to obtain a reference baseline description of how the cyclic coefficients c1, c2 and c3 behave for normophonic and organic dysphonic subjects. On the other hand, to characterize a set of study cases comparing estimates from PD patients affected and not affected by tremor against normophonic subjects showing and not showing tremor. Therefore two normophonic subjects of both genders not showing tremor acoustically (cases 100508-male and 100040-female) and two others showing a slight tremor (cases 100503-male and 100350-female) served as control group. Correspondingly, four PDaffected subjects (cases 223211-male and 333282-female not showing tremor, and 334866-male and 337523-female showing tremor) were used as case studies. To create the reference baseline a database of voice recordings from normal and organic-dysphonic speakers was created with the following distribution: 50 normal male speakers, 50 normal female speakers, 50 organic-dysphonic male speakers, and 50 organic-dysphonic female speakers. Ages ranged between 20 and 45 years. Most of the organic dysphonias presented defective closure pathologies as well as asymmetric behaviour. Mild hypo-tonic and hyper-tonic cases were also present in the dysphonic group. The records consisted in sustained phonations of vowel /a/ 0.2 s long. This is in contrast with the study in [26], where 0.5 s long records were used, as tremor in PD patients was expected to be in the range 5-10 Hz. Nevertheless parameter distributions did not differ substantially in both cases. Glottal source correlates were obtained from the voice segments, and biomechanical stiffness was used to estimate the cyclic coefficients c1, c2 and c3 as explained before. The results from the baseline database are described in the present section (see Fig. 7 and Fig. 8), and the study cases will be discussed in Section 7.
16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Fig. 7 Histograms of the coefficients c1 (top), c2 (middle) and c3 (bottom) for the male database (normophonic in green line blue diamonds, organic pathologic in red dark squares). Vertical axes give subject counts.
17
It may be seen from the statistical distribution of the cyclic coefficients c1, c2 and c3 for the male cases
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
given in Fig. 7 that c1 (top) is clearly skewed to the left in the normophonic distribution, its median being -0.72, whereas the dysphonic set presents also a skewed bi-modal distribution with median in -0.70. The coefficient c2 (middle) presents a median of 0.042 for the normophonic set and of 0.027 for the dysphonic set, with relatively symmetrical distribution for the normophonic and bimodal distribution for the dysphonic set. Coefficient c3 (bottom) presents also a symmetrical distribution in the normophonic cases with median in 0.15 and a bimodal distribution in the dysphonic case with median in 0.14.
18
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Fig. 8 Histograms of the coefficients c1 (top), c2 (middle) and c3 (bottom) for the female database (normophonic in green line blue diamonds, organic pathologic in red dark squares). Vertical axes give subject counts.
19
The statistical distributions of the cyclic coefficients c1, c2 and c3 for the female cases are given in Fig. 8.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
It may be seen that c1 (top) in the normophonic distribution is slightly skewed, its median being -0.65, and the dysphonic set is also skewed with a median in -0.63. The coefficient c2 (middle) presents a symmetric distribution in normophonics with a median in -0.16 and the dysphonic set is tri-modal with a median in 0.11. Coefficient c3 (bottom) is bimodal in both distributions with medians in 0.07 and 0.08 respectively. Essentially the distributions differ very slightly from those evaluated for frames 0.5 s long as reported in [26]. Having in mind that coefficients c1, c2 and c3 are defined in the range (-1,1), it may be inferred from the data given above that c1 tends to distribute to the negative side of the margin, whereas c2 tends to be more centred, and c3 is more frequently found in the positive side. Multimodality may point either to a multi-located population in the study or to other unclear effects, this extreme needing further study. The summarized three quartiles for each distribution (normophonics and dysphonics, male and female samples) are given in Table 2 and Table 3 respectively. Table 2. Three quartiles of the male distribution cyclicality coefficients Parameters
c1
c2
c3
1st Quartile Normophonics
-0.8126
-0.0882
0.0298
2nd Quartile Normophonics 3rd Quartile Normophonics
-0.7226 -0.6170
0.0426 0.1181
0.1532 0.2777
1st Quartile Dysphonics
-0.8076
-0.1272
-0.0306
2nd Quartile Dysphonics
-0.7077
0.0272
0.1436
3rd Quartile Dysphonics
-0.5256
0.1759
0.2699
Table 3. Three quartiles of the female distribution cyclicality coefficients c1
c2
c3
1st Quartile Normophonics
Parameters
-0.7468
-0.2738
-0.0802
2nd Quartile Normophonics 3rd Quartile Normophonics
-0.6521 -0.5099
-0.1654 -0.0519
0.0709 0.2109
1st Quartile Dysphonics
-0.7553
-0.3067
-0.0455
2nd Quartile Dysphonics
-0.6326
-0.1131
0.0862
3rd Quartile Dysphonics
-0.4540
0.0579
0.2391
A complementary view may be provided by the scatter plots of c1 vs c2, and c1 vs c3 as given in Fig. 9. These results show that there is not a clear regression pattern among cyclicality coefficients, and that they are relatively uncorrelated. This is also manifested by the statistics for both distributions given in Table 4.
20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Fig. 9 Scatter plots of c2 and c3 vs c1 for male and female (normophonics in light green and dysphonics in dark red squares).
An important question to accept that organic pathology does not have a specific influence in the cyclicality parameters c1-3 with respect to normophonics is to test the results from normophonic vs organic dysphonic subjects for these coefficients under the null hypothesis: that normophonic and organic pathologic parameter distributions as given in Fig. 7 and Fig. 8 present similar distributions under a 5% of significance. The p-values for both male and female distributions after performing a t-Test battery between c2 vs c1, c3 vs c1 and c3 vs c2 (see [29]) strongly avail the non rejection of the null hypothesis, therefore the normophonic and organic pathologic populations can be assumed to have equivalent distributions as far as c1, c2 and c3 are concerned (see Table 4, three left-most columns). To put it otherwise, c1-3 are not found to be sensitive to organic dysphonic conditions. Thus, any differential behavior in these parameters will not be contaminated by organic etiology. If a differential behavior is observed in these parameters on neurologic disease subjects it may be attributed solely to neurological etiology with a wide safety margin. Table 4. Distribution Characteristics and Statistical dependence among c1, c2 and c3 σc1 σc2 σc3 χc1 χc2 χc3 ρc2-c1 ρc3-c1 ρc3-c2 pc1NvsD pc2NvsD pc3NvsD 1.621 0.356 0.041 2.909 -0.025 -0.326 0.208 0.019 0.063 0.382 0.884 0.493 Male Norm. 0.932 0.289 0.051 -0.371 -0.691 -0.056 0.153 0.110 0.096 ------Male Dysph. 1.029 0.317 -0.096 1.172 0.650 -0.654 0.259 0.144 -0.155 0.862 0.368 0.357 Female Norm. ------Female Dysph. 0.618 -0.304 -0.537 -0.214 -0.033 0.024 0.294 -0.070 -0.067
The contents of the table are interpreted as follows: σ c1-3 give the skew of each population distribution (male and female normophonics, and male and female dysphonics); χ c1-3 express the kurtosis of each respective population; ρcx-cy give Pearson’s correlation coefficient between every two cyclicality
21
coefficients ci and cj; finally pc1-3NvsD give the p-values for each coefficient test between the normophonic
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
and dysphonic populations. From the data in the table the following consequences may be derived:
Coefficient c1 is clearly skewed in normophonic subjects, and slightly less in dysphonics. Coefficient c2 is moderately skewed, and coefficient c3 is almost not skewed except in female dysphonics.
Coefficient c1 is clearly leptokurtic in male and female normophonic subjects as well as c2 in female normophonics. The rest of the population distributions behave as platykurtic.
The correlation between any two coefficients presents a low value, well under the limit of 0.7, therefore relative independence among estimates may be assumed under second order statistics.
The p-values for all cases considered are well above 0.05, therefore the null hypothesis cannot be rejected, i.e., the distributions from normophonic vs dysphonic populations can be considered equivalent as far as the cyclicality coefficients are concerned.
These results will be used as contrasts in the case studies presented in the next section. 7. Evaluation of study cases The second part of this preliminary study was aimed to envision the possibilities of the methodology proposed in Section 4 working with real cases. This study has a phenomenological exploratory character, and is based in four specific PD cases of male voice (2 cases) and female voice (2 cases) drawn from a database being recorded currently by the ENT and Neurology Services in Hospital del Henares. These patients have been inspected to assess that they are free from organic dysphonia. Two of these patients showed audible tremor (cases: 334866 and 337523) and two others did not (cases: 223211 and 333282). These patients were contrasted against a control group of four normophonic speakers selected from the baseline database, two of them not showing tremor (cases: 100508 and 100040), and two showing a slight tremor which was barely acoustically perceived (cases: 100503 and 100350). Control speaker's age was in a range from 23 to 28 years, far from the age risking PD. Voice and speech tests were recorded including the five cardinal vowels in Spanish [a, e, i, o, u] maintained by each speaker as much as possible, target words to measure the velo-pharyngeal switch, and short sentences in which these words appeared in coarticulation. For the present study only segments of vowel [a] 0.2 s long were used. The protocol carried out was the following:
The recordings, initially taken at a sampling frequency of 44,100 Hz and 16-bit resolution were down-sampled to 22,050 Hz for the reconstruction of the glottal source following [11]. Inverse filtering was used to estimate the glottal source as mentioned before.
The biomechanical parameters μn, ξn, and σn, corresponding to vocal fold body and cover mass, stiffness and losses were estimated as by [11] per each phonation cycle n.
The stiffness parameter (ξn) was unbiased and smoothened using low-pass filtering to remove fast noisy fluctuation .
The cyclicity descriptors {c1n, c2n, c3n} were estimated using adaptive inverse modeling as by (2)-(5).
22
A cycle of the glottal source reconstructed for each of the control and PD cases is shown in detail in Fig.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
10 (a, c, e, g, i, k, m and o). Sequences of these successive cycles are given in (b, d, f, h, j, l, n and p) for each case. The glottal source (LF-pattern [30]) is plotted in the upper part of each template in blue, its associated glottal flow given in green. The glottal cycles in a), c), i) and k) are closer to the ideal LFpattern, whereas cycles in e), g), m) and o) show a more deteriorated pattern. This seems to be an interesting observation, according to which PD patients should show glottal sources more deviated than normophonic subjects with respect to the ideal LF-pattern. The estimations of perturbation and biomechanical parameters are given in Table 5. The perturbation parameters are the average fundamental frequency (f0) and its standard deviation (σf0), as well as the jitter and shimmer. The biomechanical parameters are the average body stiffness (μKb) and its standard deviation (σKb) and the corresponding estimates for the cover stiffness (μKc and σKc). Table 5. Some perturbation and biomechanical parameter estimations. Conditions: NPNT-normphonic, no tremor; NPYT-normophonic, tremor; PDNT-Parkinson’s Disease, no tremor; PDYT-Parkinson’s Disease, tremor. Case Condition Gender Age Grade f0 σf0 Jitter Shimmer μKb σKb μKc σKc (Hz) (Hz) (rel.) (rel.) (g.s-2) (g.s-2) (g.s-2) (g.s-2) 100508 NPNT M 23 0 105.44 0.69 0.007 0.027 10,180 134 5,393 135 100503 NPYT M 28 0 125.79 0.93 0.006 0.020 12,134 156 7,010 399 223211 PDNT M 65 1 142.52 1.79 0.013 0.016 14,145 407 19,498 1,573 334866 PDYT M 74 2 135.84 2.20 0.006 0.017 13,777 576 14,498 2,380 100040 NPNT F 28 0 200.67 0.98 0.005 0.053 19,227 154 14,023 942 100350 NPYT F 24 0 211.70 1.31 0.006 0.014 20,247 192 20,373 544 333282 PDNT F 70 2 162.83 5.31 0.038 0.022 17,276 2,172 12,815 1,945 337523 PDYT F 72 2 248.63 4.31 0.008 0.015 25,314 1,098 30,274 2,525
23
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
24
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
28
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Fig. 10 Glottal source estimates of the cases studied. a) Case 100508, normophonic male subject, no tremor: one cycle of the glottal source seen in detail in dark blue, the glottal flow in green; b) consecutive glottal source cycles in 0.2 s for the same case (negative peaks marked by red stars). c-d) Respective signals for case 100503, normophonic male subject, tremor. e-f) Case 223211, PD male subject, no tremor. g-h) Case 334866, PD male subject, tremor. i-j) Case 100040, normophonic female subject, no tremor. k-l) Case 100350, normophonic female subject, tremor. m-n) Case 333282, PD female subject, no tremor. o-p) Case 337523, PD female subject, tremor.
31
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
33
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
34
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
35
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Fig. 11 Cyclicality analysis of the study cases. Templates a), c), e), g), i), k), m) and o) give the corresponding body stiffness estimate per glottal cycle (top) and its smoothened and unbiased correlate (bottom). Templates b), d), f), h), j), l), n) and p) give the boxplots showing the statistical dispersion of the first-, second- and third-order cyclicality coefficients from adaptive estimation (c1-3). Description of cases: a-b) case 100508 (normophonic male voice not affected by tremor); c-d) case 100503 (normophonic male voice affected by tremor); e-f) case 223211 (PD male voice not affected by tremor); g-h) case 333486 (PD male voice affected by tremor); i-j) case 100040 (normophonic female voice not affected by tremor); k-l) case 100350 (normophonic female voice affected by tremor); m-n) case 333282 (PD female voice not affected by tremor); o-p) case 337523 (PD female voice affected by tremor).
The following observations may be highlighted from the data exposed in Table 5: In general the statistical dispersion of the fundamental frequency f0 is larger in PD patients than in control subjects as given by its standard deviation (σf0), independently if these last ones showed tremor in voice or not. Jitter did not always detect statistical dispersion in f0, which implies that f0 deviation was not necessarily present between neighbor phonation periods, but over longer intervals. The average body stiffness (μ Kb) seemed to be slightly larger in PD patients than in respective male and female control subjects. For instance, 100508 and 100503 (males, normophonic) show smaller body stiffness than 223211 and 334866 (males, PD). Similarly 100040 and 100350 (females, normophonic) show smaller body stiffness than 337523 (female, PD). The case of 333282 is an exception, behaving more as male than as female voice. The standard deviation of the body stiffness (σ Kb) was much larger in PD patients than in respective male and female subjects, being much larger for PD patients independently if they showed tremor or not. This fact is to be checked on larger databases as body stiffness dispersion could be a correlate to phonation alterations associated to PD. The average cover stiffness (μKc) was much larger in PD patients than in control subjects of both genders relative to their respective group means. Compare male cases 100508 and 100503 against 223211 and 334866 (5,393 g.s-2 and 7,010 g.s-2 vs 19,489 g.s-2 and 14,498 g.s-2) and female cases 100040 and 100350 vs PD 337523 (14,023 g.s-2 and 20,373 g.s-2 vs 30,274 g.s-2). Again the case of 333282 seems to be an exception. The standard deviation of the cover stiffness (σKc) was also much larger in PD patients than in control subjects, independently of gender and tremor condition. This fact is also to be checked on larger databases to confirm cover stiffness dispersion as a correlate to PD. The estimations of the three cyclicality coefficients c1-3 are shown in Fig. 11. A first observation to be drawn is that in cases with no tremor, the stability of the stiffness correlate is high, but the dispersion of the cyclicality coefficients is large, especially as far as c1 is concerned (see Fig. 11.b, f, j and n). On the contrary, the dispersion of c1 is low in cases with tremor (see Fig. 11.d, h, l and p). These results are compared against the validation database of normophonic and organic dysphonic subjects (males and females) as shown in the scatter plots of Fig. 12 and Fig. 13 respectively.
36
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Fig. 12 Male cases: Intra- and inter-subject cyclicality coefficient scatter plots (c2vsc1) for normophonics and organic dysphonics (green and red circles) vs normophonic non tremor (down-pointing triangles), normophonic with tremor (right-pointing triangles) and tremor-affected PD (left-pointing triangles).
In the male cases shown in Fig. 12 it may be seen that normophonic and organic dysphonic inter-class distributions are widespread (light green and red circles, respectively). The normophonic non-tremor control subject distribution (100508: down-pointing green triangles) is also very widespread on the normophonic distribution, with some samples under c2=-0.4. The normophonic tremor-affected control subject distribution (100503: right-pointing dark-red triangles) is concentrated well to the left of c1=-0.8 as expected, and within the interval (-0.2