Diploma Thesis
Modeling transient behaviour in vocal fold vibration using bifurcating nonlinear ordinary differential equation systems Fritz Menzer Communication Systems Swiss Federal Institute of Technology, Lausanne
[email protected] Supervisor Jonas Buchli
Responsible Prof. Auke Jan Ijspeert
Bio-Inspired Robotics Group Logic Systems Laboratory Swiss Federal Institute of Technology, Lausanne September 14th 2004
External Advisor Prof. David M. Howard Media Engineering Group Department of Electronics The University of York
Abstract Transient behaviour in the human voice is discussed, with particular reference to pitch breaks. Several nonlinear models for vocal fold vibration are proposed and discussed. One particular pitch break where a triple period emerges out of a double period — and vice versa — is modeled with a simple third order system similar to the R¨ossler system. For a given parameter value, the proposed model system has two intertwined attractors, one with a double and the other with a triple period. A model system more closely related to the physics of the vocal folds and the laryngograph Lx signal is also proposed. The possibilities of producing pitch breaks with this system are discussed, with particular attention to the hypothesis that pitch breaks are due to a constriction of the airflow above the vocal folds.
Contents 1 Introduction 1.1 A brief summary of speech production . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Laryngographic analysis of vocal fold vibration . . . . . . . . . . . . . . . . . . . . . 1.3 Acronyms related to voice measurement . . . . . . . . . . . . . . . . . . . . . . . . . 2 Qualitative behaviour of Lx signals 2.1 Lx waveforms for speech . . . . . . . 2.1.1 Waveform 1 (modal voice) . . 2.1.2 Waveform 2 . . . . . . . . . . 2.1.3 Waveform 3 . . . . . . . . . . 2.1.4 Waveform 4 (sinusoidal) . . . 2.1.5 Waveform 5 . . . . . . . . . . 2.1.6 Waveform 6 . . . . . . . . . . 2.1.7 Waveform 7 . . . . . . . . . . 2.2 Onset Transients in Lx Signals . . . 2.2.1 Onset 1 . . . . . . . . . . . . 2.2.2 Onset 2 . . . . . . . . . . . . 2.2.3 Onset 3 . . . . . . . . . . . . 2.2.4 Onset 4 . . . . . . . . . . . . 2.2.5 Onset 5 . . . . . . . . . . . . 2.2.6 Onset 6 . . . . . . . . . . . . 2.3 Offset Transients in Lx Signals . . . 2.3.1 Offset 1 . . . . . . . . . . . . 2.3.2 Offset 2 . . . . . . . . . . . . 2.3.3 Offset 3 . . . . . . . . . . . . 2.3.4 Offset 4 . . . . . . . . . . . . 2.4 Bifurcations in Lx Signals . . . . . . 2.4.1 Period doubling . . . . . . . . 2.4.2 From double to triple period 2.4.3 Increasing period multiples .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
3 Dynamic systems as models for Lx signals 3.1 A model for Lx signals with two state variables . . . . . . . . . . . . . . . . . . . . 3.1.1 Capabilities of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Mechanical analogy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Proof of the existence of a limit cycle in a two-dimensional system switching between two linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 A simple model for pitch breaks to double and triple periods . . . . . . . . . . . . 3.2.1 Construction of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
. . . . . . . . . . . . . . . . . . . . . . . .
3 4 4 6 7 8 8 8 9 9 9 9 10 10 10 11 11 11 12 12 12 12 13 13 13 13 13 14 15
16 . 17 . 17 . 22 . . . .
23 28 28 29
3.3 3.4
3.2.3 Controlling the model . . . . . 3.2.4 Discussion of the nonlinearity . Airflow-driven model . . . . . . . . . . 3.3.1 Results . . . . . . . . . . . . . Airflow-driven model for Lx signals . . 3.4.1 Normalisation . . . . . . . . . . 3.4.2 Results . . . . . . . . . . . . . 3.4.3 Improvements . . . . . . . . . . 3.4.4 And what about pitch breaks?
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
32 33 36 37 39 40 41 41 42
4 Conclusion 44 4.1 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 A Lx samples reference
45
B Model parameter values 46 B.1 2D model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 C Colouring of the basins of attraction in section 3.2.2
47
D LF model fitting method
48
E Tools E.1 Lx denoising . . . . . . . . . . . . . . . . E.1.1 LxNoisereduction . . . . . . . . . . E.2 Batch evaluation of ODE systems . . . . . E.3 Parameter interpolation for ODE systems E.3.1 linearparam.m . . . . . . . . . . . E.3.2 splineparam.m . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
49 49 51 51 53 53 53
F Matlab code for airflow based Lx model 54 F.1 flowlx1.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2
Chapter 1
Introduction Nonlinear phenomena and chaos received an increasing interest in the field of voice research. While some researchers have found evidence for chaotic behaviour in the human voice [17], others could show that changes in the nonlinearity of existing models can produce a variety of changes in the behaviour that also occur in the real voice [3]. While the discussion wether the human voice is chaotic or not is complicated for a variety of reasons1 , nonlinear elements have been identified in the human voice [15], [3]. On the other hand, the human voice itself shows behaviours that strongly hint at nonlinear phenomena: aperiodicity, sudden pitch changes — so called pitch breaks — that remind of period doubling known from classic nonlinear systems, but also more “exotic” effects such as pitch breaks that go from the normal period to a shorter period as well as pitch breaks where the ratio of the pitches is not integer. Not only the voice itself lets suppose that nonlinear effects may be important for some of its features: also all but the most simple models for the human voice have nonlinearities. That these nonlinearities are important has been shown in [2], where a bifurcation portrait for a model has been made and Hopf-bifurcations, period-doubling and the emergence of a quasi-periodic torus were observed. In this thesis, much interest was directed to a phenomenon that occurs relatively often in speech: period doubling and pitch breaks to higher-order periods, in particular to a triple period, which is a non-integer pitch break when starting from an already doubled period. This case is particularly interesting from a dynamic systems point of view because it clearly differs from the classic period-doubling case. Particular attention was also devoted to the development of a new model of the vocal folds that takes into account their zipper-like movement and the fact that vocal fold tissue is not rigid. This model has only three state variables, including contact area and glottal flow.
1 Often the periods during which supposedly chaotic behaviour occurs are short and one has to be very careful to define the voice producing system well (e. g. maybe it behaves chaotically because it is controlled by the brain which itself is very likely to be chaotic).
3
1.1
A brief summary of speech production
the bellows of an organ have the same function as the lungs, the tongue or mouth of a pipe creates a harmonic sound just as the vocal folds do and the body of the pipe shapes the spectrum of the sound in a way similar to how the vocal tract does it. However, in the human voice, these components are much more flexible and versatile than in an organ: in speech for example, the vocal tract is changed continuously to produce different vowels and diphthongs. The vocal folds can produce a wide range of different behaviours, varying in spectral content as well as regularity. An example for differences in spectral content can be seen by comparing normal speech to shouting (or harsh voice). An irregular behaviour occurs when the vibration of vocal folds show a period doubling or even a period tripling, which is a topic of this thesis.
The system that produces the human voice consists of three components: a power supply (the lungs), an oscillator (the vocal folds) and a resonator (the vocal tract consisting of the larynx, pharynx, and the mouth). This is illustrated in figure 1.1 below. The lungs provide a pressure difference with respect to the pressure at the lips, which induces an airflow through the vocal folds. This airflow causes the vocal folds to oscillate with the result that the airflow is “chopped up” at the frequency at which the vocal folds vibrate. This abrupt, but recurring variation of the airflow produces a harmonic sound pressure wave called the voice source. The vocal tract acts as a linear filter which gives the harmonic spectrum an overall envelope in the frequency domain. The peaks in this envelope are called “formants” and play an important role in the distinction of vowel sounds.
output sound
1.2
Studying the behaviour of the vocal folds is not a straightforward task. Any recording of human voice (i.e. a recording of the air pressure waveform made with a microphone) contains effects due to the vocal folds as well as the vocal tract. Attempts have been made to restore the signal coming from the vocal folds from a voice recording (notably using linear predictive coding and inverse filtering), but no guarantee can be given that this method really undoes the effect that the vocal tract has. Of course, placing a microphone inside the throat would be a possibility to get a signal from very near the vocal folds, but besides not being very practical, the microphone would still measure the sound waves reflected inside the vocal tract, such that even if the vocal folds would vibrate the same way all the time, changing the shape of the vocal tract would change the measured signal. Furthermore the presence of a microphone inside the vocal tract would probably change its acoustic properties. However, there are ways to obtain data from the vocal folds that does not take into account the vocal tract, such as high-speed photography (using a camera inside the vocal tract or even using x-rays) and electrolaryngography.
ampltiude
frequency
vocal tract (resonator) ampltiude
voice source frequency
vibrating vocal folds (oscillator) airstream
Laryngographic analysis of vocal fold vibration
ampltiude
frequency
lungs (power supply) Figure 1.1: Schematic representation of voice production. Image adapted from [21] The system that produces the human voice is sometimes compared to an organ [21], as the three components have equivalents in an organ: 4
Figure 1.3: The Lx signal has the same fundamental frequency as the speech signal, but is more regular and simple.
Figure 1.2: A laryngograph with the two electrodes (connected by a cable to “Lx. in”), controls for high- and lowpass filtering as well as gain control). The output signal comes from the connector at the bottom left (fitted with a phono jack adaptor in this picture). The laryngograph (see figure 1.2) is a device that measures the electrical conductivity between two electrodes placed on either side of the neck. It produces an electrical signal closely related to the contact area between the vocal folds. This signal is called Lx for Larynx excitation waveform. The principle the Lx signal is based on is very straightforward: as the contact area between the vocal folds increases, the conductivity between the electrodes also increases (i.e. the resistance reduces). This is very much the same way as the conductivity of a wire increases when the cross-section of the wire is increased. In figure 1.3 the Lx signal is compared to the voice recorded simultaneously using a microphone. It is apparent that the Lx signal is much more regular and simple, as it does only take into acount the movement of the vocal folds and is independent of the vocal tract (i.e. it is
Figure 1.4: The electrodes of the laryngograph are fitted with a grounded outer ring to prevent a current going superficially on the skin from one electrode to the other. for example impossible to identify a vowel by looking only at an Lx signal). Besides being more regular than the speech signal, the Lx signal has a very clear meaning — vocal fold contact — which allows to obtain useful information on the movement of the vocal folds from an Lx recording. Most importantly one can determine when the folds are closed and when they are open. The fraction of the time while they are closed is believed to be closely related to the efficiency of the voice [14]. The folds close very rapidly, which is due to the Bernoulli effect of the airflow in between 5
them. This results in a very steep rising edge in each period of the Lx signal (phase I in figure 1.5). The opening is much slower, making it difficult to find the passage from the opening (III) to the open (IV) phase, but often there is a relatively sharp bend in the waveform at about the same amplitude value as where the closing phase (I) started.
Lx Gx 0
200
0
200
time [ms]
400
II
vocal fold contact →
Lx−Gx
I
III
IV=OP
CP
time [ms]
400
Figure 1.6: Top: the Lx signal as it comes from the laryngograph, plotted along with the Gx signal (estimated using a 22Hz lowpass filter). Bottom: the Lx signal after removing the Gx signal
OP
time →
Figure 1.5: The main phases in one cycle of the Lx signal (adapted from [1]): I closing phase II maximum contact closed phase (CP) III opening phase IV open phase open phase (OP)
1.3 Lx
Acronyms related to voice measurement
Larynx excitation laryngograph output waveform Sp Seech pressure waveform Tx Fundamental periodic Time of excitation Fx Fundamental frequency of excitation Gx Gross low frequency larynx movement OP Vocal fold Open Phase time OQ Vocal fold Open Quotient: OP Tx · 100% CP Vocal fold Closed Phase time CQ Vocal fold Closed Quotient: CP Tx ·100% Adapted from [1].
If the vocal folds were electrically isolated from the rest of the body and in contact with the electrodes, perfect measurements could be achieved. Unfortunately, this is not the case: even though the electrodes are fitted with a grounded outer ring to prevent a current on the skin (see figure 1.4), the fact that the vocal folds are inside the larynx which can move relative to the skin, can cause huge changes in conductivity, for example if the person swallows. But even in normal speech these baseline movements — referred to as Gx (Gross larynx) — occur, often greater in amplitude than the useful and desired signal (Lx). Fortunately, the Gx and Lx signals are quite different in frequency content, so with a highpass filter with a cutoff frequency below the audible frequency range, the Gx signal may be removed quite effectively as can be seen in figure 1.6.
6
Chapter 2
Qualitative behaviour of Lx signals The goal of this chapter is to list the qualitative behaviour of Lx signals and discuss the possibility of modeling it by means of a simple dynamic system. In particular, a notation — specific to this report — of speech Lx waveforms is introduced. The same is done for onset and offset transients occurring in the Lx signal for speech. Besides onset and offset transients, also transients within phonation periods can be observed in Lx signals. In many cases, this is a sudden change of fundamental frequency — a so called pitch break. This behaviour resembles period doubling bifurcations which are a common feature of nonlinear systems.
7
2.1
Lx waveforms for speech
0.25
0.2
In this section the periodic Lx behaviour is discussed. For each type of behaviour two periods of a typical example are shown.
0.1 Amplitude
2.1.1
0.15
Waveform 1 (modal voice)
0.05
0
This is probably the most common Lx waveform in speech. Analysis of 20 seconds of speech Lx data have shown that this waveform occurs in about 60% of the time where the vocal folds are vibrating. Counted together with similar waveforms (1a, 1b and 5) they make up 88% of the vocal folds’ oscillating behaviour in the analysed data. A typical example is seen in figure 2.1.
−0.05
−0.1
−0.15
0
0.002
0.004
0.006 time [sec]
0.008
0.01
Figure 2.2: wave1a.wav A smoother example of an Lx waveform at higher fundamental frequency.
0.3 0.25
still looks like half a sinusoid, but rather than taking sin(τ ) on the interval [π, 2π], it looks as if the interval was shifted to the right by roughly π 4 . This seems to happen particularly often at low frequencies.
0.2 0.15
Amplitude
0.1 0.05 0
0.25 −0.05 0.2
−0.1
0.15
−0.15
0
0.002
0.004
0.006 time [sec]
0.008
0.01
0.1
0.012 Amplitude
−0.2
Figure 2.1: wave1.wav Typical example of modal voice Lx waveform. The signal includes a very steep rise marking the beginning of the closed phase (at 0.002s in), followed by a rounded peak and two roughly linear descending parts (roughly at 0.003s-0.0045s and 0.0045s0.005s in figure 2.1) and a wide negative peak similar to the negative half-period of a sinusoid corresponding to the open phase (0.005s0.009s).
0.05
0
−0.05
−0.1
−0.15
−0.2
0
0.002
0.004
0.006
0.008
0.01 0.012 time [sec]
0.014
0.016
0.018
0.02
Figure 2.3: wave1b.wav Example of Lx waveform at lower fundamental frequency, introducing a different shape of the open phase.
In some cases, especially for high fundamental frequencies, the waveform becomes smoother and the negative peak flatter (see figure 2.2). In particular the “kink” between the two descending parts becomes less evident. This waveform may be close to the one corresponding to falsetto voice. In other cases the open phase ends at a higher level than it starts (see figure 2.3). The shape
The waveforms may vary considerably from one speaker to another. In figure 2.4 no sinusoidal behaviour is apparent and all parts look very linear.
2.1.2
Waveform 2
This waveform occurs often when the vocal tract is not completely open, such as during the clos8
0.6
2.1.3
Waveform 3
0.5
A few cycles of this waveform appear often close to offset transients, between waveform 1 and a sinusoidal “tail”. However, this waveform may appear also during the oscillation. The waveform is caracterised by a narrow peak emerging out of what appears to be almost a complete cycle of a sinusoid (approximate inh i π 9π terval of the corresponding sine wave: 2 , 4 ). See figure 2.6.
0.4 0.3
Amplitude
0.2 0.1 0 −0.1 −0.2 −0.3 −0.4
0.35 0
0.005
time [sec]
0.01
0.015
0.3 0.25
Figure 2.4: wave1b3.wav Lx waveform similar to the one in figure 2.3 but from a different male speaker
0.2
Amplitude
0.15
ing at the ‘d’ in “words” or the opening at the ‘b’ in “base”. Therefore the amplitude and the high frequency content of the speech signal is diminished when this waveform occurs. This waveform is shown in figure 2.5. The first peak could bei modeled by a sine h π 5π wave on the interval 4 , 4 while for the open phase the interval π2 , 2π would be suitable.
0.1 0.05 0 −0.05 −0.1 −0.15
0
0.002
0.004
0.006
0.008 0.01 time [sec]
0.012
0.014
0.016
Figure 2.6: wave3.wav Lx waveform with a closed phase emerging out of a sinusoid.
0.25
0.2
2.1.4
Amplitude
0.15
It sometimes happens that the Lx signal is almost completely sinusoidal. I suppose that in this case no complete closure of the vocal folds happens. See figure 2.7.
0.1
0.05
0
2.1.5
−0.05
−0.1
Waveform 4 (sinusoidal)
0
0.005
0.01 time [sec]
0.015
Waveform 5
A behaviour similar to waveforms 1 and 1a is caracterised by smooth transitions, a not very steep rising phase and a negative peak similar to the negative half-period of a sinusoid. See figure 2.8. It is in many ways similar to waveform 1 and transitions from one to the other are very common.
0.02
Figure 2.5: wave2.wav Lx waveform with a more symmetric positive peak and a descending open phase. The waveform is caracterized by a steep rising phase (at 0.003s), a sinusoidal peak, a relatively steep, but short descending phase (at 0.007s) and a sinusoidal phase around including a negative peak.
2.1.6
Waveform 6
This waveform is interesting as it raises the question of how the open and closed phases are to be defined. See figure 2.9. 9
0.15
0.06
0.1
0.04
0.05
0.02
Amplitude
Amplitude
0
0
−0.02
−0.05
−0.1
−0.04
−0.15
−0.06
−0.08
−0.2
0
0.002
0.004
0.006
0.008 0.01 time [sec]
0.012
0.014
0.016
−0.25
0.018
0
0.002
0.004
0.006 time [sec]
0.008
0.01
0.012
Figure 2.9: wave6.wav Lx waveform with a very steep rising phase and a sinusoidal descending phase.
Figure 2.7: wave4.wav Sinusoidal Lx waveform. 0.4
0.35 0.04 0.3 0.03
0.02
0.2
0.15 Amplitude
Amplitude
0.25
0.1
0.01
0
0.05 −0.01
0
−0.05
0
0.002
0.004
0.006 time [sec]
0.008
−0.02
0.01
−0.03
Figure 2.8: wave5.wav Lx waveform with very smooth transitions and sinusoidal negative peak.
2.1.7
0
0.5
1
1.5
2
2.5 time [sec]
3
3.5
4
4.5 −3
x 10
Figure 2.10: wave7.wav Lx waveform from a female speaker at high frequency (around 430Hz).
Waveform 7
2.2
Due to the high amount of noise it is not clear if the waveform is really sinusoidal or not. However, the speech waveform is very close to sinusoidal. The difference between this waveform and waveform number 4 is the high pitch and that it persists for a relatively long time. See figure 2.10.
2.2.1
Onset Transients in Lx Signals Onset 1
Often the transition from the non-oscillating state to the steady oscillation is very fast, happening in much less time than one period of the oscillation (see figure 2.11). This results also in an almost immediate onset of the Speech signal (a very short attack phase to speak in sound synthesis terms).
10
2.2.3
0.3
Another phenomenon occuring at the beginning of the vocal fold oscillation is that the steady state is preceeded by a few sinusoidal oscillations (see figure 2.13). A possible interpretation for this behaviour is that in the beginning the vocal folds are open and when the air starts to flow through them they start to oscillate sinusoidally at first, with increasing amplitude as they draw energy from the airstream, until they start flapping together, which is the steady state. The corresponding speech signal shows the sinusoidal oscillation as well, leading to a very soft-sounding initial transient. This behaviour also occurs with the simple piecewise linear model.
0.25 0.2
Amplitude
0.15 0.1 0.05 0 −0.05 −0.1 −0.15 0.005
0.01
0.015
0.02
0.025 0.03 time [sec]
0.035
0.04
0.045
Figure 2.11: rest-osc 1.wav Very fast transition from rest to oscillation
2.2.2
Onset 3
Onset 2
In some cases, the first positive peak of the oscillation is smaller than the peaks in the steady state and seems to be emerging out of a sinusoidal oscillation as in waveform 3 (see figure 2.12 and 2.1.3). The corresponding speech signal has a longer transient than in the previous case. This is quite logical since there must be a relationship between the Lx signal and the airflow, which probably means that a smaller peak in Lx (with a less steep rising edge) corresponds to a less pronounced glottal pulse. This transient behaviour appears naturally in the piecewise linear model.
0.2
Amplitude
0.1
0
−0.1
−0.2
−0.3 0.005
0.01
0.015 time [sec]
0.02
0.025
0.03
Figure 2.13: rest-osc 3.wav Transition with some cycles of sinusoidal oscillation (waveform 4)
0.2
0.15
Amplitude
0.1
2.2.4
0.05
This is one of the more bizarre transitions (see figure 2.14). It may simply be a movement of the larynx between 0.01s and 0.015s, causing the DC offset to change. However, the short time duration of this offset change somehow contradicts this assumption. On the other hand the corresponding speech signal does not show any trace of the broad peak, but looks very much like the speech signal corresponding to one of first two types of transition. The peak could be due to some problem with the Lx recording.
0
−0.05
−0.1
0.005
0.01
0.015
0.02 time [sec]
0.025
Onset 4
0.03
Figure 2.12: rest-osc 2.wav Transition with one small peak similar to waveform 3
11
0.6
0.3
0.5
0.2
0.4
0.1
0
Amplitude
Amplitude
0.3
−0.1
0.2
0.1
0
−0.2 −0.1
−0.3
−0.4
−0.2
−0.3
0
0.005
0.01
0.015
0.02 time [sec]
0.025
0.03
0.035
0.005
0.01
0.015
0.02
0.025 time [sec]
0.03
0.035
0.04
0.045
0.05
Figure 2.15: rest-osc 5.wav Irregular behaviour with small and big peaks as well as sinusoidal oscillation at the beginning of the oscillation
Figure 2.14: rest-osc 4.wav Transition with a superimposed broad peak
2.2.5
0
Onset 5
0.3
In some cases the transition between the equilibrium and the oscillation exhibits far more irregular behaviour than the oscillation itself (see figure 2.15). While during the oscillation a well-defined (though slowly drifting) period exists, here we have a seemingly random assembly of small and big, sinusoidal and nonsinusoidal peaks. The irregularity of the Lx signal can be found also in the speech signal which sounds rather rough, compared to normal speech. This contradicts the assumption that it is solely due to an Lx recording problem.
2.2.6
0.25
0.2
Amplitude
0.15
0.1
0.05
0
−0.05
−0.1
−0.15
0
0.005
0.01
0.015
0.02
0.025 time [sec]
0.03
Figure 2.16: rest-osc 6.wav rectly to doubled period
Onset 6
Even though double period behaviour (an alternation of small and big peaks) seems to appear mainly in the middle of an oscillating phase, it may happen that the double period behaviour is entered directly from the resting phase as may be seen in figure 2.16. However, no transition from rest to triple or higher periods has been observed (which of course may be because triple periods are far more seldom than double periods).
2.3 2.3.1
0.035
0.04
0.045
0.05
Transition di-
Offset Transients in Lx Signals Offset 1
While for the onset transients a very short transition is the most common, for the oscillatingto-rest transition the most common behaviour is a “tail” of sinusoidal oscillation with decreasing amplitude. A typical example can be seen in figure 2.17. The sinusoidal behaviour appears also in the speech signal, but often fades away more quickly than in the Lx signal. This may be due to the vocal tract closing at the end of a word. 12
also be irregular behaviour at the end (see figure 2.19). There is some irregularity in the corresponding speech signal, but as the irregular phase is very short, it is not possible to identify it by listening. A recording problem cannot be excluded.
0.15
Amplitude
0.1
0.05
0
0.4 −0.05
0.3 −0.1
0.2 0.04
0.05
0.06
0.07 0.08 time [sec]
0.09
0.1
0.11
0.12
Amplitude
0.03
Figure 2.17: osc-rest 1.wav Transition with a sinusoidal fade-out
0.1
0
−0.1
2.3.2
Offset 2 −0.2
A far less common behaviour is an almost immediate transition from oscillation to rest as seen in figure 2.18. It is interesting to note that during the last four periods of oscillation seems to be a transition to double period behaviour. The speech signal for this example also ends very abruptly. By listening to the speech, one may suppose that here the sound was ended by stopping the airflow at the height of the pharinx.
−0.3
0.01
0.02
0.03 time [sec]
0.04
0.05
Figure 2.19: osc-rest 3.wav Irregular behaviour at the end of the oscillation
2.3.4
Offset 4
In some cases, the positive peaks have a tendency to change their shape into that of a decreasing exponential curve (exp−αt ). The speech signal seems to react especially on the sharp rising edge of the Lx peaks, triggering a decaying oscillation for each edge. The sound of both the Lx signal and the speech is very rough, sounding more like a grunt than a human voice (if the sound is taken out of its context).
0.3
0.25
0.2 Amplitude
0
0.15
0.1
0.05
2.4
0
Bifurcations in Lx Signals
−0.05 0.065
0.07
0.075
0.08
0.085 0.09 time [sec]
0.095
0.1
0.105
0.11
2.4.1
0.115
Depending on the speaker, period doubling was found very often. It is caracterised by one peak out of two decreasing or increasing in amplitude. This behaviour is commonly found in well-studied nonlinear systems such as the Colpitts oscillator. A typical example of period doubling (and going back to the single period) is found in figure 2.21.
Figure 2.18: osc-rest 2.wav Almost immediate Transition
2.3.3
Period doubling
Offset 3
Similar to the irregular behaviour at the beginning of an oscillating phase (see 2.2.5) there may 13
0.6
It seems as if the triple period emerges only out of double period behaviour (i.e. before the triple period, there is a double period). Three different types of triple periods have been observed: a triple length period consisting of three peaks with increasing amplitude (figure 2.22), one small peak followed by two big peaks (figure 2.23) and a series of three peaks with decreasing amplitude (figure 2.24). The increasing amplitude is the most common type. In all of the above-mentioned examples the perceived sound (of both the speech and Lx) undergoes simply a change in pitch. The tripled period can clearly be seen in the speech signal.
0.5 0.4 0.3
Amplitude
0.2 0.1 0 −0.1 −0.2 −0.3 −0.4
0
0.01
0.02
0.03
0.04
0.05 0.06 time [sec]
0.07
0.08
0.09
0.1
Figure 2.20: osc-rest 4.wav Series of long, exponential-like peaks at the end of the oscillation
0.3 0.25 0.2
The sound of the period doubling is mainly perceived as a change in fundamental frequency, both in the Lx and the speech signal.
0.15
Amplitude
0.1
0.5
0.05 0 −0.05 −0.1
0.4
−0.15
0.3
Amplitude
−0.2
0.2 0.12
0.14
0.16
0.18 time [sec]
0.2
0.22
0.1
Figure 2.22: tripleper.wav Triple period, increasing peak height
0
−0.1
−0.2 0.25
−0.3
1.5
1.55
1.6 time [sec]
1.65
1.7
0.2
0.15
Figure 2.21: doubleper.wav Period doubling
2.4.2
Amplitude
0.1
From double to triple period
0.05
0
Observing period doubling in a system may let one think of the period doubling cascade. However, this is not the way subharmonics are created in the vocal folds. Instead of creating lower subharmonics in a series of f0 , f20 , f40 , f80 , ..., where f0 is the fundamental frequency of the oscillation, the vocal folds are able to create subharmonics also a ratios that are not a power of 2: f20 , f30 , f40 and f50 have been observed.
−0.05
−0.1
−0.15
0
0.01
0.02
0.03
0.04
0.05 0.06 time [sec]
0.07
0.08
0.09
0.1
Figure 2.23: tripleper2.wav Triple period, one small peak followed by two big peaks
14
0.4
Amplitude
0.3
0.2
0.1
0
−0.1
0.01
0.02
0.03
0.04 time [sec]
0.05
0.06
Figure 2.24: tripleper3.wav Two triple periods with decreasing peak height, separated by a double period.
2.4.3
Increasing period multiples
Double and triple periods are not the only subharmonics that appear in speech Lx signals: quadruple and fivefold periods have also been observed. In figure 2.25 a transition from double to triple to fivefold period is illustrated. In this example the pitch change is not very evident in the speech signal. 0.5
0.4
Amplitude
0.3
0.2
0.1
0
−0.1
−0.2
0
0.02
0.04
0.06
0.08 time [sec]
0.1
0.12
0.14
0.16
Figure 2.25: period2 3 5.wav Double, triple and fivefold period
15
Chapter 3
Dynamic systems as models for Lx signals In this project, different models for vocal fold vibration have been studied. These models were constructed having different goals in mind. Starting from a very simple two-dimensional model that simply switches between two linear systems and whose primary goal was to be the most simple dynamic system capable of producing Lx-like signals, they range to an airflow-driven, Lx-based model of the vocal folds modeling the zipper-like opening and closing of the vocal folds, the “squeezing” of the vocal fold tissue and a constriction in the vocal tract. In between these two extremes (in terms of complexity) lies a rather abstract model for pitch breaks (sudden changes in fundamental frequency) as well as a simple airflow-driven mass-and-spring model of the vocal fold movement. In the design of these models, much emphasis was put on keeping the number of state variables as low as possible.
16
3.1
A model for Lx signals with two state variables
and 3, this hypothesis applies quite well (see also sections 2.1.2 and 2.1.3).
3.1.1
The idea behind this model was to produce signals having some of the qualities of Lx signals with a system that has only two state variables: x1 representing Lx and x2 corresponding to the velocity of the vocal folds (positive x2 meaning the vocal folds are moving apart). The state-space is defined to be divided into two regions: x1 > 0 corresponds to closed vocal folds while x1 ≤ 0 means that the vocal folds are open:
To sum it up, this model is capable of producing waveforms that consist of two sinusoidal parts as well as producing exponentially damped (or increasing) sinusoidal oscillations. Despite its simplicity the proposed model produces qualitatively correct results in many respects: • The model produces a stable oscillation for certain parameter values and a constant output for others. This corresponds to (normal) phonation and adduction prior to phonation, respectively.
x1 ≤ 0 : x˙ 1 = −bx2
(3.1)
x˙ 2 = d + ex1 + αx2
(3.2)
x1 > 0 : x˙ 1 = −ax2
(3.3)
x˙ 2 = cx1 − βx2
(3.4)
Capabilities of the model
• The point of equilibrium is in the open phase. This corresponds to the vocal folds being separated when not producing sound. • Waveforms 1a, 2 and 3 (as seen in figures 2.2, 2.5 and 2.6) as well as sinusoidal waveforms (like in figures 2.7 and 2.10) can be modeled quite accurately. With a minor modification of the system, also waveform in figure 2.8 can be approximated reasonably well.
To each half of the state-space, a linear system is associated, making the complete system a piecewise linear Filippov system, i.e. a piecewise linear system with a discontinuous righthand side [7]. The right half of the state-space, corresponding to the closed phase, follows the rules of a damped linear oscillator, which would have an equilibrium at the origin (x1 = 0, x2 = 0). However, this point is not part of the right-hand side. The left half of the state-space which models the open phase is a linear oscillator with an equilibrium point in (x1 = − de , x2 = 0) and a damping factor −α. For positive values of α this negative damping is the energy source of the oscillator. By changing the value of α it is possible to change the stability of the equilibrium point in (− de , 0) from stable (α negative) to unstable (α positive), meaning that the overall system undergoes a bifurcation creating a limit cycle (for a proof of the existence of a limit cycle in a similar system, see section 3.1.3). The proposed separation of the state-space implies the hypothesis that the open and closed phases can be separated by thresholding Lx, which is wrong in general. Nevertheless, for some modes of vibration, notably waveforms 2
• Offset transient 1 (showing a “tail” of sinusoidal oscillation) is inherent to the model. • Transients similar to onset transients showing a small peak and a few sinusoidal cycles (see figure 2.12 and figure 2.13) are natural behaviours of the model. • Onset transient 1 (see figure 2.11) can be approximated as a limit case where the speed of the transition is so fast that the sinusoidal behaviour does not appear. However, due to its simplicity, this model has some drawbacks: • Some interesting behaviour found in Lx signals, such as period doubling or irregular oscillations cannot be produced. In particular, it is not possible to produce chaotic behaviour with a two-dimensional model [20, page 203].
17
• Due to the separation between closed and open phase by means of the Lx value, it is not possible to model waveforms for which the opening and closing occurs at considerably different values of the Lx signal1 (see figure 2.3 and figure 2.9).
0.8 0.6 0.4 0.2
x2
0 −0.2 −0.4
• It is impossible to produce a waveform with an opening phase having two roughly linear parts (such as waveform 1 and 1b3 in particular).
−0.6 −0.8 −1 −1.2
• The negative damping as the driving force of the oscillator is not physically meaningful and does not represent the assumption that the Bernoulli effect of the airflow going through the closing folds provides the energy to the vocal folds (this means that the energy gain would be mainly at the end of the open phase).
−0.5
0.5
1
1.5
x1
2
2.5
3
3.5
4
Figure 3.2: Limit cycle corresponding to the simulated waveform 1a nal in the sense that it has a similar closed quotient and that the open phase is very flat. However, the shapes of the open and closed phases do not match the original. In figure 3.1, the negative peak of the open phase is around 6.4ms while for the simulated waveform while it is somewhere between 5.0ms and 5.5ms for the original waveform. On the other hand, the descending part of the closed phase peak produced by the model is significantly below the original signal between 3ms and 4ms. In the state-space trajectory in figure 3.2, one can clearly see the very distinct open and closed phases.
Waveforms 0.2
0
original model
0.15
0.1
0.05
0
−0.05 original model
0.2 −0.1
1
2
3
4
5
6 time [sec]
7
8
9
10
0.15
11 −3
x 10
0.1
Figure 3.1: Waveform 1a (see 2.1.1) and model output. The closing phase is matched very well while the minimum in the open phase occurs too late and the opening phase is significantly below the recorded waveform.
0.05
0
−0.05
Remark: the simulated waveforms have been scaled in amplitude and time to fit the recorded waveforms. When simulating waveform 1a using the system in (3.1)-(3.4), the result matches the origi-
2
4
6
8
10 time [sec]
12
14
16 −3
x 10
Figure 3.3: Waveform 2 (see 2.1.2) and model output.
1
Localising the closing and opening by choosing the steepest positive or negative gradient of the Lx signal
The two-dimensional system is capable of 18
0.6
0.4 0.3
0.4 0.2
0.2
0.1
0 x2
x2
0 −0.1
−0.2 −0.2
−0.4
−0.3 −0.4
−0.6 −0.5 −0.4
−0.2
0
0.2
0.4 x1
0.6
0.8
1
−0.8 −0.2
1.2
−0.1
0
0.1
x1
0.2
0.3
0.4
0.5
Figure 3.4: Limit cycle corresponding to the simulated waveform 2
Figure 3.6: Limit cycle corresponding to the simulated waveform 3
simulating waveform 2 (see also 2.1.2) quite well in a qualitative way. Especially the shape of the open phase seems to match very well. However, the peak corresponding to the closed phase seems to lack some steepness towards the end of the closed phase (around 4.5ms in figure 3.3).
slightly from one cycle to the next. In the first cycle, the closed phase peak starts at a slightly lower Lx value than it ends: it starts around (3ms,-0.02) and ends around (5ms,0.0). But in the second cycle the closed phase peak starts above zero (at about 11ms) and ends below zero (around 13ms). Such a behaviour cannot be modeled by a system which makes the distinction between open and closed phase simply by a threshold on the Lx value.
original model
0.3 0.25 0.2 0.15
0.4
0.1
0.35
0.05
0.3
0
original model
0.25
−0.05
0.2 −0.1
0.15 −0.15 2
4
6
8 10 time [sec]
12
14
16
0.1
−3
x 10
0.05
Figure 3.5: Waveform 3 (see 2.1.3) and model output. The closed phase peak is matched very well, but the open phase deviates a little, despite the overall shape being reasonably well matched.
0 2
Waveform 3 seems to be the one which has the most “sinusoidal” closed phase, meaning that the proposed systems provides a good model for it. Also the shape of the open phase seems to match reasonably well, but as can be seen in figure 3.5, the shape of the waveform can change
19
4
6 time [sec]
8
10
12 −3
x 10
Figure 3.7: Waveform 5 (see 2.1.5) and model output. The opening as well as the open phase show a slight mismatch. In order to model waveform 5 it was necessary to slightly modify the model. As the rising part of the closed phase peak is much steeper than the descending part, it is necessary to use
2
1
0
x2
−1
−2
−3
−4
−5
−6 −5
0
x1
5
10
Figure 3.8: Limit cycle corresponding to the simulated waveform 5 a high damping factor β. This brings the statespace trajectory close to the origin, meaning that the open phase will start out like a cosine function at zero, being very flat at the beginning and then decreasing to a minimum, much like in waveform 2 (see figure 3.3). However, waveform 5 is not at all like that, as the open phase starts with a rather steep descending slope. So the idea was to shift the equilibrium point of the linear system modeling the open phase, such that the trajectory would not have to move almost vertically when entering the open phase (as in figure 3.4). So a new parameter f was introduced which denotes the x2 -coordinate of the equilibrium point: x˙ 1 = −a(x2 − f )
(3.5)
x˙ 2 = cx1 − β(x2 − f )
(3.6)
The effect of this modification can be seen in the state-space trajectory in figure 3.8: when entering the open phase around (0,1) the trajectory does not change its direction as abruptly as it does in the other cases (in figure 3.2, figure 3.4 and figure 3.6). This translates into a much smoother transition from closed phase to open phase in the simulated Lx signal in figure 3.7.
20
This behaviour is very common and described as oscillation-to-rest transition 1 in section 2.3.1.
Offset transients
Onset transients
0.5
0.4
0.3
0.5
0.2
0.4
0.1
0.3
0
0.2
−0.1
0.1
500
1000
time [samples]
1500
0
2000
−0.1
Figure 3.9: Typical simulated final transient where the closed phase peaks suddenly disappear and a sinusoidal trail follows.
200
400
600
800 1000 time [samples]
1200
1400
1600
1800
Figure 3.11: Typical simulated onset transient 0.6 0.6 0.4 0.4 0.2 0.2
x2
0 0 x2
−0.2
−0.2 −0.4 −0.4 −0.6 −0.6 −0.8 −0.2
−0.1
0
0.1
0.2 x1
0.3
0.4
0.5
0.6 −0.8 −0.2
Figure 3.10: State-space trajectory corresponding to the signal in figure 3.9
−0.1
0
0.1
0.2 x1
0.3
0.4
0.5
0.6
Figure 3.12: State-space trajectory corresponding to the signal in figure 3.11
When the stability of the equilib changing −d rium in e , 0 , the system undergoes a bifurcation that creates a limit cycle. So by starting from a set of parameters where the system has a stable limit cycle and decreasing the bifurcation parameter α until the system has only one stable equilibrium it is possible to produce offset transients, as can be seen in figure 3.9. At a certain point the trajectory will stay in the open phase (see figure 3.10), leading to an exponentially damped sinusoidal oscillation (after sample 1000 in figure 3.9).
Analogously to the final transient described above, it is possible to produce initial transients by changing the stability of the equilibrium from stable to unstable. The resulting initial transient can be seen in figure 3.11 and is similar to the rest-oscillation transitions 2 and 3 in the sense that it has some sinusoidal oscillation as well as a small peak (see also sections 2.2.2 and 2.2.3). However, this behaviour does not match the most common initial transient (see section 2.2.1) 21
which involves a very fast transition from a nonoscillating signal to a stable oscillation. By choosing very high values for α and β it is possible to approximate such a fast transition (see figure 3.13).
1
w
2 k2
k1 m η1
6
η2 w
y1
5
Figure 3.15: Mechanical equivalent of the twodimensional vocal fold model.
4
3
Figure 3.15 shows a mechanical system that is equivalent to the system described by equations (3.1)-(3.4). “1” is a kind of “bumper” that holds the spring k1 in place when that spring is uncoupled from the mass as soon the massless plate “2” touches the right-hand wall. The equations of this system are given in (3.7)(3.10).
2
1
0
−1 200
400
600
800
1000 1200 time [samples]
1400
1600
1800
Figure 3.13: Typical simulated initial transient y1 ≤ 0 : y˙ 1 = −y2 k1 η1 y˙ 2 = (y1 + w) − y2 m m y1 > 0 :
2
0
x2
−2
y˙ 1 = −y2 k2 η1 + η2 y˙ 2 = y1 − y2 m m
−4
−6
(3.8) (3.9) (3.10)
The relationship between the state variables of the original system and the mechanical equivalent are given in (3.11) and (3.12)
−8
−10
( −1
0
1
2
x1
3
4
5
6
7
x1 =
ay1 y1 > 0 by1 y1 ≤ 0
x2 = y2
Figure 3.14: State-space trajectory corresponding to the signal in figure 3.13
3.1.2
(3.7)
(3.11) (3.12)
Comparing the equations of the model system (3.1)-(3.4) to (3.7)-(3.10) while considering (3.11) and (3.12), the following relations between the parameters a, b, c, d, e, α and β of the original model to the parameters k1 , η1 , k2 , η2 , w and m of the mechanical equivalent:
Mechanical analogy
As the vocal folds are elastic and have a mass, the idea of relating them to a mass-spring system is not too far-fetched [5]. However, this “mechanical” model is unusual in the respect that the position of the mass does not represent the position of the vocal folds, but the Lx amplitude value.
ac = eb = d = 22
k2 m k1 m k1 w m
η1 (3.13) m η1 + η2 β = m One thing that is “unphysical” is that the damping factor η1 is negative for positive α, as can be seen from equation (3.13). This means that the dashpot η1 actually produces energy. This was one of the main motivations for going to a more physically correct model.
cycle that occurs in the same way as in (3.1)(3.4). The reason for using this system is that the exact solutions of its two subsystems are much simpler.
α = −
3.1.3
x1 ≤ 0 : x˙ 1 = α(x1 + 1) − x2
(3.14)
x˙ 2 = 1 + x1 + αx2
(3.15)
x1 > 0 :
Proof of the existence of a limit cycle in a two-dimensional system switching between two linear systems
x˙ 1 = −βx1 − x2
(3.16)
x˙ 2 = x1 − βx2
(3.17)
The following hypothesis on α and β is assumed: αβ < 1 (3.18)
Intuitively it is easy to understand why there exists a limit cycle in the piecewise linear system (3.1)-(3.4) when α > 0 and β α. If a trajectory starts close to the equilibrium point in the left half of the state-space, it will spiral away from the equilibrium point due to α > 0. But if it starts very far away from the equilibrium (outside the limit cycle), it will at some point reach the right half of the state-space, which, due to β α, will bring it faster close to the origin than the left half will be able to bring it away from it. This can be seen in figure 3.16.
One particular solution of the left subsystem defined by equations (3.14) and (3.15) is x1 (t) = eαt sin t − 1 αt
x2 (t) = −e cos t
(3.19) (3.20)
as can be easily verified by calculating the derivatives of (3.19) and (3.20): x˙ 1 (t) = α e| αt{z sin }t + e| αt{z cos }t
(3.21)
−x2 (t) αt
x1 (t)+1 αt
x˙ 2 (t) = −αe cos }t + e| {z sin }t (3.22) | {z x1 (t)+1
αx2 (t)
0.6
But as the system is linear, the particular solution multiplied by a constant is still a solution:
0.4 0.2
x1 (t) = C1 eαt sin t − 1
x2
0
αt
−0.2
x2 (t) = −C1 e cos t
(3.23) (3.24)
−0.4
For the right subsystem, it can be shown in the same way that
−0.6 −0.8
x1 (t) = C2 e−βt sin t
−1
−βt
x2 (t) = −C2 e
−1.2 −0.2
0
0.2
0.4
x1
0.6
0.8
1
1.2
cos t
(3.25) (3.26)
is a solution of (3.16) and (3.17). The idea behind this proof is to find a statespace region which is trapping, i.e. from which no trajectory escapes (but that my be entered by trajectories) and that does not contain an equilibrium point. According to the BendixsonPoincar´e theorem this proves that inside this region must be a limit cycle. The boundary of such a region can be constructed from two parts of trajectories and two
Figure 3.16: Two state-space trajectory converging on the limit cycle. The blue trajectory begins close to the equilibrium point while the green trajectory begins outside the limit cycle. To formally prove the existence of a limit cycle, a slightly different system with only the essential two parameters, α and β, is used. Despite the simplification, this system has a limit 23
and from right to left if x2 > α, but when a trajectory hits the x2 axis between 0 and α, it will slide up on it until x2 = α and then leave into the left half of the state space. Following Utkins equivalent control method as described in [7], the trajectory on the sliding region is governed by
s0
1 s3
x2
0
−1
x˙ 2 =
−2
−3
−2
−1 x1
0
(3.27)
so under the hypothesis in (3.18), x˙ 2 > 0 for x2 > 0. This means that if a trajectory hits the sliding region above the origin, it will not get stuck in an equilibrium. So, now it is possible to formulate the requirements on s0 to s3 such that no trajectories escape from the trapping region through the two intervals on the x2 axis. In order to have the constraint fulfilled for [−s1 , −s2 ], it is necessary that s2 ≥ s1 > 0 (3.28)
−s1
−s2
(α + β)x22 + (1 − αβ)x2 α
1
Figure 3.17: Quiver plot representation of the system defined in equations (3.14)-(3.17) with α = 0.1, β = 0.5. The state-space region inside the red, green and blue curves contains an attractor because trajectories may only enter but not leave this region.
while for the interval [s3 , s0 ] the constraints are segments on the x2 axis. This is illustrated in figure 3.17, where the green and blue curves are parts of two different trajectories and the red lines lie on the x2 axis, on the intervals [−s2 , −s1 ] and [s3 , s0 ]. As trajectories do not cross, no trajectory enters or leaves the region through the blue or green curves. So, to fulfill the constraint that no trajectory leaves the region, it is sufficient to ensure that trajectories do not leave it through or along the red lines. So, let’s have a look at the sign of x˙ 1 just left and right to the x2 axis. From (3.14) follows that lim x˙ 1 > 0 x2 < α lim x˙ 1 < 0
x1 →0+
lim x˙ 1 < 0
x1 →0+
s0 > α
(3.30)
where (3.30) is necessary in order to prevent trajectories from sliding out of the region along the x2 axis. Now the boundaries of the trapping region can be constructed. The green curve follows a trajectory in the left part of the state space from (0, s0 ) to (0, −s1 ). By inserting the coordinates of the starting point into the solutions in (3.23) and (3.24), one can determine t0 and C1 such that the trajectory starts at (0, s0 ) in t0 :
x2 > α
C1 eαt0 sin t0 − 1 = 0 −C1 eαt0 cos t0 = s0
While (3.16) leads to lim x˙ 1 > 0
(3.29)
and
x1 →0− x1 →0−
s0 ≥ s3 > 0
⇒ tan(t0 ) = −
x2 < 0
1 s0
⇒ t0 = arctan −
x2 > 0
1 + kπ s0
where k is an integer. As (x1 (t0 ), x2 (t0 )) should be in the first quadrant, k should be odd. So k = 1 is chosen and t0 becomes
The signs of x˙ 1 and x˙ 2 are the same except for x2 ∈ [0, α]. This means that there is a sliding region on the x2 axis between 0 and α. Trajectories may pass the x2 axis from left to right if x2 < 0
t0 = arctan − 24
1 +π s0
(3.31)
Given s2 , determining the trajectory in the right half of the state-space is relatively easy: considering (3.25) and (3.26) one can see that the trajectory starts at t = 0 and that C2 = s2 . The end of the trajectory is at t = π and that leads to s3 = s2 e−βπ (3.37)
The same way C1 may be determined: 1 C1 = αt0 e sin t0 1 = α arctan − s1 +π
e
s0 C1 =
sin arctan − s10 + π
0
q
1 s20
+1
α arctan − s1 +π
By introducing (3.32) and (3.36) into (3.37), s3 can be expressed in function of s0 which allows to verify the left part of constraint (3.29):
(3.32)
0 e It is unfortunately not possible to determine s1 analytically as it involves solving an equation of the type et sin t = c for t. However, the distance from (−1, 0) of the trajectory constantly increases, which can be shown by expressing (3.23) and (3.24) in polar coordinates around (−1, 0):
s3 = =
C1 e−βπ eα(arctan α+2π) √ α2 + 1 q s0 s12 + 1e−βπ eα(arctan α+2π) 0
α arctan − s1 +π
e
0
q
x1 = 1 + r sin φ
=
s0 √
x2 = −r cos φ ⇒ r = C1 eαt
(3.33)
⇒φ = t
(3.34)
s3 =
So, s1 is guaranteed to be bigger than s0 , which fulfills the right part of (3.28). In order to find a s2 that fulfills the left side of that constraint, it is possible to define s2 as an upper bound of s1 by taking the maximum of the absolute value of the x2 coordinate of the trajectory. By setting x˙ 2 = 0 in (3.22) the time tmax when x2 has an extremum can be determined: αtmax
0 = e
q
eβπ ≥ √
e
−1 +π s0
+1
α arctan α−arctan
e
+1
−1 s0
+(α−β)π
1 s20
+1
α2
+1
α arctan α−arctan
e
−1 s0
+απ
(3.39) which can also be expressed as follows: q
⇒ tmax = arctan α + kπ
β ≥ ln √
where k is again an integer which should be chosen carefully in order to pick the right extremum. As it happens the correct choice is k = 2: k = 0 would be in the right quadrant, but then tmax would be less than t0 which means that an extremum that occurs before the trajectory hits (0, s0 ) would be chosen.
1 s20
+1
α2 + 1
+α
arctan α − arctan −1 s0 π
+α
(3.40) which can be further simplified if s0 tends to ∞:
β ≥ ln √
1 α2
+α
+1
arctan α +α π
(3.41)
and with the upper bound for the logarithm (ln(x) ≤ x − 1) and (arctan(x) < π) the following result can be obtained:
(3.35)
From (3.35) s2 can be determined: β≥√
s2 = −x2 (tmax ) = C1 eαtmax cos tmax
1 α2
+1
− 1 + 2α
(3.42)
which, finally, due to the fact that 1/(α2 +1) < 1 can be simplified to be
= C1 eα(arctan α+2π) cos(arctan α + 2π) C1 eα(arctan α+2π) √ α2 + 1
α arctan
So, to fulfill the constraint s0 ≥ s3 , β must be chosen depending on α with the following equation:
⇒ tan tmax = α
=
α2 + 1
+ 1 eα(arctan α+2π)−βπ
1 s20
α2
√
(3.38)
(−α cos tmax + sin tmax )
tmax = arctan α + 2π
1 s20
α2 + 1
q
s0 √
(3.36)
β ≥ 2α 25
(3.43)
So, it is proven that for the system in (3.14) to (3.17) with α > 0, β ≥ 2α and αβ < 1 it is possible to find a trapping region (by taking s0 sufficiently large). The condition that needs to be fulfilled to prove the existence of a limit cycle is that the trapping region may not contain an equilibrium. Two points currently violate this condition. The first one — (−1, 0) — is rather obvious as it follows directly from the the equations governing the right half of the state-space (3.14) and (3.15). Its stability can be determined very easily by calculating the eigenvalues of the system on the left: "
det
α − λ −1 1 α−λ
1
s7
s4
x
2
0
−1
−s6
−s5
−2
−3
−2
−1 x1
#
= 0
0
1
Figure 3.18: The state-space region outside the cyan, magenta and yellow curves is never left by a trajectory. α = 0.1, β = 0.5
⇒ (α − λ)2 = −1 ⇒λ = α±i
— which are joined by two straight lines on the x2 axis. The first trajectory goes from a point (0, s4 ) to the point (0, −s5 ), while the second trajectory goes from (0, −s6 ) to (0, s7 ). In order to avoid the sliding region and to ensure that trajectories may only cross from the inside to the outside, following constraints must be met: s4 ≥ α (3.44)
As it is assumed that α > 0, this equilibrium point will always be unstable. The other point is less evident as it lies on the sliding region. Equation (3.27) shows that — under the hypothesis (3.18) — the point (0, 0) is an equilibrium and suggests that this equilibrium is unstable, as
dx˙ 2 >0 dx2 x2 =0 However, there is a trajectory in the left half of the state-space that arrives on exactly that point (0, 0) in a non-infinite time. This point is degenerate. This does not affect the proof of the existence of a limit cycle because it is possible to remove both equilibria — and even the whole sliding region — from the trapping region. The principle is very simple: very much the same way as the outer boundary of the trapping region was constructed, one can construct an inner boundary that encloses the two equilibria and through which trajectories may only pass from the inside to the outside (see figure 3.18). This means that the region between the outer and the inner boundaries still is a trapping region. As with the outer boundary of the trapping region, the inner boundary consists of two parts of trajectories — one for each half-plane
s5 > s6
(3.45)
s7 > s4
(3.46)
where (3.44) is there to avoid the sliding region and (3.45), (3.46) ensure that trajectories pass the straight line segments in the right direction. To start with, one can define s4 = α, which meets constraint (3.44). From this follows, using the same calculations as the ones that lead to (3.31) and (3.32), that x1 (t) = eαt sin t − 1 x2 (t) = −eαt cos t with t0 = arctan C1 =
eαt0
−1 +π α
1 sin t0
(3.47) (3.48)
is a trajectory that starts in (0, α) at the time t0 . 26
Furthermore, at time tˆ = 2π, this trajectory passes through the point (−1, −ˆ s), where
β are chosen, in compliance with the previously determined constraints α > 0, β ≥ 2α and αβ < 1: α = 0.1 and β = 0.5 which allows to calculate
sˆ = C1 eα2π cos 2π = C1 eα2π
s7 ≈ 0.2580 > s4 = α = 0.1
So without knowing the exact value of s5 , it is possible to find an s6 that complies with constraint (3.45): because the trajectory has the property that its distance from the point (−1, 0) constantly increases, by taking s6 such that the distance between (0, −s6 ) and (−1, 0) is sˆ, one can assure that s6 < s5 , as shown in figure 3.19. s6 =
p
sˆ2 − 1
This means that there is a trapping region without equilibria or sliding regions and therefore the Bendixson-Poincar´e theorem may be applied (see also [19]), meaning that there must be a limit cycle in this region. 2 So much for the theory. That the trapping regions actually works in practice can be seen in figure 3.20.
(3.49)
1
0
s =α 4
0
2
1
x
(−1,0)
−1
^s
2 1/2 s6=(s^ −1)
^s
−2
−s
−s6
−3
5
As the time to complete the trajectory from (0, −s6 ) to (0, s7 ) is exactly π, the relation between s6 and s7 is simple: s7 = s6 e−βπ Now, that the constraints (3.44) and (3.45) have been fulfilled by construction, what remains is to check whether the last condition — s4 < s7 — holds. The complete expressions for s7 is rather complicated: s
s7 = e
e4απ (1 + α2 ) e2α(arctan
−1 +π) α
−1 x1
0
1
Figure 3.20: A trajectory (black) gets trapped in the trapping region and converges to the limit cycle. α = 0.1, β = 0.5
Figure 3.19: By taking s6 such that the distance between (−1, 0) and (0, −s6 ) is sˆ, one can guarantee that s6 < s5
−βπ
−2
−1
However, it is easy to verify numerically that s7 < s4 = α. For this, actual values for α and 27
3.2
A simple model for pitch breaks to double and triple periods
periodic windows after the first period-doubling cascade is as follows [20, 370-372]:
Subharmonic pitch breaks are interesting in this context for several reasons. On one hand, depending on the speaker they can occur relatively often in natural speech. On the other hand, period doubling is one of the most studied and well-known phenomena related to nonlinear systems. Examples of subharmonic pitch breaks can be found in sections 2.4.1 and 2.4.2. The “classic” period doubling scenario — observed in systems like the logistic map or the R¨ossler system — is that changing a parameter of the system has the following effect on the output signal: at certain parameter values the period of the output signal doubles, meaning that the sequence of the periods is
Unfortunately the observations of vocal fold vibation and the described results from dynamic systems theory do not match very well. The observed Lx signals show double and triple periods relatively often, but no period-doubling cascade. The model that is presented here gives an explanation of how the observed behaviour may relate to nonlinear dynamics. It also shows that a triple period may be observed despite the fact that no period doubling cascade nor aperiodic behaviour is observed. Furthermore this model is — very loosely — based on the assumption that a constriction of the airflow is the cause for the pitch breaks.
6, 5, 3, 2 · 3, 5, 6, 4, 6, 5, 6
{T, 2T, 4T, 8T, · · · , 2i T, · · ·}
3.2.1
where T is the initial period of the output signal. This phenomenon has been extensively studied, leading to some famous results. One of them is that the ratio of succesive parameter value intervals between period doublings has a limit when the number of period doublings goes to infinity. This limit is called the Feigenbaum constant and is universal, i.e. it does not depend on the system in question [20]. Its value is about 4.669, which has also been confirmed with “real life” systems such as nonlinear electronic circuits and even fluid convection. From this result follows the so-called “subharmonic route to chaos”: as the period doublings become more and more frequent, for a finite parameter value an infinite periodicity is reached. From this point onwards, chaos is present. Usually within the parameter region where chaotic behaviour occurs, periodic “windows” are found, i.e. some parameter intervals where the output signal is periodic. Another interesting result is that for a certain class of systems2 the occurence of periodic windows follows a fixed sequence. If only periods up to 6 are considered, then the sequence of 2
systems that may be related to a unimodal map of the form xn+1 = rf (xn ) where the term unimodal means that f must be a smooth and concave function with a single maximum [20, 370-372].
28
Construction of the model
Just like the R¨ossler system, the proposed system is based on a harmonic oscillator. The equation system in (3.50) is a linear oscillator whose stability is controlled by the parameter α. x˙ 1 = −x2 +αx1 (3.50) x˙ 2 = x1 +αx2 The parameter α is replaced by a third variable x3 that depends on itself and the other variables. The idea was to model the vibration of the vocal folds with the linear oscillator in (3.50) and the transglottal air pressure difference that drives the vocal folds with the variable x3 : if x3 greater than zero, the oscillator’s amplitude increases and if x3 is smaller than zero, the amplitude decreases. A positive value of x3 would therefore correspond to a large pressure difference and a negative value of x3 to a small — but still positive — pressure difference. Of course this analogy is very crude and should not be taken seriously as a vocal fold model. However, one of the stunning aspects of dynamic systems theory is that very simple systems can give qualitative explanations for phenomena occuring in much more complicated systems. An example is the similarity between the logistic map and the Lorenz map of the
R¨ossler system [20, 376-379], where the logistic map — a very simple, first order discrete-time system — can explain the period doubling of the R¨ossler system — a third order continuous time system. The appearance of a double period in vocal fold vibration due to a constriction of the airflow was supposed to work as follows: if the pressure difference is high, the vocal folds oscillate at a large amplitude, letting pass more air. This reduces the pressure difference, resulting in a smaller amplitude in the next cycle, which means that less air passes and the pressure difference is allowed to build up again. This idea led to two “design decisions” for the expression that controls x3 . On one hand, it should include a term that makes x3 tend to a fixed, positive value, independently of x1 and x2 . On the other hand, the term that depends on x1 and x2 should be emphasized during half of each cycle of the oscillation of x1 and x2 . This is to reflect the fact that in the vocal folds, the air passes only when the folds are open, which is roughly one half of the time and only once per oscillation period. Several different systems were designed and their capabilities to produce double and triple periods were examined. The system that produced the most interesting results and that will be further discussed here is the following:
1
0.8
x
1
0.6
0.4
0.2
0 0.1
3.2.2
0.3
0.4
0.5
c
0.6
0.7
0.8
0.9
1
Figure 3.21: Orbit diagram of the system (3.51) for the parameter range c ∈ [0.1, 1]. The “standard” period doubling cascade (from right to left) seems to be interrupted around c = 0.5. simulating the system for different values of the parameter c and plotting the maxima of x1 in function of c. The initial conditions were chosen at random in order to see the greatest variety of behaviours. The orbit diagram allows to see where a period doubling or a transition to chaotic behaviour happen: before the period doubling all maxima are of the same height and in the orbit diagram is just one line. After the period doubling big and small maxima alternate and therefore two lines appear in the orbit diagram. If the signal is chaotic, the maxima are randomly distibuted and the orbit diagram shows an area filled with dots. The result (as seen in figure 3.21) strongly resembles a period doubling cascade in its overall shape. However, around c = 0.5 it is interrupted by something like a “chaotic window”. Looking at the time signals produced for c = 0.5, one can see that the signal corresponding to the “interruption” of the period doubling cascade has a triple period with slightly irregular maxima. Besides, the double period behaviour can still be found for c = 0.5. Depending on the initial conditions, the system shows one or the other behaviour (see figure 3.22). For c = 0.5, the system has two attractors, one for the double and one for the triple period. The double period is supposed to be a limit cycle, as the maxima are precisely at two different
x˙ 1 = −x2 + x3 x1 x˙ 2 = x1 + x3 x2 x˙ 3 = b(a − x3 ) + c(arctan(x1 ) − π)(x21 − x22 ) (3.51) The parameters a and b are fixed to 0.2 while the parameter c is used to control the system. The evolution of the variable x1 is also referred to as being the “output signal” of the system. The system has a structure similar to the R¨ossler system which can be seen in (3.52) — due to the construction based on a harmonic oscillator. x˙ 1 = −x2 −x3 x˙ 2 = x1 +ax2 x˙ 3 = b +x3 (x1 − c)
0.2
(3.52)
Analysis
The first analysis carried out on this system was to draw and orbit diagram. This was done by 29
1.5
x1(t0)=−0.1277
x1
1 0.5 0
−0.5 −1 0
50
100
150
1.5
250
x (t )=−0.128 1 0
1 x1
200
0.5 0
−0.5 −1 0
50
100
t
150
200
250
Figure 3.23: Basins of attraction for the double and triple period attractors, restricted to the plane x3 = 0. Trajectories starting in blue points are attracted to the strange attractor, those starting in green points will eventually follow the double period limit cycle and the dark green shade indicates following an unstable single period orbit before being attracted to the double period limit cycle.
Figure 3.22: Evolution of the state variable x1 for different intial conditions. For x1 = −0.1277, x2 = 0.1, x3 = 0, the system follows the triple period attractor for a while and then moves to the double period attractor. If the x1 coordinate of the initial condition is changed to x1 = −0.128, the system stays on the triple period attractor. values. On the other hand, the “triple period” is only approximately a triple period because its maxima are not just at three different values, but rather randomly distributed within certain limits. Therefore the attractor does not seem to be a limit cycle but a strange attractor. Observations have shown that the system has only two attractors: a limit cycle corresponding to a double period and a strange attractor corresponding to approximately a triple period. The dependence on the initial conditions can be shown by drawing the basins of attraction for the two attractors. Of course, in a threedimensional system these are subsets of the threedimensional space, but it is possible to draw the intersection of the basins of attraction with a plane in the state space. In figure 3.23 the intersection of the basins of attraction with the plane x3 = 0 is shown. The picture in figure 3.23 shows more than just the basins of attraction. Because the colour of each point is determined from the spectrogram of the whole output signal (see appendix C), the speed at which the signal converges to the attractor plays a role. Keeping in mind that the system can seem 30
Figure 3.24: Zooming in on the center of figure 3.23 shows a complex spiraling pattern which may be a fractal. to stay on the strange attractor for a while and move to the limit cycle after a while, one would expect to see all different shades between green (limit cycle) and blue (strange attractor) between the blue and green patches in figure 3.23. Interestingly, this is not the case — at least not in the areas where the attractors intersect the plot (see figure 3.25 and figure 3.26). The explanation is that it takes only a minute
The phenomenon that for higher resolutions the system must be allowed to evolve longer is also common with fractals such as the Mandelbrot set. For the plot in figure 3.23 the time the system was given to evolve was 1000. The duration of one period is 2π, so roughly 160 periods were evaluated for each pixel. However, there is an effect due to initial transient behaviour. There are some slightly darker patches and dark lines in the green basin of attraction of the limit cycle (see figure 3.24). The lineas are due to the trajectory following closely a single period closed orbit (see figure 3.27) before being attracted to the limit cycle. The patches are due to irregular behaviour at the beginning of the trajectory. This may be of triple or higher period.
Figure 3.25: The limit cycle corresponding to the double period behaviour intersecting its basin of attraction (green) in the plane x3 = 0.
Figure 3.26: The strange attractor corresponding to triple period behaviour and its basin of attraction (blue) in the plane x3 = 0.
Figure 3.27: The single period orbit (red curve) is surrounded by the basin of attraction of the limit cycle. The reason why this single period orbit exists and why so many trajectories follow it closely may have something to do with the first period doubling (at c = 0.8 in figure 3.21). Before the period doubling there is a single period limit cycle, after the period doubling, there is a double period limit cycle. What may have happend is that the single period limit cycle has become unstable and a double period limit cycle has appeared next to it. This is just an assumption. However, this assumption is backed by the relative position of the single period orbit and the limit cycle. A trajectory starting near the single period orbit and converging to the limit cycle lies on a M¨obius band (see figure 3.29).
change in the initial conditions to change the trajectory from staying on the strange attractor to converging very early to the limit cycle. This difference in the initial condition is just much smaller than the resolution of the plot — where the distance between two pixels is 0.008, compared to a difference in the initial condition of 0.0003 in figure 3.22. At this resolution the effect of trajectories seemingly following the strange attractor before converging to the limit cycle cannot be seen. If the resolution of the plot would be increased, slightly blurred boundaries between the blue and green areas could be seen. This blur could be removed by letting the system evolve longer.
31
3.2.3
As it is established that the proposed model is capable of producing singele, double and triple periods, the question that remains is: how can the model be controlled in order to produce the desired behaviour? As can be seen in section 2.4 and in figure 3.32, Lx signals switch rapidly from one behaviour to another. Common transitions are from a single to a double period (and vice-versa) as well as from double to triple period (and viceversa). The model can produce the transition between the single and double period behaviour in a very straightforward way using the period doubling bifurcation around c = 0.8. If the parameter value is above the bifurcation point, the trajectory will eventually converge to the single period cycle. On the other side of the bifurcation, it will converge to the double period cycle. Therefore, by changing the bifurcation parameter, it is possible to force the system into one or the other behaviour. Reproducing the transition from a triple to a double period is also simple: when the system is on the strange attractor that produces the triple period output, one just needs to change the parameter of the system until the strange attractor ceases to exist. Another way would be to perturb the system so that it lands on the limit cycle. However, there is one case where it is not possible to force the system into the desired state: as there is no parameter value where only
x3
0.4 0 −0.4
0.4
0.4
0 0 −0.4
x2
−0.4
x1
−0.8
Figure 3.28: The limit cycle (green) and the strange attractor (blue) are intertwined.
0
x
3
0.2
−0.2
0.4 0.4 0 x2
0 −0.4
Controlling the model
−0.4 x1
Figure 3.29: A trajectory (red) starting near the single period orbit and converging the double period limit cycle (black) lies on a M¨obius band (a “ribbon” with only one edge and only one surface).
tory hits the cross-section in function where it hit it the last time. If the trajectory is on the limit cycle, it will alternate between one side and the other of the crosssection. If it is exactly on the periodic orbit, it wil stay on it, and will hit the cross-section always at the same point (the unstable equilibrium of the map). Any other trajectory will converge to the outer egdes of the section, flipping from one side of the unstable equilibrium to the other. Now, let’s consider the second iterate of the described map (i.e. just consider one cross-section “hit” out of two). The map will now still have an unstable equilibrium where the periodic orbit passes, but the “flipping” edge will be split into two seperate stable equilibria. When the width of the M¨ obius band becomes smaller and smaller, the two stable equilibria at the edges of the cross-section will join the unstable equilibrium and become a single stable equilibrium — which is a supercritical Hopf bifurcation.
A way the period doubling could happen is best illustrated if the period doubling is considered “the wrong way round”, from a double period to a single period: the width of the M¨obius band between the unstable orbit and the double period limit cycle could simply become smaller and smaller as c increases until it is only a single period limit cycle. This can be related to a supercritical Hopf bifurcation of the second iterate map of the Poincar´e section across the M¨obius band3 . 3 Let’s, as a “Gedankenexperiment”, consider a crosssection through this M¨ obius band and the Poincar´e map that represents the position of the point where a trajec-
32
1
the strange attractor exists, it is not possible to force the system on it. So, if brute force does not help, maybe it is still possible to persuade the system to do what is desired... Experiments have shown that the system reacts strongly to short-time parameter variations. In figure 3.30, the evolution of x1 is shown along with the value of the parameter c. A short dip of c from 0.5 to 0.3, followed by a slow increase back to 0.5 can bring the system from the limit cycle to the strange attractor. 1
−1 0 1
100
150
200
250
150
200
250
140
150 x1
110
120
130
140
150 x1 c
−1 100
110
120
t
130
140
150
Figure 3.31: Zoom on the parameter change in figure 3.30. The parameter “dip” must be timed on shortly before a maximum of the output signal. 0.25
300 x1
0.2
c
0 −1 0
130
0
c
100
120
c
−1 100 1
x1
50
110
0
300
0 −1 0 1
−1 100 1
c
50
c
0
x1
0
x1
0.15
50
100
150 t
200
250
0.1
300
0.05
Figure 3.30: A short dip of the parameter c from 0.5 to 0.3 can bring the system on the strange attractor (bottom). The top and middle graphs show the same parameter curve, but shifted to the left by 4 and 2, respectively, which does not yiel the desired result.
0 −0.05 −0.1 0.05
What is interesting to note is that the timing of the parameter is important. If the “dip” is not timed shortly before a maximum of the output signal, the perturbation does not succeed in bringing the system on the strange attractor. One of the main goals of using nonlinear systems to model vocal fold behaviour is that simple causes can have complicated effects similar to the ones found in the real vocal folds. In figure 3.33, an example is shown, how a simple change in the parameter c can produce a double, triple and single period behaviour similar to one found in a real-life Lx signal (figure 3.32).
0.1
0.15
0.2 t [sec]
0.25
0.3
0.35
Figure 3.32: An Lx signal from the word “score”. The signal shows a double period up to about t = 0.1sec, followed by two cycles of triple period behaviour and single period behaviour from t = 0.2sec.
3.2.4
Discussion of the nonlinearity
During the analysis of the model in (3.51) it became apparent that the values of x1 usually lie between -1 and 1. In this range the function arctan(x1 ) in the expression for x˙3 is relatively close to x1 (see figure 3.34). So it was only natural to explore how the system behaves when this nonlineariy is linearised. The partly linearised system is described by
33
1
1
original system
0.5
0.8
x1
x2 (scaled) c
0
−0.5
0.6
−1 0
0.4
50
100
150
200
1
250
300
arctan linearised
0.5 x1
0.2
0
−0.5
0 0
50
100
150 t
200
250
−1 0
300
50
100
150 t
200
250
300
Figure 3.35: For many initial conditions, the behaviour of the systems remains similar when arctan(x1 ) is linearised.
Figure 3.33: A signal with the same sequence of double, triple and single periods as in figure 3.32 can be produced with a very simple parameter change, consisting of only four straight lines.
However, it seems as if the triple period cannot be sustained, i.e. that the strange attractor does not exist anymore in the new system (see figure 3.36). This is a very empirical result, based on simulations of the new system with initial conditions from points known to belong to the basin of attraction of the strange attractor of the old system.
−2.2 −2.4 −2.6 −2.8 −3 −3.2 −3.4
1
−3.6
original system
0.5 x1
−3.8 arctan(x1)−π x1−π
−4 −1
−0.5
0 x1
0.5
−0.5
1
−1 0
50
100
150
200
1
Figure 3.34: The term that emphasizes the negative values of x1 in the expression for x˙3 , arctan(x1 ) − π and its linearisation x1 − π on the usual range of x1 values.
250
300
arctan linearised
x1
0.5 0
−0.5 −1 0
the following equations: x˙ 1 = −x2 + x3 x1 x˙ 2 = x1 + x3 x2 x˙ 3 = b(a − x3 ) + c(x1 − π)(x21 − x22 )
0
50
100
150 t
200
250
300
Figure 3.36: In one regard, the linearisation of arctan(x1 ) qualitatively changes the system: the strange attractor seems to have disappeared. Some trajectories seem to follow it for a while, but none has been observed to stay on it.
(3.53)
Simulations have shown that for many initial conditions the new system behaves very similarly to the original model. There is still a double period limit cycle and trajectories often show an initial transient with a triple period (see figure 3.35).
In any case, this does not impair the ability of the new system to be a model for transient behaviour of vocal fold vibrations: the system 34
is still able to produce triple periods during a short time, but not to sustain them during a long period of time. This does not contradict with observations of vocal fold vibration, where triple periods have been observed only during short times. I made the attempt to record Lx signals with sustained double and triple periods using my own voice and succeeded only for the double period. There is another very interesting point about this linearisation: when looking at the system in (3.53) under the point of view that x3 controls the amplitude of the oscillator composed of x1 and x2 (i.e. that the nonlinearities in the expressions for x˙ 1 and x˙ 2 are just there to enable this control), the single nonlinear term in the last equation becomes the essential part of the system: x21 − x22 . What is fascinating about this is that this term essentially converts the oscillation of x1 and x2 into an oscillation of the double frequency (half the period).
spectrum of x1
0
0
0
x1 x2 x3 t
8
10
2
3
4
5
1
2
Frequency
3
4
5
Because of the interaction between the oscillator and x3 as well as the other factors and terms in the expression for x˙ 3 , the solution for x3 is not simply a sinusoid with twice the frequency of x1 , but as can be seen in figure 3.38, the double frequency component is very strong. What is fascinating about this is that a nonlinearity that doubles the frequency of a signal is actually responsible for pitch breaks to lower frequencies. This may even suggest that there could be a single mechanism that is capable of producing pitch breaks to higher as well as lower frequencies. Of course, both pitch breaks can occur in the human voice (as an example, pitch breaks to a higher frequency can occur when screaming).
−0.2
6
1
= 2 cos2 t − 1 1 = 2 · (cos(2t) + 1) − 1 2 = cos(2t)
−0.1
4
5
= cos2 t − 1 + cos2 t
0
2
4
Figure 3.38: Spectrum of the signals in figure 3.37. x3 has a very strong harmonic at F = 2, which supports the hypothesis that its doubles the frequency of the oscillation of x1 and x2 .
0.1
0
3
spectrum of x3
0.2
−0.4
2
spectrum of x2
0.3
−0.3
1
12
Figure 3.37: Evolution of the state variables in time-domain. x3 seems to oscillate twice as fast as x1 and x2 . This can be seen experimentally (see figure 3.37), but also analytically: suppose the oscillator (x1 , x2 ) to be “decoupled” from x3 (i.e. suppose x3 = 0). A particular solution of the linear oscillator is x1 = cos t, x2 = sin t. When x21 − x22 is evaluated for this solution, the result is an oscillation at the double frequency: x21 − x22 = cos2 t − sin2 t 35
3.3
Airflow-driven model
sure in a fluid decreases when the speed of the flow increases. So, when the air flows fast between the vocal folds, it pulls them together. Therefore, in order to have energy provided to the vocal fold vibration by the glottal flow, it is necessary that the glottal flow is faster in the closing phase than in the opening phase. This is naturally the case as the air has a certain inertia. In the model presented here, this inertia is simply modeled by a certain “lag” between the steady-state flow rate and the actual flow rate. The pressure between the vocal folds pf is ϕ (p being the external pressure and vf = l(w−x i) the speed of the glottal flow):
The first step towards a more physical model of the vocal folds was to implement a one-mass model, driven by an airflow. Having the glottal flow as a state variable also allows to derive the voice source by taking the first derivative of the glottal flow. One-mass models are the simplest physical models of the vocal folds [5]. They consist of a mass-spring-damper system which is meant to model one vocal fold. This system experiences a force due to an airflow which provides energy to the system. Only one fold is modeled, as it is assumed that the vocal folds vibrate synchronously.
k1
k2
h
m η1
p
l
η2
pf
= p + ∆p +
pf
= p + ∆p +
w p + ∆p
ρϕ2 2
ρ 2
l(w − x1 ) − 1 vf lwi
1 1 − 2 2 l wi (w − x1 ) l (w − x1 )2
This expression goes to infinity when the vocal folds close (i.e. x1 = w and the flow is non-zero. In order to avoid this problem (which causes infinite forces), a correction term cϕ depending on x1 is introduced, which is multiplied with the flow rate and which tends to zero sufficiently fast in order to avoid the infinite force (see next paragraph). During the open phase (i.e. when x1 < w), the system is governed by following equations which take into account the spring k1 , the damping element η1 and the Bernoulli effect due to the glottal flow x3 :
wi Figure 3.39: Physical model of the vocal folds. The mass-spring system m, k1 , η1 represents the mass and tension of the vocal folds while a massless plated connected to the mass by a spring k2 and a damper η2 models the fact that vocal fold tissue can be “squeezed” when the folds touch. The line between the spring k2 and the damper η2 represents a rope of the length that corresponds to the equilibrium position of k2 . It keeps the massless plate close enough to the mass such that the spring k2 can only be compressed but not extended. The displacement of the mass from its equilibrium position is the state variable x1 .
x˙ 1 = x2 −k1 x1 − η1 x2 x˙ 2 = m +
The model described here extends the model described in [5] by a second spring (k2 ) and a second damping element (η2 ) which is meant to model the “squeezing” of the vocal fold tissue that occurs when the vocal folds touch (see figure 3.39). The force due to the glottal flow is calculated using the Bernoulli effect (which relies on the — incorrect — hypothesis that air is not compressible). Basically it states that the pres-
x˙ 3
lhcϕ ρ
x23 2
l2 w
1 i (w−x1 )
1 l2 (w−x
1
)2
m ( r1 (ϕsteady − x3 ) ϕsteady > x3 = r2 (ϕsteady − x3 ) otherwise
where cϕ =
w−x1 w w−x1 w + c/10
and ϕsteady = 36
−
!3
∆p l(w − x1 )3 h 12µ
for 0 < t ≤ te and an exponential return phase for the rest of the cycle (te < t ≤ tc ):
and where ρ is the density of air and µ the viscosity coefficient of air. During the closed phase, the spring k2 and the damping element η2 must be considered, too, and the glottal flow is supposed to be zero:
E(t) =
Here, a slightly simplified version of the LF model is used, where for te < t ≤ tc the term that guarantees that E(tc ) = 0 is removed:
x˙ 1 = x2 −k1 x1 − k2 (x1 − w) − (η1 + η2 )x2 x˙ 2 = m x˙ 3 = −r3 x3
3.3.1
E(t) =
Results
−E0 −(t−te ) e ta
This corresponds better to the model where in the closed phase the glottal flow x3 is governed by the equation
−3
4
−E0 −(t−te ) e − e−(tc −te ) ta
x 10
2
x˙ 3 = −r3 x3
0 −2 0 4
x1 [m] 0.01
0.02
which simply produces a decreasing exponential, meaning that the glottal flow derivative is an exponential as well and will therefore never be zero.
0.03
2 0 −2 0 1.5
x2=dx1/dt [m/s] 0.01
0.02
0.02
0.03
0.01
1 0.5
x3=φ [m3/s] 0.02 t [sec]
0
0.03
3 2
0.01
dφ/dt [m /s ]
0 0
simulated L−F model
Figure 3.40: State variables of the airflow-based model during stable oscillation.
−0.01
−0.02
−0.03
Simulations of this model produce very much the expected behaviour: a stable oscillation of all state variables. The most interesting state variable in this model is the glottal flow because it is related to what the voice sounds like. It is assumed that the first derivative of the glottal flow corresponds to the pressure wave produced by it, i.e. the voice source. There exists a widely accepted model of the glottal flow derivative, named the LF model after the people who developed it, Fant and Liljencrants. It is a purely mathematical model which expresses each cycle of the glottal flow derivative E(t) as being composed of two parts: a sinusoid multiplied by an exponential as the first part E(t) = E0 eαt sin ωg t
−0.04 −1
0
1
2
3
t [sec]
4
5
6
7
8 −3
x 10
Figure 3.41: LF model fitted to the derivative of one cycle of simulated glottal flow. The first rising edge is straight in the LF model, but curved in the simulated waveform. The negative peak is sharp in the LF model but rounded in the simulation. The LF model was also used to validate the simulated glottal flow, which can be seen in figure 3.41. The method used for the fitting is described in appendix D. Globally the result fits reasonably well. In some points, however, there are discrepancies: the first rising edge in the simulated waveform is curved and not straight as in the LF model and the negative peak of the 37
simulated signal is by far not as sharp as the in the LF model. Both differences between simulation and LF model make that the LF model waveform has more energy in the high frequencies than the simulated waveform. So when used as a voice source for a vocal tract model, the simulated waveform will make the voice sound more dull.
38
3.4
The goal of creating this model was to combine the Lx signal and the glottal flow into one model. This approach is very similar to the one described in [6] in many regards: Both models take into account the zipper-like opening and closing of the vocal folds, have a non-rectangular shape of the glottal area and a continuously varying vocal fold contact area (as opposed to models where the vocal fold is divided into parts that can either be in contact or not, leading to a piecewise constant contact area). There is one major difference though: the model presented in [6] uses a two-mass model4 as a base, while the model presented in this report is a one-mass model with only three state variables. The mechanical model (figure 3.42) supposes symmetric oscillation and sees a vocal fold as a rigid, massless bar of length l pivoting around a point P like a hinge (corresponding to the point where the folds are attached to the thyroid cartilage5 ). A mass m is attached to the bar at a distance lm from P . The pivoting of the bar is controlled by rotational spring and damper elements k1 and η1 , respectively (not drawn in figure 3.42). The “squeezable” part of the vocal fold tissue (grey area in figure 3.42) is supposed to have a width wt . Depending on the angle α between the bar and the symmetry axis, the length on which the vocal folds are closed can be calculated: wt lc = sin α Therefore the length on which the folds are open is lo = l − lc and the glottal area is lo2 sin α. The distinction between open and closed phase is made on lc . If lc < l the glottal area is nonzero and the vocal folds are open. During the open phase the contact area calculation is based on the assumption that the vertical extension of the contact area is constant: hcvo where h is the height of the vocal folds (or thickness) and cvo is a constant be4 5
P
Airflow-driven model for Lx signals
Flanagan-Ishizaka Also called “Adam’s apple”
39
lc
lm
α
l
m w
lo
wt Figure 3.42: “Hinge” mechanism of the airflowdriven Lx model. The vocal fold is modeled by a rigid bar with a mass and “squeezable tissue” (drawn in grey) attached to it. The glottal area (blue) is triangular. tween 0 and 1 (see figure 3.43). The expression for the contact area is therefore C = lc hcvo
hcvo h
Figure 3.43: The vocal fold contact area (grey) can be calculated from lc and hcvo However, there is a different way of looking at the contact area leading to the same mathematical expression: the tissue is probably more squeezed at the point P than at a distance lc from P where the folds barely touch. Therefore it seems natural that the contact area is triangular. If the height of the triangle at P is
Expressing x˙ 1 in function of l˙c , this gives the result x2 x2 x˙ 1 = l˙c hcvo = − 1 (3.56) hcvo wt
2hcvo , then the area remains the same (see figure 3.44). It also implies that the cross-section of the vocal folds’ “squeezable” tissue is triangular, rather than quadrilateral.
This is one of the nonlinearities of the system. Another one is of course due to the airflow. In the case where the vocal folds are closed, the expression for x˙ 1 is linear and much simpler:
2hcvo h
x˙ 1 = −l2 sx2 The other parts of the equation system are very similar to the system described in section 3.3. The Matlab code for this system can be found in appendix F as it is an important system.
Figure 3.44: Also the assumption that the vocal fold contact area is triangular leads to the same area as the assumption that it is rectangular (figure 3.43) if the height of the triangle is chosen as 2hcvo .
3.4.1
One of the problems that were encountered when integrating the system as described above with Matlab was that the ODE solver would often stop with an error message about tolerances that could not be met. The problem was that the ranges of the state variables were different by several orders of magnitude, leading to numerical problems. The solution was to normalise the system. For this normalisation constants were needed. The approach taken here was to look for expressions depending only on the parameters of the system and giving a reasonable estimate of the expected value of the state variable, bringing the range as close to [0, 1] or [−1, 1] as possible. For the contact area this was very simple: one could simply take lh to normalise x1 , resulting in a normalised state variable with a range of [0, 1] – in theory at least. In reality, x1 can go beyond lh when the vocal folds are squeezed very much. But still, a range of about [0, 3] is acceptable. Estimating the angular velocity x2 was a bit more difficult, but a reasonable q normalisation constant has been found with km1 , which is the angular velocity of a mass-spring system with mass m and spring constant k1 . For x3 , the normalisation constant was chosen as a function of the steady-state flow for the case where the vocal folds are in their equilib-
When the folds are closed, the change of contact area is due to the “squeezing” of the tissue, based on the assumption that beyond a distance hcvo from the upper side of the folds, they have a linear shape (as in figure 3.43 and figure 3.44) with a steepness s. The system that implements this model has three state variables: x1 is the contact area, x2 = α˙ the angular velocity and x3 the airflow. This means that there is no state variable that represents the position of the vocal fold. This information must be determined from the contact area. 1 From x1 one can easily calculate lc = hcxvo , which allows to determine if the vocal folds are open or not. If they are open (i.e. lc < l), sin α can be calculated as sin α = hcvo wt /x1
(3.54)
Replacing x1 by lc hcvo in (3.54) leads to lc sin α = wt
Normalisation
(3.55)
Therefore, using at one point the assumption that α is small: wt lc = sin α ˙lc = − wt cos α α˙ sin2 α | {z } ≈1
x2 = − 2 21 x2 h cvo wt 40
1.5
rium position. The constant is x1
1
3
0.5
w ∆p lm l l h 12µ
0.01
0.02
0.03
0.04
0.05
0.01
0.02
0.03
0.04
0.05
0.01
0.02
0.03
0.04
0.05
0
Results
x2
3.4.2
0 0 0.02
−0.02 −0.04 0 0.2 x3
All presented results are from the normalised system with changing cvo as described in section 3.4.3. They cover mainly the waveforms that posed some problems with the simple 2D model (section 3.1).
0.1 0 0
time [sec]
3
Figure 3.47: A reasonable waveform similar to waveform 1a and 5
x1
2 1 0.01
0.02
0.03
0.04
1.5
0.05
1 x1
0 0 0.1
0.5
x2
0
0.01
0.02
0.03
0.04
0 0 0.02
0.05
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.01
0.02
0.03
0.04 0.05 time [sec]
0.06
0.07
0.08
0
x3
x2
−0.1 0 0.1
−0.02
0.05
0.01
0.02
time [sec]
0.03
0.04
−0.04 0 0.2
0.05
x3
0 0
Figure 3.45: x1 is similar to the waveform 1b (see section 2.1.1).
0 0
Figure 3.48: A bizarre waveform with a steep falling edge. This waveform does not seem to occur in real speech.
x1
4 2
0 0 0.2
0.01
0.02
0.03
0.04
0.05
very close to the onset transient described in 2.2.1, while the Lx signal in figure 3.47 is very similar to the onset transient in 2.2.2.
x2
0
x3
−0.2 0 0.2
0.01
0.02
0.03
0.04
0.05
3.4.3
0.1 0 0
0.1
Improvements
Changing cvo 0.01
0.02
time [sec]
0.03
0.04
One of the improvements made to the model was to allow a change of cvo during the open phase. This enables the production of waveforms where the open phase starts at a different amplitude than it ends. In waveform 1b this is very prominent (see figure 2.3).
0.05
Figure 3.46: An extreme case with a very long closed phase. That is the closest the model got to the Lx waveform 6. It is interesting to notice that the new model automatically produces very sudden onset transients. Those in figure 3.45 and figure 3.46 come 41
pext Constriction
Parabolic vocal fold cross-section Another improvement made to the model described above was to change the shape of the cross-section of the vocal fold, such that the part below hcvo is a parabola (see figure 3.49). This means that towards the beginning and the end of the closed phase the contact area changes more rapidly. This can help improve the model fitting the shape of the closed phase peak in waveforms 1a, 2, and 5 (see figures 3.1, 3.3, 3.7, where the 2D model has some problems).
Vocal Folds
pc (1 − x4 )∆p pint
Figure 3.50: Physical model of a constriction of the airflow. The airflow through the constriction depends on the pressure inside the volume between vocal folds and constriction. The pressure difference ∆p that drives the vocal fold vibration is between the subglottal pressure pint and pc and therefore variable.
hcvo h
and (1 − x4 )∆p = pint − pc
As pint − pc is now the pressure difference that drives the vocal folds — instead of ∆p — it follows from (3.57) that all the occurrences of ∆p in the old model must be multiplied by the coefficient (1 − x4 ). The dynamics of x4 itself are governed by the following equation:
Figure 3.49: Cross-sections of vocal folds. Left: conventional shape, Right: new parabolic shape.
3.4.4
And what about pitch breaks?
Having a new vocal fold model producing reasonable results and remembering the success in producing double and triple period pitch breaks with a system that is somehow based on the concept of a constriction of the airflow above the vocal folds, it is a logical step to add something to the new vocal fold model that simulates a constriction of the airflow. Doing so in the hope that this new model is capable of producing pitch breaks. The physical model for Lx signals can easily be adjusted to simulate a constriction in the airflow somewhere in the vocal tract. A schema of how the model is extended above the vocal folds is shown in figure 3.50. In terms of the equation system this means that an additional variable x4 is introduced which is related to the pressure pc . In fact x4 =
(3.57)
pc − pext ∆p 42
x˙ 4 =
x4 + pext /∆p (x3 − ϕom x4 ) Vc
/∆p is derived directly from The coefficient x4 +pVext c the ideal gas law pV = N kT . It depends inversely on the volume Vc between the vocal folds and the constriction. This means that for a larger volume, the effect on x4 will be smaller. The second coefficient (x3 − ϕom x4 ) is simply the balance of what comes into that volume and what goes out, x3 being the airflow through the vocal folds (therefore into the volume) and the constant ϕom multiplied by x4 is what goes out. The latter simply states that the outflow is proportional to pc − pext . Every time air flows through the vocal folds, this adds to the pressure pc in the volume between the the folds and the constriction, therefore reducing the force driving the vocal fold oscillation. So the hope was that a period doubling could happen as follows: in one cycle a
large oscillation happens and much air goes into the volume Vc , leading to a reduced pressure difference to drive the next cycle which comes out smaller, letting less air into Vc and therefore allowing the pressure difference to build up again, where the whole thing starts again. In theory this would produce a double period limit cycle. In practice it does not work. Several attempts were made to modify the model by introducing new nonlinearities, such as making the in- or outbound airflow dependent of a power of the respective pressure difference, without success. The way x4 (and therefore the pressure in Vc ) varied from cycle to cycle, was studied more in detail using a Poincar´e map. For the map a sequence x4 [n] was constructed by evaluating x4 at the times where x1 crossed a threshold set at cvo with a negative derivative, i.e. at the end of each closed phase. The resulting sequence was plotted x4 [n + 1] against x4 [n] in order to reveal a function f such that x4 [n + 1] = f (x4 [n]) if it exists6 . In fact, this function does exist and it is very deceptive as can be seen in figure 3.51. It is simply a straight line, crossing the line x4 [n + 1] = x4 [n] at about 0.15 with a slope between 0 and 1, which means that inevitably x4 [n] will tend to 0.15. In order to have a period doubling bifurcation, it would be necessary to have a negative slope of −1 where x = f (x). 0.6
0.5
x4(n+1)
0.4
0.3
0.2
0.1
0 0
0.1
0.2
0.3 x4(n)
0.4
0.5
0.6
Figure 3.51: Poincar´e map for x4 evaluated at the end of the closed phase of each cycle. 6
If this function exists, it means that the Poincar´e map is one-dimensional, which is not necessarily the case
43
Chapter 4
Conclusion This project deals with a wide range of topics. Stationary and transient vocal fold movement was analysed. This was modeled, raising new hypotheses on pitch breaks and a new physical model of the vocal folds that simulates the contact area and the glottal flow was developed. A simple two-dimensional model simulated the Lx signal and was qualitatively correct in many ways. Furthermore it was possible to get a greater understanding of the system by proving the existence of a limit cycle. Finding a system that produces pitch breaks with a non-integer frequency ratio was very interesting because this case is different from the classic period-doubling scenario. It turned out that these pitch breaks are not directly due to a bifurcation. Instead the system is perturbed to go from one attractor to another, both of which coexist for a given parameter range. A system where different attractors coexist may offer a greater range of possible behaviours than one with only one attractor. If the attractors can bifurcate independently of each other, many different combinations are possible. Studying such systems is a domain where further work could take place. The model also raised the question if there may be a single mechanism responsible for pitch breaks to frequencies that are higher or lower than the normal vibration frequency. This came up because the system that produces the double and triple period pitch breaks contains a term producing a signal whose period is half of the signle period. Of course this question would be interesting to explore in future work. The physical models of vocal fold vibration were designed with the aim of keeping the number of state variables as low as possible, therefore making it easier to analyse the system. The 44
result is a third order system having the contact area and the glottal airflow as state variables. In terms of the number of state variables this system is in the same class as a one-mass model driven by the glottal flow. However, it has more features than are usually found in a one-mass model. It also simulates the zipper-like opening and closing of the folds and takes into account a deformation of the vocal fold tissue. The physical model producing Lx could be used to simulate the voice source based on a recorded Lx signal, by matching the model parameters to the Lx signal, running the model and deriving the voice source from the simulated glottal flow. An application of bifurcating nonlinear models could be to use them to drive real-time voice synthesis. This may contribute to a more natural sound. However, it must be considered that much of the naturalness of a sound has little to do with the vibration model itself, but with the way it is controlled. Adding some vibrato to a static waveform can already produce convincing results. Note that vibrato in the human voice can be voluntarily controlled. Therefore it has more to do with how the vocal folds are handled than with the underlying mechanism of vibration.
4.1
Acknowledgements
I would like to thank the people involved in supervising this project: Jonas Buchli for the good discussions, encouraging me to develop the pitch break model. Prof. David Howard for sharing some of his knowledge of the human voice with me and Prof. Auke Ijspeert for accepting me for this project.
Appendix A
Lx samples reference sample wave1 wave1a wave1b wave1b2 wave1b3 wave2 wave3 wave4 wave5 wave6 wave7 rest-osc 1 rest-osc 2 rest-osc 3 rest-osc 4 rest-osc 5 rest-osc 6 osc-rest 1 osc-rest 2 osc-rest 3 osc-rest 4 doubleper tripleper tripleper2 tripleper3 period2 3 5 sinusoidal littlesine nine
source
track
AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS
12 12 12 12 14 12 12 12 12 12 8 12 12 12 12 12 12 12 12 12 10 12 12 12 10 12 12 12 12
min 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
time sec 2.48 2.55 2.78 4.45 32.25 2.88 2.68 3.0 3.76 41.02 44.11 2.24 3.42 3.93 6.40 6.79 21.75 11.0 36.8 33.56 17.8 15.7 28.8 49.82 39.33 45.2 14.85 59.45 49.8
Source abbreviations
AS
Anechoic Speech CD
45
Appendix B
Model parameter values B.1
2D model
behaviour waveform 1a waveform 2 waveform 3 waveform 5 rest-osc 1 rest-osc 2/3 osc-rest 1
ODE function pwlinvf5n pwlinvf5n pwlinvf5n pwlinvf6n pwlinvf5nb pwlinvf5nb pwlinvf5nb
a 100 100 100 100 100 100 100
b 3 10 6 30 6 6 6
c 2 5 50 8 50 50 50
d 10 10 10 -5 10 10 10
46
e 100 100 160 7 160 160 160
f
-2
α 5 5 5 2
α1
α2
-1 -1 10
10 40 -5
β 20 20 40 25 80 120 80
t1
t2
0 0.5 0.5
0.5 0.5 1
Appendix C
Colouring of the basins of attraction in section 3.2.2 The aim of this colouring method is to capture not only the attractor to which a trajectory converges, but also the speed at which it converges. The attractors that should be distinguished produce output signals with a double or a triple period, i.e. whose fundamental frequency is either one half or one third of a given frequency f0 . Therefore the frequency spectrum below f0 has one peak at f20 for the double period and two peaks at f30 and 2f30 for the triple period (see figure C.1).
possible peaks (see figure C.1). A double period trajectory whose spectrum has only one peak below f0 — at f20 — will be coloured in green. A triple period trajectory with peaks at f0 2f0 3 and 3 will get mainly red and blue, hence the violet colour. This scheme takes into account the whole signal and not just the state at the end of the evolution. For example, a trajectory that follows a single period orbit at first and converges only slowly to the double period limit cycle, will have a smaller peak at f20 and will therefore be coloured in a darker shade of green.
12000
10000
8000
6000
4000
2000
0 0
100
200
FFT bins
300
400
500
Figure C.1: Spectra of double and triple period output signals. The green curve is from a double period and the violet curve from a triple period. The colours are chosen based on the values of the spectrum at the highlighted frequencies: FFT bin 54 for the red channel, FFT bin 81 for green and FFT bin 107 for blue. The colour for a given spectum is chosen according the amplitudes at the frequency of the 47
Appendix D
LF model fitting method The LF model is describes the glottal flow derivative E(t) as being composed of two parts: a sinusoid multiplied by an exponential as the first part E(t) = E0 eαt sin ωg t for 0 < t ≤ te and an exponential return phase for the rest of the cycle (te < t ≤ tc ): E(t) =
−E0 −(t−te ) e − e−(tc −te ) ta
Ei
As mentioned in section 3.3.1, a simplified version of the LF model is used, where the above equation is replaced by E(t) =
−E0 −(t−te ) e ta
t0
The fitting algorithm produces all parameters, given an input signal x(t) and tc and assuming t0 = 0. It starts by computing t1 as the first maximum, tp as the first negative zero-crossing and te as the absolute minimum between t0 and tc . From tp the frequency of the sinusoid, ωg , can be calculated as ωg = π/tp . The sinusoid sin ωg t is computed and the input signal (up to te ) is divided by it. The parameter α is computed by a first order polynomial fitting of x(t)/ sin ωg t. The same way the parameter of the return phase is calculated using polynomial fitting to approximate the logarithm of x(t) for t > te as −t + c. Finally ta is computed as ta =
ti
tp
te
tc
Ee Figure D.1: LF model parameters (simplified). Adapted from [8].
E0 te −c e
48
Appendix E
Tools During this project several tools for handling Lx signals and ODE systems were developed. As they may be of some use to other people, I describe them here briefly.
E.1
come smoother, but in places with strong gradients, i.e. during the opening and closing phases, these strong gradients are preserved. For this project a very simple implementation of this idea was realised: instead of adapting the width of the filter, the filter is simply interpolated between a (relatively wide) Gaussian curve and an impulse. It can also be interpreted as taking the original signal and a lowpass-filtered version of it and cross-fading between the two signals depending on the estimated gradient of the original signal.
Lx denoising
Lx signals are often very noisy. As the Lx signal itself is supposed to be relatively smooth — at least piecewise — it is possible to guess what the waveform should look like ideally without the noise. It should be remarked that this is not usually the case with audio signals. Most often looking at the waveform does not tell very much about the sound and noise can be confused with higher-order harmonics for example. There are algorithms based on anisotropic diffusion that work very well for denoising images. As the Lx signal makes a “sense” visually, applying a similar algorithm was worth a try. A problem that arises when simply filtering an Lx signal with a lowpass filter is that the very sharp rising edge during the closing phase is “smeared out”, attenuating the higher-order harmonics of the signal. In one dimension, anisotropic diffusion can also be seen as filtering a signal with a Gaussian filter that changes its temporal extension (and inversely its bandwidth) depending on the gradient of the signal. If the gradient is strong in some point, the signal is filtered using a narrow Gaussian, not altering the signal too much. On the other hand, if the signal relatively flat, it is filtered with a wider filter, equalising all small perturbations in the flat region of the signal. This is exactly what allows to recover the shape of an Lx signal: where it is flat, it will be-
original lowpass filtered denoised
0
1
2
3 t [sec]
4
5
6 −3
x 10
Figure E.1: Example of the denoising technique compared to lowpass filtering. The denoised signal follows the original signal very well at the beginning of the closing phase, getting the steep rising edge right from the start. But on the flat parts before and after the peak it follows closely the lowpass-filtered, getting rid of the noise. This is implemented in the following Matlab function which takes a noisy signal and three parameters as input. Parameter lpglen is the 49
length of the filter used to create the lowpassfiltered version, while dgdist and dglen control the estimation of the gradient. Decreasing them makes the estimation more localised (better for getting sharp edges), but also more noisy (less efficient noise reduction).
50
E.1.1
LxNoisereduction
function Lxnr=LxNoisereduction(Lx,lpglen,dgdist,dglen) if nargin