Quantifying developmental singing voice changes in ... - CiteSeerX

2 downloads 0 Views 439KB Size Report
either the boys or the girls, but it is clear here too that the improvement for the girls ... order one might hypothesise, then this would be: unison, perfect fifth, major ...
1st International Conference on the Physiology and Acoustics of Singing

Quantifying developmental singing voice changes in children David M Howard Music Technology Research Group, Department of Electronics, University of York, UK [email protected]

Abstract Our quantitative knowledge of human voice production has made rapid advances during the latter part of the 20th century, in respect of singing as well as speech production. These advances have accelerated in more recent times due to the widespread availability of office or home PC machines, which provide the level of computational facilities that were once only found in specialist speech science laboratories. Highly complex real-time audio processing is now possible on a standard multimedia PC machine, and the use of analysis tools as well as real-time visual displays is becoming increasingly widespread. This paper presents an overview of the author’s personal contribution to both the underlying quantitative human singing knowledge base as well as the use of real-time visual feedback for the development of singing pitching and other skills, particularly in regard to the singing voices of children. The paper is therefore set in an historical context, and it provides a review of the research results that have supplied the knowledge that underpins the application of real-time visual feedback in singing training. The various analysis and display systems are discussed along with the results that have been obtained through their use. The initial design stages of a PC-based real-time feedback system that is under development are presented.

Introduction The traditional teaching of singing has origins that can be traced to well before the availability of widespread scientific analysis techniques. It is essentially a qualitative process that makes use of a technique that has been handed down from one generation of practitioners to the next over many decades. The success of this process relies in the main on the ears and perceptual skills of the teacher, who must guide the student singer towards a posture and production technique that enables the student to produce an appropriate sound. This process is supported by the knowledge handed down as well as the personal experience of the teacher, which are manifested in lessons through the use of imagery (e.g. Moorcroft, 2002). This imagery may be in the form of ‘psychological hooks’ or concepts designed to promote the use of postural gestures that are deemed to be appropriate the production of a sung output such as: sing on the point of the yawn, or sing as if there is an orange stuck in the throat, or sing through an imaginary hole in the forehead. Whilst such psychological hooks clearly work for many students in terms of enabling them to produce an appropriate vocal output, most do not describe the physical reality of the voice production process itself. The last few decades have seen a huge surge in technological developments in terms of the computers that are now in widespread use. It is now entirely possible to carry out the kind of quantitative voice analyses, that were once only possible with equipment available in specialist speech science laboratories, on a home or office multi-media PC, and to provide the results as a real-time visual display. For such techniques to find their way into the majority of singing studios, it is essential that there are multi-disciplinary opportunities (of the kind provided by, for example, the British Voice Association, the Care of the Professional Voice, and the Pan European Voice Conference) for singing teachers to explore and understand the facilities that technology can offer in the company of engineers, surgeons, speech therapists, acousticians, directors, actors, as well as other singing teachers,. The computer will, however, never replace the voice teacher. It is suggested that it should rather be viewed as a tool to enhance the process both during lessons and pupil practice. When used in this way, both practice and lesson time will be used more productively and efficiently, leaving additional opportunities for the teacher to explore the musical and performance aspects of singing, such as interpretation, working with different accompaniments, the effect of various acoustics, communication with the conductor and other singers, and stagecraft. The application of real-time visual displays of voice parameters based on those that are found through research to vary as a result of

- page 1 of 16 -

professional voice training could benefit not only those who use their voices professionally, but also those who experience vocal difficulties in everyday life such as teachers, lecturers, politicians, media presenters, journalists, market traders, stockbrokers, tour guides, town criers, carers, health workers, and parents. A real-time visual display will only find useful application in singing lessons if the technology is simple to set up and controllable via a user-friendly environment. Further, the displays themselves must be meaningful to the user. The user must have clear and unambiguous knowledge of what to expect from the display, perhaps through the use of targets, and how to modify her/his vocal output to achieve progress in the task in hand. The display must operate in real-time by providing feedback of one or more parameters that reflect moment-by-moment activities of the vocal system with no noticeable delay (Welch et al., 1989). The parameters chosen for any such display must be selected from those known to vary in a manner that is commensurate with appropriate progress along a developmental singing continuum. Research into the acoustics of speech production has been considerable, particularly following the development of the speech spectrograph during the 1940’s (e.g. Potter et al., 1947), and considerable knowledge concerning the nature of acoustic features of speech has been established (e.g. Fry, 1979; Borden and Harris, 1980; Baken, 1987; Baken and Danilof, 1991; Kent and Read, 1992). During this period, the majority of the knowledge of speech acoustic cues was gained for adults, and in the main, adult males. Analysis of the acoustic cues of adult females and children is not as straightforward because the fundamental frequency (f0) is higher, and there are fewer harmonics available in a given frequency range during voiced speech and sung sounds to ‘illuminate’ the output spectrum. Accurate location of spectral (formant) peaks therefore becomes difficult. It is only more recently that attention has turned towards the measurement of the acoustic output from children. The analysis of f0 is another area which has received attention during the latter three quarters or so of the 20th century, but still there is: “no single device or method for f0 measurement exists that works reliably for any speaker in any acoustic environment” (Howard, 1998). He further notes that the choice of an f0 analysis technique should be made with direct reference to the particular demands of the intended application. Key issues with regard to making analysis techniques more widely available, particularly to those who are non-expert in the techniques themselves, are: data interpretation including use of statistics, appropriate data collection technique, suitable choice and proper setting up of the analysis technique. Erroneous data leading to flawed conclusions can otherwise result (Howard, 1998, 2001). Note f0 (Hz) Note name

Speech

Singing

C#2 69.30 D#2 77.78

Altos

F#2 92.50 G#2 103.8 A#2 116.5

F#3 185.0 G#3 207.7 A#3 233.1

Sopranos

C#3 138.6 D#3 155.6

Men

C#5 554.4 D#5 622.3 F#5 740.0 G#5 830.6 A#5 932.3

Tenors

F#4 370.0 G#4 415.3 A#4 466.2

Basses

C#4 277.2 D#4 311.1

Children

C2 D2 E2 F2 G2 A2 B2 C3 D3 E3 F3 G3 A3 B3 C4 D4 E4 F4 G4 A4 B4 C5 D5 E5 F5 G5 A5 B5 C6

Women

65.41 73.42 82.41 87.31 98.00 110.0 123.5 130.8 146.8 164.8 174.6 196.0 220.0 246.9 261.6 293.7 329.6 349.2 392.0 440.0 493.9 523.3 587.3 659.3 698.5 784.0 880.0 987.8 1046.5

Figure 1: Fundamental frequency (f0) values for 4 octaves of a piano keyboard (middle C marked with a black spot). Approximate f0 ranges used in singing for sopranos, altos, tenors, and basses and the speech of men, women and children are marked, with mean speech f0 values indicated by black horizontal lines. (From Howard, 1998.)

- page 2 of 16 -

Whilst f0 analysis provides essential data for the knowledge base of speech, it is even more vital when considering singing. Here, the pitch1 is set by the musical score, and accuracy in producing that pitch is one of the basic requirements of any singer. A professional singer will have a pitch range that is far wider than that they use in speech. Figure 1 illustrates this against a standard piano keyboard, and equal tempered frequency values are given alongside each note2. It is estimated that approximately 40% of 10 year-old children are unable to sing a piano note in-tune, and that this does not change into adulthood (Welch, 1979). Two important consequences arise from not being able to sing in-tune as follows -- the second of which is only too common with its potentially devastating effect on singing morale and self-esteem for those on the receiving end. • The ability to sing in-tune is a hallmark through which potential singers are identified and encouraged. • An inability to sing in-tune is a hallmark through which folk are told they cannot or even must not sing and perhaps must mouth the words in performance; the damaging consequences can be far-reaching into later life. Real-time visual feedback has the potential to aid the development of skills that can be leaned provided the parameters chosen are known to vary when appropriate progress is being made. This paper describes research data that has demonstrates that appropriateness of chosen parameters, as well as their implementation into real-time display systems. The work stems initially, from laboratory investigations into the design and implementation of an f0 analysis system for use in cochlear implants (Howard and Fourcin, 1983). Here, it was essential to have a reference f0 analysis for comparison purposes, and in this case, the electrolaryngograph (Fourcin and Abberton, 1972), which had been developed in the same laboratory, was employed. Given the vital role played by the voice source in singing and the detailed information on vocal fold vibration that is provided by the electrolaryngograph in addition to f0, data relating to so called “new laryngograms of the singing voice” were investigated (Howard and Lindsey, 1987). This started a process of data collection that now includes adult male and female trained and untrained singers as well as almost 15% of all the girl and boy cathedral choristers in the UK. A description of the analysis procedures and a summary of parameter variation in these data with singing training and experience are presented along with an account of current and future application in real-time visual displays for voice training.

The SINGAD system The SINGAD (SINGing Assessment and Development) system was developed to provide real-time visual feedback of voice fundamental frequency (f0) against time to enable voice pitching development as well as a quantitative assessment of voice pitching accuracy (Howard et al., 1987; Howard, 2000). It originated as a result of a meeting of minds (Howard and Welch) in the mid 1980’s, when the former had just completed his PhD developing a real-time f0 analysis system for cochlear implantees (Howard, 1985), and the latter had established that some 40% of 10 year-olds are unable to sing a note in-tune against a piano reference (Welch, 1979) and was keen to trial a real-time visual display of f0 with those younger than 10. The outcome was a sketch for the first version of SINGAD which was implemented on the Acorn BBC range of microcomputers which were in widespread use in UK schools at the time (Howard and Welch, 1989). The BBC microcomputer had a number of externally available ports to which peripheral equipment could be connected. It also incorporated the capability of synthesizing tones across a wide pitch range via its internal loudspeaker, enabling notes to be played for assessment purposes. The f0 of the singing voice was measured by means of the real-time peak-picker that was originally designed for use with a single electrode cochlear implant (Howard and Fourcin, 1983; Howard, 1989). The peak-picking device operates in the time domain and it was considered to be particularly suitable for this application as it incorporates no output smoothing and therefore it will give an f0 output for the creaky, breathy or harsh voice qualities often produced by young children who are shy, less confident, often demonstrating a degree of reluctance to perform alone vocally when requested to do so. One disadvantage arising from the lack of output smoothing is the inherent 1

Pitch is based on a subjective judgement made by a human listener on a scale from low to high, whereas f0 is an objective measurement. Whilst “pitch” and “f0” are often used interchangeably, it should be noted that small changes in pitch, albeit small but sufficient to alter musical intonation for example, can be perceived when f0 is kept constant and the loudness or timbre are changed (e.g. Howard and Angus, 2000). 2 The equal tempered tuning system is that in common use today by international convention. It splits the octave into 12 equal steps (semitones) and each semitone can be further split into 100 equal steps (cents). Howard and Angus (2000) provide the relevant mathematical formulae as well as a discussion of unequal tempered tuning systems.

- page 3 of 16 -

increased sensitivity to background acoustic sounds such as banging doors and the speech and shouting of others, but this can normally overcome by keeping the microphone close to the subject’s lips. The SINGAD system has two phases of operation: singing assessment and singing development. During the assessment phase, notes are played to the subject who then sings them back and the f0 of the sung response is compared with that of the note(s) played. During the development phase a real-time trace of f0 against time is plotted, and pictures can be placed on the screen to provide targets for the user.

Figure 2: Hardware block diagram of the Atari computer version of SINGAD. The BBC version of SINGAD exhibited three major limitations which were considered to be impairing to some degree its overall operation, and opportunities were sought to port SINGAD to another machine: 1. stimulus spectral definition - stimuli could only be played via the small internal loudspeaker of the BBC computer which sometimes lead to a number of ambiguous pitched responses (most often octave errors) even from experienced singers. 2. musically useful stimuli – memory limitations (only 32kBytes of RAM were available for program and data storage) enabled only randomly ordered single pitches to be used as assessment stimuli which have little “musical” usefulness. 3. assessment storage limitation – memory limitations restricted the storage of sung f0 data to just the mean f0 of 255 sung cycles for the sung response to each stimulus; the raw f0 data could not be stored for subsequent analysis. The Atari computer, to which SINGAD was ported in 1992, offered a number of significant advantages over the BBC microcomputer (Howard and Welch, 1993). With respect to the three specific limitations above, the Atari version provided the following enhancements: 1. stimulus spectral definition - any music synthesizer equipped with a musical instrument digital interface (MIDI) could be used to play the musical stimuli, since the Atari was fitted with MIDI ports as standard, giving the user to complete control over the spectral nature of the stimuli. 2. musically useful stimuli – there were no memory limitations in practice, and musically meaningful melodic fragments were implemented consisting of randomly ordered one, three or five note in a major, minor or pentatonic context. 3. assessment storage limitation – no memory limitation meant that all f0 data sung by a subject in response to the assessment stimuli could be stored on diskette for later extensive f0 analysis. A block diagram of the Atari ST version of SINGAD is shown in figure 2. Assessment stimuli were played via MIDI using an external music synthesizer, amplifier and loudspeaker. The voice f0 of for the assessment and development phases was captured by means of a then commercially available pitch-to-MIDI converter (Roland CP-40). Figure 3 presents an example SINGAD development display from the BBC microcomputer version where there are three picture targets and a sung f0 trace which has successfully hit all three. The time taken for the f0 tract to cross the screen from left to right is 4 seconds. This particular set of trials is one of a large number as detailed in Howard and Welch (1993). Figure 4 shows the Atari SINGAD assessment screen which is related to the stimuli set shown at the

- page 4 of 16 -

foot of the figure. The piano roll display at the top indicates the order (randomised) of the trials (in this case there were 15 in total). The sung f0 response is shown in the main part of the screen between each pair of the six vertical cursors. The horizontal lines indicate the frequency positions of the notes played as stimuli, and it can be seen that this subject is singing with a fair degree of pitch accuracy. Summary statistics are presented on the screen for the f0 values between each pair of cursors which are set by hand.

Figure 3: Example BBC microcomputer SINGAD development screen with an f0 trace and three picture targets.

Figure 4: Example Atari SINGAD assessment screen for the 4th trial of the set illustrated at the foot of the figure. The SINGAD system has been used in a number of UK primary schools. Its usefulness in improving pitching skills in the classroom was initially established in a Bristol Primary School in a 6week experiment with 30 children (Welch et al., 1988). All were assessed using SINGAD at the start and end, and during the experiment, the pupils were split into three groups: 1. ten used the SINGAD development software regularly in pairs with a teacher 2. ten used the SINGAD development software in pairs without a teacher 3. ten formed the control group who did not use the SINGAD development software.

- page 5 of 16 -

A statistically significant difference was observed in pitching abilities between the initial and final SINGAD assessments for groups 1 and 2 but not for group 3 (Welch et al., 1989). In terms of the practical usefulness of SINGAD within a school situation, the fact that group 2 exhibited such a change was particularly important, since it supports the possibility of such systems being used in general everyday teaching practice; a situation where extra staff would not be available. The observed development of singing pitching skills in primary school children from SINGAD assessment data lends support to the notion of there being a developmental continuum of singing ability with extremes from sustained ‘singing-like’ sounds typical of seven month old infants to the musical performance of a culture’s traditional ‘high art’ vocal music (Welch, 1985, 1986; Welch et al., 1991). In a later experiment designed to chart the pitching abilities of primary school children across the school years, 177 children from years 3-6 (aged 8-11 years) inclusive from a York primary school took part in a SINGAD assessment during November 1993 (Howard et al., 1994; Angus et al., 1996). The stimuli used for the assessment were the five three-note trials shown at the foot of figure 4. The choice of three note trials in B flat major was based on consideration of the following: • pitch range -Bb3 to f4 was chosen to be a comfortable range for all subjects to sing and stimuli are centred on the mean f0 for children (see figure 1) • tonality - a major tonality was likely to be the most familiar to all subjects • musically meaningful note patterns - patterns consisting of the tonic, mediant and dominant (notes 1, 3, and 5 of the scale) were used given their key salience in the tonic triad In order to gain sufficient data to be representative for each subject, each trial was played 3 times and the 15 (3*5) trials were randomly ordered. The notes themselves were played at a rate of 1 per second within each trial, with each note lasting for 800ms and an inter-note time of 200ms. Each subject therefore responded to a total of 45 notes. A total of 7,965 notes (45*177) was gathered for the experiment as a whole. The recorded SINGAD assessment results were analysed by hand by placing cursors at the start and end of each of the three notes of each stimulus (see the example assessment screen in figure 4) giving a total of 15,930 (177*45*2) cursor placements! Cursor placement is entirely a matter of experimenter judgement given effects such as vibrato, very short notes, and legato singing, and for this experiment, the same researcher set all cursors to ensure consistency of placement. Once cursors have been set, the following statistics are available for the f0 data between each pair of cursors: maximum f0, minimum f0, f0 mean, f0 standard deviation, the time between the cursors, and the average difference between the mean sung f0 values and the known (and tested: Howard and Welch, 1993) f0 values of the stimuli played.

Figure 5: 25% and 50% centile pitching errors by year and gender from the November 1993 assessment. (From Howard et al., 1994.) The data from this experiment is summarised in figure 5 (detailed data is available in Howard et al., 1994; Angus et al., 1996) as 25% and 50% centile analyses plotted with respect to gender. There is a distinct gender difference is visible in the 50% centile plot, indicating that half of the girls make a marked improvement in pitching skills between years four and six; no such change for half of the boys in those years. The 25% centile data indicate that there is pitching skill improvement for a quarter of either the boys or the girls, but it is clear here too that the improvement for the girls is to a greater

- page 6 of 16 -

degree. In terms of singing accuracy then, half the girls are pitch accurate to within one semitone by year six, whereas the same proportion of boys are well over two semitones adrift that year. Taken together, these two centile plots suggest that the girls are clearly developing pitching accuracy earlier than the boys within this age range in this particular school. By years 5 and 6, a quarter of the girls have achieved mean pitching accuracy of approximately 30 cents. In order to place this result in a proper context, the accuracy of professionally trained adult singers (5 female and 5 male) was assessed using SINGAD (Howard and Angus, 1998) to provide a measure of what accuracy one might expect from fully trained professional singers. The mean accuracy for this adult group was 20 cents. The range was between 14 and 29 cents indicating that the 30 cents mean accuracy achieved by a quarter of the girls in years 5 and 6 is very close to being pitch accurate with respect to professional adult singers.

Trial 1 2 3 4 5

CHILDREN SINGAD ASSESSMENT Stimuli 1st interval 2nd interval (scale note) (cents) (cents) I III V major third up minor third up 400 up 300 up V III I minor third down major third down 300 down 400 down IVI fifth up fifth down 700 up 700 down VIV fifth down fifth up 700 down 700 up III III III unison unison 0 0

Table 1: Stimuli as degrees of the scale and relative equal tempered intervals between them in cents in each trial used with the children. In order to try and understand the nature of pitching strategy being adopted by subjects, an analysis of the relative pitching accuracy of each of the musical intervals contained within the stimuli was carried out (Howard and Angus, 1998). Table 1 indicates the musical intervals that are contained within the five trials used in the SINGAD assessment which can be seen at the foot of figure 4. The notes concerned are indicated by the degree of the scale where they lie (I, III or V), and the musical intervals by name, number of cents on the equal tempered scale and direction. Figure 6 shows the relative pitching accuracy in cents by year and by sex for each of the musical intervals given in table 1. Each point plotted is the median (50% centile) value to enable direct comparison with the left plot in figure 5 where the difference between the girls and boys is most marked. Measuring relative pitching accuracy gives an indication of how well musical intervals were sung, whereas absolute pitching accuracy indicates how accurately each note was pitched with respect to the target. One reason why it is interesting to look at these two data sets separately is that if a subject sang all the musical intervals accurately but in the wrong key, then they would exhibit excellent relative pitching accuracy but poor absolute pitching accuracy. Overall in these data, relative interval pitching accuracy appears to be developed more rapidly by girls than by boys, following the trend exhibited in the absolute note pitching results (see figure 6). The unison is always pitched accurately for all subjects. As the interval becomes wider (unison, minor third, major third, fifth), it is generally pitched less accurately by both sexes. Pitching accuracy for the thirds is more accurate than that for the fifths apart from a cross-over in year 5 for the boys. The minor third is generally more accurately pitched than the major third, apart from year 5 of the boys and in some cases where the error is close to zero. If these intervals were ordered in terms of their musical consonance and dissonance (e.g. Howard and Angus, 2000), which could be an accuracy order one might hypothesise, then this would be: unison, perfect fifth, major third, and minor third. From these data, it is clear that a relative pitching accuracy based on consonance and dissonance is not appropriate here. It is also noteworthy across the board for these data that all ascending intervals tend to be pitched too flat, and all descending intervals too sharp. In addition, the average pitching accuracy for any of the three musical intervals considered separately is worse when descending than when ascending.

- page 7 of 16 -

Figure 6: Relative pitching errors (cents) for boys and girls by year. The results of this experiment are summarised along with a those from a similar experiment for trained adult singers by Howard and Angus (1998) as follows. Children ... • pitching accuracy improves with age, girls pitching accurately earlier than boys • twice the number of children are accurate to within a semitone in year 6 than in year 3 • the first note of each trial is the least pitch accurate • ‘perfect cadence’ trial more accurate than the ‘imperfect cadence’ trial • ‘dominant’ start tends to be less accurate than ‘tonic’ start • the wider the target musical interval, the less accurately it is pitched • all ascending intervals are pitched too narrow • all descending intervals are pitched too large • intervals are generally pitched more accurately when ascending Girls ... • years 5 and 6 are much more accurate than years 3 and 4 • are improving on the harder trials (4 & 5) having mastered the others Boys ... • half of each year are inaccurate • are improving on the easier trials (1 & 2) but not the others Adults ... • average absolute pitching accuracy is 20 cents • interval accuracy better than pitch just noticeable difference except for the major third • all intervals are approximately equally pitch accurate

Larynx Closed Quotient The analysis of larynx closed quotient (CQ) is based on the use of the electrolaryngograph (Fourcin and Abberton, 1971; Abberton et al, 1989), which enables vocal fold contact area to be monitored non-invasively. Larynx CQ is defined as the percentage of each cycle for which the vocal folds are in contact. This provides a basis for the quantification of cycle-by-cycle changes during vocal fold vibration during voiced speech sounds and sung notes. Two electrodes are placed superficially on either side of the neck of the subject at the level of the larynx and held in place with an elastic neckband. A constant amplitude high frequency voltage is applied enabling the electrolaryngograph to monitor the electrical impedance between the electrodes as the vocal folds vibrate by detecting the

- page 8 of 16 -

current flowing between them. The electrolaryngograph output waveform (Lx) therefore represents current flow between the electrodes, and this will be greater as the vocal fold contact area increases and less as they part (as this is illustrated in figure 7). This interpretation has been confirmed by synchronous observation of the Lx waveform alongside other techniques, for example: high speed larynx photography (Fourcin, 1974; Baer et al., 1983; Gilbert et al., 1984), an adapted high voltage Xflash imaging system (Noscoe et al., 1983; Fourcin, 1987), and computer simulated Lx waveshapes based on models of vocal fold vibration during phonation (Titze, 1984; Titze et al., 1984; Childers et al., 1986). It should be noted that the electroglottograph (EGG) output waveform, which is equivalent to Lx in terms of its experimental derivation, is usually plotted as the inverse. Comprehensive reviews of electrolaryngograph (electroglottograph) operation can be found in Baken (1987) and Childers and Krishnamurthy (1985).

+δ −δ



CP1

OP1

vocal fold contact area

Max amp 4 3 31 CP3 CP2

OP3

Min amp

time

OP2 OP4

CP4

Tx Key to figure 7 maximum positive peak in differential of Lx +δ maximum negative peak in differential of Lx −δ Tx fundamental period of Lx (f0 = 1/Tx) CP1/OP1 Method (a) of Davies et al., (1986) -- CP end at −δ CP2/OP2 Method (b) of Davies et al., (1986) -- CP end where negative going Lx crosses 3/7 of that cycle’s peak-to-peak amplitude CP3/OP3 Method (c) of Davies et al., (1986) -- CP end where negative going Lx crosses same amplitude level used for start of CP CP4/OP4 Method of Orlikoff (1991) -- CP end where negative going Lx crosses 1/4 of that cycle’s peak-to-peak amplitude

Figure 7: Example Lx waveform showing the basis for the measurement of Tx and four methods for the measurement of CP and OP from the electrolaryngograph output waveform. (From Howard, 1995.) Method (a) of the three proposed by Davies et al. (1986) was found by Howard (1995) to provide the smoothest CP variation with time. In particular, he noted that whilst all three methods gave similar patterns of CP change with time, methods (a) and (b) also give similar CP values, whereas method (c) gives CP values which are considerably lower. CQ is given by: (CP/Tx)*100)%. The use of an amplitude ratio in method (a) has the additional advantage that it can be adjusted to optimise the CQ measurement against other techniques, which is how the 3:7 ratio was originally established. CQ values gained using the 3:7 method were shown by Howard et al., (1990) to correlate well with CQ data gained by hand annotation of the output glottal flow waveform output from an automatic speech pressure waveform inverse filtering technique. In practice, if the only requirement is to look for trends in CQ, then the actual ratio used is not important provided that it is kept constant. Howard and Lindsey (1987) established their “new laryngograms” in relation to CQ measurements, and found that for four male singers, there was a tendency for CQ to be higher with

- page 9 of 16 -

increased singing training/experience. Howard et al. (1990) and found a highly statistically significant increase in the measured CQ with increasing numbers of years singing training/experience for a group of eighteen male singers. These measures were based on a two octave ascending and descending G major scale. This increase in CQ is essentially constant with f0, and it was further noted that trained and experienced adult male singers exhibit CQ values when singing that are higher than their habitual speaking CQ, whereas untrained adult males display CQ values when singing that are lower than their customary speech values. This trend for adult male singers is summarised in the upper part of figure 8.

Figure 8: Idealised variation in CQ with log(f0) with singing training/experience for adults and children. Each shape summarises the overall nature of an average scatter plot of CQ against log(f0) for a singer at that position along a singing training/experience continuum. The group numbers are described in the text. {Adult male data from Howard et al., (1990), adult female data from Howard (1995), and children’s data from Barlow (1999).} A later study (Howard, 1995) revealed the nature of CQ variation with f0 for adult female trained and untrained singers, and these are summarised illustratively in figure 8. Overall, the CQ:log(f0) gradients for the adult females increase with singing training/experience. All the gradients tend to be positive in the lower octave and those for untrained subjects tend to be negative in the higher octave, with turning point occurs around G#4. Overall, CQ tends to be reduced for pitches below D4 and increased for pitches higher than B4 with training, and the CQ/F0 gradient within the pitch ranges: G3 to G#4, and B4 to G5 tends to correlate positively with the number of years singing training/experience. More recent work has extended this study to include children (Howard et al., 2001a). The trained group consists of girl and boy choristers who sing the daily offices in English cathedrals. One particular public issue in this regard has been whether or not listeners can tell the difference between boys and girls singing in cathedral choral music. Since the study concerned quantifying the voice production of choristers, a perceptual study was also undertaken concerning listener perception of the differences between these voices. Howard and Szymanski (2000) found that listeners could tell the difference with statistical significance and a mean accuracy of 60%. However, this experiment made use of a variety of musical material (there was no choice since professional recordings were employed of Wells Cathedral choir singing in the same acoustic, with the same group of counter tenors, tenors,

- page 10 of 16 -

basses and Director of Music that only differed in who was singing the top line – boys or girls – and no choir would be able to market the same repertoire twice!), and some pieces were more accurately identified than others. A further experiment that involved experimental recordings of the choir during Evensong with girls and with boys singing the top line the same musical repertoire (the Responses, which are constant throughout each week), produced an average identification accuracy of 53.01% and no statistically significant difference from chance (Howard et al., 2002a&b). Of particular interest in terms of the voice analysis work is the fact that boys and girls can produce a choral sung sound that is indistinguishable given appropriate circumstances, and yet their underlying vocal physiologies are different. For the voice production experiment, a two octave ascending and descending G major scale was again used for the CQ measurements. The results are summarised in the lower part of figure 8. The children were divided into groups by age as follows: 1. 11-13 years: pubescent, most either in pre-puberty or in its early stages 2. 14-15 years: post-pubescent, females in mid stages of puberty, males singing as baritone/tenor 3. 16-17 years: maturational, main stages of puberty over but bodies/voices still developing 4. 18-21 years: young adult, voices maturing and settling to adult pitch range. There were 11, 14, 10, and 10 female subjects in the groups 1 to 4 respectively and the results are summarised in the plot marked girls in figure 8. The majority in group 1 exhibited a downward change in CQ with f0, whilst most of the subjects in group 2 exhibited a flat plot with some still having a downward plot and some having an upturned ‘V’ plot. The dominant shape for group 3 is the upturned ‘V’, but 2 more experienced singers exhibited an ‘up-down-up’ plot, and one very experienced singer (the right-hand plot) exhibits a rising plot. Subjects in group 4 mainly exhibited the ‘up-down-up’ plot, except for 2 non-singers who had downward plots. For boys the available group sizes were more modest due to subject availability (9 in group 1, 5 in group 2 and 1 in group 4). All boys in group 1 were still singing as trebles and the dominant plot was flat. There were, however, 3 exceptions: 1 ‘up-down-up’ plot and 2 rising plots for the only 2 trained choristers in that group. In group 2 where all subjects were singing an octave down, 2 boys exhibited ‘up-down-up’ plots and 3 rising plots. The 1 subject in group 4 (19 year-old ) exhibited a rising plot.

- page 11 of 16 -

Figure 9: Larynx closed quotient (CQ) plotted against 3 note interval groups from the ascending two octave G major scale: 1: G3-B3; 2: C4-E4; 3: F#4-A4; 4: B4-D5. (From Howard and Welch, 2002.) Howard et al. (1990) suggested that an increase in CQ with singing training/experience could be interpreted in terms of a more physically efficient voice usage as follows: (a) the time for which an acoustic path to the lungs via an open glottis exists is reduced, resulting in a reduction in the total acoustic energy transmitted to the essentially anechoic environment of the lungs (sub-glottal damping) where it would be lost to the listener, (b) less stored lung air is vented in each cycle due to the decreased open phase, thereby improving the efficiency of power source usage and enabling notes to be held for a longer time, and (c) the perceived voice quality is less breathy. In an attempt to understand the nature of adolescent voice changes in a professional choral context in more detail and to provide a basis for the collection of longitudinal data, visits have been made to Wells Cathedral in the UK annually since 1998 to record individual and group voices (Welch and Howard, 2002). Wells Cathedral introduced female choristers in 1994, and they generally sing separately from the boys to provide the top line for approximately half of the sung services. In the early years, choristers for the girls’ choir tended to be recruited from the last year of the Junior School (year 6, aged 11) or the first year of the Senior School (year 7, aged 12). Female choristers normally remain in the choir until they are fourteen (the same as the boys), but this has recently been changed to allow the possibility of remaining up to age sixteen. Three girl choristers have been recorded each year as they progressed from ages 12 to 14 (years 7 to 9) inclusive, thus making suitable longitudinal data available (Howard and Welch, 2002). Two octave major scales were recorded on each occasion, and the data analysis for each participant for every year focused on CQ/f0 scattergram plots, and for the purpose of interpretation, these were summarised by grouping the notes sung into threes (G3-B3, C4-E4, F#4-A4, B4-D5) for the ascending scale to produce the plots shown in figure 9. Howard and Welch listed the following individual differences based on these plots.

- page 12 of 16 -



• •

Chorister One - In the first year of data collection, the CQ values fall with ascending pitch. However, there is a clear register shift evident in Year 2 and by Year 3 the effects of education are evident in generally higher CQ values, although the descending pattern is still evident. Chorister Two - There is very little change in the descending CQ with rising pitch pattern across the three years, although there is some evidence of greater stability in the upper octave in Year 3. Chorister Three - This singer has a clear register shift in Year 1, but this is less marked and there is greater evenness of vocal fold closure across pitches by Year 3.

Subject 1 2 3

Acoustic amplitude (dB) Rec. 1 Rec. 2 Difference 26 28 2 19 29 10 18 19 1

90% CQ ranges (%) Rec. 1 Rec. 2 31.48-48.81 35.27-55.68 11.63-36.70 27.29-40.93 22.82-51.71 30.37-55.94

Table 2: Acoustic amplitude ranges and their differences and 90% larynx closed quotient ranges for the three girl choristers singing the carol at two recording sessions one year apart. (From Howard and Welch, 2002.) The choristers each recorded a carol (This is the truth sent from above) for which CQ data was analysed (only two recordings were available since the carol was not recorded during the first visit). Table 2 shows the 90% ranges from the CQ histogram for each carol (this is the range where the histogram is truncated such that 90% of the data remains and the lowest 10% is discarded). For all three choristers, there is a higher upper boundary for CQ in each successive recording session. The dynamic range of the carol recordings was also measured by plotting a histogram of the amplitude (dB) values employed3. Table 2 shows the acoustic amplitude ranges for the two recordings of the carol as well as their differences, and in all cases, the range has increased albeit by a small amount for two of the subjects, indicating that an increased overall dynamic range is being employed, particularly by Chorister Two.

Conclusions and The Future The SINGAD system has been described and it has been shown to be effective for the assessment and development of voice pitching skills in primary school children within a classroom situation. In particular, it has demonstrated that such development can take place without the intervention of a teacher. It has also enabled the relative interval pitching accuracy to be investigated, and a number of general conclusions have been drawn. The “new laryngograms of the singing voice” described by Howard and Lindsey (1987) were the initial attempt to make sense of CQ as measured by the electrolaryngograph for trained and untrained singers. Since then, CQ changes have been studied for adults and children and in all cases, a patterned variation of CQ with f0 has emerged. These patterned changes are also being observed in longitudinal experiments, thereby suggesting that it is reasonable to expect them to be exhibited by an individual developing singer. Some similarities as well as some differences have been noted for girls when compared to boys, and yet it has been also demonstrated that under certain circumstances for choral singing, listeners cannot tell the difference between them. Given the ever increasing processing capabilities of desktop home and office multi-media PC machines, it is extremely likely that in the future, computer-based voice analysis systems will be increasingly commonplace. This will not be because everyone suddenly aspires to be a professional singer or actor, it is much more likely because systems will be produced that focus on healthy voice production skills to support those who experience vocal difficulties in everyday life such as teachers, lecturers, politicians, media presenters, journalists, market traders, stockbrokers, tour guides, town criers, carers, health workers, and parents. It is for these reasons that the SINGAD system is now being ported to a Windows PC platform along with CQ and spectral analyses to provide visual feedback (e.g. Thorpe et al., 1999). The microphone-based f0 analysis is carried out by means of a software implementation (Howard, 1986) of the analogue peak-picker originally developed for cochlear implantees (Howard, 1989). However, when the electrolaryngograph is used, it will be able to provide the basis for f0 as well as CQ. 3

Note that these were not normalised for absolute sound pressure level because it is the range that is of interest rather than the specific level employed.

- page 13 of 16 -

Longitudinal experience of real-time displays has already been successfully gained using the ALBERT system (Rossiter and Howard, 1994) with adult singers (Rossiter et al., 1996; Rossiter and Howard, 1998; Howard and Rossiter, 1992) and actors (Rossiter et al., 1995). Such parameters have been shown to provide evidence for differences in: voice style (Howard, 1992); belt as compared to operatic voice qualities (Evans and Howard, 1993; Estill, 1988); the output from professional singers (Sundberg, 1987; Lindsey and Howard, 1989; Howard, 1999); and sub-registers within the male falsetto voice (Howard et al., 2000b). The analysis of f0 and CQ as well as an oscilloscope display of the Lx waveform is considered to be very useful in voice therapy, and a hand-held unit has been trialed in a clinical setting (Garner and Howard, 1999; Batty et al., 2000). Such a device, engineered as a small, convenient and fun-to-use unit, could provide the motivation for users to take responsibility for their own healthy voice development and protection.

Acknowledgements The author would like to record his thanks to the pupils and staff of Kingsway Junior School in York for taking part in the SINGAD experiment, as well as to the choristers, Director of Music and Staff at Wells Cathedral School for taking part in recordings. In addition, he particularly acknowledges the help and professional guidance he has received from his research and professional colleagues over the years.

References Abberton, E.R.M., Howard, D.M. & Fourcin, A.J. (1989). Laryngographic assessment of normal voice: A tutorial, Clinical Linguistics and Phonetics, 3, 281-296. Angus, J.A.S., Howard, D.M., and Welch, G.F. (1996). Singing pitching accuracy in children aged 7 to 11, Paper presented at the 100th Convention of the Audio Engineering Society, preprint-4152, 1-12. Baer, T., Löfquist, A., and McGarr, N.S. (1983). Laryngeal vibrations: A comparison between highspeed filming and glottographic techniques, Journal of the Acoustic Society of America, 73, 1304-1308. Baken, R.J. (1987). Clinical measurement of speech and voice, London: Taylor and Francis. Baken, R.J., and Danilof, R.G. (1991). Readings in clinical spectrography of speech, San Diego: Singular Publishing Group Inc. Barlow, C. (1999). The Development of the voice from Child to Adult, Unpublished MSc in Music Technology Thesis, University of York, UK. Batty, S.V., Garner, P.E., Howard D.M., Turner, P., and White, A.D. (2000). The development of a portable real-time display of voice source characteristics, Proceedings of the 26th Euromicro Conference, Maastricht, 2, 419-422. Borden, G.J., and Harris, K.S. (1980). Speech science primer, Baltimore: Williams and Wilkins. Childers, D.G., Hicks, D.M., Moore, G.P., and Alsaka, Y.A. (1986). A model for vocal fold vibratory motion, contact area, and the electrolaryngogram, Journal of the Acoustic Society of America, 80, 1309-1320. Childers, D.G., and Krishnamurthy, A.K. (1985). A critical review of electroglottography, CRC Critical Review of Biomedical Engineering, 12, 131-161. Davies, P., Lindsey, G.A., Fuller, H., and Fourcin, A.J. (1986). Variation in glottal open and closed phase for speakers of English, Proceedings of the Institute of Acoustics, 8, 539-546. Estill, J. (1988). Belting and classic voice quality: some physiological differences, Medical Problems of Performing Artists, Philadelphia: Hanley and Belfus, 37-43. Evans, M., and Howard, D.M. (1993). Larynx closed quotient in female belt and opera qualities: A case study, Voice, 2, (1), 7-14. Garner, P.E., and Howard, D.M. (1999). Real-time display of voice source characteristics, Logopedics Phoniatrics Vocology, 24, 19-25. Gilbert, H.R., Potter, C.R., and Hoodin, R. (1984). The laryngograph as a measure of vocal fold contact area, Journal of Speech and Hearing Research, 27, 178-182. Fourcin, A.J. (1974). Laryngographic assessment of vocal fold vibration, In: Wyke, B. (Ed.) “Ventilatory and phonatory control systems”, Oxford: Oxford University Press. Fourcin, A.J. (1987). Electrolaryngographic assessment of phonatory function, Journal of Phonetics, 14, 435-442. Fourcin, A.J., and Abberton, E.R.M. (1971). First applications of a new laryngograph, Medical and Biological Illustrated, 21, 172-182 Fry, D.B. (1979). The Physics of Speech, Cambridge: Cambridge University Press.

- page 14 of 16 -

Garner, P.E., and Howard, D.M. (1999). Real-time display of voice source characteristics, Logopedics Phoniatrics Vocology, 24, 19-25. Howard, D.M. (1985). Larynx frequency speech processing for cochlear stimulation, PhD Thesis, University of London. Howard, D.M. (1986). Digital peak-picking fundamental frequency estimation, In: Howard, D.M., and Fourcin A.J. (Eds.) Speech Hearing and Language - Work in Progress, London: UCL, 2, 151-164. Howard, D.M. (1989). Peak-picking fundamental period estimation for hearing prostheses, Journal of the Acoustical Society of America., 86, (3), 902-910. Howard, D.M. (1992). Quantifiable aspects of different singing styles - A case study, Voice, 1, (1), 47-62. Howard, D.M. (1995). Variation of Electrolaryngographically derived closed quotient for trained and untrained adult female singers, Journal of Voice, 9, (2), 163-172. Howard, D.M. (1998). Practical voice measurement, In: The voice clinic handbook, Harris, T., Harris, S., Rubin, J.S., and Howard, D.M. (Eds.), London: Whurr Publishing Company, 323-380. Howard, D.M. (1999). The human singing voice, Proceedings of the Royal Institution of Great Britain, 70, 113-134. Howard, D.M. (2000). SINGAD: A visual feedback system for children’s voice pitch development, In: Child voice White, P (Ed.), Stockholm: KTH Voice Research Centre, 45-62. Howard, D.M. (2001). The real and non-real in speech measurements, Proceedings of the 2nd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, Firenze, 13-15 September, CD-ROM proceedings. Howard, D.M., and Fourcin, A.J. (1983). Instantaneous voice period measurement for cochlear stimulation, Electronics Letters, 19, (19), 776-779. Howard, D.M., and Lindsey, G.A. (1987). New laryngograms of the singing voice, Proceedings of the 11th International Congress of Phonetic Sciences, USSR: Tallinn, 5, 166-169. Howard, D.M., Welch, G.F., Gibbon, R.R., and Bootle, C.M. (1987). The assessment and development of singing ability- Initial results with a new system, Proceedings of the Institute of Acoustics, 9, (3), 159-166. Howard, D.M., and Welch, G.F. (1989). Microcomputer-based singing ability assessment and development, Applied Acoustics, 27, (2), 89-102. Howard, D.M., Lindsey, G.A. & Allen, B. (1990). Toward the quantification of vocal efficiency, Journal of Voice, 4, (3), 205-121. [See also Errata (1991). Journal of Voice, 5, 93-95.] Howard, D.M. & Rossiter, D. (1992). Results from a pilot longitudinal study of electrolaryngographically derived closed quotient for adult male singers in training, Proceedings of the Institute of Acoustics, 14, 529-536 Howard, D.M., and Welch, G.F. (1993). Visual displays for the assessment of vocal pitch matching development, Applied Acoustics, 39, (3), 235-252. Howard, D.M., Angus, J.A.S., and Welch, G.F. (1994). Singing pitching accuracy from years 3 to 6 in a primary school, Proceedings of the Institute of Acoustics, 16, (5), 223-230. Howard, D.M., and Angus, J.A.S. (1998). A comparison between singing pitching strategies of 8 to 11 year olds and trained adult singers, Logopedics Phoniatrics Vocology, 22, (4), 169-176. Howard, D.M., and Angus, J.A.S. (2000). Acoustics and Psychoacoustics, 2nd Ed., Oxford: Focal Press. Howard, D.M., and Szymanski, J.E. (2000) Listener perception of girls and boys in an English Cathedral choir, Proceedings of the 6th International Conference on Music Perception and Cognition, Session S-2, 1-6. Howard, D.M., Barlow, C., and Welch, G.F. (2001a). Vocal production and listener perception of trained girls and boys in the English cathedral choir, Bulletin of the Council for Research in Music Education, 147, 81-86. Howard, D.M., Welch, G.F., and Penrose, T. (2001b). Case study acoustic and voice source evidence for the existence of sub-registers in the countertenor voice, Proceedings of the 3rd AsiaPacific Symposium on Music Education Research and International Symposium on Falsetto and Gender, Murao, T., Minami, Y., and Shinzanoh, M. (Eds.), APSMER3, Aichi University of Education, 127-131. Howard, D.M., and Welch, G.F. (2002). Female chorister voice development: A longitudinal study at Wells UK, In: A world of music education research: Proceedings of the 19th ISME Research Seminar, Gothenberg, 3-9th August, 2002, Welch, G.F., and Folkestad, G. (Eds.), 123-131. Howard, D.M., Welch, G.F., and Szymanski, J.E. (2002a). Can listeners tell the difference between boys and girls singing the top line in cathedral music?, In: Proceedings of the 7th International

- page 15 of 16 -

Conference on Music Perception and Cognition, Stevens, C., Burnham, D., McPherson, G., Schubert, E., and Renwick, J. (Eds.), Adelaide: Casual Publications, 403-406. Howard, D.M., Welch, G.F., and Szymanski, J.E. (2002b). Listener perception of English cathedral girl and boy choristers, Music Perception, In Press. Kent, R.D., and Read, C. (1992). The acoustics of speech, San Diego: Singular Publishing Group Inc. Lindsey, G.A., and Howard, D.M. (1989). Spectral features of renowned tenors in CD recordings, Proceedings of Speech Research-89, Budapest, 17-20. Moorcroft, L. (2002). Embracing alternative methodologies: Science and imagery in the teaching and performance of singing, In: Proceedings of the 7th International Conference on Music Perception and Cognition, Stevens, C., Burnham, D., McPherson, G., Schubert, E., and Renwick, J. (Eds.), Adelaide: Casual Publications, 561-654. Noscoe, N.J., Fourcin, A.J., Brown, M.A. and Berry, R.J. (1983). Examination of vocal fold movement by ultra-short pulse X radiography, British Journal of Radiology, 56, 641-645. Orlikoff, R.F. (1991). Assessment of the dynamics of vocal fold contact from the electroglottogram: Data from normal male subjects, Journal of Speech and Hearing Research, 34, 1066-1072 Potter, R. K., Kopp, G.A., and Green, H. (1947). Visible speech, New York: Van Nostrand. Rossiter, D., and Howard, D.M. (1994). ALBERT: A system for interactive analysis and display of voice source and acoustic parameters, Proceedings of the Institute of Acoustics, 16, (5), 301-308. Rossiter, D., Howard, D.M., and Comins, R. (1995). Objective measurement of voice source and acoustic output change with a short period of vocal tuition, Voice, 4, (1), 16-31. Rossiter, D.P., Howard, D.M. and De Costa, M. (1996). Voice development under training with and without the influence of real-time visually presented biofeedback, Journal of the Acoustical Society of America, 99, (5), 3253-3256. Rossiter, D.P., and Howard, D.M. (1998). Observed change in mean speaking voice fundamental frequency of two subjects undergoing voice training, Logopedics Phoniatrics Vocology, 22, (4), 187-189. Sundberg, J. (1987). The science of the singing voice, Dekalb, Illinois: Northern Illinois University Press. Thorpe, C.W., Callghan, J., and van Doorn, J. (1999). Visual feedback of acoustic voice features for the teaching of singing, Australian Voice, 5, 32-39. Titze, I.R. (1984). Parameterization of the glottal area, glottal flow, and vocal fold contact area, Journal of the Acoustic Society of America, 75, 570-580. Titze, I.R., Baer, T., Cooper, D., and Scherer, R. (1984). Automatic extraction of glottographic waveform parameters and regression to acoustic and physiologic variables, In: Bless, D.M. and Abbs, J.H. (Eds.), Vocal fold physiology: contemporary research clinical issues, San Diego: College Hall, 146-154. Welch, G.F. (1979). Poor pitch singing: a review of the literature, Psychology of Music, 7, (1), 50-58. Welch, G.F. (1985). A schema theory of how children learn to sing in-tune, Psychology of Music, 13, (1), 3-18. Welch, G.F. (1986). A developmental view of children’s singing, British Journal of Music Education, 3, 295-303. Welch, G.F., Rush, C., and Howard, D.M. (1988). “The SINGAD (SINGing Assessment and Development) system: First applications in the classroom”, Proceedings of the Institute of Acoustics, 10, (2), 179-185. Welch, G.F., Howard, D.M., and Rush, C. (1989). Real-time visual feedback in the development of vocal pitch accuracy in singing, Psychology of Music, 17, 146-157. Welch, G.F., Rush, C., and Howard, D.M. (1991). A developmental continuum of singing ability: Evidence from a study of five-year old developing singers, Early child development and care, 69, 107-119. Welch, G.F., and Howard, D.M. (2002). Gendered voice in the Cathedral choir, Psychology of Music, 30, (1), 102-120.

- page 16 of 16 -

Suggest Documents