Scripting speech software for Setswana: An ...

Report Scripting speech software for Setswana: An automated phonetic analysis of formants in Setswana vowels using the Praat software Winston No¨el Anderson 0899-910-4 University of South Africa HonoursBSc (Computer Science) COS498-X and COS 499-Y February 12, 2002

Contents 1 Introduction

7

2 Acoustic phonetics 2.1 Computational linguistics overview . . . . . . . 2.2 Acoustic phonetics in computational linguistics 2.2.1 Acoustic phonetics . . . . . . . . . . . . 2.2.2 Vowel charts . . . . . . . . . . . . . . . 2.2.3 Wave properties . . . . . . . . . . . . . Amplitude . . . . . . . . . . . . . . . . . Wavelength . . . . . . . . . . . . . . . . Frequency and pitch . . . . . . . . . . . 2.2.4 Lag and auto-correlation . . . . . . . . . 2.2.5 Periodic signal . . . . . . . . . . . . . . 2.2.6 Complex waves . . . . . . . . . . . . . . 2.2.7 Spectrographs . . . . . . . . . . . . . . . 2.2.8 Harmonics . . . . . . . . . . . . . . . . . 2.2.9 Fundamental frequency . . . . . . . . . 2.2.10 Resonance . . . . . . . . . . . . . . . . . 2.2.11 Standing waves . . . . . . . . . . . . . . 2.2.12 Phonemes . . . . . . . . . . . . . . . . . 2.2.13 Spectrograms . . . . . . . . . . . . . . . 2.2.14 Formants . . . . . . . . . . . . . . . . . 2.2.15 Vowel formants . . . . . . . . . . . . . . 2.2.16 Praat formant objects . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

3 Praat and scripting 3.1 Praat: the software . . . . . . . . . . . . . . . . 3.2 Using the scripting in the Praat software . . . . 3.2.1 Scripting languages that can be compared 3.2.2 Syntax and semantics . . . . . . . . . . . 3.2.3 Names . . . . . . . . . . . . . . . . . . . . 3.2.4 Variables and expressions . . . . . . . . . 3.2.5 Typing . . . . . . . . . . . . . . . . . . . 3.2.6 Data types . . . . . . . . . . . . . . . . . Numerics . . . . . . . . . . . . . . . . . . Strings . . . . . . . . . . . . . . . . . . . . Complex types . . . . . . . . . . . . . . . Other data types . . . . . . . . . . . . . .

1

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

9 9 10 10 11 13 13 13 13 13 14 14 14 14 15 15 15 16 16 16 18 18

. . . . . . . . . . . . to Praat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

20 20 20 21 22 23 23 24 24 24 24 24 27

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

3.2.7 3.2.8 3.2.9

3.3

3.4

Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Referencing . . . . . . . . . . . . . . . . . . . . . . . . . . Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . String constants . . . . . . . . . . . . . . . . . . . . . . . Numeric constants . . . . . . . . . . . . . . . . . . . . . . Special constant . . . . . . . . . . . . . . . . . . . . . . . 3.2.10 Control structures: compound, selective, iterative . . . . . Selective conditional . . . . . . . . . . . . . . . . . . . . . Unconditional iteration . . . . . . . . . . . . . . . . . . . Counter-controlled conditional iteration . . . . . . . . . . Post-test conditional iteration . . . . . . . . . . . . . . . . Pre-test conditional iteration . . . . . . . . . . . . . . . . 3.2.11 Sub-programmes . . . . . . . . . . . . . . . . . . . . . . . 3.2.12 Exception handling . . . . . . . . . . . . . . . . . . . . . . 3.2.13 Concurrency . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.14 Implementation . . . . . . . . . . . . . . . . . . . . . . . . Unusual features . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Support for International Phonetic Alphabet characters . 3.3.2 Creating graphs . . . . . . . . . . . . . . . . . . . . . . . The algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Background to the algorithms . . . . . . . . . . . . . . . . Sampling and quantization . . . . . . . . . . . . . . . . . Fast Fourier transforms . . . . . . . . . . . . . . . . . . . Linear predictive coding . . . . . . . . . . . . . . . . . . . Pre-emphasis . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Detail on the algorithms . . . . . . . . . . . . . . . . . . . Estimating linear prediction co-efficients using the Burg method . . . . . . . . . . . . . . . . . . . . . . . Stabilising linear prediction co-efficients . . . . . . . . . .

4 The Praat script solution 4.1 The process of writing the Praat script 4.1.1 Defining the problem . . . . . . . The problem statement . . . . . Limitation of field of study . . . 4.1.2 Writing the script . . . . . . . . The sample recordings . . . . . . Praat: the script . . . . . . . . 4.1.3 Detailed process . . . . . . . . . Sample analysis . . . . . . . . . . Initial feedback . . . . . . . . . . Revised full script run . . . . . . Feedback . . . . . . . . . . . . . Genericise and isolate problems . Graph results in script . . . . . . The script solution . . . . . . . . Version Control . . . . . . . . . . 4.2 Basic script user guide . . . . . . . . . . 4.2.1 Before running the script . . . . The sound file directory . . . . . 2

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

27 27 27 27 28 28 28 28 29 29 30 30 31 32 32 32 33 33 33 33 33 36 36 36 37 37 38 39 42 42 42 42 44 44 44 45 45 45 45 46 46 46 46 46 47 47 47 47

4.2.2 4.2.3

4.2.4 4.2.5

Sound file names . . . . . . . . . . . . . . . . . How to run the script . . . . . . . . . . . . . . Choosing the various run options . . . . . . . . Sound directory name . . . . . . . . . . . . . . Analysis method . . . . . . . . . . . . . . . . . Interpretation of File Names . . . . . . . . . . Level of result detail . . . . . . . . . . . . . . . Comma-delimited results . . . . . . . . . . . . Graphical results . . . . . . . . . . . . . . . . . Graphical representation . . . . . . . . . . . . . Plot type . . . . . . . . . . . . . . . . . . . . . Completing the script execution . . . . . . . . Halting the script deliberately during execution Examining the output . . . . . . . . . . . . . .

A Resulting formant charts A.1 F1 against F2 . . . . . . A.1.1 Summary . . . . A.1.2 Unraised vowels . A.1.3 Raised vowels . . A.1.4 a . . . . . . . . . A.1.5 e . . . . . . . . . A.1.6 Raised e . . . . . A.1.7 i . . . . . . . . . A.1.8 o . . . . . . . . . A.1.9 Raised o . . . . . A.1.10 u . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

49 49 49 49 49 49 53 53 53 53 53 53 53 53

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

57 57 57 59 61 62 64 66 72 74 76 82

B Resulting table of values

84

C Complete script

89

D Presentation given at Human Technology Workshop E References E.1 The Praat algorithm . . . E.2 The usual frequency domain E.3 Scripting in Praat . . . . . E.4 Technical report writing . .

. . . . . . . . . . . . . . . . . methods of spectral analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

108

. . . .

. . . .

. . . .

. . . .

109 109 109 109 109

List of Figures 2.1 2.2 2.3 2.4 2.5

The cardinal vowel chart . . . . . . . . . . . . . . . . . . . . . . . Setswana vowels in terms of the cardinal vowel chart in [Jones, 1928] Setswana vowels in terms of the cardinal vowel chart in [Malao, 1987] Spectrogram of the vowel a in Setswana . . . . . . . . . . . . . . Zoomed spectrogram of the vowel a in Setswana . . . . . . . . .

11 12 12 17 19

3.1 3.2

Three component sine waves based on values in Table 3.1 . . . . A complex periodic wave formed from three component sine waves in Figure 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A power spectrum derived by Fourier analysis of the complex wave depicted in Figure 3.2 . . . . . . . . . . . . . . . . . . . . .

34 34

Version control audit trail history of the script Praat script window: open script . . . . . . . Windows open dialogue dox . . . . . . . . . . Script options form . . . . . . . . . . . . . . . . Work in progress dialogue box . . . . . . . . . . Praat info dialogue box . . . . . . . . . . . . . Praat picture window . . . . . . . . . . . . . . Comma-delimited output file . . . . . . . . . .

48 50 51 52 54 54 55 56

3.3

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

A.1 Bark scale for vowels in Setswana (phonetic representation) . . . A.2 Bark scale for vowels in Setswana . . . . . . . . . . . . . . . . . . A.3 Bark scale for unraised vowels in Setswana (phonetic representation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.4 Bark scale for unraised vowels in Setswana . . . . . . . . . . . . . A.5 Bark scale for raised vowels in Setswana (phonetic representation) A.6 Bark scale for a in Setswana . . . . . . . . . . . . . . . . . . . . . A.7 Bark scale for a in Setswana . . . . . . . . . . . . . . . . . . . . . A.8 Bark scale for e in Setswana . . . . . . . . . . . . . . . . . . . . . A.9 Bark scale for e in Setswana . . . . . . . . . . . . . . . . . . . . . A.10 Bark scale for e in Setswana . . . . . . . . . . . . . . . . . . . . . A.11 Bark scale for e in Setswana . . . . . . . . . . . . . . . . . . . . . A.12 Bark scale for e in Setswana . . . . . . . . . . . . . . . . . . . . . A.13 Bark scale for e in Setswana . . . . . . . . . . . . . . . . . . . . . A.14 Bark scale for e in Setswana . . . . . . . . . . . . . . . . . . . . . A.15 Bark scale for e in Setswana . . . . . . . . . . . . . . . . . . . . . A.16 Bark scale for i in Setswana . . . . . . . . . . . . . . . . . . . . . A.17 Bark scale for i in Setswana . . . . . . . . . . . . . . . . . . . . .

4

35

57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73

A.18 Bark A.19 Bark A.20 Bark A.21 Bark A.22 Bark A.23 Bark A.24 Bark A.25 Bark A.26 Bark A.27 Bark

scale scale scale scale scale scale scale scale scale scale

for for for for for for for for for for

o o o o o o o o u u

in Setswana . . in Setswana . . in Setswana . . in Setswana . . in Setswana . . in Setswana . . in Setswana . . in Setswana . . in Setswana . . in Setswana by

5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . speaker

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

74 75 76 77 78 79 80 81 82 83

List of Tables 3.1

Example of sine wave components of an [a] vowel . . . . . . . . .

33

B.1 B.2 B.3 B.4 B.5 B.6

Formant values for Setswana vowels . . . . . . . . . . . . . . . . Formant values differences for Setswana vowels . . . . . . . . . . Formant values for Setswana vowels by speaker . . . . . . . . . . Formant values differences for Setswana vowels by speaker . . . . Formant values for Setswana vowels by consonant environment . Formant values differences for Setswana vowels by consonant environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84 84 85 86 87

6

88

Chapter 1

Introduction This report is about scripting software for phonetic analysis. To begin describing the software, scripts and the phonetic analysis, Chapter 2 starts with definitions that are important in understanding the phonetic concepts. Speech and language processing which is concerned with computational techniques that process spoken and written language, is examined. Acoustic phonetics within computation linguistics is developed in more detail, and such important concepts as vowel charts, complex waves, fundamental frequencies and vowel formants are defined. Chapter 3 looks at the computer programme Praat. Praat is a research, publication, and productivity tool for phoneticians. With it, one can analyse, synthesize, and manipulate speech, and create high-quality pictures for articles and theses. The Praat scripting language is examined and compared to other common imperative and scripting languages. Chapter 4 describes the solution provided to the Department of African Languages. The specific script is examined in more detail. The appendices document the results of the Praat script solution, and also contain a presentation of the results given at a Human Technologies Workshop during the 2002 African Language Association of Southern Africa conference1 .

Audience This report is written primarily for a computer science audience. As such, where subject matter relates to phonetics or signal processing, these have been described in detail with the assumption that the reader might need further background in these concepts to further understand the problem and the solution. The assumption is made that the reader has an understanding of tertiary level computer science and mathematical principles, for example, in defining how the scripting language in Praat relates to other scripting and programming languages. 1 Presented Thursday 13 September 2001 at a workshop on Human Language Technologies in South Africa - Research, development and training at [CHI-SA 2001, 2001]: The 2nd South African Conference on Human-Computer Interaction, Special Track on Human Language Technologies in Africa, in co-operation with ALASA-SIG (African Languages Association of Southern Africa Special Interest Group for Language and Speech Technology Development) and the African Speech Technology Project.

7

CHAPTER 1. INTRODUCTION

8

The only exception to this is Section 4.2 which is written for a linguist or phonetician who would need to understand how the script written for this project can be used.

Trademarks This document was formatted and typeset in LATEX2 . Praat is licensed to UNISA for academic purposes only. RedHat is a registered trademark of RedHat Corporation. Microsoft, Visual Basic, Windows, Windows 2000, Excel and/or other Microsoft products referenced herein are either registered trademarks or trademarks of Microsoft Corporation. Java and all Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. TclPro is a registered trademark of Scriptics Corporation. Johannesburg International Airport is an airport of the Airports Company of South Africa. All other products or company names are used for identification purposes only and may be trademarks of their respective owners.

Notation Throughout this report, software products are identified by being represented in Small Caps type. Vowels are represented in one of three formats: • the cardinal vowels are represented as vowels enclosed in square brackets • the orthographic vowels are represented in italics as written • the phonetic vowels are represented as their representations in the International Phonetic Alphabet.

Acknowledgements I wish to express my sincere thanks and apreciation to Ken Halland whose encouragement and thinking shaped my interests and this practical project. I wish to thank Mia Le Roux for involving me with her research, and to both Mia Le Roux and Albert Kotz´e who gave up valuable time to help correct problems in my understanding and who provided valuable information for the content of this report. Thanks to Paul Boersma, who patiently replied to various emails and helped in resolving practical problems. Thanks to Arina Britz for her availibility and ideas. Thank you to my wife, Hyreath Anderson, for her professional insight, advice and all her personal support during the course of this project.

Chapter 2

Acoustic phonetics 2.1

Computational linguistics overview

Johannesburg International Airport: Do you know your flight number? Caller: No. Johannesburg International Airport: Please state the airport of your choice? Caller: Johannesburg. Johannesburg International Airport: Please choose the type of flight: international or domestic. Caller: International. Johannesburg International Airport: Would you like information on arrivals or departures? Caller: Arrivals. Johannesburg International Airport: When is your expected flight time? Caller: Eight, oh five. This report is about scripting software for phonetic analysis. To begin describing the software, scripts and the phonetic analysis, we need to start with definitions. In the above example, Johannesburg International Airport represents an English human voice synthesised by computer called Automated Flight Enquiries (Afe) — available on +27 82 245 9191. The caller is a South African calling in via telephone to find out information about a flight. Although this technology is still rudimentary, note that Afe speaks to the caller, and seems to understand the caller — a human being’s voice. What is required for this form of communication? Solving this problem is the concern of Speech and Language Processing which is made up of Natural Language Processing, Computational Linguistics, Speech Recognition and Speech Synthesis. Speech and language processing is concerned with computational techniques that process spoken and written language as language. To determine what the caller is saying, the Afe must be capable of analysing an incoming audio signal and recovering the exact sequence of words the caller used to produce that signal. Similarly, in Afe’s response, Afe must be able to take a sequence of words, and generate an audio signal that the caller can recognise. These tasks require knowledge about phonetics and phonology, the model 9

CHAPTER 2. ACOUSTIC PHONETICS

10

of how words are pronounced in language and specifically in colloquial speech. The tasks also require information about signal processing, how a computer receives, analyses, stores and produces sound. This is a subset of the broader field of acoustics — how computers, animals and humans process sound, or the nature of sound. The area of acoustic phonetics within Speech and Language Processing is one of the conceptual foci of this report. Another focus is the nature of scripting languages (refer to Chapter 3 for further detail). Other areas of speech and language processing would include: • Morphology – Producing and recognising variations of individual words in context. • Syntax – the knowledge needed to order and group words. • Lexical semantics – the knowledge of the meaning of component words. • Compositional semantics – knowledge of how lexical semantic components are combined to form larger meanings. • Pragmatics – the appropriate use of language. • Discourse conventions – correctly structuring conversations. • Ambiguity and disambiguity – multiple alternative linguistic structures having different meanings.

2.2

Acoustic phonetics in computational linguistics

Many of the definitions and explanations in this section are taken from [Language Processing and Language Technology, 2000]. When needing to complement explanations from other sources, they will be specifically cited.

2.2.1

Acoustic phonetics

Speech sounds are produced by means of our organs of speech, all of which have a primary biological function, and which are used in articulations only in a secondary role. When articulations occur, the air particles in the vicinity of the articulation are disturbed and these encoded disturbances are transmitted to the area around the speaker in the form of sound waves. Through learning, humans become adept at interpreting the coded disturbances by first receiving these by means of their ears and then transforming the code into a tiny electrical current which is sent to the brain by the hearing nerves. Obviously, each sound must have its own unique wave pattern, or else we would not be able to discern one sound from the next. Acoustic phonetics is the study of the sound waves by means of which speech sounds are transmitted.


11

Figure 2.1: The cardinal vowel chart

2.2.2

Vowel charts

Articulation of vowels is described in terms of tongue and lip positions. A language has its own specific vowels, and this project focusses on Setswana vowels only. Vowels are characterised by position of articulation. When the tongue is near the roof of the mouth, with a narrow jaw opening, the vowels are said to be close or high vowels. Such vowels are [i] or [u]. A vowel with the top of the tongue relatively far from the roof of the mouth and with a wide jaw opening is [a], and it is termed an open or low vowel. The vowels [e] and [o] are articulated with less tongue height than [i] and [u], but with more tongue height and less jaw opening than [a]. The vowels [e] and [o] are termed mid vowels. Vowels articulated with the tongue retracted are termed back vowels , while those produced with the tongue extended toward the front are termed front vowels. The vowels [o] and [u] are back vowels while [i] and [e] are front vowels. Front vowels are made with spread lips, while back vowels are made with rounded lips. Each of the vowels can be long or short. The graphical representation of the vowel position is termed a vowel chart. In the past vowel charts were based on the subjective hearing of the listener. These articulatory representations had implied axes. The horizontal representation represented the quality of back and front, or the distance from the frontal incisors. The vertical axes of the chart represented height, or the distance from the roof of the mouth. Examples of such vowel charts are given in the cardinal vowel chart in Figure 2.1, an older representation of Setswana (in [Jones, 1928]) vowels illustrated in Figure 2.2 and a modern representation of Setswana vowels (in [Malao, 1987]) illustrated in Figure 2.3. Since these qualities affect the formant values, a more accurate acoustic representation based on formant values could possibly be more accurate in representing these qualities. The Department of African Languages intended to make this more formal and objective for Setswana based on the ratios of the frequency of formants in acoustic phonetics. See Section 2.2.14 below for a definition of formants.


12

Figure 2.2: Setswana vowels in terms of the cardinal vowel chart in [Jones, 1928]

Figure 2.3: Setswana vowels in terms of the cardinal vowel chart in [Malao, 1987]


2.2.3

13

Wave properties

The sound waves made by the speech organs are of a rather complex type, as these waves can be broken down into various composing simpler waves which together give rise to one complex wave form. A number of very important properties can be measured for a sound wave: • amplitude (the size of the pressure variations, corresponding with loudness) • wavelength (the length of the wave measured in metres) • frequency (corresponding with pitch and usually measured in Hertz) Amplitude The magnitude of an acoustic activity will correspond to the size of the air pressure variations, and hence with the effect on the ear or the mechanism (e.g. microphone) with which it is received. As a result there is a clear link between the amplitude of a wave and loudness. When represented in a graph a wave with a small amplitude will not deviate far from the horizontal axis indicating normal air pressure. Conversely, a wave with a large amplitude will deviate far from the horizontal axis. Wavelength Given the speed at which sound waves travel and the factor of wavelength, the shorter the wavelength, the higher the frequency will be, and the longer the wavelength, the lower the frequency. Wavelength is therefore the converse of frequency. The shorter the wavelength, the more cycles are completed in a given period and vice versa. Frequency and pitch Waves are propagated in cycles, i.e. adjacent areas of high and low air pressure. One cycle is composed of a high pressure area and its adjacent low pressure area. While in existence, such cycles move through the air at a speed of about 330ms−1 . Depending on the length of a particular wave, more or less cycles are completed within a given period of time. This then, is what frequency refers to: cycles of a wave that are completed in a second or Hertz (Hz). Whereas frequency is a direct physical measurement, we often interpret this as humans as a quality of the sound. When referring to the quality of a sound governed by the rate of vibrations producing it, or the degree of highness or lowness, we call this pitch. [Pearsall, 1999]

2.2.4

Lag and auto-correlation

Since sound waves and frequencies (by defintion) are periodic, one needs practical tools to help one analyse these periodic functions.


14

Fast Fourier Transforms (FFT) and Linear Predictive Co-efficients (LPC) are used in analysing any periodic function or wave form, for example frequencies, and are an important basis of signal processing. A correlation function lies in the time domain, i.e. it is a function of time. Fourier Transforms are used to transform equations from the time domain to the frequency domain and vice versa. This transformation is a function of amplitude if in a subset of the Real (R) domain and, possibly, amplitude and phase if in the Complex (C) domain. The correlation is a special combination of Fourier Transform, which is based on a change in time, hence called a lag. Auto-correlation is the lag of a sampled function with itself. The autocorrelation is the correlation of a sampled function with itself [Press et al., 1992].

2.2.5

Periodic signal

For a time signal, by ascertaining the wave function maxima, it is possible to determine a special lag that re-occurs. This re-occurring lag is termed a period, and can be used to define what is called a periodic signal according to [Boersma, 1993].

2.2.6

Complex waves

The waves of speech sounds are of a complex nature. Complex waves are formed when more than one wave has to travel through the same medium at the same time. Because sound waves are variations in air pressure, different waves with differing frequencies and amplitudes having to co-exist will impact on one another. The mutual influences of waves upon one another take on the following form: • a high pressure from one wave will cancel out a low pressure from another wave • two high pressures will reinforce each other • two low pressures will reinforce each other.

2.2.7

Spectrographs

In acoustic analysis, the focus is not on the actual complex wave, but on the frequencies and amplitudes of the simple waves of which it is composed. By making use of special analysis tools utilising Fast Fourier Transforms, a complex wave can be dismantled in order to reveal the frequencies and amplitudes of which it is comprised. When this is done the acoustic ‘code’ of the sound in question is revealed and differences between sound types become obvious. The spectrograph can be compared to a prism breaking a speech sound into its component frequencies, in the same way that a prism breaks a light wave into its component frequencies.

2.2.8

Harmonics

Naturally occurring waves involve various kinds of vibration simultaneously. The following modes of vibration are produced by a vibrating object:


15

• The frequency of the simple wave produced by the simplest back-and-forth motion is called the fundamental frequency. • The frequency produced by the mode of vibration where the object vibrates in halves is twice the fundamental frequency (and an octave higher). • The frequency produced by the third mode, where the object vibrates in thirds, is three times the fundamental frequency. • More modes of vibration are excited over a range of frequencies from the fundamental, which is the lowest, and upward. Each of the higher-frequency simple waves is called a harmonic. In naturally occurring vibrations there is a harmonic at each multiple of the fundamental frequency all the way up to infinity, although the amplitude of an harmonic decreases as the frequency rises.

2.2.9

Fundamental frequency

A wave is produced by the vocal cords. Before it is modified by the vocal tract it is known as a glottal wave. This frequency, produced by the simplest backand-forth motion of the vocal cords, is also known as the fundamental frequency and is perceived as the pitch of a voiced sound. It is the lowest frequency produced by the oscillation of the whole object, as distinct from any harmonics (or higher frequencies that are part of the complex wave) produced. A periodic signal will have a fundamental frequency based on its lag [Boersma, 1993].

2.2.10

Resonance

Resonance is the tendency of an object to vibrate at a certain frequency. If an attempt is made to vibrate an object at a frequency other than its natural frequency, the vibrations will be dampened and they will quickly die out. If, on the other hand, the object is made to vibrate at its natural frequency, the vibrations will be reinforced and the object will resonate. Resonance can be regarded as natural amplification, i.e. enhancement without the use of dedicated equipment. It occurs when the amplitude of a wave with a certain frequency is increased when the wave exists in a particular environment. The amplitude of waves with other frequencies is in fact dampened because of the enhancement wave.

2.2.11

Standing waves

Wavelength is crucial in establishing resonance. If an object is vibrated in the middle of the room, there would be enough space for complete wavelengths of the sound between the object and the walls on either side. The reflected wave will have an amplitude very similar to that of the original wave, while its wavelength will be the same as that of the original. The original and the reflected wave interact. Where the original and the reflected waves meet over the length of the room, the areas of simultaneous positive air pressure will be


16

added as will be the areas of simultaneous negative air pressure. This will result in a wave with the same wavelength as the original wave but with an increased amplitude, known as a standing wave. Standing waves are instances of resonance and hence they are critical in the articulation process. The sounds we hear most often in everyday life are usually complex tones, e.g. music and speech sounds. This means there are usually many frequencies with wavelengths that might create standing wave patterns.

2.2.12

Phonemes

Each different sound, called a phoneme has a unique vocal tract configuration and therefore also a unique acoustic code. This code can be revealed with dedicated analysis tools like the Praat computer programme.

2.2.13

Spectrograms

Acoustic code can be made visual in the form of a spectrogram, a graphical representation of the frequencies of a sound, phrase or sentence. An example of a spectrogram for the vowel a in Setswana is given in Figure 2.4. While a spectrogram details the frequencies of sound waves on the vertical axis, darkness is an indication of the amplitude at the frequencies displayed. The horizontal axis reflects time. For each different sound the dark areas (areas of high amplitude/frequencies of resonance) on spectrograms are different due to the unique vocal tract configuration which is required to produce a different sound.

2.2.14

Formants

Frequency response curves are of great assistance when we have to distinguish vowels from one another as they indicate the preferred resonating frequencies for each specific configuration of the vocal tract. Each of the preferred resonating frequencies of the vocal tract is easily identified as a bump in the curve, and is known as a formant. Normally the first three formants, or pitch constituents, of the vowel are of most importance in characterising the vowel. When the shape of the vocal tract is changed by movements of especially the tongue body and lips, the position of the formants is also changed. Formants are usually numbered and referred to as F1, F2, F3, F4 etc. Each formant is directly dependent on the articulating movements of the tongue body: • F1 is influenced by tongue body height. • F2 is influenced by tongue body frontness/backness. As a consequence, high vowels have a relatively low F1 and the low vowel [a] the highest F1. F2 is highest in front vowels and the lowest in back vowels.


Figure 2.4: Spectrogram of the vowel a in Setswana

17


2.2.15

18

Vowel formants

Speech sounds can be classified into two main classes, viz. vowels and consonants. During the production of consonants the air-stream is obstructed when it travels through the oral tract. Vowels are produced without any obstruction of the air-stream. Vowels are usually voiced and always act as the core of a syllable. In tone languages like Setswana, vowels are tone bearing as well. Because spectrograms provide us with a visible image of the frequency composition of a sound, as well as an indication of the amplitude at the frequencies displayed, a great deal of detail about the articulation process can be obtained from them as well. A trained spectrogram reader will know which sound/s appear on a spectrogram without having been present when the sounds were recorded. Each vowel will have its own relationships between its formants which are specific to the language.

2.2.16

Praat formant objects

When analysed spectrographically a whole sound input is recorded and stored as a file on computer storage. Praat is a programme designed to detect how much sound intensity there is at particular frequencies at any time during the existence of the sound. In spectrograms the horizontal dimension represents time, the vertical dimension frequency and the darkness indicates amplitude. Darker areas on a spectrogram show those frequencies where the simple component waves have a high amplitude. A formant object in Praat represents the spectral structure of the sound as a function of time: a formant contour. Figure 2.5 illustrates this. This is the same sound as represented in Figure 2.4 except that it has been zoomed in to smaller time period around the centre of the recording. The dark bands represent the formant contours. A formant countour is sampled by Praat into a number of frames centred around equally spaced times. Each frame contains frequency and bandwidth information about several formants.


Figure 2.5: Zoomed spectrogram of the vowel a in Setswana

19

Chapter 3

Praat and scripting 3.1

Praat: the software

The computer programme Praat is a research, publication, and productivity tool for phoneticians. With it, one can analyse, synthesize, and manipulate speech, and create high-quality pictures for articles and theses. [Ousterhout, 1998] suggests the following questions in deciding whether to use a scripting language or a system programming language for a particular task: 1. Is the application’s main task to connect pre-existing components? 2. Will the application manipulate a variety of different kinds of things? 3. Does the application include a graphical user interface? 4. Does the application do a lot of string manipulation? 5. Will the application’s functions evolve rapidly over time? 6. Does the application need to be extensible? If the majority of the answers were yes, then scripting is more appropriate than a system programming language. Praat offers both options — scripting in its script language or using C. The decision of the Department of African Languages to go the scripting route was the correct one.

3.2

Using the scripting in the Praat software

Praat has automation and customisation features available for scripting or programming. Since Praat is written in C, programming can be done by adding in C modules that call Praat classes and modules. This is the high-end option. An option more geared towards linguists however, is scripting. This scripting ranges from basic macro recording and playback routines, to far more elaborate scripts. Scripting languages have been in existence since the inception of computer programming. Generally, they have always taken a back seat to programming

20

CHAPTER 3. PRAAT AND SCRIPTING

21

languages, with their main purpose being to provide mechanisms to sequence programmes, or to glue programmes of different languages together. Modern scripting languages have become syntactically and semantically independent, whereas earlier languages were just quick versions of automating programmes in their related systems languages (e.g. awk and C). There has been a demand for languages that are easy to learn and use, not only by ‘computer professionals’, but also by ‘casual general users’. All of the major scripting languages include ways to connect to databases, file systems, libraries, network protocols and other common technologies. System programming languages were designed for building data structures and algorithms from scratch, starting from the most primitive computer elements such as memory words. In contrast, scripting languages are designed for melding: they assume the existence of a set of powerful components and are intended primarily to connect these components together. This is in fact the exact design principle behind the Praat scripting language. System programming languages are strongly typed to help manage complexity, while scripting languages are generally typeless to simplify connections between components and provide rapid application development. All scripting languages must have certain features. First and foremost, the scripts must be accessible. It must be easy to create, modify and execute a script. Scripts must have the control structures of system programming languages that enable one to take actions depending on certain conditions. A script in Praat is text that consists of menu commands and action commands. If one runs the script, the commands are executed as if one selected them from the Praat GUI. In addition there are standard language constructs like conditionals, looping, file handling statements and object manipulation among others. A brief description of the scripting language features of the Praat software follows. For more detailed explanations of this, and of the functionality of Praat itself refer to [Boersma and Weenink, 2001].

3.2.1

Scripting languages that can be compared to Praat

Praat scripting contains some syntactic and semantic features of other scripting languages, as well as having some similar features to imperative programming languages. It shares with Basic, such properties as string variable names and form definition. Object oriented properties similar to Python and C++, such as class property inheritance are present. Notable scripting languages that Praat could be compared to are languages like Perl, Tcl and Python. Both Perl and Tcl were developed in 1988. Larry Wall, of the Jet Propulsion Laboratory, created Perl, an acronym for Practical Extraction Report Language, in that year. Malcolm Beattie and Gurusamy Sarathy were further developers who enhanced Perl. Perl is probably the most widely-used scripting language today and dominates in World Wide Web Cgi (Common Gateway Interface) scripting environments, where it is necessary to link web sites to back-end databases and other back-end software. Its key strengths lie in the string and data processing areas. Some key features of Perl include: • easy interpretation of any regular expressions and pattern matching


22

• number and string typing interoperability • built-in array operations and unrestricted arrays (unrestricted in size) • native hash table support • object-orientation including multiple inheritance • built in security with encryption and decryption standard to the language. Tcl (pronounced ‘tickle’) is an acronym for Tool Command Language. John Ousterhout, of the University of California at Berkeley, created the language in early 1988. In May 1994 Ousterhout joined Sun Microsystems, after which in January 1998, Ousterhout left Sun, with five other developers, to found Scriptics Corp. In September 1998, Scriptics shipped TclPro; the first commercial version of Tcl. Tcl has simple syntax, is easily learned and is extensible. Tcl is clumsy for simple arithmetic and some other common operations, and standard Tcl does not support modularization well. Python, a language named after Monty Python’s Flying Circus, was developed by Guido von Rossum. It is a sophisticated, object-oriented, portable, maintainable language. Some of the key features of Python include: • it is statement, rather than expression based • it uses indentation for grouping and scoping, which aids in its enforcement of programming style and readability • although it does not enforce using object-oriented structures, it does support multiple inheritance, polymorphism, overloading, and abstract classes • it has standard data types, plus some that are unique to it (e.g. tuples and dictionaries) • it supports exception handling • its scoping, variable typing and function handling are all quite different from other languages • functions can be declared without names • it has a versatile else that can be added onto most control structures.

3.2.2

Syntax and semantics

Praat specifies comments in its scripting language by #. This is similar to Perl where comments are also indicated by the ‘#’ character, and extend to the end of the line. Perl inherits most of its syntax from C, and anyone with knowledge of C will easily understand Perl. Similarly the majority of the Praat scripting syntax is reminiscent of C. The first word in Tcl always is a function, all others are arguments. This is similar to the structure in Praat and how it instantiates its objects in the Praat list of objects. Each function is called with a set of arguments. Most scripting languages have escape sequences to represent special characters, often, as is the case in Tcl or C, represented by a backslash escape


23

sequence. Praat similarly has escape sequences indicated by a backslash escape sequence, and since Praat is used by phoneticians, the developers have added special escape sequences to represent phonetic characters in the International Phonetic Association alphabet.

3.2.3

Names

Names of numeric variables in Praat must start with a lower-case letter, optionally followed by a sequence of letters, digits, and underscores. As in the programming language Basic, the names of string variables end in a dollar sign. Praat, like Tcl will promote integers to reals when needed. Although there are no specific rigid typing rules in Praat, as in similar scripting languages, automatic co-ercion occurs where necessary. All values get translated to the same type.

3.2.4

Variables and expressions

Variables are created on first use, and are not bound to specific types. This is common in languages like Basic and most scripting languages. variable = expression evaluates a numeric expression and assigns the result to a variable. Existing variables are substituted when put between quotes: x = 99 x2 = x * x echo The square of ‘x’ is ‘x2’. This will write the following text to the Info window: The square of 99 is 9801. In order of decreasing precedence: • ( ) (, ) [ ] [, ] — parentheses and brackets • − ∧ — negation and exponentiation, from right to left (2 0.015625; −2 ∧ 6 = −64)

∧

−6 =

• ∗ / div mod — multiplication and division, from left to right (div and mod yield integers) • + − — addition and subtraction, from left to right • = == ! = < > = — comparison; = and == mean equal; ! = and mean unequal; these operators always yield 0 (false) or 1 (true) • not — logical negation • and — logical conjunction • or — logical disjunction


3.2.5

24

Typing

Praat has no specific typing features, and there is no type checking. There is implicit coercion for, say, a multiplication of an integer with a real number, but these features are similar to how languages like Basic would deal with variables of different types.

3.2.6

Data types

The pre-determined data type in Praat scripts are quite limited when compared with languages such as Tcl, Perl or Python, but the class inheritance structure, and the many data types embedded in classes as a result, give Praat a rich mix of data types that would be important for the primary users — phoneticians. Numerics Numeric variables contain integer numbers between −1 000 000 000 000 000 and +1 000 000 000 000 000 or real numbers between −10308 and +10308 . The smallest numbers lie near −10−308 and +10−308 . Names of numeric variables must start with a lower case letter, optionally followed by a sequence of letters, digits, and underscores. Strings As in the programming language Basic, the names of string variables end in a dollar sign. Complex types Arrays

Quote substitution allows one to simulate arrays of variables:

for i from 1 to 5 square‘i’ = i * i endfor After this, the variables square1, square2, square3, square4, and square5 contain the values 1, 4, 9, 16 and 25 , respectively. One can use any number of variables in a script, but one can also use Matrix or Sound objects for arrays. One can substitute variables with the usual single quotes, as in ‘square3’. If the index is also a variable, however, one may need a dummy variable: for i from 1 to 5 hop = square‘i’ print The square of ‘i’ is ‘hop’ printline endfor The reason for this is that the following line would not work, because of the required double substitution: print The square of ‘i’ is ‘square‘i’’


25

Abstract data types Perl does not provide a formal syntactic interface to a class’s methods. It relies on the programmer to read the documentation of each class. Similarly, for the advanced scripting in Praat, it requires a programmer to read the documentation of the class. Praat contains many pre-defined classes in its phonetics library. These are documented in the scripting tutorial and the help facility of Praat.

General-purpose classes: • Matrix: a sampled real-valued function of two variables • Polygon • PointProcess: a point process (PointEditor) • Sound: a sampled continuous process (SoundEditor, SoundRecorder, Sound files) • LongSound: a file-based version of a sound (LongSoundEditor) • Virtual classes: Function, Sampled, AnyTier, AnyPoint, RealTier, RealPoint, Proximity • Strings • Distributions, PairDistribution • TableOfReal • Sequence • ParamCurve Periodicity analysis: • Measuring jitter, noise, and shimmer • Pitch: articulatory fundamental frequency, acoustic periodicity, or perceptual pitch (PitchEditor) • Harmonicity: degree of periodicity • Intensity, IntensityTier: intensity contour Spectral analysis: • Spectrum: complex-valued equally spaced frequency spectrum (SpectrumEditor) • Ltas: long-term average spectrum • Spectrogram • Formant: acoustic formant contours • LPC: coefficients of Linear Predictive Coding, as a function of time


26

• Wavelet: wavelet transform • Cepstrum, MFCC (Mel filter cepstral coefficients) • Excitation: excitation pattern of basilar membrane • Excitations: an ensemble of Excitation objects • Cochleagram: excitation pattern as a function of time Labelling and segmentation: • TextGrid (TextGridEditor) • IntervalTier • TextTier (TextPoint) Manipulation of sound: • Filtering • Source-filter synthesis • PitchTier (PitchTierEditor) • Manipulation (ManipulationEditor): PSOLA • DurationTier • FormantTier Articulatory synthesis: • Speaker: speaker characteristics of a woman, a man, or a child • Articulation: snapshot of articulatory specifications (muscle activities) • Artword: articulatory target specifications as functions of time (VocalTract: area function) Neural net package: • FFNet: feed-forward neural net • Pattern • Categories: for classification (CategoriesEditor) Numerical and statistical analysis: • Eigen: eigenvectors and eigenvalues • Polynomial, Roots, ChebyshevSeries, LegendreSeries, ISpline, MSpline • Covariance: covariance matrix • Confusion: confusion matrix • Discriminant analysis: Discriminant


27

• PCA: Principal component analysis • Correlation, ClassificationTable, SSCP • DTW: dynamic time warping Multidimensional scaling: • Configuration (Salience) • Kruskal analysis: Dissimilarity (Weight), Similarity • INDSCAL analysis: Distance, ScalarProduct • Correspondence analysis: ContingencyTable Optimality-theoretic learning: • OTGrammar (OTGrammarEditor) • OTAnyGrammar (OTAnyGrammarEditor): OTGrammer tongueRoot Bureaucracy: • WordList • SpellingChecker Other data types There are no user-defined ordinal, record, union, set or pointer types in Praat scripts.

3.2.7

Scope

Variables in Praat scripts all have global scope. Variables declared in procedures also have global scope. Pre-defined classes that are called behave as C classes in terms of scope.

3.2.8

Referencing

Referencing is based on List of Objects in the Praat Object window, and the global scoping of the script. As a new object is created or accessed, it appears in the object window. Any method applied to this class can create a new object. The list is effectively a stack, with the bottom of the stack at the top of the left-hand side of the object window. All the objects in the stack are available for referencing in the script, as well as any variables explicitly defined in the script.

3.2.9

Constants

String constants Predefined string variables are newline$ and tab$.


28

Numeric constants • pi 3.14159265358979323846 • e 2.7182818284590452354 Special constant undefined is a special value that the query commands sometimes write into the Info window, where the value undefined is written as –undefined–. Usage One will often use this –undefined– value in a Praat script to test whether a query command returned a valid number:

select Pitch hallo meanPitch = Get mean... 0.1 0.2 Hertz Parabolic if meanPitch = undefined \# Take some exceptional action. else \# Take the normal action. endif

Behaviour In text files, this value is written as –undefined–. In binary files, it is written as a big-endian IEEE positive infinity. In memory, it is the Ansi-C constant HUGE VAL, which equals infinity on IEEE machines.

3.2.10

Control structures: compound, selective, iterative

There is no break statement in Praat that can be used to break out of any control structure, as can be done for example in Python. Selective conditional The selective conditional is similar to that of most languages including Perl, Tcl and Python. if expression1 then expression2 else expression3 fi if expression1 then expression2 else expression3 endif The conditional selective control structure can be used within expressions themselves or within the structure of a script. Within an expression, if expression1 evaluates as zero, the result is expression3 otherwise the result is expression2. A peak clipper: if abs(self)>100 then if self>0 then 100 else -100 fi else self fi In a script, the structure becomes more complex.


29

if expression elsif expression else expression endif If the expression evaluates to zero or false, the execution of the script jumps to the next elsif or after the next else or endif at the same depth. The following script computes the preferred length of a bed for a person ‘age’ years of age: if age

Scripting speech software for Setswana: An ...

Scripting speech software for Setswana: An ...

Suggest Documents

Setswana Speech Recognizer for Computer Based Applications

Text-to-Speech Scripting Interface for Appropriate ... - Semantic Scholar

WSIPL: An XML Scripting Language for Integrating

ANIMALSCRIPT: An Extensible Scripting Language for Algorithm ...

Shell Scripting for MySQL Administration: An ... - cdn.oreilly.com

Scripting Browsers with Glamour - Software Composition Group

Scripting Coordination Styles - Software Composition Group

Logon Scripting Logon Scripting is an advanced feature that allows ...

Scripting for ArcGIS

The Use of Speech Recognition Software as an

User Interface for an Expressive Speech Synthesiser - Speech data

An Approach for Cross-Site Scripting Detection and ... - ThinkMind

The TeaTM Scripting Language: An Overview - pdmfc

WSIPL: An XML Scripting Language for Integrating Web ... - CiteSeerX

An Object-Oriented Scripting Environment for the WEBSS Electronic ...

EtherPIPE: an Ethernet character device for network scripting

An Analysis of Scripting Languages for Research in Applied ...

WSIPL: An XML Scripting Language for Integrating Web ... - CiteSeerX

XOTCL, an Object-Oriented Scripting Language - CiteSeerX

Towards Developing an Easy-To-Use Scripting Environment for ... - arXiv

MolML: An Abstract Scripting Language for Assembly of ... - CiteSeerX

Batch Scripting for Parallel Systems

TCL Scripting for Cisco IOS

Speech-to-Speech Translation Software on PDAs ... - Semantic Scholar