A Digital Sound Editor - Science Direct

COMPUTERS

AND

BIOMEDICAL

RESEARCH

18,480-487

A Digital

(1985)

Sound

DONALD G. JAMIESON*

Editor

AND DAVID

NAUGLER

Department of Psychology, University of Calgary, 2500 University Drive, Northwest, Calgary, AB, E’N lN4, Canada Received November 12, 1984

A series of digital computer programs which facilitate the production and control of acoustic stimuli for hearing assessment and research are described. The package, which is available for PDP 11 computers under RT-11, allows sounds to be digitized, adjusted for amplitude and/or dc offset, edited while in digital form, and output to tie or tape. The waveform editor package includes facilities to edit sounds in time-with some sections removed or added with temporal precision of 0.1 msec or better. Two or more sounds may also be combined for stereo or monaural (sound-on-sound) output, or two may be concatenated. Together, the programs permit a wide range of manipulations useful in preparing sound stimuli for use in hearing experiments or in clinical audiometry. o 198s Academic PWS. inc.

Investigations of speech-related hearing disorders require precise stimulus definition and control. Such control can be obtained by altering aspects of prerecorded natural speech in a measureable and reproducible manner. Studies using edited natural speech have shown considerable promise, in terms both of understanding basic processes of speech perception (I, 2) and in terms of demonstrating the factors which limit the intelligibility of speech for hearing impaired listeners (3). As one example, recent work (4) has demonstrated that adjusting the duration of the preceding vowel in a vowel-consonant speech sound can produce substantial improvements in the intelligibility of speech, even for listeners who profound hearing impairments. While such work holds great promise for understanding hearing and for rehabilitating hearing loss, this research is presently restricted by the limited availability of facilities for accurate sound editing and control. This paper describes an integrated speech waveform editor for PDP 11 series computers, manufactured by the Digital Equipment Corporation. The waveform alteration package (THWAP) is a series of programs designed to facilitate the alteration, measurement, and organization of digitized sound waveforms (see Schafer and Rabiner (5) for a discussion of digitized speech). The package provides a variety of editing functions, including programs to cut, mark, con* To whom requests for reprints should be addressed. 480 0010-4809/8.5 $3.00 Copyright 0 1985 by Academic Press, Inc. All rights of reproduction in any form reserved.

DIGITAL

SOUND EDITOR

481

FIG. 1. Diagram of the hardware configuration assumed for the present version of the PDPllbased waveform editing package, running under RT-11.

catenate, and merge sound files, useful to implement a large number of experimental situations for the study of normal or disordered hearing function. Manipulations include control over (a) temporal cues involved in the identification of speech sounds such as the duration of certain voiced or silent intervals (e.g. (2-Q); (b) the relative amplitudes of particular speech cues such as the release burst of a syllable-initial stop consonant (6); (c) the timing and amplitude of events presented to the two ears (e.g., for studies of hemispheric lateralization (7, 8)); or for studies of central auditory function, using binaural fusion (9, 10); and (d) measurement of temporal, spectral, and amplitude parameters of speech utterances (II, 12). The package contains three types of programs (see Table 1). First, are input/ output programs to record (digitize) sounds and to regenerate sounds by digital to analog conversion. Second, are file manipulation routines to display and edit the stored sounds. Third, are file reorganization programs to combine the contents of two files to create binaural files, and to paste (concatenate) files, or sum (superimpose) files.

Language

and Hardware

Requirements

The package was developed for a PDP 1 l/34 computer, with a minimum of peripheral equipment (cf. Fig. 1). It requires a hard disk (RK05), video monitor or hardcopy terminal, programmable real-time clock (KW 1I-K), analog-to-digital converter (AD1 l-K), digital-to-analog converter (AA1 l-K), anti-alias filter, and an X-Y oscilloscope (e.g., Tektronix 608). The programs are written in RTll Fortran and Macro. A subsequent package was developed for DEC’s Vax computers. This paper will concentrate on the package developed for the PDP 11, on which the very similar Vax package was based. The programs are designed for users with minimal computing experience. Brief instructions are presented on the video terminal to guide the user. A user manual has been written to provide detailed documentation of each routine for the relatively uninitiated user and to supplement the instructions given on-line. Wherever possible, files are given optional default file names to eliminate repetitive typing.

482 Input-Output

JAMIESON

AND

NAUGLER

Programs

The input program, WEDA2D, converts an analog signal (e.g., speech) into digital values at a rate specified by the user. The sampling rate is flexible and may range from 16 kHz for speech recording to much slower rates if electrophysiological or other low frequency signals are to be digitized. The digitized values are stored on disk. The amount of speech that can be stored is a function of the sampling rate and is, therefore, restricted in length only by the available disk storage capacity. Within the program, suggestions are given which help the user to optimize the digital representation of the analog wave without incurring peak clipping or “underrecording,” with consequential loss of information. The output program converts the digitized values into an (auditory) analog signal at a user determined rate. Either one- or two-channel output can be specified, with a maximum output rate of 16 kHz for single channel and 12 kHz/ channel for two-channel output on the computer system described above. The macro subroutines which output speech can also be easily incorporated into other programs to control experimental stimulus output. File Manipulation

The editing program, WED, displays 1000 samples of the waveform file on an oscilloscope. A visible cursor indicates the location in time at which the program pointer is positioned. The amplitude of the signal at the cursor and the position in time of the cursor (dependent on the sampling rate) are displayed on the terminal. The cursor can be scrolled forward and backward through the waveform file by striking the numeric keys on the terminal. Using this procedure, the beginning and end of a waveform segment can be defined, and this segment can be either output to the DAC or stored in a new file. The time resolution of the editor is the inverse of the sampling rate and thus ranges up to 62.5 psec for the 16-kHz sampling rate. The command to execute any function is a single mnemonic keystroke at the terminal (B-define beginning of a section; E-erase a marker; F-define finish; H-help; M-mark this position; N-fetch a new file; R-change rate; S-speak the defined section; W-write section to a new file). A hard copy plot of the waveform may be obtained on a digital X-Y plotter. Samples of oscilloscope displays are given in Figs. 2 and 3. When a new file is written to disk, the relative or absolute intensity of the signal (or some of its components) may be modified, the dc offset may be zeroed, and the final 300 samples of the file may be used to gradually decrease the intensity of the sound to zero. There are two scales for amplitude variation: (1) average amplitude from the zero level; and (2) a decibel scale with a 66-dB maximum. These options allow a smooth transition when joining two sound files which have different offsets or recording levels and prevent clicks or “glitches” which may otherwise occur when a headphone or amplifier receives a sharp cutoff or onset. In addition, a new file may have silence added to the end of it. This may be necessary when joining several files to create “con-

FIG. 2. Sample waveforms as displayed by the waveform editor, during editing. (A) Oscillogram of the initial 1000 samples (100 msec) of the syllable /ba/, showing the consonant release burst at left, followed by the onset of voicing periodicity; (B) oscillogram of the initial 1000 samples (100 msec) of the syllable from (A), after reducing the amplitude the burst; (C) oscillogram of the initial 1000 samples (100 msec) of the syllable from (A), after removing 50 samples (5 msec) from the burst. 483

484

JAMIESON

AND NAUGLER

FIG. 3. Sample waveforms as displayed during editing. (A) Oscillogram of the initial 1000 samples (100 msec) of the syllable /sa/, showingfrication at the left of the figure; (B) oscillogram of the initial 1000 samples (100 msec) of the syllable /sa/, after adding a lOO-sample (10 msec) silent interval after frication. This adjusted syllable is heard as Isp a/.

netted” speech or when a subsequent program requires an integer number of blocks (for example, when an output routine must, under program control, read and then output a specific section of a file). Such programming is simplified if stimuli are stored in a single large file, with each stimulus beginning at a block boundary.

DIGITAL

SOUND

EDITOR

485

File Reorganization Most file reorganization routines are single-command operations. Included are routines (a) to concatenate up to 15 files into a single file, (b) to add or subtract the contents of two files-for example, to “superimpose” one waveform upon another), and (c) to merge the contents of two files to create a dichotic file for two-channel (stereo) output. One of the routines (WEDLSN) allows the user to randomize the presentation order of up to 100 sound files, and vary the time interval between files, in order to create audio tapes with any desired sequence of sounds.

Additional

Features

The file manipulation, reorganization, and output routines are compatible with the synthetic speech files created by a digital, cascade-parallel speech synthesizer (13, 14), as well as other waveform manipulation programs. For instance, filtering, linear predictive coding, and fast Fourier transforms may be performed on the stored digital waveform. As a result, the flexibility of these routines suggests many applications apart from the study of speech related hearing loss.

Applications Without our laboratory, this editor has been applied in the following ways: (1) to develop a short version of the nonsense-syllable test (15) for clinical use. We edited natural speech in order to control the amplitude and duration of test items and the signal-to-noise ratio, and to randomize the presentation order of the syllables; (2) to alter temporal cues to speech sounds, in order to study speech perception by normal and hearing-impaired listeners; (3) to edit natural speech tokens-digits and consonant-vowel syllables such as /ba/, /da/, /pa/, /ta/-to obtain uniform peak-to-peak amplitudes and to minimize variation in duration for experiments on auditory memory; (4) to edit natural nonspeech sounds-babys’ cries, animal sounds, traffic noise, and so forth-to control the amplitude and duration of specific sound segments, as well as the temporal onset asynchrony of specified binaural pairs of sounds for studies of hemispheric specialization; (5) to edit digitized electrophysiological waveforms including the marking of wave peaks and troughs (16, 17); (6) to measure voiceonset time values for natural speech.

Availability Listings or machine-readable code, preferably on floppy disk (RX02) for all programs (cf. Table I), along with the full THWAP user manual are available from the first author, for the costs of reproduction and handling.

486

JAMIESON

AND NAUGLER TABLE

I

SUMMARY OF PROGRAMS COMPRISING THE WAVEFORM ALTERATION PACKAGE Program name WEDA2D WEDLST WEDLSN WED WEDMX WEDMXS WEDMG WEDLS2 WED2

Application Digitizing an auditory waveform Outputting sounds (one channel) Playing up to 100 files, in specified order and at specified intervals Editing waveforms, writing selected waveforms to disk, altering amplitudes, correcting dc offset etc. Superimposing two sounds (sound on sound) Subtracting one sound from another Merging two channels Outputting sounds (two channels) Editing waveforms, writing selected waveforms to disk; altering amplitudes, correcting dc offset and so forth, for two-channel (stereo) sound

Summary

The programs described are designed to facilitate the editing of natural speech sounds. Included are programs (a) to digitize and store analog speech signals; (b) to adjust the amplitude and duration of a signal; (c) to combine two or more files (sound on sound); (d) to concatenate two or more files; (e) to remove or add samples to adjust the duration of a sound; and (f) to present the adjusted files for review or recording. These facilities allow the preparation of natural speech stimuli for use in many kinds of real-speech audiometry and speech perception studies. ACKNOWLEDGMENTS Thanks are due to D. Chandler who programmed the initial version of the waveform editor project (cf. Jamieson and Chandler, (Z6)), to S. Kramer, who rewrote portions of the package, to A. Mahoney, M. Proctor, and M. Corristine, whose experience and comments have improved the waveform editor, and especially M. Cheesman for her assistance throughout this project. The project was supported by grants from the Alberta Heritage Foundation for Medical Research, and the Natural Sciences and Engineering Research Council of Canada. REFERENCES 1. COLE, R., AND Scorr, B. Toward a theory of speech perception. Psychol. Rev. 81,348 (1974). 2. LIBERMAN, A. M. On finding that speech is special. Amer. Psychol. 37, 148 (1982). 3. REVOILE, S., PICKETT, J., HOLDEN, L., AND TALKIN, D. Acoustic cues to final stop voicing for impaired- and normal-hearing listeners. J. Acousr. Sot Amer. 72, 1145 (1982). 4. REVOILE, S., PICKETT, J., HOLDEN-X, AND TALKIN, D. Acoustic cues to final stop voicing for impaired- and normal-hearing listeners. J. Acoust. Sot. Amer. 72, 1145 (1982). 5. SCHAFER, R. W., AND RABINER, L. R. Digital representations of speech signals. Proc. ZEEE 63(4) (1975). 6. PICKETT, J., REVOILE, S., AND DANAHER, E. Speech-cue measures of impaired hearing. In “Hearing Research and Theory” (J. Tobias and E. Schubert, Eds.), Vol. 2. Academic Press, New York, 1983.

DIGITAL

SOUND EDITOR

487

7. MILNER, B., TAYLOR, L., AND SPERRY, R. Lateralized suppression of dichotically presented digits after commisural section in man. Science (Washington, D.C.) 161, 184 (1968). 8. SHANKS, J., AND RYAN, W. A comparison of aphasic and non-brain-injured adults on a dichotic CV-syllable listening task. Cortex l2, 100 (1976). 9. KEITH, R. “Central Auditory Dysfunction.” Grune & Stratton, New York, 1977. 10. LUBERT, N. Auditory perceptual impairments in children with specific language disorders: A review of the literature. .I. Speech Hear. Disorders 46, 3 (1981). 11. BLUMSTEIN, S. E., COOPER, W. E., ZURIF, E., AND CARAMAZZA, A. The Perception and production of voice-onset time in aphasia. Neuropsychologica 15, 371 (1977). 12. FLEGE, J. E., AND BROWN, W. S. The voicing contrast between English lpl and lb/ as a function of stress and position in utterance. J. Phonetics 10, 335 (1982). 13. KEWLEY-PORT, D. KLTEXC: Executive program to implement the Klatt software speech synthesizer. (Progress Report 4, Research on Speech Perception.) Indiana Univ. Press, Bloomington, 1978. 14. KLATT, D. H. Software for a cascade-parallel formant synthesizer. J. Acousf. Sot. Amer. 67, 971 (1980).

15. DUBNO, J. R., DIRKS, D. D., AND LANGHOFER, L. R. Evaluation of hearing-impaired listeners using a nonsense-syllable test. J. Speech Hear. Res. 25, 141 (1982). 16. JAMIESON, D. G., AND CHANDLER, D. Computerizing methods of pattern recognition of biorhythms. Paper presented at the annual meeting of the Canadian Psychological Association, Quebec City, June, 1979. 17. SINCLAIR, B. R., SETO, M. G., AND BLAND, B. H. 0 cells in the CA1 and dentate layers of the hippocampal formation: Relations to slow wave activity and motor behavior in the freely moving rabbit. J. Neurophysiol. 48, 1214 (1982).