Perception of Spaciousness using Wave Field ...

1 downloads 674 Views 31MB Size Report
If a well-known pattern is converted in some way, the observer will interpret this pattern differently even ... cogpsy175/Temporal_Reversal.html[11.10.16]. 14 ...
Perception of Spaciousness using Wave Field Synthesis, Headphones and Loudspeakers

Wissenschaftliche Hausarbeit zur Erlangung des akademischen Grades eines Master of Arts der Universit¨ at Hamburg

von Claudia Stirnat

aus Heilbronn Hamburg, 2016

Contents 1 Introduction

1

2 Closely Related Research

4

2.1

Binaural Sky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

2.2

Perceived Spaciousness in Different Music Genres . . . . . . . . . . . .

6

2.3

Audio-Visual Interactions in Loudness Evaluation . . . . . . . . . . . .

7

3 Basic Music Perception

10

3.1

Cognitive approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

3.2

Ecological Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

3.3

Interaction between Audition and Vision . . . . . . . . . . . . . . . . .

19

4 Spaciousness within the Auditory Room

22

4.1

The Acoustical Room . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

4.2

The Psychoacoustical Room . . . . . . . . . . . . . . . . . . . . . . . .

27

4.3

The Tonal Room . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

4.4

The Semantic Room . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

5 Technical and Physical Basics 5.1

5.2

32

Transducer Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

5.1.1

Electrodynamic Transducer . . . . . . . . . . . . . . . . . . . .

32

5.1.2

Electrostatic Transducer . . . . . . . . . . . . . . . . . . . . . .

34

5.1.3

Electromagnetic Planar Transducer . . . . . . . . . . . . . . . .

35

5.1.4

Piezoelectric Transducer . . . . . . . . . . . . . . . . . . . . . .

36

Loudspeakers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

5.2.1

(Electro-)Dynamic Loudspeaker . . . . . . . . . . . . . . . . . .

38

5.2.2

Electrostatic Loudspeaker . . . . . . . . . . . . . . . . . . . . .

40

i

CONTENTS

5.3

5.4

CONTENTS

5.2.3

Piezoelectric Loudspeaker . . . . . . . . . . . . . . . . . . . . .

41

5.2.4

Characteristics and Features . . . . . . . . . . . . . . . . . . . .

41

Headphones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

46

5.3.1

Construction Forms . . . . . . . . . . . . . . . . . . . . . . . . .

46

5.3.2

Designs by Driver Principle . . . . . . . . . . . . . . . . . . . .

47

5.3.3

Characteristics in a Critical Point of View . . . . . . . . . . . .

52

Wave Field Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

6 Aim, Research Questions and Hypotheses

59

7 Listening Test 1

60

7.1

Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

7.2

Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62

7.3

Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62

7.4

Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63

7.4.1

Wave Field Synthesis System . . . . . . . . . . . . . . . . . . .

64

7.4.2

Tracking System . . . . . . . . . . . . . . . . . . . . . . . . . .

66

7.4.3

Laboratory Setup . . . . . . . . . . . . . . . . . . . . . . . . . .

67

7.5

Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

7.6

Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

7.7

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

70

7.8

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

8 Listening Test 2

75

8.1

Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

8.2

Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

8.3

Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

8.4

Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

8.5

Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

8.6

Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

8.7

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

77

8.8

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

79

9 Results for Artificial Head Measurements 9.1

Measurement of Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . ii

81 81

CONTENTS 9.2

CONTENTS

Measurement of Listening Test . . . . . . . . . . . . . . . . . . . . . . .

82

10 Discussion

85

11 Conclusions

87

Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

Appendix

96

A Experiment 1

97

B Experiment 2

109

C Measurements

120

iii

Chapter 1 Introduction “The whole is greater than the sum of its parts.”1 Imagine that a room will be prepared for any kind of sound reproduction. Usually, questions about the choice of sound or music, the reproduction technique and the room design need to be answered. Depending on the event, people prefer a certain genre or instrumentation of the music. Massive loudspeakers are often required at big concerts whereas small loudspeakers are sufficient in living rooms. If people listen to music in a group they prefer loudspeakers. But if a person wants to enjoy music by him- or herself or is considerate of fellow people, the person rather uses headphones. Another reason for the choice of a reproduction technique is the sound quality, of course. Different purposes require a respective room design so that the room acoustics matches the needs. The intelligibility in speeches need a shorter reverberation time than a rock concert or a concert of a choir. It has to be considered how an empty room sounds and how the acoustics changes if people fill the room. Similar questions need to be considered when research studies are carried out. Many studies are conducted with headphones, e.g. in psychoacoustics234 ,5 often participants also prefer headphones than loudspeakers.6 ”Naturally, the first choice for playback 1

Aristoteles. http://www.goodreads.com/quotes/20103-the-whole-is-greater-than-thesum-of-its-parts[08.11.16]. 2 Fastl, H. “Basics and applications of psychoacoustics”. In: Proceedings of Meetings on Acoustics at ICA 2013, Montr´eal. 2013, pp. 1 –23. 3 Griesinger, D. “The Psychoacoustics of Apparent Source Width, Spaciousness and Envelopment in Performance Spaces”. In: ACUSTICA - acta acoustica 83 (1997), pp. 721 –731. 4 Fastl, H. “Audio-visual interactions in loudness evaluation”. In: Proceedings of the 18th International Congress on Acoustics (ICA). Citeseer. Kyoto, Japan, 2004, pp. 1161 –1166. 5 Blauert, J. and Lindemann, W. “Auditory spaciousness: Some further psychoacoustic analyses”. In: The Journal of the Acoustical Society of America 80.2 (1986), pp. 533 –542. 6 Kallinen, K. and Ravaja, N. “Comparing speakers versus headphones in listening to news from

1

1. INTRODUCTION are headphones, but particularly in the context of virtual or augmented reality a reproduction free of headphones is often requested.”7 When I searched for a topic for my Masterthesis, I was asked why I would not use headphones instead of loudspeakers for my listening test that I had in mind at that time. So, I thought about this suggestion and realized that I wanted to check the use of headphones and loudspeakers critically to better answer the question why to use a reproduction technique instead of another one for a listening test. When I further developed my research topic, I was inspired to also investigate wave field synthesis in order to have another alternative for headphones. This way, I developed the idea of investigating spacioussness with different reproduction techniques as an extension of my Bacherlorthesis in a semantical approach. My Bachelorthesis dealed with the spatial perception of the genres Electro, Ethno, Classic, Jazz and Rock and will be described in 2.2.8 “Spaciousness means that auditory events, in a characteristic way, are themselves perceived as being spread out in an extended region of space.”9 The first study about perceived spaciousness was conducted by Reichhardt in 1966. Reichhardt aimed to find a measuring unit for it and established a scale for psychoacoustical experiments with 14 levels of spaciousness as a result of his research.10 There has been a great demand to get more reliable data on physical and perceptual aspects of auditory spaciounsess in architectural acoustics.11 According to Griesinger, ”In the English language a hall can be spacious, the reverberation of an oboe can be spacious, but the sonic image of an oboe cannot be spacious.”12 ”However the release properties of orchestral bass instruments are often quite different from those of the treble instruments, and this fact will influence our perception of spaciousness with music.”13

a computer–individual differences and psychophysiological responses”. In: Computers in Human Behavior 23.1 (2007), pp. 303 –317. 7 Laumann, K., Theile, G., and Fastl, H. “Binaural Sky - Examination of Different Array Topologies”. In: Proceeding of the NAG/DAGA - Rotterdam. 2009, pp. 1090 –1092, p. 1090. 8 Stirnat, C. “Percepted Spaciousness of different Musical Genres”. Bachelor’s Thesis. Hamburg, 2012. 9 Blauert, J. Spatial Hearing: The Psychophysics of Human Sound Localization. Cambridge, London: MIT Press, 1997, p. 348. 10 Reichhardt, W. and Schmidt, W. “Die h¨orbaren Stufen des Raumeindruckes bei Musik”. In: ACUSTICA 17 (1966), pp. 175 –179. 11 cf. Blauert and Lindemann, see n. 5, p. 533. 12 Griesinger, see n. 3, p. 721. 13 Ibid., p. 728.

2

1. INTRODUCTION Thesis’ Structure The thesis is devided in two parts. The first part gives an overview about the theoretical background for the second part that reports the empirical work. The theoretical background summarizes three studies that are closely related to the study presented in this thesis (chapter 2). It comprises an insight of how we perceive music and why not only our ears are important for the perception (chapter 3). The next chapter (chapter 4) will provide information of various aspects of the auditory room in which spaciousness is part of: the acoustical room, the psychoacoustical room, the tonal room and the semantical room. Following, the theoretical background will give an explanation of the basics of how headphones, loudspeakers and wave field synthesis are set up and work (chapter 5). Fastl reminds of a longstanding engineering approach that only audible improvements to audio communication systems are worth the effort.14 Thus, expressive cost/benefit analysis should rely on psychoacoustics. In order to reach the aims (chapter 6) I conducted two listening tests and artificial dummy head measurements. Listening test 1 tested a new method of evaluating spaciousness by using pictures of spatial representations (chapter 7). Listening test 2 is similar to listening test 1 which involved a different method by asking for ”spacious” directly (chapter 8). A useful tool to make objective measurements in psychoacoustics and other fields is the artificial dummy head. It records sounds binaurally as a listener would hear and provides objective data e.g. in addition to listening tests. Two artificial dummy head measurements which results are presented in chapter 9. The first measurement tested the accuracy of the wave field synthesis system combined with the tracking system. The second measurement are recordings of the listening tests in the participants’ position with shortened stimuli of listening test 2. The results, the advantages and disadvantages of using this virtual headphone approach are discussed in chapter 10. Summarising the most important aspects, the last chapter 11 concludes that the virtual headphone approach using wave field synthesis is an alternative to headphones with some limitations that request further research.

14

Fastl, “Basics and applications of psychoacoustics”, see n. 2, p. 20.

3

Chapter 2 Closely Related Research The following studies are closely related to the listening tests in the empirical part of the thesis.

2.1

Binaural Sky

Laumann, Theile and Fastl developed a virtual headphone using wave field synthesis. ”Loudspeaker systems designed to equal transmission characteristics of headphones are generally termed virtual headphones.”1 Laumann, Theile and Fastl tested a newly designed circular loudspeaker array among other array setups on a ceiling - the Binaural Sky (figure 2.1).23 Motivated by the fact that with common loudspeaker arrays the head-related transfer function (HRTF) changes with head movements of a listener, they worked on a solution that avoids this problem. ”A head-related transfer function (HRTF) is a transfer function that, for a certain angle of incidence, describes the sound transmission from a free field (plane wave) to a point in the ear canal of a human subject.”4 In other words, it ”characterizes how a particular ear (left or right) receives a sound from a point in space“.5 The authors explain a major practical limitation: if 1 Laumann, Theile, and Fastl, “Binaural Sky - Examination of Different Array Topologies”, see n. 7, p. 1090. 2 Laumann, K., Theile, G., and Fastl, H. “A virtual headphones based on wave field synthesis”. In: Proceedings of Acoustics 08 in Paris, France. 2008, pp. 3593 –3597. 3 Laumann, Theile, and Fastl, “Binaural Sky - Examination of Different Array Topologies”, see n. 7. 4 Møller, H. , et al. “Head-Related Transfer Functions of Human Subjects”. In: J. Audio Eng. Soc 43.5 (1995), pp. 300 –321. http://www.aes.org/e-lib/browse.cfm?elib=7949; Laumann, Theile, and Fastl, “A virtual headphones based on wave field synthesis”, see n. 2, p. 3594. 5 Potisk, T. Head-Related Transfer Function. 2015. http://mafija.fmf.uni- lj.si/seminar/ files/2014_2015/Seminar_Ia_Head-Related__Transfer_Function_Tilen_Potisk.pdf[24.09. 16], p. 0.

4

2. Closely Related Research the listener moves or just turns the head, the transfer functions changes.

Figure 2.1:

Left: A dummy head measurement of the Binaural Sky. Right: The loudspeaker array in the listening setup for a typical listening test at IRT. Source: Menzel et al., 2005, p. 3).

Their idea was to create a circular array of loudspeakers above the listener reproducing focussed sound sources based on principles of wave field synthesis. By mounting the array above the listener’s head, the listener was not distracted visually by any speaker. Different from common loudspeaker arrays, these focussed sound sources change their positions simultaniously with the listener. They chose a circular setup to allow a sturdy compensation of head rotations for one listener by means of symmetry. The circular setup consisted of 22 speakers that were placed in 1.0m diameter plus a single driver for low frequencies located in the centre. The distance between the listener and the array was about 40cm. During experiments, a motion tracking system (Polhemus FASTRAK) gathered orientation of the listener and head position. The array setup ”leads to a constant aliasing frequency and greatly reduces audible sound colourations during head rotations”.6 According to Menzel, Wittek, Theile and Fastl, a solid reproduction can be achieved by synthesizing focused sound sources near the listener’s head and using these as binaural loudspeakers.7 In addition, the usage of a head tracking system avoids in-headlocalisation which means that sound is perceived inside the head.8 Dummy head measurements showed that head displacements were tolerable of about 8 to 10cm. 6

Menzel, D. , et al. “The Binaural Sky: A Virtual Headphone for Binaural Room Synthesis”. In: Proceeding to the Tonmeistersymposium Nov. 2005, Hohenkammer, Germany. 2005, pp. 1 –6. https://www.irt.de/fileadmin/media/downloads/Produktion/A_Virtual_Headphone_for_ Binaural_Room.pdf[26.09.16], p. 2. 7 Ibid., p. 2f. 8 cf. Dickreiter, M. , et al. Handbuch der Tonstudiotechnik. (Eds.) medienakademie, A. 7th ed. Vol. 1. M¨ unchen: K.G. Saur Verlag, 2008, p. 177.

5

2. Closely Related Research Ear put signals from dummy head recordings are an objective measure to check the quality of virtual headphones. From similar measurements, Laumann, Theile and Fastl concluded two statements. They measured three loudspeaker configurations with a dummy head in an anechoic chamber. Firstly, the distance to and the configuration of the loudspeaker array have no impact on the quality of virtual headphones based on wave field synthesis. Changes on the positions of the focussed sources to compensate head movements lead to no improvement of the measurement results. Thus, secondly, the head-related transfer functions are less stable regarding head rotations as they expected.

2.2

Perceived Spaciousness in Different Music Genres

In my Bachelorthesis, I investigated the perception of spaciousness of the genres Electro, Ethno, Classic, Jazz and Rock in a semantical approach.9 In a listening test, 13 participants evaluated music excerpts according to twelve adjectives representing spatial characteristics taking place at the Institute of Systematic Musicology, University of Hamburg. The adjectives were big, low, open, infinite, soft, intimate, hollow, wide, rough, artificial, close and narrow. The participants were asked to rate each music excerpts on a scale from 1 (slightly appropriate) to 10 (very appropriate). Additionally, they were adviced to give their judgement on their first impression respectively feeling. The music stimuli comprised 30 music excerpts for each genre and had a length of 1 min. (15 sec. music - 15 sec. silence 15 sec. music - 15 sec. silence). The results of the study have enabled the associations of spatial features for each genre. Classic sounded rather big, wide, open, low and infinite. Electro was judged rather as artificial, big and wide. Jazz was perceived rather open, big and close. Ethno sounded rather big and open. And Rock was perceived rather big, wide, open and low. All genres were rated as big and most of them as wide and/or open which indicates that the music stimuli generally are perceived rather big and partly wide and/or open. I compared these results with three sizes calculated by Bader. Firstly, the interaural cross correlation (IACC), the correlation between the input at two ears (see section 4.1), was used. Secondly, the echodensity is a size for the number of echos in a sec9

Stirnat, “Percepted Spaciousness of different Musical Genres”, see n. 8.

6

2. Closely Related Research ond representing the spatial complexity. And thirdly, the fractal correlation dimension shows the number of tones played simultaniously. The calculations revealed the following mean values over all 30 music excerpts:

IACC Echodensity Fractal correlation dimension IACCClassic = 0, 4 EchoClassic = 239, 94 F rakKorrEthno = 3, 72 IACCJazz = 0, 55 EchoJazz = 244, 05 F rakKorrRock = 3, 84 IACCElectro = 0, 60 EchoRock = 268, 3 F rakKorrClassic = 3, 92 IACCEthno = 0, 76 EchoEhno = 286, 27 F rakKorrJazz = 3, 93 IACCRock = 0, 85 EchoElectro = 408, 27 F rakKorrElectro = 4, 25 Table 2.1:

Mean values over all 30 music excerpts for IACC, echodensity and fractal correlation dimension.

The IACC mean values show that all genres sound more or less ”mono” because the mean values range from IACCClassic = 0, 4 to IACCRock = 0, 85. The echodensity reveals Electro as the genre with the highest mean value EchoElectro = 408, 27 and Classic with the lowest EchoClassic = 239, 94. The fractal correlation dimension shows that Electro contains the most tones played at the same time with F rakKorrElectro = 4, 25 and Ethno the least F rakKorrEthno = 3, 72. The adjectives soft (r = −0.21), hollow (r = 0.23) and rough (r = 0.32) correlated with the IACC-values significantly (p < 0, 01) to a minor degree. Artificial (r = 0.21) correlated with echodensity significantly (p < 0, 01) to a minor degree. Thus, a high echodensity is perceived as artificial. No significant (and linear) correlation excists between big and IACC. The higher the IACC and the more music sounds ”mono”, the less it sounds soft and the more it is perceived as hollow and rough.10

2.3

Audio-Visual Interactions in Loudness Evaluation

Fastl investigated the evaluation of loudness perception of traffic noise when presenting visual stimuli to participants. “Acoustical stimuli were recorded by a half inch condenser microphon or by a dummy head and stored on DAT tape.“11 Edited stimuli were presented via headphones being cautious about setting all sound to their original 10

Stirnat, C. “R¨ aumliche Wahrnehmung der Musikstile Elektro, Ethno, Jazz, Klassik und Rock”. In: Posterpresentation at the Jahrestagung der Deutschen Gesellschaft f¨ ur Musikpsychologie (DGM). 2015. 11 Fastl, “Audio-visual interactions in loudness evaluation”, see n. 4, p. 1.

7

2. Closely Related Research levels. Still pictures were taken by a digital camera and the moving pictures by a video camera. Those visual stimuli were shown either by a head mounted display or by a car simulator using a beamer. Therefore, several experiments were conducted. In the first experiment, Fastl showed that pictures of a train in different colours (figure 2.2) influence the auditory perception of a recorded train passing by. He presented pictures of a train in different colours (white, red, blue and green) while participants heard a recorded train passing by. The picture of the red train was perceived 15% louder than the picture of the green train.

Figure 2.2:

Pictures of a train in white, red, blue and green that were used in Fastl’s study (source: Fastl, 2004, p.

2).

Two groups participated in the next set of experiments which included visual stimuli from still pictures and visual stimuli from moving pictures. The group of still pictures was further divided into subgroups showing still pictures not related to the sound or a picture matching the sound. For example, the sound of a train was combined with pictures of a tree and later with pictures of a street (not related) or with pictures of a train (related). Not related pictures included pictures of a tree in the summer, a tree in the winter, a street with and without electronically added green leaves. Not related pictures revealed a small loudness reduction of 2,5% whereas the pictures with leaves indicate a slight higher reduction which may mean that the green colour has again an influence. This experiment showed a tendency but no information are given about the results being significant or not in this proceeding. The comparison of unrelated and related pictures revealed only very small loudness decrease. The combination of still pictures with related sound and moving pictures with related sound showed a bigger influence on the perceived loudness in another experiment. The same passby audio signals were presented with a video displaying a moving train as 8

2. Closely Related Research well as a still picture of a train. The moving pictures with the sound reduced the perceived loudness by 5%. In comparison, the still picture only had an averaged effect of 1%. According to this result, the impact of the more realistic situation is stronger due to the visual input’s larger audiovisual effects. Sound recording engineers use these effects for sound tracks of movies. Audio signals sound louder than in reality without watching the movie. But as soon as the pictures are added the perceived loudness turns out to be lower. So the loudness of the sounds in the movie is fine. The summarized results show that the added visual input make participants perceive a reduced loudness. The colour green evidently influences the perception to be softer espescially compared to the same picture in red. “By and large it can be stated that the more realistic the situation, the larger the possible loudness reduction induced by visual images for the same acoustic stimuli.”12

12

Fastl, “Audio-visual interactions in loudness evaluation”, see n. 4, p. 6.

9

Chapter 3 Basic Music Perception The general perception process can be explained with a simplified exposition that consists of seven steps and an example from vision to also better understand the perception of music later on. Note that these steps do not necessarily occur in the exact order, some can take place simultaneously or in another order. Firstly, the whole perception process starts with a stimulus from an environment, e.g. a tree. The tree reflects incoming light that reaches an observer’s eye. Secondly, the eye receives the incoming information which is transformed onto the eye’s retina to form a representation of the tree. The result is a picture of the tree on the retina. Thirdly, the energy of the incoming light is transduced into electrical energy. Transduction means the transformation from one form of energy into another form of energy. The transduction is done in sensory receptors that are sensory cells responding to energy exposure. In analogy to audition, the ear transduces sound energy (pressure fluctuations in the air) into electrical signals. The electrical signals are then transferred via the auditory nerve and some more stations in the brain. Fourthly, a complex network out of nerve cells, called neurons, transfer the electrical signals from the receptors to the brain and then further within the brain. The signals are exposed to changes during the neuronal process but still represent the tree. The changes during the transmission and the processing are important for the next part of the perception process, the reaction in behaviour. It is part of the most amazing transformation (according to Goldstein) in the perception process because the electrical signals are transformed into conscious experiences. Then, fifthly, the observer perceives the tree and sixthly, reconizes it. The last step is the action that includes motoric activities. The observer can decide whether he or she walks towards the tree or also climbs up the tree. According to Goldstein, several 10

3. MUSIC PERCEPTION researchers see the action as an important result of the perception process because of its meaning to survive.1 The perception process includes another factor: knowledge. Knowledge consists of all information the observer brings into a situation. It is a central factor because it can influence several steps within the perception process. An example of using longterm knowledge is the ability to categorise objects. An observer can label the tree as ”tree” or other objects as ”bird” or ”leaf”. The effect of information influencing the perception is differentiated between bottom-up and top-down-processing. Bottom-up-processing (or stimulus-controlled processing) is based on incoming stimuli to the receptors. Applying this to the previous example, the observer sees the tree because the picture of the tree on the retina activates the perception process. The picture on the retina corresponds to the incoming stimulus which the bottom-up-processing is based on. Top-down-processing is also called knowledgebased processing. It refers to the processing based on knowledge. If the observer labels the object as ”tree” and maybe even as ”real tree”, he or she takes into account all the knowledge about trees that he or she has got. This kind of knowledge is not always involved in the perception but it is often the case even though we are not aware of it. Goldstein describes an example of an interaction between bottom-up and topdown-processing. A pharmacist tries to read a doctor’s prescription but has difficulties encoding the written words. As soon as the bottom-up-processing evoking the written words on the retina has activated the perception process, the top-down-processing can begin. Then, the pharmacist can use the knowledge about drugs’ names and possibly previous experiences of the doctor’s hand writing to encode the prescription.2

There are three approaches of perception processes that concentrate on different perspectives. Firstly, the cognitive approach addresses the sensory perception of a stimulation that forms an internal representation. Secondly, the embodied cognition approach covers the connection between perception and the resulting action. Thirdly, the ecological approach deals with the information of a stimulation that is discovered in the external world actively and processed. This chapters provides a summary of aspects of the cognitive approach and broadly of the ecological approach. 1

Cf. Goldstein, E. B. Wahrnehmungspsychologie. Der Grundkurs. (Eds.) Gegenfurtner, K. R. 9th ed. Berlin, Heidelberg: Springer, 2015, pp. 1 –13, p. 3ff. 2 Cf. ibid., p. 7ff.

11

3. MUSIC PERCEPTION

3.1

Cognitive approach

The world is surrounded by physical events that can be acoustical or any other kinds of events. A human oberserver registers those events and forms a representation of them in the head. Perception means a long series of processes between both, the physical events and the registration of the events. In these processes an external object or event creates energy that is transmitted through the space from the event to the observer. The observer’s sensory receptors receive and process the energy and send the resulting signals to the brain where more processing is done. The interpretation in the brain is how we experience all events in the external world. The eyes and the ears provide information about what happens in the external world from which the brain is able to form the representation. But they are only the source where the brain can receive the information from because they have a relatively small congruity to the final representation perceived by the observer after further processing within the brain. This chapter shows some general principles of cognition and perception that are similar for audition and vision and that are important for the comprehension of music and music perception.3 Unconscious Inference Formulated by Hermann von Helmholtz, the principle of unconscious interference means that three dimensional objects are influenced by the perceptional process. Perceptual cues from two-dimensional patterns are interpreted unconsciously so that threedimensional objects are perceived. The visual domain is illustrated in figure 3.1 (left) and shows two monsters appearing in a different size althoug they are the same size. The illustration of the tunnel causes this illusion. It is an 2D-pictures giving an 3Dimpression.

In the auditory domain, we are only a little directly aware of reflected sound arriving at us in a surrounding with surfaces that causes echoes and reverberation. But we make unconscious inference using the information from the reflected sound about the sound sources within our surrounding. The reflections provide us a sense of space we are 3

Cf. Shepard, R. “Cognitive Psychology and Music”. In: Music, Cognition, and Computerized Sound. An Introduction to Psychoacoustics. (Eds.) Cook, P. C. Cambridge, MA, USA: MIT Press, 2001, pp. 21 –35, p. 21.

12

3. MUSIC PERCEPTION in with its size and shape. The principle of consciousness uses the intensity ratio of direct to reflected sound and also the time delay between the direct and reflected sound among other cues to figure out the intensity and distance of a source.4 Size and Loudness Constanty Objects in the external world remain of the same size. But as the object moves farther away and closer, the image of the object changes. Size constancy refers to the ability to perceive objects the way they are independent of their distance from us and is shown in figure 3.1 (middle). Loudness constanty constitutes the same principle for the auditory domain. Imagine the case that an instrument produces sound of constant output. If it moves farther away from a listener, the intensity decreases at the listener’s position.5 Spatial and Temporal Inversion A human observer interprets the information of a pattern best if he or she encounters it in its usual form. If a well-known pattern is converted in some way, the observer will interpret this pattern differently even though all other information remain the same. Figure 3.1 (right) shows an example of a simple rotation transformation in the visual domain. A number of the same faces are rotated in different angles which seem like different faces at first. We need some effort to realise that they are upside down. As we are used to see a face with the eyes on the top and the mouth on the bottom we interpret faces according to the standard example.

Temporal reversal in the auditory domain is the analogy of the spatial inversion. When we listen to a sound in a room, we hear the direct sound first and the reverberant sound later. We get a sense of the space by the reflections arriving after the direct sound that depend on the room size. But if a sound is reversed we can not recognize it as easy as in the usual version. In a reversed sound, the sound is manipulated in a way that the reverberation precedes the sound and the sound itself still continues. A good example of a piano and a reversed piano can be listened to online that is provided by Smyth.6 4

Cf. Shepard, see n. 3, p. 23 – 26. Cf. ibid., p. 25. 6 Smyth, T. Temporal Reversal. In: Music 175: Cognitive Psychology and Music. Department of 5

13

3. MUSIC PERCEPTION

Figure 3.1:

Left: Unconsious Inference. The perspective of the of tunnel makes the picture appear in 3D and the monster in the background look bigger than the one in the foreground even though they are the same size. Middle: Size Constanty. The first and the last head from the perceivers point of view are the same size but the last head seems too big. Right: Spatial Inversion. If the page is turned upside down, it will become obvious that the faces in the last row are the same as the faces in the top row (source: Shepard, 2001, p. 23 – 27).

Perceptual Completion In situations in which we are exposed to incomplete information reaching our sensory system, we make use of the principle of perceptual completion. To conclude what happens around us, addtional top-down processing is nessecary to the normal bottomup processing. We have to complete the information to decide for the most probable explanation for the scene going on in the external world that makes sense with the information provided to our senses. This principle is called perceptual completion and is illustrated in figure 3.2. The left figure shows a circled object and one (or two) rectangled object with the circled object covering the rectangled object. Two explanations are possible. The rectangled object could either be a single object that continues under the circled object or it could consist of two seperate, shorter bars of the same look that complement the picture of one bar. The most probable explanation seems to be the first explanation of one single object. The second explanation is illustrated in the middle figure. The right figure shows symmetrical patterns that creates the illusion of a white triangle in the center of the figure although it does not exist. It is difficult not to see the triangle.

In the auditory domain, the principle is found e.g. in periodical sounds interrupted by a short loud burst of broad-band noise. If a periodical sound is broken at several parts so that silent gaps occur, a listener only hears a bunch of short pieces of sound but does not combine them to a single sound. However, if the broad-band noise masks Music, University of California, San Diego (UCSD). 2012. http://musicweb.ucsd.edu/~trsmyth/ cogpsy175/Temporal_Reversal.html[11.10.16].

14

3. MUSIC PERCEPTION the silent gaps to cover them, the ears hear the falling and rising pieces of sound as continuous sound passing through the disturbing noise. Therefore, the noise needs to be loud enough to mask the silent gaps.7 The example is illustrated in figure 3.3.

Figure 3.2: Three examples of the perceptual continuation principle. Left: a circled objects covers a rectangle object and gives the impression of a continuous bar. Middle: A second possible explanation of the impression from the left figure, the rectangle consists of two seperate bars. Right: symmetrical patterns creating the illusion of a white triangle. (source: Shepard, 2001, p. 30f.).

Figure 3.3: Principle of auditory completition. Left: The silent gap is replaced by a broad-band noise B. Right: Interuption of triangle oscillation by bursts of broad-band noise (source: Smyth, 2012, online).

The Gestalt Grouping Principles The Gestalt grouping principles were developed by a group of German psychologists in order to explain why certain elements in vision are connected closely with each other. Five principles give information of how the brain creates mental pattern by forming connections betweens elements of the sensory input.8 In their opinion, these principles are innate because the perceptual organization could be revealed in very young animals and the organizational processes could be tricked by screening so that objects could be perceived as parts of the surroundings.9

7

Cf. Bregman, A. S. Auditory Scene Analysis: The perceptual organization of sound. Cambridge, MA, USA: MIT Press, 1990, p. 28. 8 cf. ibid., p. 19. 9 cf. ibid., p. 39.

15

3. MUSIC PERCEPTION 1. The principle by proximity means that objects being located close together are perceived as parts of the same object. The principle is shown in figure 3.4a, the rectangles being located in a small distance are grouped together pairwise.10 2. The principle by similarity states that objects looking similar or showing a congruent form are grouped together if they have the same distance to each other. In figure 3.4b the grouping occurs pairwise, too. For instance, sounds with a similar timbre are grouped together so that an oboe and a harp are not perceived in the same group although they play the same register.11 3. The principle by symmetry means that objects with the same symmetry are perceived together. It would be improbable for unrelated objects to display a symmetrical relationsship because random, because unrelated objects in the external world are not expected to be symmetrical. Figure 3.4c shows an example of the symmetry principle and also the similarity principle.12 4. The principle by good continuation refers to the regularity of objects. If objects appear together as completing each other or continuing the other object, they tend to be grouped together.13 The principle is illustrated in figure 3.4d. A musical example is the continuation of a melody by another voice. If a flute starts with a melody and a second flute plays the second part of the same melody, both melody parts are perceived as a whole. 5. The principle by common fate labels objects that move together and thus are most likely connected with each other. Because it is highly unprobable that two objects act coherently in a perfect way without being related at all.14 Figure 3.4e displays the principle. In music, the principle by common fate is often found, if several instruments play the have the same voice. If the first violins and the first flute play the same score and the second violins the play the same voice as the second flute in an orchestra or ensemble, the same voices are grouped together instead of the same instruments because they move differently. 10

Cf. Cf. 12 Cf. 13 Cf. 14 Cf. 11

Bregman, see n. 7, p. 20. ibid., p. 19. Shepard, see n. 3, p. 32. ibid., p. 32. ibid., p. 33.

16

3. MUSIC PERCEPTION The principles by proximity, similarity, symmetry and good continuation are mainly used if information are uncomplete or noisy. The principle by common fate in the auditory domain usually consists of amplitude- or frequency modulation. For instance, it includes grouping of partials and harmonics of a sound source because we are able to filter a voice of an instrument from a complex auditory field.15 If the comparison between vision and audition is valid, the spatial dimension of distance in vision has two analogies to audition: the separation in time and the separation in frequency. Then both analogies are distances and should be applied, too.16

Figure 3.4:

Gestalt grouping principles (source: Shepard, 2001, p. 32).

From Bregmann’s perspective, ”the Gestalt principles are seen to be principles of scene analysis that will generally contribute to a correct decompostion of the mixture of effects that reaches our senses”.17 Perceptual grouping of neural spectrogram parts belongs to auditory streams. Bregman defines an auditory stream as a perceived unit that represents a single event. A stream’s purpose is to cluster related features and thus to serve as center of our description of a acoustical event.18 An example is the segregation of a sequence of high and low tones into two streams if the tempo is high enough. The segregration into two streams follows the grouping principle of proximity in this case. Then the high tones are perceived as one stream and the low tones are heard as a second stream because the high frequencies are closer to each other and the low frequencies are nearer.19 Snyder and Alain found that the ability of auditory scene analysis remains in older participants.20 Thus, also older participants are qualified for listening tests if they still 15

Cf. Shepard, see n. 3, p. 32f. Cf. Bregman, see n. 7, p. 19. 17 Ibid., p. 24. 18 Cf. ibid., p. 9f. 19 Cf. ibid., p. 17f. 20 Snyder, J. S. and Alain, C. “Sequential auditory scene analysis is preserved in normal aging adults”. In: Cerebral Cortex 17.3 (2007), pp. 501 –512. 16

17

3. MUSIC PERCEPTION have not suffered from any hearing loss as young participants.

3.2

Ecological Approach

The ecological approach relies on traditional accounts of perception and considers perception to be based on complex events entities in the external world. It is unmediated by memory or inference. Basic perception stimuli are rather characterized by complex consistencies of simple features than correspond to simple physical dimensions e.g. frequency, amplitude, phase and duration. So, complex perceptions focus on complex stimuli instead of the integration of sensations. ”This, according to the ecological account, the study of perception should be aimed at unconvering ecologically relevant dimensions of perception and the invariant perceptual information for them.”21 What we hear and how we hear are Gaver’s central questions. Gaver (1993) describes the example of an approaching car as a good example to understand what we hear and how the ecological approach to audition is meant. Imagine you stand in a road and hear an approaching car. You are likely to register the kind of engine (e.g. large and powerful) the sound is produced with instead of the sound itself. You pay more attention to the car coming closer quickly and maybe to the environment that is a narrow echoic alley. Thus, the sound-producing event and its environment are the perceptional dimensions of interests instead of the attributes of the sound itself. It gives information about an interaction of materials that are located in an environment. For example, it’s not the size, shape or density we hear seperately but more likely a new dimension integrating them. Gaver finds in a study that participants were able to hear sounds according to their category. He distinguishes between vibrating objects, aerodynamic sounds and liquid sounds. No participants mixed up the sounds produced by vibrating objects from liquid sounds, for example, although they sometimes interpreted the sound sources wrongly.22

21

Gaver, W. W. “What in the World Do We Hear?: An Ecological Approach to Auditory Event Perception”. In: Ecological Psychology 5.1 (1993), pp. 1 –29. http://www.cog.brown.edu/courses/ cg195/pdf_files/fall07/Gaver-Whatdowehear.pdf[10.10.16]., p. 5. 22 Cf. ibid., p. 2 – 6, p. 18 – 22.

18

3. MUSIC PERCEPTION

3.3

Interaction between Audition and Vision

The brain processes auditory and visual signals with different speed. Auditory signals can be perceived as two different signals if the time difference is at least two to five milliseconds whereas visual signals need 20 − 30ms to be differentiated as two stimuli. Also the response time is different. It takes approximately 150ms to react on visual stimuli and approximately 110 − 120ms to respond to auditory stimuli.23 On the one hand, auditory dimensions have an impact on each other. Evidence suggests that changes in a particular dimension affects the overall perception of a stimulus and also perceived changes in the other dimensions.24 “Some of the most commonly examined perceptual dimensions include pitch, loudness, perceived spatial location, timing, and timbre.”25 On the other hand, not only dimensions within audition influence each other but also different modalities between. The interaction between audition and vision belongs to multimodal perception in that seperate information form a unified whole in the brain enabling a focussed control of actions.26 McGurk and MacDonald’s study is an example for the impact on the interaction between audition and vision. 103 participants (3 - 40 years) watched a video about a woman speaking directly into the camera. She repeatedly said the syllables ”ga-ga” and ”ba-ba”. After the videos were manipulated by exchanging the sound tracks so that the video track of ”ga-ga” lip movements was combined with the sound track of ”ba-ba” and vice versa, the participants watched these video versions. Surprisingly, most of the participants perceived ”da-da”, a mix of both syllables. But the children had a lower error rate than the adults. When the participants only heard the audio track, they could accurately perceive the original syllaby from the audio. The illusion found in this study is called the ”McGurk effect”.27 Fastl’s research, summarized in chapter 1, is another example of the interaction. It showed that even the colour in the 23

Cf. Schlemmer, M. “Audivisuelle Wahrnehmung: Die Konkurrenz und Erg¨anzungssituation von Auge und Ohr bei zeitlicher und r¨ aumlicher Wahrnehmung”. In: Musikpsychologie (Handbuch der Systematischen Musikwissenschaft). (Eds.) Motte Haber, H. de la. 3rd ed. Laaber: Laaber, 2005, pp. 173 –184, p. 173f. 24 Cf. Neuhoff, J. G. Ecological psychoacoustics. Amsterdam, Netherlands: Elsevier Academic Press, 2004, p. 255. 25 Ibid., p. 249. 26 cf. Ernst, M. O. and Rohde, M. “Multimodale Objektwahrnehmung”. In: Kognitive Neurowissenschaften. (Eds.) Kamath, H.-O. and Thier, P. 3rd updated and expanded version. Berlin: Springer, 2012, pp. 139 –147, p. 140. 27 Cf. McGurk, H. and MacDonald, J. “Hearing lips and seeing voices”. In: Nature 264.30 (1976), pp. 746 –748, p. 746ff.

19

3. MUSIC PERCEPTION visual input can have an impact on the perception of loudness. Saldana and Rosenblum confirmed the McGurk effect in a similar experiment. They also found that nonspeech perception is influenced by visual input significantly. An auditory bowed cello was perceived as a plucked cello when a video of a person playing a plucked cello was shown and vice versa by 13 participants. Interestingly, this effect did not occur when 13 participants saw a video with the words “bowed” and “plucked” in addition to the audio. So participants were not influenced by just showing them the words.28

Multimodal perception can interact in severals ways. Ambiguous information from several senses can be claryfied to empower the perception. For example, an observer is sitting in a train and is waiting for it so start moving. The observer is watching another train and is wondering which train is moving. Only by receiving a second information of the still track or the missing feeling that the own train is starting to move, this ambiguous situation can be claryfied and the observer knows which train is moving. Different information of the same object can also be added to enable a complex impression of the object. For instance, if an observer not only sees an objects but also feels it, he or she gets a better and more complex impression of it than by only watching or feeling it. Different impressions can also cooperate with each other to make a more detailed perception possible. For instance, objects are perceived from a perspective and can be recognized later best if they are perceived from the same perspective. Or impressions can cooperate with each other by completing the perception with complementary information. Integration of multimodal information means the combination of redundant information of an object to a single percept.29 Furthermore, vision often dominates audition. Alais and Burr’s study is an example for an optimal integration of visual and auditory information and for vision often dominating audition. In this study, participants were asked to localize visual blobs and auditory clicks that were either presented simultaneously or one by one with a short break. Resulting, the error rate decreased when both sensory information were integrated. Participants made less mistakes in the localization task when both senses were involved. Also, the visual perception dominated as soon as the visual localization was 28

Cf. Salda˜ na, H. M. and Rosenblum, L. D. “Visual influences on auditory pluck and bow judgments”. In: Perception & Psychophysics 54.3 (1993), pp. 406 –416, p. 406 – 416. 29 Cf. Ernst and Rohde, see n. 26, p. 140ff.

20

3. MUSIC PERCEPTION good.30 Similarly, actors on the screen of a movie theater move their lips whereas the vocal sound is reproduced from speakers that are distributed throughout the room.31 ”The voices sound as though they come from the actor’s mouths, and the unified spatial representation is preserved.”32 Neuhoff reports work in that blindfolded listener could localize sounds better than blind listeners. “This finding suggests that not only do auditory and visual localization skills interact on a sensory level but also a mental (or neural) representation of external space that is based on both auditory and visual experience [. . . ].”33 Maps of auditory space are based on neurons that are tuned to certain auditory localization cues. Not only are auditory experiences important for the development of auditory spatial maps, but also visual experiences are crucial.34

30

Cf. Alais, D. and Burr, D. “The ventriloquist effect results from near-optimal bimodal integration”. In: Current Biology 14.3 (2004), pp. 257 –262, p. 257 – 262. 31 cf. Neuhoff, see n. 24, p. 257. 32 Ibid., p. 257. 33 Ibid., p. 258. 34 Cf. ibid., p. 262.

21

Chapter 4 Spaciousness within the Auditory Room The auditory room is combined by four types of rooms: the acoustical room, the psychoacoustical room, the tonal room and the semantic room. This chapter gives an overview of different aspects of the four rooms that are important for the empirical part of the thesis.

4.1

The Acoustical Room

The acoustical room consists of room acoustical sizes, binaural sizes, sizes of signal processing and the physical, architectonic room. The binaural (=both ears) sizes interaural cross correlation (IACC), spaciousness, apparent source width (ASW) and envelopment are explained further on. Spaciousness is a term for a feeling in which the sound is coming from many different directions.1 It “means that auditory events, in a characteristic way, are themselves perceived as being spread out in an extended region of space.”2 The opposite monophonic impression characterizes the feeling of sound arriving at the listener through a narrow gap.3 Sound fields in concert hall types are preferred when spaciousness is perceived.4 Spaciousness consists of two aspects that are meaningful when listening to music: 1

cf. Blauert and Lindemann, see n. 5, p. 533. Blauert, see n. 9, p. 348. 3 cf. Gade, A. C. “Measure of Spaciousness”. In: Springer Handbook of Acoustics. (Eds.) Rossing, T. D. 2nd ed. Berlin, Heidelberg: Springer-Verlag, 2014. Chap. Acoustics in Halls for Speech and Music, 325f. p. 325. 4 cf. Blauert and Lindemann, see n. 5, p. 533. 2

22

4. SPACIOUSNESS • Apparent source width (ASW) refers to the impression that the sound field is larger than the physical and visual expansion of the source. • Listener envelopment (LEV) describes the feeling of being within the reverberant sound field of the room. ASW and LEV depend both on the direction the impulse response reflections is arriving from. ASW increases when a higher ratio of the early reflection energy comes from sidewards.5 According to de Vries, Hulsebos and Baan, traditional ASW measures revealed large fluctuations on small spatial distances that did not result in perceived changes of the ASW.6 Early reflections include all reflections within the first 80ms after the direct sound reaches the listener. Strong LEV arises from a high degree of late, lateral reflections.7

Figure 4.1:

A signal’s time dependent progress in a room. The direct sound is shown in red, early reflections in green and reverberation in blue (source: Kyriakos, 2010).

Blauert and Lindemann confirmed that spaciousness is mainly caused by early reflections including all their spectral components. In their experiment, ten to fourtheen participants evaluated binaural test signals pairwise in respect of auditory spaciousness and preference. Blauert and Lindemann recorded the test signals with a dummy head in their anechoic chamber to prepare the sound fields to be judged. An arrangement of three loudspeakers was used as playback with adjusted sound material in respect to reverberation and spectral modifications for the recordings. The participants listened through Sennheiser-HD-424 headphones at a level of 70 to 75dB SPL. They also found that spectral components below approximately 3kHz of late reflections created an image extension in the front-back direction. Thus, the spectral components contribute 5

Cf. Gade, see n. 3, p. 325. cf. De Vries, D. “Spatial fluctuations in measures for spaciousness”. In: J. Acoust. Soc. Am. 110.2 (2001), pp. 947 –953, p. 953. 7 cf. Gade, see n. 3, p. 325. 6

23

4. SPACIOUSNESS to the listener’s sensation of spaciousness. In the context of a model of binaural signal processing, a quasiobjective measure could be developed.8 Barron, Marshall and Winckel confirmed the importance of a high ratio of low frequencies in the sound for a spacious sensation.9 Blauert, Barron and Marshall point out that a dependence between the sound level and the perceived spaciousness exists. According to Blauert, the dependence has not been ”investigated quantitatively enough to be included in the index of spaciousness”.10 However, Barron and Marshall introduced a ”very tentative relationship” between the degree of spatial impression and sound level: S.I. = 14, 5(Lf − 0, 05) +

(L − L0 ) , 4, 5

(4.1)

where L means the sound level in dB and L0 the treshold level for the spatial impression, assumably about 80 − 90dB. Lf is the lateral energy fraction, a linear measure of spatial impression, and can be calculated by the formula: P80ms r cosΦ Lf = t=5ms , P80ms t=0ms r

(4.2)

where r is the reflection energy including the direct sound energy and Φ is the angle between the axis of the listener’s ears and the reflection paths. The direction sound energy in the numerator is eliminated because of Φ = 90◦ . If many reflections from all directions arrive uniformly, Lf gets the value of 0, 5.11 If the sound level is not available in dB, it can be calculated by the formula:

Lp = 20log(

p ). p0

(4.3)

p is the measured pressure and p0 = 2 · 10−5 P a 8

[P a =

N ] m2

(4.4)

Cf. Blauert and Lindemann, see n. 5, p. 533 – 541. Barron, M. and Marshall, A. “Spatial Impression due to Early Lateral Reflections in Concert Halls: The Derivation of a Physical Measure”. In: Journal of Sound and Vibration 77.2 (1981), pp. 211 – 232; Winckel, F. “Akustischer und visueller Raum. Mitgestalter in der experimentellen Musik”. In: Experimentelle Musik. Raum Musik. Visuelle Musik. Medien Musik. Wort Musik. Elektronik Musik. Computer Musik. (Eds.) Winckel, F. Berlin: Gebr. Mann Verlag, 1970, pp. 1 –18. 10 Blauert, see n. 9, p. 355. 11 Cf. Barron and Marshall, see n. 9, p. 230f. 9

24

4. SPACIOUSNESS the reference pressure.12 Barron and Marshall conducted a series of experiments to further investigate the spatial impression in concert halls. As a method they used pairs of sound fields (fixed sound fields and variable sound fields) that participants had to compare. They were asked to adjust the size of interest of the variable sound field until they perceived the same spatial impression for both sound fields. The sizes of interest were reflection delay, direction, level and spectrum. Unfortunately, Barron and Marshall recruited only a small number of participants of four to ten participants for their experiments so that the results only indicate possible relationships between the sizes of interest and spatial impression for further investigations. However, their results indicate that the degree of spatial impression is quite constant for reflection delays between 8 and 90ms. Thus, the degree of spatial impression seem to be independent from the reflection delays within the delay time. It seems to be the greatest if the reflections come from the opposite direction of the listener’s ears, an angle of 90◦ . They found a relationship between the degree of spatial impression and the lateral reflection level relative to the direct sound. The lateral energy fraction Lf can be calculated with the formula 4.2. The last result confirmed the importance of low frequency lateral energy.13

As lateral reflections cause the immediate pressure at a listener’s both ears to be different. The interaural cross correlation coefficient (IACC) measures such a dissimilarity of signals that arrive at the ears: R t2

hL (t)hR (t + τ )dt |. R t2 2 t2 2 hL (t)dt t1 hR (t)dt t1

IACCt1 ,t2 = max| qRt1

(4.5)

t1 and t2 determine the time interval of the impulse response in that the correlation is calculated. hL and hR are the impulse response measurements at the entrance of each ear. τ defines the interval within which the maximum of the correlation is searched and ranges from −1ms to 1ms. The chosen time interval nearly covers the time the sound needs to travel from one ear to the other. The IACC will get values between 0 and 1 because the correlation is normalized. Results are usually presented as 1 − IACC and mean an increasing value with increasing dissimilarity. Analogously, it results in an 12 13

cf. Hall, D. E. Musikalische Akustik. Mainz: Schott Music GmbH & Co. KG, 2008, p. 94. Cf. Barron and Marshall, see n. 9, p. 211 – 232.

25

4. SPACIOUSNESS increasing impression of spaciousness.14 A simple example is illustrated in figure 4.2. The left figure shows two identical sinusoidal signals with IACC = 1 that are the least dissimilar. The right figure displays two very dissimilar signals with an IACC nearly 0.

Figure 4.2: Left: Two identical signals of the function f(x) = sin(x). Right: Two very dissimilar signals that consist of a signal f(x) = sin(x) and noise. The IACC measures the ASW if the time interval (t1 , t2 ) is (0ms, 100ms) and the LEV if the time interval (t1 , t2 ) equals (100ms, 1000ms). In general, measurements in halls use data of IACC from the shorter interval.15 Beranek, Hidaka, and Okano (1995) devided the IACC into three octave bands (500Hz, 1000Hz and 2000Hz) and integrated the measured impulse response over the first 80ms after the arrival of the direct sound. They refered to it as IACCE3 and used the size as a measure of spatial impression including the components ASW and LEV in concert halls.16 As Griesinger (1997) explains, common calculations of IACC from an impulse response have different features from IACC measured with music. The average length of music excerpts and the presence of vibrato are the cues the measurement with music depend on. Usually, IACC measurments with music reveal lower values for IACC than correlations with the impulse response. In concert halls measurements, the use of music captures the way a listener hears IACC during a concert and constitutes a basis in perception.17

An echo is defined as a clearly hearable repetition of the direct sound occurrence.18 “The distinctive feature of an echo is that it arrives by an indirect path, either by a 14

Cf. Gade, see n. 3, p. 326. Cf. ibid., p. 326. 16 cf. Hidaka, T., Beranek, L. L., and Okano, T. “Interaural cross-correlation, lateral fraction, and low- and high-frequency sound levels as measures of acoustical quality in concert halls”. In: J. Acoust. Soc. Am. 98.2 (1995), pp. 988 –1007, p. 988. 17 Cf. Griesinger, see n. 3, p. 722. 18 cf. Schmidt, W. and Reichardt, W. Echo. (Eds.) Fasold, W., Kraak, W., and Schirmer, W. 2. Berlin: VEB Verlag Technik, 1984, p. 1201. 15

26

4. SPACIOUSNESS reflection process or through being returned by a receiver.“19 According to this definition echo density is a size which represents the spatial complexity20 that was used in Stirnat (2012). It represents “the number of echoes per second at the output of the reverberator for a single pulse at the input [the ear].”21 This means, partly reflected signals result in a low echo density and fully reflected signals in a high echo density.22 “The rate of echo density increase has to do with factors such as the size and shape of the space and whether it is empty or cluttered with reflecting objects.”23

4.2

The Psychoacoustical Room

Psychoacoustics aims to find relationships between physical properties such as level, frequency spectrum and envelope and the related hearing sensations by subjective judgements.24 The psychoacoustical room includes the binaural brain processing among other things that deals with the process in the chain of perception when listening with both ears. Blauert explains aspects of auditoy information processing of spatial hearing in a schematic model. When sound reaches both ears, the time difference between the arrival of each ear is important in spatial hearing. Why it is important will become clear by understanding three steps of processing that Blauert differentiates: 1. The physical aspects cover the propagation of sounds in the sound field until they enter the auditory system through the external ears. 2. The psychophysical aspects focus on the binaural signal processing in the subcortical auditory system. 3. The psychological aspects include the involvement of the cortex and judgement (cognition). In the first stage, signals from the sound field arrive at the two ears. The external ears superpose linear distortions on the incoming sound signals. The signals’ distortions 19

Morfey, C. L. “Echo”. In: Dictionary of Acoustics. (Eds.) Morfey, C. L. London, San Diego: Academic Press, 2001, p. 128, p. 128. 20 Bader, R. Music and Space. http : / / systmuwi . de / Pdf / Papers / Bader % 20papers / RoomAcoustics/HamburgConcertSpaces/HamburgConcertSpaces.ppt[20.10.16], p. 4. 21 Schroeder, M. R. “Natural Sound Artificial Reverberation”. In: J. Audio Eng. Soc 10.2 (1962), pp. 219 –223, p. 219. 22 cf. Stirnat, “Percepted Spaciousness of different Musical Genres”, see n. 8, p. 7. 23 Abel, J. S. and Patty, H. “A Simple, Robust Measure of Reverberation Echo Density”. In: AES 121th Convention, San Francisco. 2006, pp. 1 –10, p. 1. 24 cf. Fastl, “Basics and applications of psychoacoustics”, see n. 2, p. 2.

27

4. SPACIOUSNESS are distinct for the source distance and the direction of appearance of the sound wave. Then, spatial information is encrypted into the signals which the eardrums receive. The auditory system is assumed to be capable of decrypting this particular information and of applying it while auditory events in the participant’s perceptual space is formed. This knowledge combined with quantitative data for modeling external ears can be used for many applications such as creating auditory events in a participant’s perceptual space. For example, binaural room simulations are able to create auditory perceptions that critical listeners report as being authentic. As soon as the signals reach the ear drums, they start to vibrate and deliver the vibration from the ear bones of the middle ear to the inner ear particularly the cochlea. In the second stage, the received signals in the inner ear are decomposed into spectral components and then transformed into nervous signals in each of the spectral bands. The next step involves the nervous signals being processed physiolocially where also monaural channels and binaural channels are assumed. The signals of the two ears are linked together complexly in the binaural channel. The result of monaural and binaural processing is what Blauert calls a ”binaural activity pattern”, an internal representation of the signals of both ears. The auditory information within the binaural activity pattern refers to the bottom-up processing up to this level.25 The Jefress model and Lindemann extension represent the linking process. Estimates of cross correlation functions are used to analyse binaural patterns in means of interaural difference in arrival time. The model considers a line from each ear to the representation of the auditory event’s lateral position. Each cross-correlation line takes into account a signal’s delay and produces an output at the corresponding position. The output starts a mechanism that inhibits other outputs at other positions. Hereby, monaural outputs can be inhibited by binaural ones but not the other way around. If the interaural cross-correlation equals zero, there are no interaural cues and all the output is limited to monaural pathways. Lindemann’s extension applies a contralateral inhibition mechanism to the model. It consists of a delay line for each ear that starts at one ear and ends right before the other ear. Thus, both delay lines are parallel to each other. The activity at one ear caused by a signal moves along the delay line until it reaches the corresponding position on the delay line and inhibits the activity on the connected position on the other ear’s delay line. For example, a single sound impulse 25

Cf. Blauert, see n. 9, p. 369 – 372.

28

4. SPACIOUSNESS arrives at both ears with different arrival times. The delay lines receive information with different arrival times that start neural activities moving along both delay lines in opposite directions. At some point, the activities from both sides hit each other somewhere on the line and stimulate the respective detector that is connected to both delay lines at a certain position.26 In the last stage, psychological aspects refer to top-down processing as a hypothesisdriven mode. For example, pattern recognition is a hypothesis-driven process. Higher stages of the central nervous system establish a hypothesis for a situation as to what an appropriate sensation could be and test the hypothesis for acceptance or rejection. If the hypothesis is accepted, the corresponding sensation will be evoked. In case of a rejection, a new hypothesis will be set up probably. We know from music perception basics that top-down processing is based on knowledge. Blauert takes mental effects such as attention and expectation into account, too.27 Binaural hearing has several advantages over monaural (one ear) hearing such as an enhanced creation of an auditory space where the positions and the spatial extents of auditory events are represented more detailed and the localization blur is reduced. Sound signals from various sound sources can be distinguished better. This way, the listener’s ability to focus on one source and ignore the other sources (cocktail-party effect) is improved. Perceptual effects of reflected sounds decrease and the powerful role of the first wavefront is supported.28

The precedence effect means that a sound source is localized by the sound information which arrives at a listener first. Usually, the first wavefront is the direct sound that determines the direction we localize a sound from.29 But precedence does automatically mean that all spatial information is suppressed for sounds of the first wavefront generally. Blauert reports experiments results in that the precedence effect can break down quickly if a sound pattern suddenly changes.30 26

Cf. Blauert, see n. 9, p. 393 – 396. Cf. ibid., p. 372. 28 Cf. ibid., p. 393. 29 cff. Hall, see n. 12, p. 338. 30 cf. Blauert, see n. 9, p. 412, p. 419. 27

29

4. SPACIOUSNESS

4.3

The Tonal Room

Music can transfer room experiences that Wellek classifies as a pure mental, phenomenal room.The tonal room is separated into three dimensions. The first dimension refers to pitch as a vertical dimension meaning ”high” and ”low” tones. The words ”high” and ”low” have been used as terms for sounds of different frequencies in most of human languages. The distance of tones creates a space.Also the dynamic up and down (higher and lower) creates a vertical space which effects the sensation more than the width of intervals and chords. The second dimension is the progress in time as a horizontal dimension. It can be found in the notation that is read from left to right. A sequence of higher and lower tones seems to be farther apart in space. But the distances do not only comprise upward and downward movements, they also appear stretched horizontally. A melody has a great extent towards high and low which is spanned in a horizontal progress. It makes time as a room dimension come alive. According to Stumpf, time is a ”pseudo-room” because it is irreversible.31 William Stern defines the psychological dimension as reversibility of directions. Thus, a change of direction in music is impossible. The third dimension refers to depth as the front and back. Wellek considers timbre most likely the third dimension. Timbre can be categorized in warmer and colder ones. Warmer timbres are the closer ones and the colder timbres the distant ones. Strings are closer than woodwinds, bassoon closer than strings and pipes are even more distant. The warm tenor is closer than the colder soprano, the warm cello is closer than the colder violin and the trumpet closer than the horn. Thus, Wellek considers the polyphonic movement in homogenous timbre areal, two dimensional as in the notation whereas the movement in heterogeneous timbre appears spacious, three dimensional.32 ”[...] musical timbres are distinguished in this way mainly by their spectrum envelope.”33 The fractal correlation dimension belongs to the tonal room and represents the spatial complexity of music. It calculates the number of tones occuring at the same time and was used in Stirnat (2012).34 31

Wellek, A. Musikpsychologie und Musik¨asthetik. Grundriss der Systematischen Musikwissenschaft. 3rd ed. Bonn: Bouvier, 1982, p. 310. 32 Cf. ibid., p. 295 – 317, p. 334. 33 Schneider, A. “On concepts of ”tonal space” and the dimensions of sound”. In: (Eds.) Spintge, R. and Droh, R. St. Louis, MO, USA: MMB Music Inc., 1992, pp. 102 –130, p. 110. 34 cf. Stirnat, “Percepted Spaciousness of different Musical Genres”, see n. 8, p. 10f.

30

4. SPACIOUSNESS In comparison to the tonal room, the music room (”Musikraum”) is a pure room of sensations that was first recognized by Ernst Kurth. In the music room, music itself induces sensations and all kinds of musical attributes contribute to space to form a holistic effect.35

4.4

The Semantic Room

In our daily life, we encounter many signs that have a meaning for us. Semantic is defined as the meaning of signs. Signs can be either inconsistent, e.g. noises and gestures, or consistent, e.g. shields and notations. Consistent signs refer to brands in comparison to specific objects that are not identified as signs such as a stone. The use of signs emerges from the desire of communication. The satisfaction of the desire depends on the agreement on the meaning of signs. The meaning of signs is sometimes called ”image” among others, too. A listener gives music a meaning, the music does not have a meaning itself. It can be the role of music, the content of music or its significance to the listener. Semantic is part of semiotics that is the science of signs and sign systems. Signs play an important role in people’s life. According to Schneider, they are ”carriers” of meaning and express the meaning. A relation between the sign and an object that the sign labels is communicated through the meaning. A relevant principle in semiotics is the description of signs. If signs are investigated, they have to be analysed.36 Music genres can be regarded as a semantic room that is characterized by musical regularities, norms, attributes and values. Regularities provide rules about how to complete a phrase. The setup and duration of a song and a 4/4 time in popular music are examples of norms. Loud, piano and also adjectives representing spaciousness belong to musical attributes and values describe the note values. Stirnat showed that it is possible to label music genres with adjectives describing spaciousness. The participants gave the music excerpts of different genres a meaning by evaluating appropriate adjectives. If two genres are specified by the same adjective, they show at least one similarity and cross the border so that they are not in opposition.37 35

Cf. Wellek, see n. 31, p. 318, p. 334. Cf. Schneider, R. Semiotik der Musik. Darstellung und Kritik. M¨ unchen: Wilhelm Fink Verlag, 1980, p. 11 – 32. 37 Cf. Stirnat, “Percepted Spaciousness of different Musical Genres”, see n. 8, p. 11. 36

31

Chapter 5 Technical and Physical Basics 5.1

Transducer Principles

Loudspeakers, headphones and microphones belong to sound transducer principles. There are different possibilities to transduce mechanical energy from the sound field into electric energy and vice versa.1 The principles important for loudspeakers and headphones will be introduced here.

5.1.1

Electrodynamic Transducer

The electrodynamic transducer works with a permanent magnet. The voice coil is not wired around the magnet but stays free and moveable. An electric current generates a force in the voice coil, located in a magnetic field which is produced by the magnet. Also the interaction between the dynamic magnetic field through the voice coil and the permanent magnetic field causes a force which makes the mechanical and acoustical system move. According to Friesecke, a strong magnet system is the most important factor for a big force.2 The voice coil is connected to the diaphragm as a part of the moving system. Both are put into a stiff basket that also fastens the magnet and is commonly made of sheet metal. The diaphragm is connected to the basket with a suspension. The diaphragm and the magnet assembly are crucial for the sound quality of a loudspeaker because both parts have the greatest influence on the sound 1

Dickreiter , et al., see n. 8. Friesecke, A. Die Audio-Enzyklop¨ adie: Ein Nachschlagewerk f¨ ur Tontechniker. M¨ unchen: K.G. Saur Verlag, 2007, p. 437. 2

32

5. TECHNICAL AND PHYSICAL BASICS quality. This kind of transducer is reversible which means that the energy can be converted in both ways: from mechanical energy into electric energy and from electric into mechanical energy. As an advantage it does not produce a rotating magnetic field which would lead to an inhomogeneous field and distortions. If the voice coil stays in the homogeneous magnetic field even when big movements occur, electrodynamic transducers can be produced as sturdy transducers with poor distortion.34 It is shown in figure 5.1.

Figure 5.1:

Electrodynamic transducer (translated and edited; Original source: Goertz, 2008, p. 424).

“By controlling the diaphragm modes and their damping, one can achieve a fairly uniform frequency response and directivity, particularly if one is only concerned with the response in the reverberant sound fields of a room”.5 Nonlinear friction occurs if the voice coil gets too close to the gap walls because the voice coil walls are often roughly produced. Then, it sounds as a buzz and not as harmonic distortion. Depending on the desired frequency range, two types of magnet systems for electrodynamic loudspeakers are used: either (a) a ferrite magnet usually applied in low frequency transducers or (b) a metal magnet usually used for high- frequency transducers (figure 5.2).

The electrodynamic design can be robust, simple, non-dangerous and inexpensive which brings advantageous properties over e.g. piezoelectric and electrostatic designs. 3

Cf. Dickreiter , et al., see n. 8, p. 113f. Cf. Kleiner, M. Acoustics and Audio Technology. 3rd ed. USA: J. Ross Publishing, 2012, p. 310ff. 5 Ibid., p. 312. 4

33

5. TECHNICAL AND PHYSICAL BASICS

Figure 5.2: Two different types of magnet systems for electrodynamic loudspeakers: (a) ferrite magnet regularly used for low frequency transducers and (b) a metal magnet e.g. Neodymium regularly used for high-frequency transducers (source: Kleiner, 2012, p. 316). Those properties combined with sufficient acoustic power output, good frequency response, and good directivity have made the electrodynamic transducer dominate the market.

5.1.2

Electrostatic Transducer

This transducer uses the condenser’s principle with a fixed plate and a moveable diaphragm as a counter electrode.6 Electrostatic transducer are built as a surface radiator.They are used as tweeters in small executions or as full-range loudspeakers in very big solutions with 1 − 2m2 area for the diaphragm. In difficult room acoustical environments someone can take advantage of the big radiating area and strong directivity. The electrostatic transducer’s principle is shown in figure 5.3. Its setup consists of two external, fixed electrodes that are strongly perforated in order to ensure sufficient sound permeability. Between both fixed electrodes the diaphragm is located in the middle as a thin, stretched foil. In comparison to the external electrodes, the diaphragm is biased with a high polarisation direct current voltage (DC voltage). The electrodes are only connected to a relatively small audio signal voltage (max. 10% of the polarisation). The audio signal voltage is also interfered in the DC voltage. As long as the audio signal voltage equals zero, the diaphragm is located in the middle of the two fixed electrodes. A resistance is series-connected to the polarisation voltage and is high-ohm so that the charge Q remains nearly constant. Q = C · U = constant is valid. The voltage U consists of the constant polarisation voltage and the superimposed audio signal. If the charge is constant, a change of the voltage makes the capacity C change as well. Resulting, the diaphragm between the electrodes moves into one or the other 6

cf. Dickreiter , et al., see n. 8, p. 114.

34

5. TECHNICAL AND PHYSICAL BASICS direction (figure 5.3b).7

Figure 5.3: Left: General setup of a electrostatic loudspeaker’s transducer, middle: the interfered signal voltage makes the capacity C change when the charge Q remains constant. Then the electrode in the middle moves rightwards. Right: Setup of an electrostatic loudspeaker (translated figure; original sources: Goertz, 2008, p. 421; Eargle, 2003, p. 136f.).

5.1.3

Electromagnetic Planar Transducer

Fine conducting paths are mounted on the very thin diaphragm foil where power goes through. They have a propulsive power on the diaphragm in the external magnet field. Figure 5.4 shows the basic setup of an one sided magnet setup which is located of the diaphragm’s backside. Modern loudspeakers including neodymium magnets can be realised with double setups in front of and behind the diaphragm. An electromagnetic planar transducer works with propulsive principle as conventional loudspeaker. At this principle, the voice coil moves in an air gap and transfers its propulsive power to the diaphragm. The unwired voice coil is located on the diaphragm’s surface so that the propulsive power affects the diaphragm’s surface equally. This way, partial waves are reduced or out of the audible frequency range. Compared to other common tweeters with voice coil and coil rack, this kind of loudspeaker transducer has an advantage: it shows great impulse properties because of the very low mass of the thin foil. The diaphragm vibrates as a long and lathy stripe. Thus, it radiates a cylindrical wave as high as the diaphragm’s length. In the horizontal level, a concentration of the waves 7

Cf. Goertz, A. “Lautsprecher”. In: Handbuch der Audiotechnik. (Eds.) Weinzierl, S. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 421 –490, p. 428f.

35

5. TECHNICAL AND PHYSICAL BASICS only occurs for higher frequency because of the width.

Figure 5.4: General setup of an electromagnetic planar loudspeaker. The path that is flowed by current is located directly on the diaphragm (translated figure, original source: Goertz, 2008, p. 425).

The Air-Motion-Transformer is another kind of an electromagnetic transducer and is also called “Accelerated Ribbon Technology”. The propulsive principle remains the same, figure 5.5 shows its setup with the diaphragm being folded in lamellar form. The lamellar moves towards each other or away from each other depending on the signal flow and presses air out of the clearance or suck it inside. Because of the lamellar being lower than the aperture slots on the front- respectively backside, an acoustic velocity transformation of the moved air occurs.8

Figure 5.5:

Left: Partial view of an Air-Motion-Transformer by which the folded diaphragm’s lamellar pull together or move apart depending on the signal current. Right: Mechanical setup of an Air-Motion-Transformer. The neodymium magnets are located as supports in the 1. and 3. mountings. The 2. Mounting carries the diaphragm (translated; original source: Goertz, 2008, p. 427).

5.1.4

Piezoelectric Transducer

The piezoelectric transducer takes advantage of the reversibility of the piezoelectric effect. An electric voltage on a crystal’s area causes a deformation of the crystal by applying a potential difference accross the surfaces of the material. Then, the material either expanses or contracts in the size of nanometres which results in many uses in 8

Goertz, see n. 7, p. 425 – 427.

36

5. TECHNICAL AND PHYSICAL BASICS acoustics e.g. the production and detection of sound. As soon as the crystal’s unit cell misses a centre of symmetry, the piezoelectric effect occurs. At this effect, the piezoelectric crystal is characterized by the positive and negative charges being separated by distance which causes the formation of electric dipoles. The crystal remains electrically neutral. “If the dipole can be reversed by an applied external electric field, the material is additionally known as a ferroelectric (in analogy to ferromagnetism)”.9 This way the crystal drives a coupled diaphragm or sound can be radiated directly from the crystal’s surface. But the radiation from the surface requires the crystal’s surface to be big enough.Common constructions chuck an edge of the crystal combination.The other part of the combination is connected to the diaphragm which is realised with a lever mechanism if necessary. The lever mechanism works as a transmitter by changing the small but strong movement amplitudes of the crystal into the essential, higher amplitude of the diaphragm. Still it is difficult to produce high diaphragm amplitudes that are necessary for low frequencies. Thus, especially for loudspeakers, this principle is mostly used for reproducing high frequencies with quite good efficiency.1011 Different transducer principles have some common regularities. Transducers, working with a magnetic field, react on the tempo of the moved diaphragm. In contrast, transducers working with an electric field react on the amplitude of the diaphragm.12

5.2

Loudspeakers

A loudspeaker is an electroacoustic device fort converting an electric signal into an ” acoustic pressure signal retaining the waveform. The loudspeaker’s diaphragm moves the surrounding air driven by a force from an electromechanical transducer.”13 In general, a loudspeaker contains a loudspeaker driver (or motor) and a loudspeaker box. 9

Breazeale, M. A. and McPherson, M. “Piezoelectricity and Transducer”. In: Handbook of Acoustics. (Eds.) Rossing, T. D. 2nd ed. Berlin, Heidelberg: Springer-Verlag, 2014. Chap. Physical Acoustics. Apparatus, pp. 234 –236, p. 235. 10 Cf. Webers, J. Tonstudiotechnik: Analoges und digitales Audio Recording bei Fernsehen, Film und Rundfunk. 8th ed. Poing: Franzis’ Verlag GmbH, 2003, p. 278. 11 Cf. Breazeale and McPherson, see n. 9, p. 234ff. 12 Cf. Dickreiter , et al., see n. 8, p. 115. 13 Kleiner, see n. 4, p. 303.

37

5. TECHNICAL AND PHYSICAL BASICS Often, the radiation from the rear side is prevented by a box, baffle or other construction to avoid an aerodynamic short-circuit (section5.2.4) generated by the diaphragm. The shape and acoustic design of the loudspeaker box contribute to how the loudspeaker driver radiates sound. In addition, there is usually radiation of sound by the vibrations of the box walls. These can be excited both by the loudspeaker driver and the sound field inside the box.14 When the loudspeaker converses electric energy into mechanical energy (sound in the air), the loudspeaker needs to produce a high sound pressure to reproduce an orchestra’s full sound. In order to ensure the highest reproduction quality (High-Fidelity) the following requirements have to be fulfilled: - big frequency range, less linear distortions, - low attack- and release time, high impulse response, - good radiation characteristics, - low non-linear distortions and - high degree of effectiveness as well as high operational safety.15

Loudspeaker can either mean a single loudspeaker system or a combination of several loudspeaker systems in one box. Different driver principles can be used for loudspeakers. Electrodynamic or dynamic loudspeakers, electrostatic, piezoelectric and magnetic loudspeakers are the different types. The most important principle is the (electro-) dynamic loudspeaker. Another way to distinguish loudspeaker systems is the transmission range: wideband loudspeaker, low-frequency loudspeakers, high-frequency loudspeaker and midrange loudspeaker. The type of use is essential for the radiation characteristic, the loudspeaker’s power as well as electrostatic quality.16

5.2.1

(Electro-)Dynamic Loudspeaker

Dynamic loudspeakers have been the most distributed transducer for the reproduction of music and speech for decades. Great sound levels with a broadband can be achieved 14

Kleiner, see n. 4, p. 303. Cf. Webers, see n. 10, p. 268. 16 Cf. Dickreiter , et al., see n. 8, p. 170. 15

38

5. TECHNICAL AND PHYSICAL BASICS quite easy and economical without a high amount of distortions. The dynamic loudspeaker is equipped with an electro dynamical transducer. This kind of loudspeaker can be subdivided into different loudspeakers: Cone loudspeakers, dome loudspeakers and horn loudspeakers.17 The diaphragm conducts eigen vibrations besides the enforced vibrations transmitted by the voice coil. Eigen vibrations not only enlarge nonlinear distortions but also change the frequency response of the loudspeaker by causing thin irruptions or resonances like rising in the course of the curve. This issue is prevented by damping the diaphragm widely. Diaphragms need a certain size because the acoustic power depends on it. Normal loudspeakers of 4W to 10W require a diaphragm diameter of approximately 15 to 30cm for example. In order to ensure a reproduction satisfying the listeners, a widely ranged frequency band from 15Hz to 15000Hz is hardly realised with only one transducer. This is why several systems are used for the reproduction with a high quality. A low-ranged tweeter has a bigger diaphragm with a lower Eigen resonance whereas the tweeter’s diaphragm reveals a smaller size.18 Cone Loudspeaker The cone loudspeaker is named for its conical form of the diaphragm that provides the required stiffness. It can be used as low range and midrange speaker. The radiated sound’s frequency range is limited towards low frequencies by the resonance frequency.19 Figure 5.1 shows an electrodynamical loudspeaker with a cone diaphragm. Dome Loudspeaker Basically, a dome loudspeaker is a cone loudspeaker without a basket and a cone diaphragm. The sound is only radiated by the dome (figure 5.6). The dome with fixed suspension is made from sufficient stiff material with a high damping. Its diameter is mostly smaller than the wave length being transmitted. The radiation happens evenly and in phase in a wide angle spectrum. This is why dome loudspeakers preferably are used as tweeters with a dome diameter of approximately 20 to 25mm. But also it suits 17

Cf. Dickreiter , et al., see n. 8, p. 170ff. Cf. Webers, see n. 10, p. 270ff. 19 Cf. Dickreiter , et al., see n. 8, p. 170f. 18

39

5. TECHNICAL AND PHYSICAL BASICS as a midrange speaker down to 400Hz with a rough diameter of 60mm.20 According to Friesecke dome loudspeakers have a very high stability.21

Figure 5.6:

An electrodynamical dome loudspeaker (translated; original source: Dickreiter et al., 2008, p. 172).

Horn Loudspeaker A horn loudspeaker is a dome loudspeaker with a horn connected to a room whose diameter increases (figure 5.7). This kind of sound radiation is used by e.g. brass instruments and megaphones. Expanded horns are necessary for low frequencies which can not be realised in studios and living rooms. Thus, only tweeters can be used in these rooms. Setting up a horn loudspeaker in an edge of the room improves the reproduction of low frequencies a lot. One version is the horn speaker that has a pressure chamber between the big dome loudspeaker diaphragm and the horn. The pressure chamber increases the level of efficiency so that this loudspeaker version works well for alarms, commandos and announcements. Horn loudspeakers are essential in the professional sound exposure of stadiums and halls with very big areas. Broadband horn are also called “music horns”.

5.2.2

Electrostatic Loudspeaker

An dynamic electric field on the electrical charge generates a force to put a diaphragm into motion. Using an electrostatic transducer 5.3, the loudspeaker functions as a dipole that radiates sound forwards and backwards simultanously. Because of the very 20 21

Cf. Dickreiter , et al., see n. 8, p. 172. cf. Friesecke, see n. 2, p. 439.

40

5. TECHNICAL AND PHYSICAL BASICS

Figure 5.7:

An electrodynamical horn loudspeaker (translated; original source: Dickreiter et al., 2008, p. 178).

light diaphragm, the loudspeaker shows an excellent impulse response. It can also reproduce very high frequencies up to 100kHz. Most of the times it is combined with a low-ranged cone loudspeaker.2223 “An interesting psychoacoustic effect in using full frequency range electrostatic vertical strip loudspeakers is that the sound in listening appears to come from the part of the loudspeaker at the height of one’s head.”24

5.2.3

Piezoelectric Loudspeaker

Piezoelectric loudspeaker can only reproduce a quite small frequency range linearly. Its sound is often describes as moderate. It works with a very high level of efficiency in the Eigen-resonance range so that it can be used well to induce separate frequencies. Piezo transducers are usually built as tweeters in loudspeakers in combination with lowranged transducers. They are also used for many alarm producers and signal beepers.25

5.2.4

Characteristics and Features

The most important acoustical measurement values are the frequency response, the spatial radiation pattern and the distortion values. All measurements should consider the whole frequency range of the specific loudspeaker. The electrical impedance describes the electrical behaviour of a loudspeaker from the amplifier’s point of view. A frequency-dependent impedance results in fluctuations in the frequency response in connection with the source’s internal resistance. Thus, the loudspeaker’s impedance of interest should not demand too much power from the am22

Cf. Cf. 24 Cf. 25 Cf. 23

Dickreiter , et al., see n. 8, p. 174. Kleiner, see n. 4, p. 332f. ibid., p. 335. Friesecke, see n. 2, p. 441.

41

5. TECHNICAL AND PHYSICAL BASICS plifier. But it depends only on the internal resistance how much the impedance has an impact on the frequency response. A low internal resistance has no impact yet. The (amplitude) frequency response shows the sound pressure that is measured in 1m distance using a voltage that produces an output power of 1W at the impedance of the box. Alternately, a different distance can be used that is calculated to the ratio 1W/1m. The curve is made for a 0◦ -axis of the loudspeaker and counts as the most frequently shown curve of a loudspeaker.26

The phase response belongs to the complex frequency response. It describes how much phase delay or shift appears when a signal is transduced. As it is impossible to convert it infinitely fast, sometime pasts until the signal is transduced from one condition to the other condition. If all frequencies have the same constant delay the phase response is linear and the delay is unrecognizable. Deterioration of the sound eventuates only if frequencies have a different delay because the signal’s transient response changes. Thus, a linear phase response should be aimed. Ordinary drops in the (amplitude) frequency response often appear as post-oscillating or delayed building up of mechanical or acoustical resonances whose impact on the sound can be bigger than someone would assume from the frequency response. Especially when the frequency response is supposed to be corrected by electrical filters and even the sound level is supposed to be raised at resonance locations, someone should be cautious.27 Directivity Directivity describes the spatial radiation characteristics of a loudspeaker. Two different radiation angles can be discriminated. The horizontal and the vertical angle can turn out very different. The directivity is often shown in a polar diagram that maps the changes of the sound pressure of certain frequency ranges (thirds or octaves) for all 360◦ -angles. In addition, information about frequency dependence and formation of interferences are meaningful for users. Whereas the directivity of loudspeaker systems primarily represents the covered spatial angle range, the directivity of studio monitors with a clearly defined listening position indicates how much the surrounding space and the acoustical environment of the loudspeaker are included. This way, it is desirable 26 27

Cf. Goertz, see n. 7, p. 469 – 473. Cf. Friesecke, see n. 2, p. 389f.

42

5. TECHNICAL AND PHYSICAL BASICS to suppress unwanted reflexions on the surface of the mixing desk through a narrow vertical radiation pattern.28 Frequency Response A loudspeaker’s frequency response is quite unregularly compared to the frequency response of single electrical transmission members. It is due to the complex interaction between the electrically, mechanically and acoustically working parts of a loudspeaker system including the baffle. Studio loudspeakers have tolerances of the transmission factor of about ±1, 5 to 2dB and home loudspeakers of ±4dB between 100 and 4000Hz. Higher deviations are accepted beyond this frequency range.29 The frequency response is the figure of the transmission factor of sinusoidal input voltages in dependence on their frequency (example in Figure 5.8). The transmission factor T of a transmission member is defined as T = 20lg

U2 , U1

(5.1)

where T is the transmission factor in dB, U1 is the input voltage in V and U2 is the output voltage in V . In most cases the transmission factor is referred to 1000Hz. Then the difference between the transmission factor of the respective measured frequency and the transmission factor of 1000Hz is shown. This way the frequency response referred to 1000Hz is easily readable.30

An amplitude frequency response is desirable for a neutral transduction that is the same amplitude levels for all frequencies. But producers often accept ±3dB or even ±10dB differences. A neutral transducer requires ±1dB or less though.31 But the frequency response alone does not provide the whole information about the radiated frequency because the room acoustics has an even wider influence on the sound field in a room.32 28

Cf. Friesecke, see n. 2, p. 391f. Cf. Dickreiter, M. Handbuch der Tonstudiotechnik. (Eds.) Rundfunktechnik, S. f¨ ur. 2nd ed. M¨ unchen: Verlag Dokumentation Saur KG., 1978, p. 130. 30 Cf. Dickreiter, M. Handbuch der Tonstudiotechnik. (Eds.) Rundfunktechnik, S. f¨ ur. 5th ed. Vol. 1. M¨ unchen: Verlag Dokumentation Saur KG., 1987, p. 352. 31 Cf. Friesecke, see n. 2, p. 389. 32 cf. Dickreiter, Handbuch der Tonstudiotechnik, see n. 30, p. 352. 29

43

5. TECHNICAL AND PHYSICAL BASICS

Figure 5.8: Frequency response of an electrostatic loudspeaker strip. The measurement was made in anechoic conditions with 1, 5m distance to the loudspeaker (source: Kleiner, 2012, p. 334). Distortions (e.g. non-linear distortion) Non-linear distortions appear if frequency components in the reproduced sounds are generated that did not exist in the electrical signal sent to the loudspeaker driver. Different sources can be the reason for non-linear distortions. Nonlinearities in the magnetic field and in the diaphragm’s mechanical system can be a reason for tonal distortion components. “Distortion, such as turbulence noise, is created when air moves at high speed around the voice coil and in acoustic resonator components such as the port opening in bass-reflex loudspeaker boxes.”33 Nonlinear distortions can also occur when the loudspeaker diaphragm is exposed to high force which results in buckling. But there are some ways to reduce non-linear distortions e.g. with balanced designs in the magnetic field and in the diaphragm suspension.34 Acoustic Short Circuit If the loudspeaker diaphragm moves in direction of the arrow as shown in Figure 5.9 an over-pressure zone arises in front of the loudspeaker and a low pressure zone in the back. In case the diaphragm diameter is small compared to the wave length of the sound to be radiated, the zones of overpressure and low pressure balance out over the edge of the loudspeaker. Then no sound emerges anymore. The result is an acoustic short circuit. Low frequencies are not radiated then. In order to avoid an acoustic short circuit a baffle is necessary. The minimum size of a baffle can be calculated with 33 34

Kleiner, see n. 4, p. 330. Cf. ibid., p. 330.

44

5. TECHNICAL AND PHYSICAL BASICS the following formula: d=

c , 4f0

(5.2)

with f0 as the cut-off frequency, c as the speed of sound and d as the shortest distance between loudspeaker and baffle edge.35

Figure 5.9: (a) Acoustical short circuit and (b) its prevention using a baffle (translated; original source: Dickreiter et al., 2008, p. 175).

Level of Efficiency A loudspeaker’s level of efficiency describes the ratio between electrical input power and acoustical output power.36 η=

Wacoustic , Welectric

(5.3)

where Wacoustic is the acoustic output power and Welectric is the electric input power.37 The power loss can be calculated with the level of efficiency and can lead to the loudspeaker’s thermal maximum load.38 Others Usually, the frequency range to be transmitted is divided into two or three loudspeakers for studio loudspeakers and other high-quality loudspeakers. The division reveals several benefits. Loudspeakers can be used that are optimized in their frequency response, level of efficiency and sound focussing to a certain frequency range. Roughness and distortions of intermodulation can be reduced. These transmission errors occur in 35

Cf. Dickreiter , et al., see n. 8, p. 175. cf. Friesecke, see n. 2, p. 453. 37 cf. Kleiner, see n. 4, p. 309. 38 cf. Friesecke, see n. 2, p. 453. 36

45

5. TECHNICAL AND PHYSICAL BASICS a wide-range loudspeaker because high and low frequencies are radiated from the same loudspeaker system at the same time.39

5.3

Headphones Headphones are small personal loudspeakers that radiate sound close to the ear canal opening. The distance between the ear canal opening and the headphone driver is about 5cm or less. Earphones are smaller devices designed to be inserted into the ear canal entrance, typically forming a sealed cavity.40

On the one hand, there is a distinction between different construction forms of headphones. On the other hand, headphones differentiate in their driver principles as loudspeakers do (which are often found in technical data of producers).

5.3.1

Construction Forms

A headphone consists of two electro acoustical transducers that are connected with a variable headband to adjust the headphone to the individual head shape.41 Circumaural Headphones The transducer and the entire ear of circumaural headphones are enclosed towards the environment. Almost no sound can get outside or hardly any disturbing sound can reach the inside of the headphone and ear. The ear is located inside the headphone without any burden.42 High frequencies can be damped up to 40dB and very low frequencies approximately 10dB. Closed headphones are preferred for loud environments e.g. discos or for situations in which no sound may reach the external world e.g. in recording situations. Circumaural designs prevent an acoustic short circuit.43 39

Cf. Dickreiter , et al., see n. 8, p. 176. Kleiner, see n. 4, p. 339. 41 cf. Dickreiter , et al., see n. 8, p. 178. 42 cf. Friesecke, see n. 2, p. 444. 43 Cf. Kleiner, see n. 4, p. 444. 40

46

5. TECHNICAL AND PHYSICAL BASICS Supra-Aural Headphones The transducer of supra-aural headphones is open towards the ear and the environment. This way, the listener is still in contact and feels included to the environment. However, the signal can be heard in the surroundings which can be disturbing. Thus, open headphones are not appropriate for studio recordings because the playback reproduced through the headphones will be also recorded with the microphone.44 According to Dickreiter, frequencies over 5kHz of environment sound is damped approximately 10dB.45 Different from circumaural headphones, supra-aural designs need a more elaborated construction in order to avoid an acoustic short circuit.46 Semi-Supra-Aural Headphones This kind of headphones is a mix of closed and open headphones. The sound can partly go through the headphones which is a good compromise between being isolated and being in the room. Semi-open headphones are inappropriate for recordings, too.47 In-Ear Phones In-Ear phones radiate sound directly into the ear-canal. The transducer is incorporated into silicon- or foam material and is put into the ear canal.48 They are often used in live and stage applications when the invisible sound reproduction of a monitor signal for actors is desired. An issue with In-Ear phones occurs if they are used wrongly. If the sound pressure is too high, damages to the ears can be the consequence.49

5.3.2

Designs by Driver Principle

Headphones are designed with different driver principles as loudspeakers are. Products with electromagnetic and piezoelectric drivers are utilised in telephones and hearing aids and electrodynamic as well as electrostatic driven headphones are employed for high-quality audio sound reproduction.50 44

cf. Kleiner, see n. 4, p. 444. Dickreiter , et al., see n. 8, p. 178. 46 cf. Kleiner, see n. 4, p. 444. 47 cf. ibid., p. 444. 48 Dickreiter , et al., see n. 8, p. 178. 49 cf. Friesecke, see n. 2, p. 445. 50 cf. Kleiner, see n. 4, p. 342. 45

47

5. TECHNICAL AND PHYSICAL BASICS Electromagnetic Headphones This kind of headphones is characterised by high sensitivity, a simple and robust construction (Figure 5.10). A permanent magnet, a small air gap, a magnetically conductive diaphragm and a drive coil to make a modulation of the magnetic field possible are parts of the constructions. The magnetic field strength and the attraction between the magnetic parts fluctuate as soon as the audio voltage is employed to the coil. The diaphragm shifts and changes the air gap width because of its mechanical permissiveness. If the headphone has a tight seal against the ear, the acoustic impedance will be primarily capacitive at frequencies up to approximately 2 kHz. This means that the stiffness and the stiffness of the trapped air will act as two series- coupled springs. This requires the diaphragm stiffness to be low, to generate sufficient sound levels, which, in turn, leads to a requirement for low diaphragm mass to have a reasonably high first diaphragm resonance frequency.51

Figure 5.10:

Setup of a simple electromagnetic headphone (translated; source: Kleiner 2012, p. 344).

The leakage between the seal of the headphone and the ear affects the frequency response for low frequencies. A loose seal results in a poor, low frequency response because the seal’s impedance is higher than the impedance of the captured air at low frequencies. If the frequency is above the first diaphragm resonance frequency, the result is a drop off of the headphone’s frequency response (Figure 5.11). Higher frequencies lead to resonances in the captured air, several acoustic circuits which influence the frequency response. Basically, electromagnetic headphones are nonlinear. A closure 51

Kleiner, see n. 4, p. 344.

48

5. TECHNICAL AND PHYSICAL BASICS is favoured in order to gain a good low-frequency response and to avoid surrounding noise for the best signal-to-noise ratio.52

Figure 5.11:

Electromagnetic insert earphone’s frequency response curves (source: Kleiner, 2012, p. 345).

Electrodynamic Headphones The construction of electrodynamic headphones is almost as robust as it is of electromagnetic designs. This kind of headphones can be found in high-fidelity audio sound reproductions because of its quite symmetrical design which results in low nonlinear distortions. Its setup is basically produced as a “miniature electrodynamic loudspeakers”.53 There are two different, typical kinds: (a) the first kind provides and open back and acts as a dipole and (b) the second kind is built with a closed back and acts as a monopole. Figure 5.12 shows a design of a simple closed headphone. The captured air inside the headphone box and the enclosed air between the diaphragm and the ear flows in series. An almost even frequency response can be achieved with a right chosen frequency response. Additionally, acoustical compensating circuits can be included to get an even better frequency response which makes the frequency range increase by one or two octaves. According to Dickreiter, high sound levels with low distortions could be already achieved with very small electrical power.54 Open back headphones have to make deductions in the low frequency response and sensitivity caused by aerodynamic short circuits. This poor low frequency response and sensitivity can be limited if the headphone is worn tightly at the ear. Additionally, the right choice of diaphragm 52

Cf. Kleiner, see n. 4, p. 343ff. Ibid., p. 346. 54 cf. Dickreiter, Handbuch der Tonstudiotechnik, see n. 29, p. 134. 53

49

5. TECHNICAL AND PHYSICAL BASICS resonance frequency and the use of suitable ear cushions can further improve the result of the frequency response.55

Figure 5.12:

A basic electrodynamic headphone with a closed back (source: Kleiner, 2012, p. 346).

Piezoelectric Headphones As piezoelectric loudspeakers do, piezoelectric headphones use the piezoelectric effect to put the diaphragm into motion. Therefore, piezoelectric ceramics and plastics are usually used instead of piezoelectric crystals. The design can be constructed small, rugged and sensitive and is applied to hearing aids as well as low costs portable devices. The main disadvantage of it is the high electric impedance. Similar to electrodynamic headphones, piezoelectric ones need a tight ear cushion between the headphone and the ear. “If the fundamental resonant frequency of the headphone diaphragm is high enough, the effective frequency response will be determined by the resonances of the ear canal.”56 A piezoelectric headphone setup is shown in Figure 5.13. It consists of a conical diaphragm driven by a piezoelectric ceramic bender. Some modern piezoelectric headphones work with piezoelectric films which are sort of similar to electrostatic headphones. However, they are more similar to headphones with bender designs regarding their mode of operation.57

55

Cf. Kleiner, see n. 4, p. 348f. Ibid., p. 348. 57 Cf. ibid., p. 348f. 56

50

5. TECHNICAL AND PHYSICAL BASICS

Figure 5.13:

A piezoelectric headphone driven by a ceramic bender (source: Kleiner, 2012, p. 348).

Electrostatic Headphones Electrostatic headphones are basically produced as an electrostatic loudspeaker with a relatively large diaphragm and operate on the same principle as the loudspeaker. They achieve the highest possible sound quality reproduction. Figure 5.14 shows an open-back design which is used for most of the applications.

Figure 5.14:

A circumaural headphone with an open-back design and a push-pull electrostatic transducer (source: Kleiner, 2012, p. 349).

The issue in obtaining sufficient loudness that occurs when electrostatic loudspeakers are used is not a serious problem for electrostatic headphones because they are located near the ear. The plastic film between two electrodes has low mass and is set for low frequencies. It is the only part of the mechanical system and is attached between both electrodes for push-pull operations. Open-back designs produce mass-controlled motion of the film. Since the film is characterised by low impedance, the resonances of the pinna and the ear canal remain consistent when the headphone is worn. Kleiner recommends to listen to music in quit environments when using open-back headphones

51

5. TECHNICAL AND PHYSICAL BASICS as sound is mainly not insulated.58

5.3.3

Characteristics in a Critical Point of View

Frequency responses vary a lot in the way of wearing headphones. Wavelengths of audios’ frequency range is mainly much larger than a headphone’s size. The operating distance between head and headphone are crucial for the headphone design and its listener. Figure 5.15 shows the reason for the distance as an important factor. The space leads to a loss on the resulting frequency response of a headphone. Especially low frequencies reveal a large leakage about 10dB with a greater space between the coupler and the phone. Also the contact pressure influences the resulting frequency response (Figure 5.16). Low frequencies show level increase of more than 10dB and frequencies up to 2kHz 10dB for higher pressure.59 It is the same result Dickreiter already wrote 1978. He explained that low frequencies would be reproduced more intense in supraaural headphones when the contact pressure was higher.60

Figure 5.15: Frequency response of a dynamic semi-open headphone with two different distances between coupler and headphone. The distance between the headphone and the ear has a large influence on the frequency response. The solid line displayes the frequency response of a semi-open headphone with little loss and the dashed line displays the frequency response with big loss (Kleiner, 2012, p. 343).

An issue is the in-head localisation when listening through headphones. This means, sound sources being placed between left and right appear inside the head between the left and the right ear. A reasonable judgement about stereo basis and spaciousness is 58

Cf. Kleiner, see n. 4, p. 349f. Cf. ibid., p. 342f. 60 Dickreiter, Handbuch der Tonstudiotechnik, see n. 29, p. 135. 59

52

5. TECHNICAL AND PHYSICAL BASICS

Figure 5.16:

The frequency response of a supra-aural headphone showing the impact of different contact pressures on the frequency response (Kleiner, 2012, p. 343).

almost impossible. Another issue is the missing structure-borne sound.Frequencies below 150Hz above a certain sound level are felt by the whole body. This effect is absent when somebody listens through headphones. But this effect is important psychoacoustically in order to stimulate the vegetative nerve system and to induce particular feelings while listening to higher sound levels. If this feeling misses the sound reproduction seems “boring”, “lax” and “pressureless” according to Friesecke.61

Headphones vs. Loudspeakers Headphones and loudspeakers have different audio requirements. The response characteristics of a room are removed for headphones and earphones because the listener’s room is not a part of the audio chain. The auditive impression arising by listening through loudspeakers in rooms depends on the direct sound, the reflected and the reverberant sound. The sound pressures at the listener’s ears will depend on such things as the loudspeakers, their direct sound frequency response, the response in other directions, the reflection characteristics of the room surfaces, and scattered sound from objects. But the sound pressure will also depend on the reflection of sound by the listener’s body and head.62 61 62

Cf. Friesecke, see n. 2, p. 444. Kleiner, see n. 4, p. 339f.

53

5. TECHNICAL AND PHYSICAL BASICS Kleiner points out that even high-quality headphones show a non-flat frequency response (see figure 5.17). The semi-open headphone and the insert earphone have a quite resonance free behaviour in the response curve. The electrostatic circumaural headphone reveals the impact of the modes inside the cavity made by the headphone.63 But “our hearing is used to these types of resonances”.64

Figure 5.17:

High quality head- and earphones‘ frequency response (source: Kleiner, 2012, p.341).

Eventhough recordings for headphones have to be produced differently than recordings for loudspeakers, ”headphones are generally used to simulate the experience obtained when listening to loudspeakers”.65 So ”it is important to equalize the frequency response of the headphone so that it is similar to that of loudspeakers, at the ear canal entrance”.66 Also the perception of sounds differs between headphones and loudspeakers. Headphones evoke the in-head-localisation which means that sound is perceived inside the head whereas loudspeakers radiate sound that is localised between both loudspeakers in front of the head.67 When listeners hear the same signal over both headphone systems it corresponds with a mono signal or a stereo mid-signal.68 63

Cf. Kleiner, see n. 4, p. 339f. Ibid., p. 340. 65 Ibid., p. 342. 66 Ibid., p. 342. 67 cf. Dickreiter , et al., see n. 8, p. 177. 68 cf. Dickreiter, Handbuch der Tonstudiotechnik, see n. 29, p. 135. 64

54

5. TECHNICAL AND PHYSICAL BASICS

5.4

Wave Field Synthesis

Wave Field Synthesis (WFS) is a principle to create a sound field which is psychoacoustically and physically equivalent to a natural sound.697071 It is capable to generate sound sources located in a great number of places outside and with some limitations inside the listening room. “This is accomplished by driving a loudspeaker array with sound signals whose superposition creates the desired sound field.”72 The principle of superposition means that every tiny volume of a medium is able to transmit many discrete disturbances in various different directions. For example, two waves can strengthen or erase each other depending on their phase.73 WFS has the advantage that there is no sweet spot as in other techniques, e.g. stereo reproduction, have. Thus, there is no point where the reproduction sounds noticeably better. Instead, there is a sweet area, the listening area that is quite large.74 According to Ziemer, Fohl and Dickreiter, the wave field synthesis is based on Huygens’ Principle (figure 5.18). It conveys that “each point of the wave front can be considered the origin of an elementary wave. The superposition of all these elementary waves creates the original wave of the source”.75 “This principle can be translated to mathematical formulae using theories of Kirchhoff and Rayleigh and can then be applied for use with an linear array of loudspeakers”.76 These mathematical theories and physical considerations are well explained by Baalman and Ziemer. In order to apply this principle to a practical system, some modifications have to be made. An array of discrete loudspeakers realise the elementary waves as secondary sources which may result in spatial aliasing effects. Then, audible artefacts occur in the sound (figures 5.19and 5.20). The reconstruction is limited to two dimensions in that the loudspeakers are set 69

cf. Ziemer, T. “Implementation of the Radiation Characteristics of Musical Instruments in Wave Field Synthesis Applications”. PhD thesis. Institute of Systematic Musicology, University of Hamburg, 2014. http://ediss.sub.uni-hamburg.de/volltexte/2016/7939/pdf/Dissertation.pdf, p. 53. 70 cf. Fohl, W. “The wave field synthesis lab at the HAW Hamburg”. In: Sound, Perception, Performance. (Eds.) Bader, R. Switzerland: Springer International Publishing, 2013, pp. 243 –255, p. 243. 71 cf. Dickreiter , et al., see n. 8, p. 304. 72 Fohl, see n. 70, p. 243. 73 cf. Everest, F. A. and Pohlmann, K. C. “Superposition of Sound”. In: Master Handbook of Acoustics. 5th ed. New York, USA: McGraw-Hill, 2009. Chap. Comb-Filter Effects, 135f. p. 135f. 74 Cf. Baalman, M. A. “On Wave Field Synthesis and electro-acoustic music, with a particular focus on the reproduction of arbitrarily shaped sound sources”. PhD thesis. TU Berlin, 2008, p. 11. 75 Fohl, see n. 70, p. 244. 76 Baalman, see n. 74, p. 12.

55

5. TECHNICAL AND PHYSICAL BASICS

Figure 5.18: Left: Illustatrion of (a) Huygen’s principle and its transmission to (b) wave field synthesis. Right: A 3-D illustration of Huygen’s principle (sources: Baalman, 2008, p. 12; Ziemer, 2014, p. 54). up in a plane along the walls of the listening room. Usually, virtual point sources are located behind the loudspeaker array. But also simulations of sources within in the listening room are often desired in extension to Huygens’ principle and are called focused sources. Focused sources can be created by producing a concave, meaning an inside curved, sound field, where all wave fronts are focused to a focal point in front of the loudspeaker array in the listening room. A listener will perceive the sound source as being located in the focal point if the listener’s position is in front of this focal point. If the listener walks between the focal point and the loudspeakers, the person will perceive the sound as being radiated from the loudspeakers. Figure 5.21 shows a two dimensional wave field of a focused source that is located in front of the loudspeaker array. Listeners standing below the focal point will have the impression of a source placed within the room.77 Also truncation errors can occour. Theoretically, it is assumed to use an infinite number of loudspeakers which can not be realised in a real room. Thus, pre- and postechoes can occur for focused sound sources that cause changes in the timbre (figure 5.19).78

Plane waves are necessary to represent sources in indefinite distance. In practical applications they are created to simulate room reflections or very distant sources. In order to avoid unwanted interferences by the walls, floor and ceiling of the listening room when the loudspeakers reproduce sound, the room will have to be furnished with features enhancing the room acoustics e.g. curtains and carpets. The loudspeakers 77 78

Cf. Fohl, see n. 70, p. 244ff. Cf Dickreiter , et al., see n. 8, p. 308.

56

5. TECHNICAL AND PHYSICAL BASICS

Figure 5.19:

A focused sound source without spatial aliasing but affected by a truncation error (source: Ziemer,

2017, p. 28.)

Figure 5.20:

A focused sound source affected by spatial aliasing and truncation errors (source: Ziemer, 2017, p.

27).

Figure 5.21:

A focused source in a two-dimensional wave field positioned in front of the loudspeaker array (source: Bleda et al., 2003, p. 4).

57

5. TECHNICAL AND PHYSICAL BASICS should also have a wide radiation angle in the horizontal plane but a narrow angle in the vertical plane.79 According to Baalman a source signal should be as dry as possible, so recordings only contain the direct sound. Then, it can be reproduced as a point source with WFS. The acoustics of a space can be recorded with a microphone array and used to for virtual sources by adding virtual room reflections. In real-time use, a pre-delay is required that needs to be configured beforehand when focused sources are used. According to Baalman, only point sources and plane waves have been used as source types in a lot of WFS implementations.80 Wave Field Synthesis offers new possibilities for electro-acoustic music, in that it allows more accurate control over movement of sound sources, enables other types of movement than those often seen with circular octaphonic setups (such as movements which use depth and movements through the listening space) and provides and interesting possibility for contrast between the effect of point sources and of plane waves: a source with a fixed position versus a source which moves along when the listener walks through the sound field.81

79

Cf. Fohl, see n. 70, p. 245f. Cf. Baalman, see n. 74, p. 73 – 75. 81 Ibid., p. 71. 80

58

Chapter 6 Aim, Research Questions and Hypotheses The aims of this research are firstly to find out characteristics of the perceived spaciousness in music using wave field synthesis, headphones and loudspeakers, secondly to reveal differences in the perception when participants listen with wave field synthesis, headphones and loudspeakers and thirdly to investigate the possibility of replacing headphones with wave field synthesis.

Therefore, the following research questions will be answered in this work: 1) How do we perceive spaciousness in music with different technical devices? 2) What are the differences in perceived spaciousness when participants listen with headphones, loudspeakers and wave field synthesis? 3) Is it possible to replace headphones with wave field synthesis?

One hypothesis states that music specific characteristics will occure referring to the perceived spaciousness. Another hypothesis is that the headphones, loudspeakers and wave field synthesis will be perceived differently and reveal specific characteristics for the used technical devices.

59

Chapter 7 Listening Test 1 Stirnat (2012) used adjectives for the evaluation of spaciousness. Participants needed to imagine an appropriate room for each adjective. In the preparation process of listening test 1, the idea arised to create a method that simplified ratings of spaciousness by using pictures instead of adjectives. The next section explains this new method.

7.1

Methods

I conducted a listening test at the wave field synthesis laboratory at the Hamburg University of Applied Science (HAW) in August and September 2014. The listening test comprised the evaluation of twelve pictures that represented different spatial characteristics. Participants rated the pictures on a 7-Point-Likert-Scale by estimating how well the pictures suited a music excerpt. I used the adjectives from Stirnat (2012) as an inspiration to find appropriate images of spatial impressions but I did not intend to search for exact representations of the adjectives. Table 7.1 shows the twelve pictures including a description of them. The big concert hall, the wide free field, the living room and the room with moderate light are typical acoustical environments. The open window matches Barron’s statement The sensation of spatial impression corresponds ” to the difference between feeling inside the music and looking at it, as through a window.”1 quite well. The other pictures were chosen for explorative purpose.

1

Barron and Marshall, see n. 9, p. 214.

60

7. LISTENING TEST 1 Picture

”Big”

Description The picture shows a concert hall with a symmetrical picture setup.The concert hall has dark seats that are filled with people in the foreground. The edges consist of ranks.Lusters hang from the ceiling at the top of the picture.The stage is located in the background. The picture is taken from a bird’s eye view. Coaches are placed around a table in a living room. A window array is at the top edge of the picture where light is shining through.

”Low” The picture shows an open window with a view into a field and the sun. Red curtains hang at both sides of the window. ”Open” A shooting star flies through the cosmos and is located in the left side of the picture. ”Infinite” A centered white feather is reflected on the ground with a complete black background. ”Soft” The picture shows a room with moderate, warm light. A bar is located on the right side where two persons sit at. The focus of the picture is in the background. ”Intimacy” A holed tree bole is faced towards the oberserver who can look through the tree bole. A hand touches the internal part of the wood. The background is completely black. The hand gives the picture a haptic feature. ”Hollow” The picture shows a free field landscape including a wide horizon and a moderate cloudy sky. ”Wide”

61

7. LISTENING TEST 1 Picture

Description

The picture displays a rock with a rough surface structure. ”Rough” symmetrical floor with colourful tiles characterizes the picture. It is made from a sloped perspective. The ceiling and the ground are grey. ”Artificial”

”Close”

The picture shows a white, completely empty room. Big notes are located in the foreground on the left side. The picture setup is symmetrical.

The picture displays a narrow floor with white walls. The picture setup is symmetrical. ”narrow” Table 7.1:

7.2

These pictures were used for the first listening and show different acoustical environments and features.

Participants

25 participants took part in this experiment, 23 (12 male, 13 female) of them completed the task correctly being aged between 19 and 60 years (mean age = 31.61 years, standard deviation (SD) = 11.40). 14 of them had experiences in hearing tests. 19 claimed to be normal listeners without any diagnosed hearing impairment. 14 had completed musicial training at least at amateur level. I recruited participants through various call for participants in the institute of systematic and historical musicology, at the institute’s library, at the department of informatics at the HAW and in the wave field synthesis laboratory (see appendix A). I sent calls for participants to virtual platforms for musicology students, to students of a student residence and the HAW’s audio-and-brain working community via mailing list. Furthermore, I asked people personally. My experience of recruiting participants was that most of the participants decided to do the listening test if I asked them personally.

7.3

Stimuli

The listening test contained 30 music excerpts of various anechoic recorded instruments (see Table 7.2) Each music excerpt was approximately 15 seconds and was played 62

7. LISTENING TEST 1 twice for each stimulus with 15 seconds silence (15s music - 15s silence – 15s music – 15s silence). Thus, participants had 1 minute to rate the pictures that coincide with the stimulus length of Stirnat (2012). Two stimuli served as pretest excerpt for the participants to practise the procedure. The stimuli were either self recorded in the anechoic chamber of the Institute of Systematic Musicology or elsewhere recorded in an anechoic environment.23 I asked fellow students and members of the university symphonic orchestra to record music that they were actually playing or practicing at the moment to feel comfortable about the music choice. I also played a couple songs. The recording setup consisted of a AKG C4000B cardioid microfon that was connected to a Focusrite Scarlett2i2 sound card. I used the available sofware Cubase 5.01 to record and save the music without any editing. Appendix experiment 1 A includes a picture of a recording session. Most of the musicians faced the wall while playing. Afterwards, I selected approximately 15s of a recording and chose a whole phrase whenever possible or a sequence with a smooth end. When necessary, I faded 1sec. of the end out using Audacity. I added 15s of silence and duplicated the excerpts. The music excerpts include many kinds of instruments: guitar, banjo, flute, oboe, e-piano, violin, cello, accordion, vocals, trumpet, brass. They were performed either solo, in a duet or in an ensemble.

7.4

Setup

The technical devices used were Beyerdynamic DT100 2x400Ω headphones, two Adam professionell A8X loudspeakers and Fouraudio Wave Field Synthesis Model 28-243 with 26 (loudspeaker) moduls including 26 loudspeakers for each modul. The headphone was a circumaural, electrodynamic model and the loudspeaker an electrodynamic one. The loudspeaker consists of a nearly linear frequency response.4 Measurements of WFS system show a less linear frequency response (see appendix A.2). This section describes the WFS system at the HAW in a scope that covers the setup for the listening tests. For more detailed information see Fohl (2013), Nogalski (2012) and Nogalski (2015). 2 3

Bernsch¨ utz, B. , et al. Anechoic Recordings. 2012. http://www.audiogroup.web.fh-koeln.de. Olufsen, B. Music from Archimedes. 1992.

4

Unfortunately, Beyerdynamics could not provide a measurement of the headphone’s frequency response on request and an independent measurement was not available in the internet.

63

7. LISTENING TEST 1

Name

Instruments

Instrument group

Henning Albrecht - Blues-Improvisation3

E-Piano

Pretest excerpt

Fred Cockerham – Pretty Little Miss Rachmaninof – Vokalise Siyahamba (African song) Unkown Robert Schumann – Romanzen, first movement Robert Schumann – Romanzen, third movement F. Mendelssohn-Bartholdy, arr. R.W. Violin Concerto (arrangement for flute) G. Verdi - La Traviata Purcell – Trumpet Voluntary Stanley Brothers – All the good times are passed and gone Violin etude Tango El Choclo Eduard Lalo – Cello concert, second movement Weber – Theme Martini – Old Gavotte (part 1) Fred Cockerham – June Apple Hava Nagila – Traditional Israeli (part 1) Hava Nagila – Traditional Israeli (part 2) Counting Crows – Mr. Jones Donovan – Universal Soldier Flamenco1 U89 Bang & Olufsen - Guitar, Capriccio Arabe (F. T´ arrega) Martini – Old Gavotte (part 2) Johann Halvorsen – Passacaglia (according to H¨ andel Orgel Suite) Kommando Kant – Mohn & Sonne (part 1) Kommando Kant – Mohn & Sonne (part 2) Henning Albrecht - Blues-Improvisation Henning Albrecht - Blues-Improvisation2 PerNØrgard – Waves (Kongas)

Banjo Voice Voice Clarinet Oboe

Pretest excerpt

Oboe

Windinstrument-One Voice (WOV)

dB-levels of .wav-files; Silence = -84,2dB

-19,5 -22,7 -31,1 -26,2 -32,6

Flute

-23,3

Flute Trumpet Banjo

-20,9 -28,0 -28,0

Violin Violin Cello

-30,1 -29,2 -24,3

Cello Cello Banjo Brass Ensemble Brass Ensemble Voice & E-Piano Voice & E-Piano Guitar Guitar

Stringed instrumentOne Voice (SIOV)

Mixed instrumentsSeveral Voices/ Polyphonic (MISV/P)

-27,3 -26,9 -21,8 -27,1 -31,1 -18,3 -28,4 -22,5 -27,5

Cello duo Violin & Cello

-22,4 -26,7

Accordion

-25,7 -27,0

Accordion Others/ Non-grouped E-Piano E-Piano Percussion

-32,1 -30,2 -17,6

Table 7.2:

The list of music excerpts used as stimuli in the listening test. The stimuli are grouped in terms of similarity and according the results of the factor analysis 1. Similar instruments with one respectively several voices are grouped together. The dB-levels of the .wav-files were checked in Audacity with the contrast analysis according to WCAG 2. The contrast analysis calculates the average sound level of a selected window. Here, the complete music excerpt without silence was selected.

7.4.1

Wave Field Synthesis System

The WFS system at the HAW is constituted of the WFS server, the WFS mac as a control computer, two WFS nodes and audio modules. Each node consists of 13 audio modules with eight channels for each audio modules. In total, the WFS system 64

7. LISTENING TEST 1 has 208 channels. The laboratory works with the open source software WONDER that is constituted of several elements. They are distributed over the WFS system and communicating over LAN. The control computer, an Apple MacPro with OS X operating system, serves as the user interface which the user starts and stops the WFS system with. Ardour and other programs generating or playing back sounds run on the control computer. Also xWONDER, element of the software WONDER, runs on the control computer. xWONDER provides a graphical user interface (GUI) (see Figure 7.1) to manage the virtual sound sources, load and save projects. The WFS server includes a module that executes the central communication. The WFS nodes contain the rendering unit and a jack client that provides an input port for every virtual sound source and an output port for every audio channel.5 “It uses the position data of virtual sound sources supplied by [the central communication module] and fixed position and orientation date for each audio module, to calculate the delay and amplitude for each channel.”6 In total, 128 separate WFS sources can be handled.7

Figure 7.1: xWONDER GUI to control the virtual sound sources. The turquoise rectangle represents the listening area within the WFS loudspeaker modules. The red and blue balls show the virtual sound sources located at a listener who wears a tracker (yellow ball). The menu on the left displays the set ups for each virtual sound source (and the virtual listener). The xWONDER GUI shows the listening area in a turquoise rectangle and each virtual sound source in a coloured ball which position can be changed via drag and drop. The menu on the left displays the set up for each virtual sound source (and the 5 Cf. Nogalski, M. “Acoustic Redirected Walking with Auditory Cues by Means of Wave Field Synthesis”. MA thesis. Hamburg University of Applied Sciences, 2015, p. 25ff. 6 Ibid., p. 27. 7 cf. Fohl, see n. 70, p. 249.

65

7. LISTENING TEST 1 virtual listener ). Each virtual sound source has an ID number and can be labelled as desired. Some effects and the type of sound source can be set. All configurations made in xWONDER are transferred to the central communication module and all changes to the system done by other programs are transmitted from the central communication module to xWONDER.

7.4.2

Tracking System

In addition to the WFS system, the laboratory consists of an ART-tracking system. “Tracking systems capture the position and possibly the orientation of objects within a defined physical space”.8 It uses passive markers that consist of small bullets in different sizes, plates or rings and are covered with a strongly reflecting surface (see Figure 7.2). The area to be tracked is determined by the placement of several infrared cameras that are directed to the inward of the area. These cameras emit infrared light and register the direction strong reflections come from. Comparisons of several cameras registering reflections are interpreted as markers which positions can be calculated then. The ART-tracking system uses the software dtrack which the system is calibrated, started and stopped with. The ART-tracker runs with an image repetition rate of up to 60Hz. The WFS system and the ART tracking system are connected.9

Figure 7.2:

7.4.3

An example for grey marker bullets used for the tracking system.

Laboratory Setup

The listening area of the WFS system within the loudspeaker arrays is roughly 5x6meters. The lower edges of the speakers are just over two meters above the floor. 26 audio 8

Nogalski, “Acoustic Redirected Walking with Auditory Cues by Means of Wave Field Synthesis”, see n. 5, p. 30. 9 Cf. Nogalski, M. Gestengesteuerte Positionierung von Klangquellen einer WellenfeldsyntheseAnlage mit Hilfe eines kamerabasierten 3D-Tracking-Systems. Hamburg, 2012. https : / / users . informatik.haw-hamburg.de/~ubicomp/arbeiten/bachelor/nogalski.pdf, p. 15f.

66

7. LISTENING TEST 1 modules are connected to 208 channels covering 676 single speakers inside the audio modules. The modules are slightly tended downwards and Figure 7.3 shows a model of an audio loudspeaker model.10 Two channels are 10 cm apart from each other, each channel can emit sound in a sound pressure level (SPL) up to 105dB.11

Figure 7.3:

A model of loudspeaker module at the HAW. Two woofers at the top and 24 tweeters at the bottom. Three vertical tweeters belong to one channel, four channels are connected to one woofer (source: Goertz 2007, p. 681).

I used the wave field synthesis including the tracking system for the listening test. In order to find a headphone alike reproduction, I chose the tracking system with the markers as shown in figure 7.2. The participants wore it as displayed in the figure. Two virtual sound sources met the requirements well. I placed them close to the virtual listener as shown in figure 7.1. One virtual sound source was located near the left ear and the second virtual sound source was situated near the right ear. This way, the participants heard the music excerpts close to their head. The loudspeakers and the headphone were connected to the MAC control computer via a Liquid Saffire 56 interface. The loudspeakers were positioned in front of the participants in a stereo setup (see figure 7.4). They formed a triangle with the participants’ position where each loudspeaker had the same distance to the participants’ position. Usually, the speakers are required to be on the listener’s height. Since vision has a strong influence on the auditory perception, I intended to avoid the visual impact as far as possible by setting the speakers’ height closer the loudspeaker modules of the wave field synthesis. The listener’s position was the same for all three conditions. 10

Cf. Nogalski, “Acoustic Redirected Walking with Auditory Cues by Means of Wave Field Synthesis”, see n. 5, p. 32. 11 cf. Fohl, see n. 70, p. 248.

67

7. LISTENING TEST 1

Figure 7.4: The listening test environment. The wfs loudspeaker modules are in the top row and surround the listener. Two loudspeakers stand in front of the listener (the monitors between both loudspeakers is the laboratory’s powerwall) and the listener’s place including the headphone is situated in the front.

7.5

Procedure

At the beginning, I asked each participant to answer some background questions about age, sex, musical background, background with listening tests and hearing impairment (see appendix A). The participants read a short introduction about the experiment including the definition of spaciousness on the evaluation questionnaire (see appendix A). Then I introduced them to the experimental setup and procedure. I encouraged them to ask any questions about the experiment in case something was unclear. The task was to rate twelve pictures for each stimulus on how well each picture suited the music excerpt (from 1=”little appropriate” to 7=”very appropriate”). The experiment was divided into three parts for each listening condition. In the first part the participants listened to the music excerpts with headphones, in the second part the music was presented through the wave field synthesis system using the tracking system. In the third part the participants listened with the stereo loudspeakers.

I

asked the participants to rate 30 music excerpts. The first two excerpts were pre-testing excerpts so that they had the possibility to get used to the task and to the listening condition. The order of the conditions was randomized so that participants equally started with one of the three technical devices. One third of the participants started with headphones, one third began with loudspeakers and one third listened to wave field synthesis first. I set the sound level on an equal level for all technical devices so that they sounded equally loud. I listened to the wave field synthesis, headphone and loudspeakers to 68

7. LISTENING TEST 1 check and compare their sound levels because there was no absolute decibel display. Before the headphone condition started, I had asked the participants if the sound level was fine because I wanted them to feel comfortable. Since the duration of the condition was 30 minutes, I gave them the opportunity to adjust the sound level for them. But only a few participants needed an adjustment. They heard the music excerpts in randomized order as well. I used Matlab for randomizing the order and changed it only if two stimuli of the same instrument were listed one after the other. The order was different for each condition but it remained the same within one condition for every participant. The whole experiment took between 1,5h and 2h. As the experiment was quite long the participants had the opportunity to make a little break between parts. As a little reward they received a little gift for their participation.

7.6

Data Analysis

In order to decrease the number of variables and to decrease the probability of the α-Error the data had to be reduced by summarizing similar stimuli and pictures. I categorized the stimuli by their similarity and sound production and asked two musicologists about their impression. Resulting, I divided the stimuli by their number and kind of musical instrument: wind instruments with one voice, stringed instruments with one voice and mixed instruments with several voices. Table 7.2 lists the stimuli within each instrument group. Five stimuli did not match any useful instrument group and had to be excluded. The categories coincide with Donnadieu’s results. In her study, participants grouped musical stimuli intuitively according to similiar ”physical functioning” in a free classification task.12 Data reduction using factor analysis in SPSS revealed three factor levels for each instrument group of excerpts: (a) wind instruments: RoomStructure, SoundProperty and RoomAmbience, (b) stringed instruments: RoomAmbience, SoundProperty and RoughSurface, (c) mixed instruments: SoundProperty, RoomAmbience and RoughSurface. I chose pictures for one factor according their loading in the (rotated) matrix. The criterias for the pictures belonging to one factor were that the same pictures should 12

Donnadieu, S. “Mental representation of the timbre of complex sounds”. In: Analysis, synthesis, and perception of musical sounds. The sound of music. (Eds.) Beauchamp, J. W. New York: Springer, 2007, pp. 272 –319.

69

7. LISTENING TEST 1 be present in all three conditions as the lowest common thread, the minimum loading accounted for 600 and more (with exceptions of just below 600) and at least two pictures appeared in one factor. The pictures for each factor level within an instrument group are shown in figure 7.5a-c. I chose only the same pictures for all three conditions in order to be able to compare the technical devices. Therefore, I had to separate pictures from the same factor level in the factor matrix in two cases. Two pictures did not match for any factor level and had to be excluded. I labelled the factor levels only for analysis purposes according to their commonalities. Afterwards, I summarized the variables according to the factors by calculating their mean values. For example, I computed the variable WindinstrumentsOneVoice Headphones Roomstructure with the results of the pictures ”big”, ”rough” and ”narrow” of windinstruments one voice stimuli in the headphone condition. I carried out 3x3 ANOVA-repeated measures in order to analyse the data in respect to the hypotheses about the differences between the technical devices and the perceived spatial characteristics. Therefore, I used two factors with three levels each: Technical Devices and Picture Labels and Bonferroni as adjustment for multiple comparison.

7.7

Results

Wind instruments-One Voice: The diagram in figure 7.5a displays that RoomAmbience received the highest ratings (¯ x = 3.71 to x¯ = 3.55), followed by SoundProperty (¯ x = 3.12 to x¯ = 3.03). RoomStructure has the lowest ratings (¯ x = 2.73 to x¯ = 2.54). As the Mauchly test is not significant for TechnicalDevices and PictureLabels sphericity is assumed. The interaction PictureLabels*TechnicalDevices is significant in the Mauchly test with p < .05 meaning sphericity is not assumed but the Greenhouse-Geisser value because ε < .750. ANOVA-repeated measures revealed a significant Within-Subject effect on the factor PictureLabels (F = 8.806, p < .01) but no significant Within-Subject effect between TechnicalDevices (p > .05). PictureLabels*TechnicalDevices showed no significant Within-Subject effect (p > .05), too. Estimated marginal means confirm this result and also reveal the factor levels of PictureLabels showing the effect (figure 7.6. RoomStructure and RoomAmbience were rated significantly different by the 70

7. LISTENING TEST 1

Figure 7.5: Factor levels’ pictures for each instrument group and their mean graphs: (a) windinstruments-one voice, (b) stringed instruments-one voice and (c) mixed instruments several voices. Each factor level is represented by a colour in the mean graphs: RoomStructure (green), SoundProperty (beige), RoomAmbience (red) and RoughSurface (violet)

71

7. LISTENING TEST 1 participants (p < .01) with a mean difference of d = 0.97, which can be seen in figure 7.5a.

Figure 7.6: Estimated marginal means for PictureLabels levels RoomStructure, SoundProperty and RoomAmbience and TechnicalDevices levels Headphones, WFS and Loudspeakers.

Stringed instruments-One Voice The diagram in figure 7.5b shows RoomAmbience with the highest ratings (¯ x = 3.57 to x¯ = 3.40), followed by SoundProperty (¯ x = 3.34 to x¯ = 3.16) and RoughSurface (¯ x = 2.49 to x¯ = 2.22). Here, TechnicalDevices and PictureLabels are not significant (p > .05) in the Mauchly test, so sphericity is assumed. The interaction PictureLabels*TechnicalDevices is significant (p < .05) so that I used the Greenhouse-Geisser value (ε < .75). ANOVA-repeated measures produced a significant Within-Subject effect on the PictureLabels (F = 14.636, p < .01) but as for the wind instruments no significant Within-Subject effect on TechnicalDevices (p > .05) occured. PictureLabels*TechnicalDevices again shows no significant Within-Subject effect (p > .05). Estimated marginal means support these findings (figure 7.7). Additionally, they reveal the factor levels of PictureLabels that were rated significantly different. RoomAmbience and RoughSurface (p < .01, mean difference d = 1.12) as well as SoundProperty and Rough Surface (p = .001, mean difference d = 0.88) suit the music significantly different (see figure 7.5b). Mixed instruments - Several Voices RoomAmbience received the highest ratings (¯ x = 3.86 to x¯ = 3.71) according to the diagram in figure 7.5c. SoundProperty obtained lower ratings (¯ x = 2.97 to x¯ = 2.91) and RoughSurface has the lowest ratings (¯ x = 2.51 to x¯ = 2.26). Mauchly test

72

7. LISTENING TEST 1

Figure 7.7: Estimated marginal means for PictureLabels levels RoomAmbience, SoundProperty and RoughSurface and TechnicalDevices levels Headphones, WFS and Loudspeakers. indicates to assume sphericity for all three variables because p > .05. I found a significant Within-Subject effect on PictureLabels (p < .01) in the ANOVA-repeated measures analysis whereas no significant Within-Subject effect occurred for TechnicalDevices and the interaction TechnicalDevices*PictureLabels (both p > .05). Estimated marginal means support this result and show the significantly different factor levels of PictureLabels (figure 7.8. Hence, SoundProperty and RoomAmbience (p < .01, mean difference d = 0.86) as well as RoomAmbience and RoughSurface (p < .01, mean difference d = 1.40) were rated significantly different (see figure 7.5c).

Figure 7.8: Estimated marginal means for PictureLabels levels SoundProperty, RoomAmbience and RoughSurface and TechnicalDevices levels Headphones, WFS and Loudspeakers.

Participants’ Feedback on the Pictures I asked some participants about their associations and interpretations of the pictures at the end of the listening test. They reported that they associated the picture of the cosmos ”infinite” with ”something extraterrestrial”, ”it absorbs reverberation imme73

7. LISTENING TEST 1 diately” or ”you do not hear anything”.The picture of the holed tree bole ”hollow” ”absorbs reverberation”, meant the ”sound characteristic of wood” or ”something tactile”. The picture of the feather ”soft” was an ”infinite black room” or evoked the association ”soft”. A participant reported that the picture of the living room ”low” ”also aborbs reverberation because of the furniture” and the white room ”close” was an ”empty room”. Another participant had difficulties to work with the picture of the rock ”rough”. A third participant said ”some pictures were difficult to connect with music and thus to evaluate”. I observed that many participants had troubles to evaluate the pictures in respect to the music. The most difficult pictures were the one of the rock ”rough” and the one of the holed tree bole ”hollow”. The pictures of the cosmos ”infinite”, the coloured room ”artificial” and the white room ”close” seemed difficult, too.

7.8

Discussion

Different pictures suit the three instrument groups. RoomAmbience was rated the highest among all factor levels and conditions. All in all the pictures do not suit the music well. This may be because I used recordings from an anechoic environment which made the instruments sound less familiar. The listening condition does not have an impact on the evaluation of the pictures. The results support previous research because perception depends more on the pictures, not on the listening condition. Consequently, the visual input has an impact here again. This kind of method might not suit the purpose of investigating different listening conditions as the attention was focussed on the pictures. More research is necessary to better understand the relationship between music and the evaluation of pictures. I presented the pictures to the participants without any description so that they would do the listening unbiased. The participants could interpret the pictures in their way unless they asked about what is seen on a specific picture. The participants’ feedback indicates that the participants interpreted the pictures in different ways. They had difficulties in evaluating the pictures in respect to music. Thus, participants possibly need a description about pictures to know what the pictures mean.

74

Chapter 8 Listening Test 2 Listening 2 is based on listening test 1 and consists of some changes. Listening 1 revealed results in respect to the appropriateness of pictures for different kinds of music. Clearly, the listening condition was rated significantly similar. Unfortunately, the results left the question whether the similar ratings were due to the influence of the picture or due to a general similar perception. Thus, I conducted a second listening test without pictures.1

8.1

Methods

Listening test 2 is similar to the previous listening test (see chapter ??). Instead of evaluating pictures, participants rated spaciousness on a 7-Likert-Point-Scale by answering how spacious they perceived each music excerpt. The listening test took place in July 2015.

8.2

Participants

I recruited 28 participants for this study, the data of 27 participants (17 male, 10 female) could be used. One participant had to be excluded from the data because a technical problem interferred with one session. The participants’ age ranged from 19 to 62 years (mean age = 30, 59 years, SD = 12.87). 18 participants had experiences in 1

Stirnat, C. “How Important is the Reproduction Technique for the Perception of Spaciousness in Music?” In: The 9th International Conference of Students of Systematic Musicology (SysMus16), Jyv¨ askyl¨ an yliopisto, June 8-10 2016. (Eds.) Burger, B., Bamford, J., and Carlson, E. 2016. http: //urn.fi/URN:ISBN:978-951-39-6708-6[04.11.16].

75

8. LISTENING TEST 2 listening tests and 26 participants stated to have normal listening abilities without any diagnosed hearing impairment. 21 participants had completed musical training at least at amateur level (8 participants at semi-professional and 3 participants at professional level). I recruited participants as in experiment 1 and modified the calls for participants (see appendix B).

8.3

Stimuli

Experiment 2 comprised the same stimuli as experiment 1 except a small difference (see section 7.3). Participants needed less time because they only had to rate one adjective instead of 12 picture. Thus, I shortened the stimuli and presented each music excerpt once with 5 seconds silence (15 sec. music – 5 sec. silence).

8.4

Setup

I used the same setup as in the precious experiment, compare chapter 7.4 for the setup.

8.5

Procedure

The procedure remained the same, too. The participants read the introduction on the questionnaire beside my personal explanation. I encouraged them to ask any questions about the experiment if something was unclear. The participants with each technical devide equally as well. This time, I left the headphone sound level unchanged after I had set the sound level on an equal level for all three conditions. The experiment took only between 30 min. and 45 min. As a little reward they received a little gift for their participation, too.

8.6

Data Analysis

Further data analysis required the stimuli to be reduced to decrease the probability of the α-error as in experiment 1. I summarized the stimuli into the same three groups: wind instruments with one voice, stringed instruments with one voice and mixed instruments with several voices. I carried out 3x3 ANOVA-repeated measures and chose

76

8. LISTENING TEST 2 two factors with three levels each: TechnicalDevices and InstrumentGroups. The analysis included Bonferroni as adjustement for multiple comparison. Additionally, I used the software Audacity to analyse the stimuli’s frequency spectrum and sound level (see appendix B).

8.7

Results

Figure 8.1 shows the overall means and medians for all participants and all stimuli. The participants rated the headphone condition the least spacious (mean value x¯ = 4.08 and median M d = 4.08) followed by the loudspeaker condition (¯ x = 4.19 and M d = 4.43). They perceived the wave field synthesis condition the most spacious (¯ x = 4.49 and Md = 4.54). I checked this tendency for significance and carried out 3x3 ANOVArepeated measures.

Figure 8.1: Left: The overall means show that the wave field synthesis was rated the highest, followed by loudspeakers and headphones. Right: The overall medians reveal the same tendency with similar values, the loudspeakers’ value is slightly smaller than the wave field synthesis’ value. Table 8.1 provides an overview of the mean values and standard deviations of each condition over each instrument group. Boxplots of each listening condition show that the data contains a few statistical outliers (appendix B). Sphericity is assumed because the Mauchly test is not significant for TechnicalDevices and InstrumentGroups. But sphericity is not assumed for the interaction TechnicalDevices*InstrumentGroups because the interaction is significant with p < .05. Thus, the Greenhouse-Geisser value (ε < .75) is considered. The test of Within-Subject effects reveals significant Within-Subject effects for TechnicalDevices (F = 4.541, p < .05), InstrumentGroups (F= 71.281, p < .01) and also for the interaction TechnicalDevices*InstrumentGroups (F = 7.700, p < .01).

77

8. LISTENING TEST 2 Variable Wind instrumends - one voice: Headphone spacious WFS spacious Loudspeakers spacious Stringed instruments - one voice: Headphone spacious WFS spacious Loudspeakers spacious Mixed instruments - several voices: Headphone spacious WFS spacious Loudspeakers spacious

Table 8.1:

Mean value

Standard deviation

N

2.9636 3.7500 3.4213

0.91669 1.07473 1.03145

27 27 27

4.1173 4.5679 4.0593

1.04984 1.01196 0.92516

27 27 27

4.6337 4.8189 4.7490

0.94019 0.83669 0.79444

27 27 27

Mean values and standard deviations of each condition over each instrument group.

Estimated Marginal means support this result. InstrumentGroups reveals a significant effect for all three levels. The mean difference (d = 0.87) between wind instruments-one voice and stringed instruments-one voice is significant (p < .01). The mean difference (d = 1.36) between wind instruments-one voice and mixed instrumentsseveral voices is significant (p < .01), too. Stringed instruments-one voice and mixed instruments-several voices reveal a significant (p < .01) difference of d = 0.49. The listening conditions headphone and wave field synthesis show a significant (p < .05) mean difference of d = 0.47. Figure 8.2 illustrates the interaction between the variables by displaying the estimated marginal means. The variation of the distances between the InstrumentGroups variables for all three conditions indicates an interaction effect for all instrument groups and listening conditions.2 The interaction effect means that the different instrument groups have an impact on the perception of the listening conditions and vice versa.

Figure 8.2:

Estimated marginal means for InstrumentGroups of all three listening conditions of ”spacious”.

2

cf. Janssen, J. and Laatz, W. Statistische Datenanalyse mit SPSS: eine anwendungsorientierte Einf¨ uhrung in das Basissystem und das Modul Exakte Tests. 3rd ed. Berlin, Heidelberg: SpringerVerlag, 2013, p. 360.

78

8. LISTENING TEST 2 Frequency Spectrum and Sound Level I checked the stimuli’s frequency spectrum and averaged sound level in Audacity (see appendix B). Only three of eight wind instrument stimuli showed a somewhat high amount of low frequencies. The stringed instrument and mixed instruments stimuli consisted of a higher amount of low frequencies. The analysis of the stimuli’s sound level groupwise does not match with the low rated wind instruments. The sound levels between the instruments are different for all stimuli and do not reveal a link to the frequency spectra or ratings. But a more detailed analysis showed a tendency between stimuli and sound level. I compared music excerpts of the same instrument pairwise. The music excerpts of a higher sound level have higher mean values if the sound level difference is at least 6dB. For example, the averaged sound level of ”Counting Crows” is 10dB higher than ”Donovan” and ”Counting Crows” received higher mean values for all three listening conditions. Both excerpts were played with piano and voice.

8.8

Discussion

Listening test 2 shows that the technical devices were perceived differently. Unfortunately, statistical outliers occurred for a few stimuli in each condition. Thus, the results have to be interpreted carefully. The overall means and the means of all three instrument groups show a higher rating for the wave field synthesis condition which is confirmed by the ANOVA analysis. Thus, the wave field synthesis condition was perceived the most spacious. The analysis revealed a significant main effect for the instrument groups. On the one hand, the kind of instrument and the number of instruments are a reasonable explanation. On the other hand, the frequency spectrum is consistent with Winckel’s (1970) and Barron and Marshall’s (1981) findings. The mixed instruments with several voices were rated the most spacious because of a higher ratio of low frequencies. Interestingly, I found an interaction effect between the technical devices and the instrument groups. Thus, the technical devices have an impact on the perception of the instrument groups and vice versa that leads to different perception of spaciousness than both factors separately. Furthermore, the averaged sound level shows a tendency that it influences the perceived spaciousness. Music excerpts were rated more spacious if their sound level 79

8. LISTENING TEST 2 was at least 6dB higher than their paired music excerpt. More research about the link between the sound level and spaciousness is necessary though.

80

Chapter 9 Results for Artificial Head Measurements 9.1

Measurement of Accuracy

The purpose of the first artificial head measurement was to test the tracking system of the wave field synthesis system. I used the same setup as the setup for the listening test and a HEADacoustics HSU III.2 dummy head including two condensator microfones MK 250 (No. 8156). The dummy head was located at the listener’s position setting it in nine different positions. The dummy head was faced to the right, to the front and to the left in three different heights (see figure 9.1). The wave field synthesis system played back a sinus tone in eight octave bands 62, 5Hz, 125Hz to 8000Hz with the same sound level for each tone. I produced the sinus tones in Audacity and reproduced them as a .wav-file. The sinus tone for each band was five seconds long and two tones were separated by one second silence to ensure a clear cut between the bands in the analysis (5 sec. sinusoidal tone - 1 sec. silence - ... - 5 sec. sinusoidal tone - 1 sec. silence). I analysed the dummy head recordings with the HEAD acoustics Analyzer Software ArtemiS using the Mark Analyser2. I chose the SQuadriga recordings as a source and level vs. time as an analysis. ”Analyse → calcuate” in the menu started the analysis. The blue diagrams in figure C show the results. I compared the maximum sound level difference for each microphone over all positions. The maximum difference of the left microphone is 3, 93dB[V ] and for the right microphone 3, 99dB[V ]. A remeasurement in dB[SP L] revealed a maximum difference of the left microphone of 0, 54dB[SP L] 81

9. Artificial Head Measurements and a maximum difference of the right microphone of 0, 56dB[SP L]. The results mean that the sound level is not the same within the listening area but the difference remains relatively small.

3,26m

~3,26m

~3,26m

HeightDummy head1 = 0,875m HeightDummy head2 = 0,945m HeightDummy head3 = 1,010m

Figure 9.1: A sketch of the meaurement setup in the laboratory. The small, dark grey boxes represent the loudspeaker modules of the wave field synthesis system. The light grey coloured area is the listening area. The red ball represents the dummy head’s position facing into three different directions (arrows). The triangle shows the distance between the dummy head and the stereo loudspeakers.

9.2

Measurement of Listening Test

The second measurement consisted of a reproduction of the listening test in order to compare the results of the listening test with objective data. I used the same setup 82

9. Artificial Head Measurements

Figure 9.2:

Pictures of the artificial head measurement. Left: the artificial dummy head wears the markers of the trackingsystem. Middle: The artificial head is faced right. right: It is faced left.

as for the listening test (see figure 9.1). The dummy head was faced to the front on the listener’s position. I proceeded as in the listening test and presented the listening conditions in the order: wave field synthesis, loudspeakers and headphone. SQuadriga was located close to the dummy head and connected to a laptop and the dummy head. SQuadriga was set to the highest sensitivity (−36dB(V )) because the lower sensitivity setups recorded the music excerpts (short versions as in listening test 2) too quietly. Thus, I had to ensure that no external noises e.g. footsteps or typing on the keyboard occurred that would have been recorded as well. So I remained the only person in the laboratory and started the recordings near the dummy head. Then, I walked to the computer controlling the wave field synthesis settings and started the music excerpts for the wave field synthesis on Ardour. I stayed there quietly for the whole time to avoid any footsteps or any other noises. I did the same for the loudspeakers condition and the headphones condition. In order to analyse the recordings in Matlab, they had to be converted to separate .wav-files. I calculated the IACC for all the music excerpts. I computed IACCE3 for three octave bands: 500Hz, 1000Hz and 2000Hz. At the beginning, I filtered each channel of each recording into three octave bands using a highpass- and a lowpassfilter. During the filter process, some recordings were affected by clipping which means that frequencies were cut from the original recording. I excluded these recordings and the respective recordings of the other listening conditions from the analysis to avoid a manipulation of the results because of clipping. Thus, recordings of 22 music excerpts remained. Usually, IACCE3 is calculated of the first 80ms of impulse responses. I adapted the calculation to the filtered recordings and divided each recording into 83

9. Artificial Head Measurements 80ms time windows. I used the formula 4.5 of chapter 4 to compute IACCE3 . After I calculated mean values and weighted standard deviations of each recording, I summarized these values for an overall comparison of the listening conditions in SPSS. Table 9.1 shows the overall means and weighted standard deviations of all recordings of each listening condition. Both channels of the headphone recordings are nearly identical because the IACC mean value is nearly 1 (IACCheadphone = 0.9842). The result makes sense because both headphone channels played back the same signal very close to the ears’ microphones. The channels of the wave field synthesis and loudspeakers recordings are quite similar but not identical at all (IACCwf s = 0.7374 and IACCloudspeakers = 0.6622). A difference between the headphone recordings, wave field synthesis recordings and loudspeakers recordings is the impact of the roomacoustics on the wave field synthesis recordings and loudspeakers recordings. Headphone Lower border of standard deviation Mean value Higher border of standard deviation Table 9.1:

Loudspeakers

0.958

Wave Field Synthesis 0.619

0.9842 0.9941

0.7374 0.823

0.6622 0.7549

0.5434

Overall mean values and weighted standard deviations of 22 recorded stimuli.

84

Chapter 10 Discussion Listening test 1 revealed results in respect to the research questions and hypotheses. The first hypothesis stated that music specific characteristics occure in the perceived spaciousness. Different pictures suit the three instrument groups. The pictures represent spatial characteristics. An appropriate picture can represent the impression of a particular music excerpt. Although the ratings are low, the pictures were rated significantly different. The music excerpts of wind instruments, stringed instruments and mixed instruments are slightly characterised by the pictures. Thus, the results confirm the first hypothesis. The second hypothesis suggested that headphone, loudspeakers and wave field synthesis are perceived differently and reveal particular characteristics for the used technical devices. The data analysis showed that no significant Withinsubject effect exists for the technical devices. Consequently, the second hypothesis had to be rejected at first. According to the results, perception depends more on the pictures, not on the listening condition. Listening test 2 provides results in respect to the research questions and hypotheses, too. The instrument groups were perceived significant differently. The mixed instruments with several voices were rated the most spacious. The most reasonable explanation is the higher ratio of low frequencies compared to the other instrument groups. Thus, the results confirm the first hypothesis about music specific characteristics. Listening test 2 revealed a significant difference in the listening condition. The wave field synthesis condition was perceived the most spacious. Consequently, the results confirm the second hypothesis about the technical devices being perceived differently. Additionally, an interaction effect between the instrument groups and the technical devices show that not only each factor influences the perception of spaciousness sep85

8. DISCUSSION arately but also both factors lead to a different perception. The sound level indicates an influence on the perception of spaciousness that needs to be investigated further in future research. The tracking system of wave field synthesis system works quite accurate. The dummy head measurement detected a small sound level difference between positions that should be inaudible theoretically. But participants reported that the sound level was not the same on both sides. The second dummy head measurement revealed that both channels of the headphone played back nearly identical. Also the wave field synthesis and loudspeakers reproduced the music excerpts similarly for both channels. The results coincide with the results of listening test 2 because a lower IACC (but higher ”1-IACC”) mean a higher dissimilarity and an increased impression of spaciousness. So, the headphone condition was perceived the least spacious which is confirmed by the measurement.

86

Chapter 11 Conclusions Both listening tests revealed significant difference in perceived spaciousness between solo wind instruments, solo stringed instruments and mixed instruments. On the one hand, the pictures were not rated differently over the listening conditions in listening test 1. On the other hand, a significant difference occured between wave field synthesis and headphones in listening test 2. Interestingly, the results of listening test 2 revealed an interaction effect between the instrument groups and the technical devices. Pictures suited the music excerpts of the instrument groups differently because the instrument groups evoke different impressions in participants. The overall low ratings of the pictures result from difficulties that the participants experienced in the listening test. Thus, the pictures did not simplify the evaluation as intended. The similar ratings over the listening conditions are the consequence of focussed attention on the pictures and show an impact of visual input on the auditory perception. The listening tests supports previous research about the impact and dominance of vision on audition. Saldana’s and Rosenblum’s findings support the results of both listening tests by discovering no influence of words on the auditory perception in contrast to videos with a cellist playing.1 The contribution of a high ratio of low frequencies to a greater perception of spaciousness coincide with Winckel’s, Barron’s and Marshall’s findings.23 The sound level showed a tendency of having an impact to the perception of spaciousness, too. Thus, it supports Blauert’s, Barron’s and Marshall’s results but more research on the link 1

Salda˜ na and Rosenblum, see n. 28. Winckel, see n. 9. 3 Barron and Marshall, see n. 9. 2

87

10. CONCLUSIONS between sound level and spaciousness is necessary.4 Additionally, wave field synthesis is perceived more spacious compared to headphones because the room acoustics of the laboratory is included in the overall perception. As participants wore a circumaural headphone, the room acoustics of the laboratory was excluded. The IACC values support this difference. Perception means the processing of all available information. The processing forms a representation of auditory and visual input. The interaction effect between the instrument groups and the technical devices demonstrates that perception includes the combination of complex information. Concluding, the study confirmed both hypotheses by conduction two listening tests and artificial head measurements. The wave field synthesis including the tracking system proved to be a good alternative to headphones and can replace them. It is another option to the binaural sky approach.

4

Blauert, see n. 9.

88

Acknowledgement I would like to thank my advisors Prof. Rolf Bader and Prof. Clemens W¨ollner. Thank you to Prof. Wolfgang Fohl who gave his permission to use the wave field synthesis laboratory. Thank you Malte Nogalski for inducting me to the wave field synthesis system and support during the conduction of the listening tests. I would also like to thank Tim Ziemer, Jesper Hohagen, Michael Blaß, Henning Albrecht and Orie Takada for their valuable support in various ways. Thank you Simon Linke and Dennis Walkusch for writing the Matlab-codes with me. Thank you to Marc Thompson, an advisor during my Erasmus time in Jyv¨askyl¨a, Finland, who gave feedback to me on my first ideas. I would also like to thank all musicians and participants for taking their time to record music or participate in the listening tests. Finally, thank you to my family and friends for supporting me mentally and being there for me.

89

References Abel, J. S. and Patty, H. “A Simple, Robust Measure of Reverberation Echo Density”. In: AES 121th Convention, San Francisco. 2006, pp. 1 –10. Alais, D. and Burr, D. “The ventriloquist effect results from near-optimal bimodal integration”. In: Current Biology 14.3 (2004), pp. 257 –262. Aristoteles. http://www.goodreads.com/quotes/20103-the-whole-is-greaterthan-the-sum-of-its-parts[08.11.16]. ”Artificial”. http : / / www . forschungsexpedition . de / generator / wj2009 / de / Bilder/Ausstellungszug/Innendesign_20des_20Wagens__nat_C3_BCrlich_ 20k_C3_BCnstlich, property%3DBigImage, slc%3Dwj2009_2Fde, cc%3D1984. jpg[24.10.16]. Audio, Test. Adam Audio A8X. ”Ehrlicher Analytiker”. http://www.adam-audio. com/de/pro-audio/news/review-a8x-unheard-clarity[08.11.16]. Baalman, M. A. “On Wave Field Synthesis and electro-acoustic music, with a particular focus on the reproduction of arbitrarily shaped sound sources”. PhD thesis. TU Berlin, 2008. Bader, R. Music and Space. http://systmuwi.de/Pdf/Papers/Bader%20papers/ RoomAcoustics / HamburgConcertSpaces / HamburgConcertSpaces . ppt[20 . 10 . 16]. Barron, M. and Marshall, A. “Spatial Impression due to Early Lateral Reflections in Concert Halls: The Derivation of a Physical Measure”. In: Journal of Sound and Vibration 77.2 (1981), pp. 211 –232. Bernsch¨ utz, B., Woirgard, M., Stade, P., and Amankwor, J. Anechoic Recordings. 2012. http://www.audiogroup.web.fh-koeln.de. ”Big”. http://1.bp.blogspot.com/- x_BtfjlOfEA/T- BzfBi4b0I/AAAAAAAADEo/ dhdJOMmAVYg/s640/boston-symphony-hall.jpg[24.10.16].

90

Blauert, J. Spatial Hearing: The Psychophysics of Human Sound Localization. Cambridge, London: MIT Press, 1997. Blauert, J. and Lindemann, W. “Auditory spaciousness: Some further psychoacoustic analyses”. In: The Journal of the Acoustical Society of America 80.2 (1986), pp. 533 –542. Bleda, S., L´opez, J. J., and Pueo, B. “Software for the simulation, Performance Analysis and Real-Time Implementation of Wave Field Synthesis Systems for 3D-Audio”. In: Proc. of the 6th Int. Conference on Digital Audio Effects (DAFX-03), September 8-11. London, UK, 2003, pp. 1 –6. Breazeale, M. A. and McPherson, M. “Piezoelectricity and Transducer”. In: Handbook of Acoustics. (Eds.) T. D. Rossing. 2nd ed. Berlin, Heidelberg: Springer-Verlag, 2014. Chap. Physical Acoustics. Apparatus, pp. 234 –236. Bregman, A. S. Auditory Scene Analysis: The perceptual organization of sound. Cambridge, MA, USA: MIT Press, 1990. ”Close”. http://www.photoshop-cafe.de/bildupload/pics/sonst/1279803005_ links.jpg[24.10.16]. De Vries, D. “Spatial fluctuations in measures for spaciousness”. In: J. Acoust. Soc. Am. 110.2 (2001), pp. 947 –953. Dickreiter, M. Handbuch der Tonstudiotechnik. (Eds.) S. f¨ ur Rundfunktechnik. 2nd ed. M¨ unchen: Verlag Dokumentation Saur KG., 1978. Dickreiter, M. Handbuch der Tonstudiotechnik. (Eds.) S. f¨ ur Rundfunktechnik. 5th ed. Vol. 1. M¨ unchen: Verlag Dokumentation Saur KG., 1987. Dickreiter, M., Dittel, V., Hoeg, W., and W¨ohr, M. Handbuch der Tonstudiotechnik. (Eds.) A. medienakademie. 7th ed. Vol. 1. M¨ unchen: K.G. Saur Verlag, 2008. Donnadieu, S. “Mental representation of the timbre of complex sounds”. In: Analysis, synthesis, and perception of musical sounds. The sound of music. (Eds.) J. W. Beauchamp. New York: Springer, 2007, pp. 272 –319. Ernst, M. O. and Rohde, M. “Multimodale Objektwahrnehmung”. In: Kognitive Neurowissenschaften. (Eds.) H.-O. Kamath and P. Thier. 3rd updated and expanded version. Berlin: Springer, 2012, pp. 139 –147. Everest, F. A. and Pohlmann, K. C. “Superposition of Sound”. In: Master Handbook of Acoustics. 5th ed. New York, USA: McGraw-Hill, 2009. Chap. Comb-Filter Effects, 135f. 91

Fastl, H. “Audio-visual interactions in loudness evaluation”. In: Proceedings of the 18th International Congress on Acoustics (ICA). Citeseer. Kyoto, Japan, 2004, pp. 1161 –1166. Fastl, H. “Basics and applications of psychoacoustics”. In: Proceedings of Meetings on Acoustics at ICA 2013, Montr´eal. 2013, pp. 1 –23. Fohl, W. “The wave field synthesis lab at the HAW Hamburg”. In: Sound, Perception, Performance. (Eds.) R. Bader. Switzerland: Springer International Publishing, 2013, pp. 243 –255. Four Audio, GmbH. WFS HAW - Messungen 17.09.2013. Tech. rep. 2013. Friesecke, A. Die Audio-Enzyklop¨adie: Ein Nachschlagewerk f¨ ur Tontechniker. M¨ unchen: K.G. Saur Verlag, 2007. Gade, A. C. “Measure of Spaciousness”. In: Springer Handbook of Acoustics. (Eds.) T. D. Rossing. 2nd ed. Berlin, Heidelberg: Springer-Verlag, 2014. Chap. Acoustics in Halls for Speech and Music, 325f. Gaver, W. W. “What in the World Do We Hear?: An Ecological Approach to Auditory Event Perception”. In: Ecological Psychology 5.1 (1993), pp. 1 –29. http://www. cog . brown . edu / courses / cg195 / pdf _ files / fall07 / Gaver - Whatdowehear . pdf[10.10.16].. Goertz, A. “Lautsprecher”. In: Handbuch der Audiotechnik. (Eds.) S. Weinzierl. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 421 –490. Goldstein, E. B. Wahrnehmungspsychologie. Der Grundkurs. (Eds.) K. R. Gegenfurtner. 9th ed. Berlin, Heidelberg: Springer, 2015, pp. 1 –13. Griesinger, D. “The Psychoacoustics of Apparent Source Width, Spaciousness and Envelopment in Performance Spaces”. In: ACUSTICA - acta acoustica 83 (1997), pp. 721 –731. Hall, D. E. Musikalische Akustik. Mainz: Schott Music GmbH & Co. KG, 2008. Hidaka, T., Beranek, L. L., and Okano, T. “Interaural cross-correlation, lateral fraction, and low- and high-frequency sound levels as measures of acoustical quality in concert halls”. In: J. Acoust. Soc. Am. 98.2 (1995), pp. 988 –1007. ”Hollow”. http://papageien- blog.de/wp- content/uploads/2013/06/Stamm5. jpg[24.10.16]. ”Infinite”. http://bilder.t-online.de/b/66/69/80/22/id_66698022/610/tid_ da/komet-ison-auf-seinem-weg-durchs-weltall.jpg[24.10.16]. 92

”Intimacy”. http : / / styledd . de / wp - content / gallery / test / 2009 - 09 - im wohnzimmer-008.jpg[24.10.16]. Janssen, J. and Laatz, W. Statistische Datenanalyse mit SPSS: eine anwendungsorientierte Einf¨ uhrung in das Basissystem und das Modul Exakte Tests. 3rd ed. Berlin, Heidelberg: Springer-Verlag, 2013. Kallinen, K. and Ravaja, N. “Comparing speakers versus headphones in listening to news from a computer–individual differences and psychophysiological responses”. In: Computers in Human Behavior 23.1 (2007), pp. 303 –317. Kleiner, M. Acoustics and Audio Technology. 3rd ed. USA: J. Ross Publishing, 2012. Kyriakos. Room Acoustics & Reverberation Time RT(60) Back. 2010. http://www. aca.gr/index/forums/fen/hiend2?row=528[19.10.16].. Laumann, K., Theile, G., and Fastl, H. “A virtual headphones based on wave field synthesis”. In: Proceedings of Acoustics 08 in Paris, France. 2008, pp. 3593 –3597. Laumann, K., Theile, G., and Fastl, H. “Binaural Sky - Examination of Different Array Topologies”. In: Proceeding of the NAG/DAGA - Rotterdam. 2009, pp. 1090 –1092. ”Low”. http://bilder.t-online.de/b/66/69/80/22/id_66698022/610/tid_da/ komet-ison-auf-seinem-weg-durchs-weltall.jpg[24.10.16]. McGurk, H. and MacDonald, J. “Hearing lips and seeing voices”. In: Nature 264.30 (1976), pp. 746 –748. Menzel, D., Wittek, H., Theile, G., and Fastl, H. “The Binaural Sky: A Virtual Headphone for Binaural Room Synthesis”. In: Proceeding to the Tonmeistersymposium Nov. 2005, Hohenkammer, Germany. 2005, pp. 1 –6. https : / / www . irt . de / fileadmin / media / downloads / Produktion / A _ Virtual _ Headphone _ for _ Binaural_Room.pdf[26.09.16]. Møller, H., Sørensen, M. F., Hammershøi, D., and Jensen, C. B. “Head-Related Transfer Functions of Human Subjects”. In: J. Audio Eng. Soc 43.5 (1995), pp. 300 –321. http://www.aes.org/e-lib/browse.cfm?elib=7949. Morfey, C. L. “Echo”. In: Dictionary of Acoustics. (Eds.) C. L. Morfey. London, San Diego: Academic Press, 2001, p. 128. ”Narrow”. https://www.holidaycheck.de/m/flur/77ca6953-b6cc-3e2d-8a67604b54e2563e[24.10.16]. Neuhoff, J. G. Ecological psychoacoustics. Amsterdam, Netherlands: Elsevier Academic Press, 2004. 93

Nogalski, M. “Acoustic Redirected Walking with Auditory Cues by Means of Wave Field Synthesis”. MA thesis. Hamburg University of Applied Sciences, 2015. Nogalski, M. Gestengesteuerte Positionierung von Klangquellen einer WellenfeldsyntheseAnlage mit Hilfe eines kamerabasierten 3D-Tracking-Systems.

Hamburg, 2012.

https://users.informatik.haw-hamburg.de/ ~ubicomp/arbeiten/bachelor/ nogalski.pdf. Olufsen, B. Music from Archimedes. 1992. ”Open”. http://storageblog.yellostrom.de/media/2013/01/Fenster- offen. jpg[24.10.16]. Potisk, T. Head-Related Transfer Function. 2015. http://mafija.fmf.uni-lj.si/ seminar/files/2014_2015/Seminar_Ia_Head-Related__Transfer_Function_ Tilen_Potisk.pdf[24.09.16]. Reichhardt, W. and Schmidt, W.

“Die h¨orbaren Stufen des Raumeindruckes bei

Musik”. In: ACUSTICA 17 (1966), pp. 175 –179. ”Rough”. https://www.berleburger.com/images-de08/decoelast/Oberflaechen/ Finish4.jpg[04.08.16]. Salda˜ na, H. M. and Rosenblum, L. D. “Visual influences on auditory pluck and bow judgments”. In: Perception & Psychophysics 54.3 (1993), pp. 406 –416. Schlemmer, M. “Audivisuelle Wahrnehmung: Die Konkurrenz und Erg¨anzungssituation von Auge und Ohr bei zeitlicher und r¨aumlicher Wahrnehmung”. In: Musikpsychologie (Handbuch der Systematischen Musikwissenschaft). (Eds.) H. de la Motte Haber. 3rd ed. Laaber: Laaber, 2005, pp. 173 –184. Schmidt, W. and Reichardt, W. Echo. (Eds.) W. Fasold, W. Kraak, and W. Schirmer. 2. Berlin: VEB Verlag Technik, 1984. Schneider, A. “On concepts of ”tonal space” and the dimensions of sound”. In: (Eds.) R. Spintge and R. Droh. St. Louis, MO, USA: MMB Music Inc., 1992, pp. 102 –130. Schneider, R. Semiotik der Musik. Darstellung und Kritik. M¨ unchen: Wilhelm Fink Verlag, 1980. Schroeder, M. R. “Natural Sound Artificial Reverberation”. In: J. Audio Eng. Soc 10.2 (1962), pp. 219 –223.

94

Shepard, R. “Cognitive Psychology and Music”. In: Music, Cognition, and Computerized Sound. An Introduction to Psychoacoustics. (Eds.) P. C. Cook. Cambridge, MA, USA: MIT Press, 2001, pp. 21 –35. Smyth, T. Audiory Perception Completion I. In: Music 175: Cognitive Psychology and Music. Department of Music, University of California, San Diego (UCSD). 2012. http : / / musicweb . ucsd . edu / ~trsmyth / cogpsy175 / Auditory _ Perception _ Complet.html[19.10.16]. Smyth, T. Audiory Perception Completion II. In: Music 175: Cognitive Psychology and Music. Department of Music, University of California, San Diego (UCSD). 2012. http://musicweb.ucsd.edu/ ~trsmyth/cogpsy175/Auditory_Completion_II. html[19.10.16].. Smyth, T. Temporal Reversal. In: Music 175: Cognitive Psychology and Music. Department of Music, University of California, San Diego (UCSD). 2012. http : //musicweb.ucsd.edu/~trsmyth/cogpsy175/Temporal_Reversal.html[11.10. 16]. Snyder, J. S. and Alain, C. “Sequential auditory scene analysis is preserved in normal aging adults”. In: Cerebral Cortex 17.3 (2007), pp. 501 –512. ”Soft”. http://images.fotocommunity.de/bilder/aufnahmetechniken/naturtabletop/weich-39ce2387-2524-4c04-8d37-36226a34aaa0.jpg[24.10.16]. Stirnat, C. “How Important is the Reproduction Technique for the Perception of Spaciousness in Music?” In: The 9th International Conference of Students of Systematic Musicology (SysMus16), Jyv¨askyl¨an yliopisto, June 8-10 2016. (Eds.) B. Burger, J. Bamford, and E. Carlson. 2016. http://urn.fi/URN:ISBN:978-951-39-67086[04.11.16]. Stirnat, C. “Percepted Spaciousness of different Musical Genres”. Bachelor’s Thesis. Hamburg, 2012. Stirnat, C. “R¨aumliche Wahrnehmung der Musikstile Elektro, Ethno, Jazz, Klassik und Rock”. In: Posterpresentation at the Jahrestagung der Deutschen Gesellschaft f¨ ur Musikpsychologie (DGM). 2015. Webers, J. Tonstudiotechnik: Analoges und digitales Audio Recording bei Fernsehen, Film und Rundfunk. 8th ed. Poing: Franzis’ Verlag GmbH, 2003. Wellek, A. Musikpsychologie und Musik¨asthetik. Grundriss der Systematischen Musikwissenschaft. 3rd ed. Bonn: Bouvier, 1982. 95

”Wide”.

http : / / corion . net / talks / Advanced - perl - techniques / images /

CIMG0961-weites-feld.JPG[24.10.16]. Winckel, F. “Akustischer und visueller Raum. Mitgestalter in der experimentellen Musik”. In: Experimentelle Musik. Raum Musik. Visuelle Musik. Medien Musik. Wort Musik. Elektronik Musik. Computer Musik. (Eds.) F. Winckel. Berlin: Gebr. Mann Verlag, 1970, pp. 1 –18. Ziemer, T. “Implementation of the Radiation Characteristics of Musical Instruments in Wave Field Synthesis Applications”. PhD thesis. Institute of Systematic Musicology, University of Hamburg, 2014. http://ediss.sub.uni- hamburg.de/ volltexte/2016/7939/pdf/Dissertation.pdf. Ziemer, T. “Wave Field Synthesis”. In: Springer Handbook of Systematic Musicology. (Eds.) R. Bader. Berlin, Heidelberg: Springer, 2017, pp. 1 –43.

96

97

Appendix A Experiment 1

Appendix A Experiment 1

Proband/innen gesucht! Für eine Studie im Rahmen meiner Masterarbeit suche ich Proband/innen, die daran teilnehmen möchten. Die Studie thematisiert Räumlichkeit in Musik, wozu ich einen Hörversuch durchführen werde. Es geht darum, den Eindruck von der Musik, die ich den Proband/innen vorspiele, auf einem Fragebogen zu beurteilen. Vorkenntnisse zur Teilnahme werden nicht benötigt. Die Dauer beträgt weniger als zwei Stunden. Der Hörversuch findet an der HAW am Berliner Tor statt, die vom Musikwissenschaftlichen Institut nur wenige Minuten entfernt und direkt an der U1-Haltestelle Lohmühlenstraße bzw. an der Haltestelle Berliner Tor liegt. Die Adresse lautet: Berliner Tor 7 (Haus B) 20099 Hamburg Gerne hole ich Teilnehmer unten am Gebäude ab. Interessierte tragen sich bitte in der Doodle-Liste ein: http://doodle.com/d8ww6rahytfkzs2u#table oder melden sich bei: [email protected]. Der Hörversuch läuft länger als in der Doodle-Liste eingetragen ist, die nächsten Termine trage ich noch ein. Meldet euch gerne schon vorher bei mir. Alle Proband/innen bekommen für ihre Teilnahme ein kleines Dankeschön. Ich freue mich über jede/n, der bzw. die mitmacht!

Claudia Stirnat Musikwissenschaftliches Institut Neue Rabenstraße 13 20354 Hamburg

98

Appendix A Experiment 1

Proband/innen gesucht! Für eine Studie im Rahmen meiner Masterarbeit suche ich Proband/innen, die daran teilnehmen möchten. Die Studie thematisiert Räumlichkeit in Musik, wozu ich einen Hörversuch durchführen werde. Es geht darum, den Eindruck von der Musik, die ich den Proband/innen vorspiele, auf einem Fragebogen zu beurteilen. Vorkenntnisse zur Teilnahme werden nicht benötigt. Die Dauer beträgt weniger als zwei Stunden. Der Hörversuch findet an der HAW am Berliner Tor statt, die direkt an der U1-Haltestelle Lohmühlenstraße bzw. an der Haltestelle Berliner Tor liegt. Die Adresse lautet: Berliner Tor 7 (Haus B) 20099 Hamburg Gerne hole ich Teilnehmer unten am Gebäude ab. Interessierte tragen sich bitte in der Doodle-Liste ein: http://doodle.com/d8ww6rahytfkzs2u#table oder melden sich bei: [email protected]. Der Hörversuch läuft länger als in der Doodle-Liste eingetragen ist, die nächsten Termine trage ich noch ein. Meldet euch gerne schon vorher bei mir. Alle Proband/innen bekommen für ihre Teilnahme ein kleines Dankeschön. Ich freue mich über jede/n, der bzw. die mitmacht! Claudia Stirnat Musikwissenschaftliches Institut Neue Rabenstraße 13 20354 Hamburg

99

Teilnehmer/in _______________

Datum und Uhrzeit _____________________

Hintergrund-Fragebogen Ihre Daten bleiben anonym und werden vertraulich behandelt. Bei Unklarheiten fragen Sie gerne nach. 1. Geschlecht: M/F

2. Alter: _________

3. Haben Sie musikalisches Training gehabt? Ja / Nein

4. Wenn ja, geben Sie Ihr Level an musikalischem Training an: Amateur / Semi-professionell / Professionell

5. An wie vielen Hörtests haben Sie bisher teilgenommen? 0 / 1 – 10 / über 11

6. Wurden bei Ihnen je Hörschäden diagnostiziert ? Ja / Nein

Für jedes Feedback bin ich dankbar! Feedback:

Vielen Dank für Ihre Teilnahme!

Appendix A Experiment 1

Figure A.1:

Recording session in the anechoic chamber of the institute of Systematic Musicology.

101

Hörtest Im Folgenden hören Sie 15sek. lange Musikausschnitte, die Sie jeweils zweimal vorgespielt bekommen. Zwischen den Musikausschnitten sind 15 sek. Pause, sodass Sie eine Minute Zeit haben ein Musikausschnitt zu bewerten. Beurteilen Sie anhand der Bilder Ihren Raumeindruck der Musikausschnitte. Geben Sie dafür eine Zahl von 1 bis 7 an. 1 bedeutet wenig zutreffend und 7 bedeutet sehr zutreffend. P1 und P2 sind Probedurchgänge und werden nicht mit in die Auswertung eingehen. Kennen Sie ein Musikausschnitt bzw. haben Sie es schon einmal gehört, kreuzen Sie die entsprechende Nummer an.

Skala:

1

2

3

4

5

6

3

4

7

Teil 1: Bild/ Lied 1)

P1

P2

1

2

5

6

7

8

9

10

11

12

13

5

6

7

8

9

10

11

12

13

2)

3)

4)

5)

1

Bild/ Lied 6)

P1

P2

1

2

3

4

7)

8)

9)

10)

11)

12)

2

Bild/ Lied 1)

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

20

21

22

23

24

25

26

27

28

11

12

13

2)

3)

4)

5)

6)

7)

8)

3

Bild/ Lied 9)

14

15

16

17

1

2

3

18

19

10)

11)

12)

Teil 2: Bild/ Lied 1)

P1

P2

4

5

2)

3)

4

6

7

8

9

10

Bild/ Lied 4)

P1

P2

1

2

3

4

5

6

7

8

9

10

11

12

13

26

27

28

5)

6)

7)

8)

9)

10)

11)

12)

5

Bild/ Lied 1)

14

15

16

17

18

19

2)

3)

4)

5)

6)

7)

8)

6

20

21

22

23

24

25

Bild/ Lied 9)

14

15

16

17

1

2

3

18

19

20

21

22

23

24

25

26

27

28

10)

11)

12)

Teil 3: Bild/ Lied 1)

P1

P2

4

5

6

7

8

9

10

11

12

13

5

6

7

8

9

10

11

12

13

2)

3)

7

Bild/ Lied 4)

P1

P2

1

2

3

4

5)

6)

7)

8)

9)

10)

11)

12)

8

Bild/ Lied 1)

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

20

21

22

23

24

25

26

27

28

2)

3)

4)

5)

6)

9

Bild/ Lied 7)

14

15

16

17

18

19

8)

9)

10)

11)

12)

10

Appendix A Experiment 1

Figure A.2:

The frequency response of the A8X loudspeakers is nearly linear (source: Audio Test, 2011, p. 61).

Figure A.3:

The frequency response of the WFS at the HAW is not linear. Thanks to Fouraudio for the permission to use the figure (source: Fouraudio, 2013, p. 2).

107

Appendix A Experiment 1

Figure A.4:

Screenshot of the stimuli set up of the first listening test in Ardour. The first channel includes the stimuli order of the wave field synthesis condition, the second channel shows the stimuli order of loudspeaker condition and the third channel consists of the stimuli order of headphone condition.

108

109

Appendix B Experiment 2

Appendix B Experiment 2

Proband/innen gesucht! Für eine Studie im Rahmen meiner Abschlussarbeit suche ich freiwillige Proband/innen, die daran teilnehmen möchten. In der Studie geht es um Räumlichkeit in Musik, wozu ich einen Hörversuch durchführen werde. Vorkenntnisse zur Teilnahme werden nicht benötigt. Die Dauer beträgt ca. 45 min. Der Hörversuch findet an der HAW am Berliner Tor statt, welche vom Musikwissenschaftlichen Institut nur wenige Minute entfernt und direkt an der U1-Haltestelle Lohmühlenstraße liegt. Die Adresse lautet: Berliner Tor 7 (Haus B) 20099 Hamburg

Gerne hole ich Teilnehmer unten am Gebäude ab. Interessierte tragen sich bitte in der Doodle-Liste ein: http://doodle.com/zy6f7zehahie4buc oder melden sich bei: [email protected]. Alle Proband/innen bekommen für ihre Teilnahme ein kleines Dankeschön.

Claudia Stirnat Musikwissenschaftliches Institut Neue Rabenstraße 13 20354 Hamburg

110

Appendix B Experiment 2

Proband/innen gesucht! Für eine Studie im Rahmen meiner Abschlussarbeit suche ich freiwillige Proband/innen, die daran teilnehmen möchten. In der Studie geht es um Räumlichkeit in Musik, wozu ich einen Hörversuch durchführen werde. Es geht darum, den Eindruck von der Musik, die ich den Proband/innen vorspiele, auf einem Fragebogen zu beurteilen. Vorkenntnisse zur Teilnahme werden nicht benötigt. Die Dauer beträgt ca. 45 min. Der Hörversuch findet an der HAW am Berliner Tor statt, die direkt an der U1-Haltestelle Lohmühlenstraße liegt. Die Adresse lautet: Berliner Tor 7 (Haus B) 20099 Hamburg Gerne hole ich Teilnehmer unten am Gebäude ab. Interessierte tragen sich bitte in der Doodle-Liste ein: http://doodle.com/zy6f7zehahie4buc oder melden sich bei: [email protected]. Alle Proband/innen bekommen für ihre Teilnahme ein kleines Dankeschön.

Claudia Stirnat Musikwissenschaftliches Institut Neue Rabenstraße 13 20354 Hamburg

111

Hörtest Der Hörversuch besteht aus drei Teilen mit 30 Musikausschnitten in jedem Teil. Sie werden 15 Sekunden lange Musikausschnitte hören. Zwischen den Musikausschnitten ist immer eine Pause von 5 Sekunden. Die ersten beiden Musikausschnitten dienen als Übung, damit Sie sich mit der Aufgabe vertraut machen können Beurteilen Sie Ihren Raumeindruck der Musikausschnitte auf der unten stehenden Skala von 1 bis 7. 1 bedeutet wenig räumlich und 7 bedeutet sehr räumlich. Gehen Sie dabei nach Ihrem ersten Eindruck vor. Kennen Sie einen Musikausschnitt bzw. haben Sie ihn schon einmal gehört, kennzeichnen Sie bitte die entsprechende Liednummer. „Räumlichkeit“ bezeichnet einen auditiven Effekt, bei dem eine charakteristische Ausbreitung des Klanges einen größeren Raum füllt und auch entsprechend wahrgenommen wird, d.h. ein größerer Raum wahrgenommen wird.

Lied wenig räumlich Ü1 Ü2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Teil 1 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

1

5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6

7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7

sehr räumlich

Lied wenig räumlich Ü1 Ü2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Teil 2 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

2

5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6

7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7

sehr räumlich

Lied wenig räumlich Ü1 Ü2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Teil 3 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4 3 4

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

3

5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6

7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7

sehr räumlich

Appendix B Experiment 2 Windinstruments - One Voice

Siyahamba (African song; Voice)

Figure B.3:

- Vokalise (Voice)

Figure B.4: Purcell Trumpet Voluntary (Trumpet)

Figure B.5: Robert Schumann - Romanzen I (Oboe)

Figure B.6:

Figure B.1:

Rachmaninof

Figure B.2:

Figure B.7: F. Mendelssohn-Bartholdy, arr. R.W. Violin Concerto (Flute)

Figure B.8: Traviata (Flute)

115

Unkown

(Clarinet)

Robert Schumann - Romanzen III (Oboe)

G. Verdi - La

Appendix B Experiment 2 Stringed instruments - One Voice

Figure B.9:

Stanley Brothers - All the good times are passed and gone (Banjo)

Figure B.10:

Figure B.12:

Figure B.13:

Eduard Lalo - Cello concert II (Cello)

Etude (Vio-

Figure B.11:

Tango El

Choclo (Violin)

lin)

Theme (Cello)

116

Weber -

Figure B.14:

Martini Old Gavotte, part 1 (Cello)

Appendix B Experiment 2 Mixed instruments - Several Voices

Figure B.15:

Figure B.16:

Figure B.17:

Figure B.18:

Counting Crows - Mr. Jones (Voice & Piano)

Figure B.19:

Donovan Universal Soldier (Voice & Piano)

Figure

Figure B.21:

Figure B.22:

Figure B.23:

Fred Cockerham - June Apple (Banjo)

Bang & Olufsen – Guitar, Capriccio Arabe (F. T´ arrega; Guitar)

Hava Nagila - Traditional Israeli, part 1 (Brass Ensemble)

Martini - Old Gavotte, part 2 (Cello Duo)

117

Hava Nagila - Tranditional Israeli, part 2 (Brass Ensemble)

B.20:

Fla-

menco1 U89 (Guitar)

Johann Halvorsen - Passacaglia (Violin & Cello)

Appendix B Experiment 2

Figure B.24:

Figure B.25:

Boxplot - Headphone condition.

Boxplot - Wave field synthesis condition.

118

Appendix B Experiment 2

Figure B.26:

Boxplot - Loudspeakers condition.

119

Appendix C Measurements

Figure C.1:

Position: Low-

Center

Figure C.4: Center

Figure C.2:

Position: Low-

Left

Position: Middle-

Figure C.3:

Position: Low-

Right

Figure C.5:

Position: Middle-

Left

Figure C.6: Right

120

Position: Middle-

Appendix C Artificial Head Measurements

Figure C.7:

Figure C.8:

Position: High-

Center

Left

Figure C.9:

Position: High-

Right

121

Position: High-

Appendix C Artificial Head Measurements Eidesstattliche Erkl¨ arung nach § 14,8 der Pr¨ ufungsordnung der Fakult¨ at f¨ ur Geistes -und Kulturwissenschaften f¨ ur Studieng¨ ange mit dem Abschluss Master of Arts (M.A.) vom 5. Juli 2006

Ich versichere an Eides statt durch meine eigenh¨andige Unterschrift, dass ich die beiliegende Arbeit selbst¨andig und ohne fremde Hilfe angefertigt und alle Stellen, die w¨ortlich oder ann¨ahernd w¨ortlich aus Ver¨offentlichungen entnommen sind, als solche kenntlich gemacht habe. Außerdem habe ich mich keiner anderen als der angegebenen Literatur, insbesondere keiner im Quellenverzeichnis nicht benannten Internet -Quellen, bedient. Diese Versicherung bezieht sich auch auf zur Arbeit geh¨orige Zeichnungen, Skizzen, bildliche Darstellungen etc. Weiterhin entspricht die eingereichte schriftliche Fassung der Arbeit der Fassung auf dem eingereichten elektronischen Speichermedium.

Mit der sp¨ateren Einsichtnahme in meine Hausarbeit erkl¨are ich mich einverstanden/nicht einverstanden.

Datum

Unterschrift

122

Suggest Documents