1 19th INTERNATIONAL CONGRESS ON

3 downloads 0 Views 53KB Size Report
Sep 7, 2007 - COGNITIVE EVALUATION OF SOUND QUALITY: BRIDGING THE GAP ... car approaching), that is, not about the sound itself but rather about the sound as resulting from ..... (1987) Cambridge : Cambridge University Press.
1 19 INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 th

COGNITIVE EVALUATION OF SOUND QUALITY: BRIDGING THE GAP BETWEEN ACOUSTIC MEASUREMENTS AND MEANING

PACS: 43.66.Ba, 43.50.Rq Daniele Dubois1, Guastavino, Catherine2; 1 Centre National de la Recherche Scientifique, LCPE/LAM, 11 rue de Lourmel, 75 015 Paris France; [email protected] 2

McGill University, School of Information Studies and CIRMMT (Centre for Interdisciplinary Research on Music Media & Technology), 3459 McTavish, H3A 1Y1 Montreal, QC Canada; [email protected]

ABSTRACT Presented in the paper is a new theoretical grounding for soundscape research based on the following observations: first, the limits of classical physical measurements and the diversity of disciplinary fields newly involved; second, the diversity of research paradigms used in recent research, and finally; the need to articulate different types of expertise. Specifically we address the coupling between acoustic measurements and other descriptions (derived from social sciences) to account for the concept of soundscape. Our contribution is to give explicit descriptions of the structural properties of soundscapes as cognitive categories and to explicitly relate formal properties to physical descriptions. A special focus is given to the representation of sound sources and their descriptions in both psychological and physical domains. Finally, we suggest to develop cognitive models using formal representations to account for the complexity of cognitive representations, inasmuch as physical descriptions of acoustic signals can not be considered as the exhaustive description of «intrinsic» properties of entities of the world. Rather, acoustic descriptions should be conceptualized as semantic cues pointing to cognitive representations that integrate other types of properties differently structured. INTRODUCTION Soundscape research and more generally sound quality evaluations have traditionally developed along the psychophysical paradigm. In this view, the physical description of the acoustic signal is the primary description. Physical parameters are evaluated in the “subjective” (i.e. psychological) domain, to identify indicators (as physical measurements) that adequately account for annoyance (as people responses to noise exposure). However, indicators currently in use remain mainly based on A-weighted sound pressure levels. They reflect nuisance perceived for particulars sound only in terms of acoustic or low-level psycho-acoustic features (e.g. Lden day-evening-night level from the European directive about management and evaluation of noise in the environment, June 2002). In this view, psychological investigations use simple artificial sounds (rarely encountered in natural listening environments) to identify and validate acoustic properties affecting perception and annoyance. Furthermore, cognitive or situational factors are conceived as extraneous so called “contextual” variables that need to be controlled for, rather than investigated. However, there is converging evidence that in the case of natural or everyday life situations, sounds are processed as meaningful events, in which case cognitive factors exert their influence over perceptual ones, and often overpower them [1, 2]. Neuhoff [3] further reviewed evidence that results obtained with artificial sounds do not always generalize to ecologically valid sounds, and specifically emphasized differences between static and dynamic perception of sounds. In contrast with the well-established fields of speech or music perception and cognition where “subjective” perspectives have long been investigated by psychologists and musicians, soundscape research remains dominated by the physical sciences and the engineering communities. For example, Gaver [4] introduced the distinction

between musical listening and everyday listening only recently in cognitive psychology, when this distinction was already largely developed by Schaeffer’s investigation on Musique concrète perception, in terms of “écoute réduite” as opposed to “écoute banale” [5], or by Castellengo who contrasts identification and qualification [6]. While musical listening focuses on perceptual attributes of the sound perceived for itself (e.g. pitch, loudness), everyday listening focuses on sounds as acoustic events allowing to gather relevant information about our environment (e.g. car approaching), that is, not about the sound itself but rather about the sound as resulting from sources. Therefore sources are identified as meaningful events along with the agents and actions producing sound. Linguistic studies on sound naming [7,1] revealed that everyday sounds are commonly named using the sound source name inasmuch as it is the source rather than the sound itself that is relevant for everyday life adaptation. Studies on human-made domestic sounds developed in ecological psychology validated this view [8, 9, 10] demonstrating the primacy of source identification processes in qualitative judgments. Most relevant to this paper is previous research on environmental sounds indicating that people organize familiar sounds on the basis of the appraisal of objects producing sound and the context in which these sounds are usually encountered in everyday life [11, 12, 13]. Current soundscape research (see the special issue of Acta Acustica & Acustica, 92(6), 2006 for a review) has accumulated evidence of the limitations of a purely “objective” description of the sound per se through physical measurement to access noise annoyance. Furthermore, there is converging evidence that “subjective“ judgments rely on relationships between sounds and people. These findings suggest that the elaboration of auditory categories is mediated by the subject's interactions with its environment through socialized activities [1]. Therefore, the way people react and evaluate sounds has to be taken into consideration in the conceptualization and description of the signal in the first place. Only ultimately can it be related to physical measurements for a better understanding of causal relationships and to improve sound quality control in urban environments and for city planning. Since meaning is involved, the notion of public space and therefore public noise acceptance has to be considered. Fiebig and SchulteFortkamp [14] showed that annoyance judgments depend primarily on social representations and acceptance of sounds [35] rather than on the physical properties. Furthermore, a free sorting task using recordings of urban soundscapes revealed the existence of scripts categories referring to the situation in which the sounds are heard, including notions of time and location (e.g. “sounds heard in a sidewalk café district on a sunny afternoon”) [13]. Examinations of personal music collections [15] also reveal organization principles based on intended use: people organize music on the basis of the situation in which they intend to listen to a particular set of music (e.g. “work music”, “driving music”). On the linguistic side, most natural languages reflect subjective, personal conceptualizations of environmental sounds. If visual objects are often described by simple lexical devices, there are few single words on which people agree as spontaneous descriptions of everyday sounds [1]. The lack of basic lexicalized terms or a priori established categories questions the relationship between words and knowledge representations and suggests broadening the linguistic material to complete statements provided by language in discourse. Moreover, hedonic judgments happen to be relevant and even discriminant in free sorting tasks of recorded sounds, which appear to be processed as individual experiences (as script categories) rather than shared knowledge (as taxonomic categories). Together, these results suggest an integrative approach to relate the explorations of soundscape to the recent trends in ecological validity on “natural” categorization in the field cognitive psychology [1, 16, 17, 18].

FORMAL REPRESENTATIONS A diversity of descriptions Two main issues arise from such an integrative approach. The first issue is to access precise descriptions of soundscapes as psychological representations. Given that these representations cannot be observed directly, we study them empirically through behaviour and language, by analyzing auditory judgments and verbal descriptions of sounds. Behavioural and linguistic measures, derived from categorization research, include free sorting tasks, supervised 2 19th INTERNATIONAL CONGRESS ON ACOUSTICS – ICA2007MADRID

classification tasks, typicality ratings, spontaneous verbal descriptions, naming and definition elicitations. The second issue is to identify the relations between structural features or properties of psychological representations and physical properties of the acoustic signal. This is necessary to identify which parameter(s) can be interpreted as a cue to a target meaningful category for the listener, pointing to the representation of a sound source or event. Several authors have pointed to the limits of acoustic parameters, such as A-weighted levels, for complex sound environments with numerous mixed sources. Indeed, differences in noise spectra recorded from different sources and large variations in noise levels over time suggest introducing new parameters, or combining existing ones derived from different measurement procedures [19,20]. Loudness levels are not sufficient to predict the level of annoyance since different sources are perceived quite differently. For example, human sounds (e.g. voices, footsteps) give rise to positive judgments, whereas mechanical inanimate sounds (e.g. traffic) tend to be perceived negatively [12, 13]. Furthermore, it has also been shown that public transportation noise is evaluated as less annoying than traffic noise resulting form individual vehicle at similar loudness levels [12]. Such qualitative differences are related to the meaning attributed to the sound source in a socially defined environment. Hence further research is required to model environmental sound source recognition from recorded signals. To do so, there is a need to analyze different sound sources (e.g. vehicles) and their contribution to the acoustic signature of the sounsdcape using cognitive evaluations. Finally, source recognition processes should be modelled in order to give meaning to acoustic parameters to predict auditory judgments. To get at a more formal description and modelling such categorical representations, further knowledge on the structural properties of cognitive categories has to be acquired. Research on categories and concepts [16] traditionally describes their formal properties either as extensional or intensional structures. The extension of a concept is the set of all exemplars that are considered to belong to the concept; its intension is given by the set of defining attributes of features. Viewed as extensional structures, cognitive categories give rise to holistic judgments distributed along similarity. Similarity can be represented as distance to prototypical exemplars. Cognitive categories can also be viewed as intensional structures, derived from analytic judgments on their properties (essential as well as accidental), in which case similarity can be represented as family resemblance of distributed properties (characteristics, dimensions). Extensional structures can be best represented using additive tree representation to account for typicality, with variables distances to prototypical exemplars [21]. The goodness of fit can be evaluated using both metrical measure (Kruskal stress) and topological measures (e.g. % of adequately represented quadruplets) [22]. Intensional structures can be related to fuzzy logic (among other formalisms) in that it is concerned with the internal structure of family resemblance and probability in membership decision [23]. A diversity of subjects Inasmuch as that the meaning attributed to sounds is a relation between a stimulation and a perceiver, attention should be paid to the identity and characteristics of who is exposed to such sounds and who is perceiving it. Specifically, listeners’ expertise should be accounted for. Bensa et al. [24] conducted free sorting tasks of synthetic piano sounds varying along 2 dimensions, namely inharmonicity and presence of “phantom partials”, due to non-linear coupling between transverse and longitudinal modes. Two groups of listeners were compared: pianists (experts) and non-pianist (novices). Experts created categories on the basis of the first dimension (inharmonicity) at a first level, and finer grain sub-categories along the second dimension (non-linear coupling), suggesting an intensional structure. However, novices created categories related to combinations of partial ranges of the two dimensions integrated in extensional categorical structures categories. Experts’ responses may lead to validate representation that fit the physical descriptions of sounds “per se”, whereas novices’ responses highly differ and reveal a categorical organisation along typicality and family resemblance.

3 19th INTERNATIONAL CONGRESS ON ACOUSTICS – ICA2007MADRID

MODELLING From psychology to acoustic signal processing Given the limits of dimensional measurements and the primacy of source identification processes in appraisal judgments, sound recognition processes models are crucially needed. Recent research efforts have been directed towards systems capable of detecting and classifying environmental sounds [24, 25, 26]. The wide range of environmental sounds includes the sounds generated in domestic, business, and outdoor environments but most investigations focus on a restricted domain for specific applications. For example, indoor environmental sounds recognition has been investigated in the context of surveillance and security applications [26]. In standard sound recognition approaches (e.g. [24, 27, 28]), sound classification is performed in two stages. First, a pre-processor used signal processing techniques to generate a set of features characterizing the signal to be classified. These features form a so-called feature vector. The use of the features as input to classifiers plays two roles: a dimension reduction role, and a representation role. Indeed, the signal itself could in principle be used as input to classifiers, but its dimension (number of samples) is usually too high with respect to the training set size, resulting in overfit. An example of audio features is MPEG7-audio. These features contain statistical information from the temporal domain (e.g. zero-crossing rate), spectral domain (e.g. spectral centroid), or more perceptive features (e.g. sharpness, relative loudness). In the space of feature vectors, a decision rule (i.e., a classifier) is then implemented to assign a pattern to a class. During the training phase, the classifier learns how to discriminate between classes by optimizing some performance criterion with respect to training data. Training data are composed of sounds together with class labels indicating which class each sound belongs to. Once the system has been sufficiently trained for a particular pattern recognition application, its parameters are “frozen” and the classifier is ready to use. A determinant factor for successful classification is the quality of audio features used for data representation. The quality of a feature has to be analyzed in context for a given combination of input data, application domain and sound classes. Similarly, classifiers cannot be evaluated alone; instead, they have to be evaluated jointly with the features and dataset they operate on. Hence, a sound classification system should be designed within a given context by jointly selecting the set of features, the classifier and the dataset. Automatic environmental sound recognition has primarily focused on single sound recognition. Few studies have investigated the broader task of automatic sound recognition for many sound classes. Early efforts proposed systems devoted to specific tasks such as sound alerting aid [29] considering certain classes of stationary signals (bells, phone, door rings, etc.) and using standard statistical classification procedures (based on spectral and temporal features). Later, new classifiers were explored using Neural Networks [30] and Hidden Markov Models [31]. Most relevant to soundscape research is the work of Defreville and co-authors [32, 33, 34] who analyzed urban recordings with the aim to detect 6 categories of typical sound sources (namely cars, mopeds, buses, motorcycles, bikes, voices and birds). This work raised two critical issues: first, the difficulty to design descriptors to identify every possible sound source in urban environments, and second the inconsistency of certain sound sources in terms of acoustic features. The first issue could be resolved using conditional evaluations of families of acoustic descriptors, rather than using a single general-purpose extractor. Indeed, a first hierarchical distinction between vehicles (moped, bus, motorcycles and car) and non-vehicles (bird and voice) significantly raised the accuracy of identification of the buses. The relevance of this distinction has been demonstrated in cognitive evaluations of urban soundscape, which revealed a strong contrast between judgments of human sounds and of traffic sounds [12, 13]. Hence, cognitive evaluations can be used to design more efficient sound recognition models. CONCLUSIONS Recent research on sound quality and auditory scene analysis, suggest using cognitive evaluations in the first place as a “pre-filter” to reduce complexity by identifying relevant categories of sounds. Then formal representations can be derived, incorporating features, 4 19th INTERNATIONAL CONGRESS ON ACOUSTICS – ICA2007MADRID

relevant to the structure of categories as psychological representations (e.g. by selecting extensional or intensional descriptions as deemed appropriate to account for the cognitive organization). On that basis, sound recognition processes can be modelled within each category using the relevant audio features. In this view, acoustic descriptions are be conceptualized as semantic cues pointing to cognitive representations that integrate other types of properties differently structured. Further research on this topic could lead to a cognitive basis for auditory stream segregation. ACKNOWLEDGMENTS The writing of this paper was supported by CFI and FQRNT grants to C. Guastavino. The authors would like to thank Boris Defreville for fruitful discussions and new insights on sound recognition. References: [1] Dubois, D. Categories as acts of meaning. Cognitive Science Quaterly, 1 (2000) 35-68. [2] Guastavino, C. Categorization of environmental sounds, Canadian journal of experimental psychology, 61(1) (2007) 54-63. [3] Neuhoff, J. G., Ed. Ecological Psychoacoustics. (2004) New York: Academic Press. [4] Gaver, W. What in the world do we hear? An ecological approach to auditory event perception. Ecological Psychology, 5 (1993) 1–29. [5] Schaeffer, P. Traité des objets musicaux. (1966) Paris: Seuil. [6] Castellengo, M., & Dubois, D. Sémantique des objets sonores : timbres et musiques, Intellectica, (submitted). [7] David, S. Représentation d'objets sensoriels et marques de la personne. In D.Dubois (Ed.). Catégorisation et cognition: de la perception au discours, (1997) pp. 211-242. Paris: Kimé. [8] Ballas, J. Common factors in the identification of an assortment of brief everyday sounds. Journal of Experimental Psychology, 19 (1993) 250-267. [9] Guyot, F., Castellengo, M., & Fabre, B. Etude de la catégorisation d'un corpus de bruits domestiques. In D.Dubois (Ed.), Catégorisation et cognition: de la perception au discours, (1997) pp. 41-58. Paris: Kimé. [10] Vanderveer, N.J. Ecological Acoustics : Human Perception of Environmental Sounds. Dissertation Abstract International, 40 (1979) 4543 B (University Microfilms number 8004001). [11] Dubois, D., Guastavino, C., & Raimbault, M. A cognitive approach to soundscape: Using verbal data to access everyday life auditory categories. Acta Acustica united with Acustica. 92(6) (2006) 865-874. [12] Guastavino, C. The ideal urban soundscape: Investigating the sound quality of French cities. Acta Acustica united with Acustica. 92(6) (2006) 945-951. [13] Guastavino, C. Categorization of environmental sounds. Canadian Journal of Experimental Psychology. 60(1) (2007) 54-63. [14] Fiebig, A., Schulte-Fortkamp, B. Soundscape and their influence on inhabitants – New findings with the help of a grounded theory approach. Journal of the Acoustical Society of America, 115(5), pt 2 (2004), 2496. [15] Cunningham, S. J., Jones, M. & Jones, S. R. Organizing digital music for use: an examination of personal music collections. Buyoli, C. L. and Loureiro, R. (Eds), Proc Fifth International Conference on Music Information Retrieval (ISMIR2004), Barcelona, Spain, (2004) 447-454. [16] Neisser, U. Concepts and conceptual Development : Ecological and intellectual factors in categorization. (1987) Cambridge : Cambridge University Press. [17] Barsalou, L.W. Perceptual symbol systems, Behavioral and brain sciences 22 (1999) 577– 660. [18] Markman, A B, & Ross, B.H. Category use and category learning. Psychological Bulletin, 129 (2003) 592-613. [19] Zwicker, E. & Fastl, H. Psychoacoustics, facts and models (1990) Springer Verlag, [20] Raimbault, M, Lavandier, C;, Berengier, M. Ambient sound assessment of urban environment : field studies in two French cities. Applied acoustics 64 (2003) 1241-1256. [21] Sattah, S ; & Tversly, A. Additive similarity trees. Psychometrica, 42 (1977) 319-345. [22] Guénoche, A. & Garreta, H. (2002) Can we have confidence in a tree representation ? Proceedings of JOBIM’2001, Lecture note in Computer sciences, 131-138. 5 19th INTERNATIONAL CONGRESS ON ACOUSTICS – ICA2007MADRID

[23] Botteldooren, D. & De Coensel, B, Quality assessment of quiet areas: a multi-criteria approach. Proceedings of Euronoise 2006, (2006) Tampere, Finland. [24] Bensa, J. Dubois, D., Kronland-Martinet, R, Ystad, S. Perception and cognitive evaluation of a piano synthesis model, Lecture Notes in Computer Science, Springer Verlag, 3310 (2005) 232-245. [24] Couvreur, C. Environmental Sound Recognition: A Statistical Approach. Ph.D. dissertation, Faculté Polytechnique de Mons, Belgium, (1997). [25] Peltonen, V. Computational auditory scene recognition, Ph.D. dissertation, Tampere University of Technology, Finland (2001). [26] Istrate, D. Détection et reconnaissance des sons pour la surveillance médicale, Ph.D. dissertation, INPG, France (2003) [27] Martin, K. D. Sound-source recognition: A theory and computational model,” Ph.D. dissertation, MIT Press (1999). [28] Brown, G. J. & Cooke, M. Computational Auditory Scene Analysis,” Computer Speech and Language, 8 (1994) 297–336. [29] Uvacek, B., Ye, H. & Moschytz, G. A new strategy for tactile hearing aids: tactile identification of preclassified signals (tips), in International Conference on Acoustic, Speech and Signal Processing (ICASSP), New-York, USA, May 1988. [30] Osuna, J. A. & Moschytz, G. S. Recognition of acoustical alarm signals with cellular networks, in European Conference on Circuit Theory and Design, Istanbul, Turkey, 1995. [31] Oberle, A. K. S. Recognition of acoustical alarm signals for the profoundly deaf using hidden Markov models, in International Symposium on Circuits and Systems, Seattle, USA, 1 (1995), 2285–2288 [32] Lavandier, C. Defreville, B. The Contribution of Sound Source Characteristics in the Assessment of Urban Soundscapes, Acta Acustica united with Acustica, 92(6) (2006) 912-921. [33] Defreville, B. Roy, P., Rosin, C. & Pachet, F. Automatic Recognition of Urban Sound Sources. Proceedings of the 120th AES Conference (2006). [34] Aucouturier, J-J. & Defreville, Sounds like a park: a computational technique to recognize soundscape holistically, without source identification. (2007) Proceedings of the 19th Congress of Acoustics (ICA), September 2007, Madrid, Spain. [35] Maris, E., Stallen, P,J., Vermunt, R. & Steensma, H. Noise within the social context: Annoyance reduction through fair procedures. Journal of the Acoustical Society of America (in press).

6 19th INTERNATIONAL CONGRESS ON ACOUSTICS – ICA2007MADRID