Multimodal Temporal Processing Between Separate and Combined Modalities Adam D. Danz (
[email protected]) Central and East European Center for Cognitive Science, New Bulgarian University 21 Montevideo St., 1618 Sofia, Bulgaria
Abstract Previous research has shown that the auditory modality dominates in detecting temporal frequency changes when there is a discrepancy between the auditory and visual modalities. Little to no research investigates how the visual and auditory modalities cooperate when the temporal frequencies are perceived in parallel between the two sensory modalities. In experiment I, detection of temporal frequency changes of an increase or decrease of 5% from a base frequency of 2Hz are examined in separate modalities. In experiment II, the frequencies were presented in parallel between both modalities. Comparison of these results shows support towards multimodal sensory integration rather than auditory dominance of temporal perception. Keywords: multimodal, temporal discrepancy; perception; RT, time, auditory, visual, sensory integration, cognition.
Introduction Many experimental designs have been devised in studying temporal processing utilizing auditory stimuli (Schubotz, Friederici, & Cramon, 2000), visual stimuli (Moutoussis, 1997), and even tactile stimuli (Macar et. al., 2002). Researchers have studied how one percept can influence the other in multimodal temporal processing (Welch & DuttonHurt, 1986; Gebhard & Mowbray, 1959) and how various aspects within a single percept are perceived at different time courses (Moutoussis, 1997). The bulk of literature concerning multimodal temporal processing focuses on a paradigm of competing modalities in order to determine which sensory modality is dominant in temporal perception. The study herein, however, provides an examination of multimodal temporal stimuli between vision and audition as they are perceived when stimuli are presented congruently in parallel (experiment II) compared to processing these modalities in isolation (experiment I). In this approach, sensory modalities are not competing against differing temporal representations and instead, they may either work together or allow one modality to dominate interpretation of the temporal stimuli.
Sensory influence on temporal processing As Newton proclaimed in Principia Mathematica (1698), “absolute, true, and mathematical time, of itself and from its own nature, flows equably without regard to anything external”, yet without sensory organs to detect this incorporeal flow, humans have the ability to synchronize with and recreate temporal patterns even accurately predicting some durations of time (Rao et.al., 2001). Even
more intriguing, temporal perception is highly congruent across observers. Because human sensory organs are susceptible to limitations, the processing of external temporal events are affected and, as with every sensory perception, not necessarily representative of the actual environment. When two or more events occur without discernable succession, they are said to have occurred simultaneously, or to have occurred „at the same time‟. Perceived simultaneity does not conform to physical simultaneity. In attempt to determine the limits of perceived simultaneity, many paradigms have been used resulting in many differing conclusions Hirsh & Sherrig (1961) examined the succession/simultaneity threshold of various sensory modalities in isolation including vision, audition, and tactile perception. The results showed that successive stimuli under approximately 20ms intervals were perceived as simultaneous across all modalities whereas intervals greater that 20ms were perceived as successive. Instead of singlemodality succession of stimuli, Hirsh & Fraisse (1964) tested succession across modalities using an acoustic click and a brief flash of light. When the sound preceded the light, the threshold was measured at about 60ms while light preceding the sound resulted in thresholds between 90 to 120ms (Hirsh & Fraisse, 1964). The contrast of these results demonstrates the sensitivity of temporal perception relative to sensory modality. However, such inconsistencies have been shown within a single modality by merely changing the complexity of the stimuli (see Moutoussis, 1997). For example, when six letters are presented in random succession, they are perceived as simultaneous as long as the total duration does not succeed 90ms (Hylan, 1903). When four light emitting diodes in the shape of a diamond are lighted in an ordered succession, as long as the duration between the first and last flash of light is under 125ms, the succession is perceived as simultaneous (Lichtenstein, 1961). With such variability of simultaneity-threshold measurement, it is not surprising that investigations of duration perception yield varying results. Experiments involving duration require participants to either estimate a duration after it has completed or interact with a duration that is being perceived at present time. For stimuli used in duration discrimination tasks, not only do participants process the temporal span of stimuli as in simultaneity research, but the semantic aspects of stimuli are also
processed. The „paradox of subjective duration‟ (Pöppel, 1997) demonstrates that in retrospective evaluation of duration, a high memory load, consistent with a difficult task, complex stimuli, or both, will result in less attention towards the actual duration and therefore an overestimation of that duration. On the other hand, during the experience of time passing, if task and stimuli are complex, time seems to pass quicker rather than during states of boredom (Pöppel, 1997; Fraisse, 1979) The sensory modalities chosen for temporal perception research, the types of stimuli and their features, the complexity of the stimuli, and the response task all contribute to measures of temporal processing. Since these sensory-based confounding constraints are unavoidable, researchers should be conscious of them when designing their experiments and interpreting their results.
Integration of Multimodal Sensory Data The „modality appropriateness hypothesis‟ states that in multimodal perception, contributions of various sensory modalities are relative to the stimuli being perceived. Welch & DuttonHurt (1986) demonstrated this effect in the field of temporal perception by using bimodal temporal stimuli resulting in auditory dominance over vision in discrimination of temporal frequencies. In their experiment, designed discrepancies between auditory and visual frequencies showed evidence of auditory bias over vision demonstrating that when visual and auditory temporal frequencies are in conflict the auditory information dominates the percept. Whereas vision specializes in spatial perception, audition seems to dominate that of temporal perception as a gross number of studies confirm. The threshold of vision in discriminating between flashes of light is much lower than the threshold of audition in discriminating between successive bursts of sound showing that temporal acuity is much higher in audition than in vision. This gives rise to the phenomenon referred to as „auditory driving‟ which shows that when an auditory frequency is gradually increased or decreased while being compared to a steady visual frequency, the visual oscillations seem to increase or decrease along with the auditory stimuli even though they are remaining constant (Gebhard & Mowbray, 1959). The visual system does not seem to yield the same effect, however, on auditory perception. Sensory transduction itself operates on different time courses as sensory data are being transformed into electrochemical impulses. Specifically, auditory information is encoded faster than visual information and if a bimodal temporal stimulus is to be perceived simultaneously the brain must either lean towards the timing of one modality or integrate the incoming sensory data (evidence of integration sites for auditory and visual stimuli can be found in Bushara, Grafman, & Hallett, 2001 and Calvert, et. al., 2001). The dynamic integration of temporal perception spans beyond sensory constraints. By using direct galvanic stimulation of the vestibular system, Trainor et.al. (2009) were able to manipulate participants into
perceiving otherwise ambiguous rhythms as specific interpretations. The galvanic stimulation replicated the common experience of nodding the head to music except without bodily movement. The vestibular system sends much of its information to the cerebellum, which has been shown to play a role in interval timing (Ivry, 1996).
Tempo Perception, in Music Though the ties between frequency perception and tempo in music research seem plausibly related, there is a communicative gap between music theorists and cognitive scientists who currently use different lexicons to measure the same entity. Where frequency is measured in hertz (Hz) and durations in milliseconds (ms), tempo is measured in beats per minute (bpm) whereby a „beat‟ is Euclidean in that it represents a point or marker without duration in and of itself. While psychophysical measurements concern thresholds of simultaneity and succession, music theorists broaden their scope to measure the range of tempos with which musicians may interact. Tempi above ~300bmp (5Hz, 200ms intervals) are perceptually difficult to discern in the context of music (Van Noorden & Moelants, 1999) while tempi under ~40bpm (0.67Hz, 1500ms intervals) result in perceptual isolation of each beat and is beyond the capacity of working memory to process two consecutive beats that are required to create a tempo (Van Noorden & Moelants, 1999). Moelants (2002) implied that there must be a zero point between these ranges where tempo perception is optimal. Researchers have settled at an optimal tempo centered around 120 bpm (2Hz, 500ms intervals) which has also been replicated within the visual modality (Luck and Sloboda, 2007).
Methods Experiment I: Separate Modalities Participants 20 participants (7 males) took part in the study with mean age 26.4 (19-35 years; 4.4 SD). Participants with imperfect vision wore corrective lenses or glasses and no participants were hearing impaired. All participants had little to no musical experience eliminating experts in tempo discernment.
Stimuli and Design The auditory pulse consisting of a square wave tone of 440 Hz lasting 125ms presented binaurally using full coverage headphones produced the sound of one auditory „beep‟. Each 125ms pulse followed by one empty interval consisted of one cycle. Cycles repeated between 15 to 22 times defining one stimulus. There was a 1500ms inter trial interval which also consisted of silence. The baseline tempo consisted of the 125ms tone followed by a 375ms period of silence before the following tone. Combined, the 500ms baseline cycle represented the preferred tempo range of 120bpm (2Hz). During the auditory portion of the experiment, the computer monitor was gray as in the visual
portion of the experiment; however, participants were not instructed to watch the monitor. The visual stimulus consisted of a black circle with a diameter of 1.25cm centered on the screen with a light gray background to reduce contrast and thereby reducing afterimage effects. Similar to the auditory stimulus, the „dot‟ appeared for 125ms followed by a 375ms light gray background. This constituted the 500ms (120bpm, 2Hz) baseline frequency for the visual pulse. Two experimentally manipulated conditions, two masking conditions, plus one set of catch trails, or control trials, were designed and replicated across both modalities. Unchanged frequencies, or „catch trials‟, were included in the experiment so that participants would not signal a detection on every trial thereby decreasing habitual guessing. The experimentally manipulated conditions all began at a baseline frequency of 500ms intervals (120 bpm, 2Hz) and after at least seven cycles of 500ms (but no more than 14 cycles) featured a sudden 5% increase or decrease in frequency. Masking trials consisted of either a 20% decrease or 30% increase of frequency. Catch trials continued at the base frequency without change. Thus, the design of the experiment was a within-subject 2 (Modality: visual vs. auditory) x 2 (Directionality: decrease vs. increase) design. Eight different change points were used and repeated three times per item making a total of 24 trials per condition per modality. Catch trials totaled 24 and masking trials totaled 50 (24 increases and 24 decreases) per modality. Changes in frequency were created by increasing or decreasing the silent, or „empty‟, intervals between pulses whereby a decrease in interval created an increase in frequency and vice versa. Five percent increases contained an empty interval of 351 ms and 5% decreases contained 410 ms of silence between pulses. Changes occurred only once during the stimulus. Cycles continued for exactly eight pulses after the frequency change ending with a silent interval. For this reason, the window of time for detection of the change was different between conditions ranging from three to five seconds. While each modality was tested separately, all trials including the masking and catch trials were pseudo randomized.
Procedure Participants were tested individually using E-Prime 2.0 software (Schneider, Eschman, & Zuccolotto, 2002), wore full-coverage headphones with external volume control during the auditory portion of the experiment, and were left uninterrupted in a quiet room for the full duration of the experiment. Participants were instructed to adjust the volume to their preference at the beginning of the practice session. Half of the participants began with the auditory portion of the experiment while the other half began with the visual portion. Participants were provided with written instructions to press the mouse button as soon as they detect a change in frequency of the stimuli. They were informed that some trials would be more obvious than others while
some trials would not change at all. Instructions were followed by a practice session featuring one to two trials in each condition including masking and catch trials. Each portion of the experiment required about thirty minutes and participants were able to take a break between auditory and visual portions of the experiment as well as after every 40 trials within each modality. After completing 120 trials, participants were tested in the other modality. E-prime experiment generator software controlled stimuli presentation, timing, recorded responses and reaction time (Schneider, Eschman, & Zuccolotto, 2002).
Experiment II: Combined Modalities Participants 21 participants (3 male) took part in the study with mean age 25.3 (20-36 years; 4.7 SD). Participants with imperfect vision wore corrective lenses or glasses. One participant had visibly noticeable amblyopia but performed better than average in the experiment. No other visual deficiencies were present in the participants and all had unimpaired hearing. All participants had little to no musical experience eliminating experts in tempo discernment. Three participants had taken part in experiment I at least three weeks prior to participation in this experiment.
Stimuli and Design The only difference between the stimuli of this experiment and of experiment I is that the visual and auditory pulses were combined to make one audio-visual pulse consisting of the 1.25cm black circle and 440hz square wave tone in controlled synchrony for 125ms. The same two conditions (decrease vs. increase) were used as in experiment I as there were no other changes to the stimuli between experiments. The use of E-prime experiment generator software ensured the physical simultaneity and congruency of the auditory and visual stimuli each beginning and ending in parallel.
Procedure The procedure matched that of experiment I although requiring half the time, about 30 minutes, due to the combination of modalities. Participants were instructed simply to press the mouse button if and when they detected a change in frequency of the stimuli while ignoring unchanged catch trials. Participants were not influenced to add any unnatural attention towards visual or auditory modalities as it was up to their discretion in how they detected the changes in frequency. This is important to note as it provides a natural approach to detection and does not force unnatural attention to any particular modality. This, in addition to non-competitive temporal frequencies, provides a natural perception of frequencies as experienced outside of the laboratory.
Results Experiment I: Separate Modalities The data of four participants were withdrawn from analysis after accuracy calculations showed that two had performed under chance level as their d-prime analysis (d‟) resulted in scores of less than 1.1 in all conditions within both modalities. The other two participants were excluded from analysis because they did not detect any changes in one of the experimental conditions. The following results are from the remaining 16 participants. Dependent variables included reaction time (RT), number of pulses to detect temporal change (NoP), and accuracy of detection (measured in percent-hits and d‟).
RT analysis Reaction time analysis was based on correct responses only. For a response to be correct, detection was required to occur after the onset of frequency change. This eliminated trials that did not feature a frequency change as well as trials where participants signaled detection prematurely to the programmed frequency change. RT was measured from the point of frequency change in each stimulus to the point of detection by the participant. For increases of frequency the initial pulse after the increase is naturally presented sooner than in baseline cycles allowing for immediate discrimination from the baseline frequency. Decreases in frequency, however, naturally present the initial pulse following the decrease later than expected. For this reason, all RT measurements in the decrease condition were tailored by subtracting the baseline frequency interval (375ms) from the final RT since it is impossible to detect this change in frequency during this initial span of time. It is only after this initial 375ms that the frequency actually changes in decreases conditions as the empty interval become longer in duration and timing mechanisms may begin to detect this change in frequency. Table 1 shows mean RT and SD per condition in both sensory modalities. A 2 (Modality: visual vs. auditory) x 2 (Directionality: decreases vs. increases) repeated measures analysis of variance (ANOVA) on item and subject RT means resulted in concurrence with preferred tempo research (Moelants, 2002) having no main effect of increases or decreases from the 120 bpm (2Hz, 500ms interval) base frequency (Fi (1, 7)=0.84; p>0.7; Fs (1, 15)=0.23; p>0.8). Furthermore, overall auditory detection of the frequency changes did not differ from visual performance (Fi (1, 7)=2.98; p>0.1; Fs (1, 15)=4.26; p>0.056). An interaction was found between modality (auditory vs. visual) and directionality (increase vs. decrease) (Fi (1, 7)=14.26; p0.1; Fs (1, 15)=3.11; p>0.09). In further agreement with RT measurements, the interaction between modality (auditory vs. visual) and directionality (increase vs. decrease) (Fi (1, 7)=14.95; p0.9) Table 1: means and standard deviations RT (ms) (SD)
Experiment 1
Percent correct as well as d-prime (d‟) scores were examined in analysis of accuracy while both measurements were statistically analyzed separately. Here, „percent correct‟ is defined as the percentage of experimental trials where participants signaled a detection after the change of frequency. In signal detection theory, these are known as „percent hits‟. For a response to be correct, detection had to be signaled after the experimentally manipulated onset of frequency change and before the end of the stimulus, exactly eight cycles after frequency change. Because d’ scores matched the accuracy measurements, they are not reported herein. Due to the design of both experiments, percent correct measurements are more appropriate than d’ scores. Sixteen participants executing 240 trials each resulted in 3840 sets of data. Control conditions, or catch trials, accounted for 768 of these trials, while masking conditions accounted for 1536 trials. The remaining 1536 trials were independent variable trials. From the control conditions, featuring no change of frequency, 78% (598 trials) were correctly rejected. From the frequency-change trials, 43% (660 trials) were detected, 48% (731 trials) were missed by participants, and 9% (145 trials) signaled detection prior to the experimentally manipulated change of frequency. Though these figures seem to hover at chance level of performance, it should be noted that in similar designs, 5% changes from similar baseline frequencies have been detected (Jongsma, 2007). Five percent change of frequency is not easily detected even among professional musicians (Danz & Janyan, 2009). Strategies of pure guessing also would have resulted in a greater number of incorrect detections prior to the experimentally manipulated change. For a comparison with the much easier masking trials, 30% increases were detected with 84% accuracy and 20% decreases with 85% accuracy. A 2 (Modality: visual vs. auditory) x 2 (Directionality: decreases vs. increases) repeated measures analysis of variance (ANOVA) on item and subject accuracy means obtained a main effect of directionality (increase vs. decrease) resulting in much greater accuracy in the frequency-increase conditions (Fi (1, 7)=427.25; p=0.000; Fs (1, 15)=50.83; p=0.000). A main effect of modality (auditory vs. visual) was also found resulting in auditory frequencies having higher accuracy (Fi (1, 7)=14.96; p