Developing a 3D Sound Environment for a Driving Simulator - CiteSeerX

Developing a 3D Sound Environment for a Driving Simulator Ronald R. Mourant and David Refsland Virtual Environments Laboratory, Northeastern University 334 Snell Engineering Center, Boston, MA 02115-5000 USA [email protected]

Abstract. This paper describes the modeling and rendering of sounds in a 3D virtual environments driving simulator. The sounds modeled included the engine noise and the tire squealing of a human-controlled vehicle, and the engine noise of autonomous vehicles. Both the engine and tire squealing sounds are based on a sophisticated vehicle dynamics model. We also modeled the siren of a police vehicle using the Doppler Effect. Other sounds such as vehicle wind noise, beeping of the vehicle’s horn, the clicking of activated turn signals, and the noise when vehicles collided, were played using audio clips. The sound effects encapsulated in the virtual environment represent many sounds that can make a profound impact on a user’s overall experience. It is the collection of all these sounds that helps produce a believable virtual driving environment. Further, by engaging the user with more and more interactivity with the virtual reality system, the more the user will have an immersive experience.

1. Introduction Sound plays an important role in the realm of driving. Wind and engine noise contribute to fatigue in drivers who have logged many hours. Sirens and horns grab our attention away from the task at hand. Traffic noise can also affect a driver’s state of being and decision-making. Tires squealing are indicators that the car is being pushed towards its handling limits. When developing a realistic driving simulator, a three-dimensional sound environment also needs to be modeled. The sound software described in this paper models major sound sources in the driving domain. A variable engine noise is created that is based on engine speed and throttle input. Tires squeal when the lateral loads become too great. Wind noise increases as the car’s speed increases. Autonomous vehicles navigate on the roads, themselves sources of noise. A collision with an autonomous vehicle results in a loud crash sound. A horn and turn signals are available for the operator to utilize. In all, the sound software adds another level of detail and fidelity to the simulated driving environment. To create a more complete auditory driving environment, other miscellaneous sounds must be added to give the user a more immersive experience [1]. These

sounds, however, are not static; they are dynamically activated by either user input or by the state of the vehicle. The sounds we have included in the environment are a user-activated horn, a user activated turn signal noise, wind noise, and a warning noise indicating that the user’s vehicle has wandered out of its suggested lane. 2. Hardware Environment A dual 800Mhz Pentium III Gateway E-5400 was used for all of the software development. The computer was equipped with 256M of physical memory, a Creative SoundBlaster Live! Sound card, and an NVidia GeForce3 Graphics Card. Sound was outputted on a pair of Sony MDR-V150 headphones. 3. The Java 3D API The overall goal in the development of a sound API is to develop a more immersive experience for simulator users. This means creating a soundscape that encapsulates a majority of the noises experienced in the real world. We used the Java 3D API [2] for sound modeling and rendering. The class Sound is abstract and has subclasses, BackgroundSound and PointSound. The physical sound data is contained in the class MediaContainer. The sound data file may be located on your computer, on the Web, or be a continuous input stream. Class Sound provides the usual methods associated with playing sounds, i.e. setting the initial gain, looping, pausing, etc. A BackgroundSound node defines an unattenuated, non-spatialized sound source that has no position or direction. It has the same attributes as a Sound node. This type of sound is simply added to the sound mix without modification and is useful for playing a mono or stereo music track, or an ambient sound effect. More than one BackgroundSound node can be simultaneously enabled and active. We used a BackgroundSound node when generating tire squealing. The PointSound defines a spatially located sound source whose waves radiate uniformly in all directions from a given location in space. It has the same attributes as a Sound object with the addition of a location and the specification of distancebased gain attenuation for listener positions between an array of distances. A sound's amplitude is attenuated based on the distance between the listener and the sound source position. A piecewise linear curve (defined in terms of pairs of distance and gain scale factor) specifies the gain scale factor slope. We used PointSound nodes when generating engine noise for autonomous vehicles and simulating the siren on a moving police vehicle.

4. Engine Sound Noise and Wind Noise The sound level of a particular engine speed is determined through linear interpolation. A set of engine speeds and their corresponding sound intensity levels are created. The intensity level of a particular engine speed is calculated by linearly interpolating between the sound levels of the key engine speeds it falls between. Using this method the intensity curve can be as detailed or as simple as one desires; the larger the set of the engine speed keys, the greater the resolution of sound intensity values. The dynamic engine note is a multi-threaded system that calculates on the fly both engine speed and selected gear based on vehicle speed and operator inputs. Below, figure 1 represents a top level abstract of the engine note design. There are two components of the engine note at this high level – the Engine Speed Calculation Unit and the Engine Sound Node. The Engine Speed Calculation Unit encapsulates all of the physical parameters of the automobile and contains threads that continuously calculate the current engine speed based on the vehicle’s dynamics model [3]. The Engine Sound Node reads the engine speed information, along with the operator’s throttle input, and appropriately sets the engine note’s intensity.

Engine Speed Calculation Unit

Engine Speed, Throttle Pos

Engine Sound Node

Figure 1. High level engine sound design.

Essentially, the Engine Speed Calculation Unit takes in the vehicle’s current dynamic state, the user’s inputs, and the vehicle’s mechanical properties and passes to the Engine Sound Node the engine speed and throttle position. The Engine Sound Node then adjusts the sound output accordingly. To create a realistic engine sound, a multiple speed transmission coupled with a shifting algorithm is required to have a dynamic engine note that holds true to a real world model. Since our driving simulator is an automatic transmission, some sort of logic must be developed to keep the transmission in the proper gear, as occurs in real automobiles. The transmission logic looks at two states in the dynamic model, current engine speed and throttle input. Let us first ignore driver inputs. The function of a transmission in automobiles is to keep the vehicle’s engine within its power and torque peaks. The transmission will try to keep the engine speed between two values. If the engine speed increases beyond this range, the transmission shifts up, if it decreases below this range, it will downshift. Driver’s inputs are also incorporated, however. If the driver is on the throttle, the transmission will delay the shift until a higher engine speed to promote quicker,

shiftless acceleration. It will also downshift when a user applies the throttle at an engine speed below the shifting threshold. Wind noise depends on the vehicle’s velocity. The vehicle’s current velocity was used to determine the gain of the wind noise with the actual wind noise being played as an audio clip. 5. Tire Squeal Sounds During driving, a vehicle experiences many different loads that affect the vehicle’s speed and direction of travel. These loads can be large, as automobiles are anywhere from one to three tons of metal. However, it is through the tires and their contact patch that these loads influence the vehicle’s dynamics. Forces acting upon the tire’s contact patch are what cause the car to accelerate, decelerate, and turn. The tire is simply using the frictional force between the tire and the road surface to influence the vehicle’s motion. However, the driver can ask for more than the tire’s contact patches can deliver. If this occurs, the tires will lose traction, causing the familiar tire squeal. If the user applies too much throttle, the moment applied to the wheel from the engine will overcome the moment that the tire contact patch can exert, resulting in the tire breaking traction and squealing. If the driver brakes hard enough, the moment applied to the wheel from the front brakes can overcome the maximum frictional force of the tire. This results in the wheels locking up, loosing traction, and making a squealing noise. Finally, a vehicle committing to a hard turn can also cause the tire squeal to occur. A vehicle traveling in a circle is in effect accelerating perpendicularly to the direction of motion, toward the center of the turn. If the vehicle is to maintain this path, the tires must supply that acceleration to the vehicle’s mass. As this acceleration increases, so to does the lateral load acting upon the tire. Eventually, the tires will begin to make noise with a high enough load. The first mode of tire slip modeled in this software is that of lateral loading. A vehicle, just as any object in motion, experiences acceleration normal to the direction of travel when in a turn. As with all changes in the vehicles dynamic state, it is the tires and their contact patch with the ground that must affect this vehicle turn and therefore this centripetal acceleration. The path of the turn dictates the amount of centripetal acceleration that is needed for the vehicle to hold the specified turn. A smaller radius turn will require a greater centripetal acceleration. As mentioned above, the tires have to bring this acceleration about through their frictional forces at the point of contact with the ground. There are limits to the forces these tires can exert on the vehicle. The limits are a function of the tire’s rubber compound, the road surface, the temperature, the vehicle’s suspension geometry, the vehicle weight and weight distribution, to name a few factors. When a turn’s required acceleration is more than what the tires frictional forces can deliver, the tire begins to slide on the road surface and tire squeal develops. Assuming that all the factors are fairly constant values, we can simplify things by stating that above a certain centripetal acceleration, tire squeal

will occur. The current lateral acceleration is calculated and based on this acceleration the tire squeal is either activated or deactivated. Tire squeal can also take place when a car is accelerating from a stop. As with tire squeal caused by centripetal acceleration, a car’s tires will squeal during acceleration due to the engine’s torque overwhelming the friction between the tires and the road. There is a limit to the frictional force that inhibits wheel spin; if the torque of the engine overcomes that threshold force’s moment about the rear axle, the tires will spin until equilibrium is met. Further a car in motion can develop tire squeal if its wheels lock up during breaking. This occurs when the braking systems develop more torque on the car’s axle than the frictional force of the tire/ road combination. In both cases forces are acting upon the vehicle’s axles that the frictional force of the tire’s contact patch cannot completely counteract. The result is a situation in which the engine or brakes lock the wheel up and the tires slide on the road rather than roll with the road. This situation causes the tires to make quite a bit of noise. Tire squeal can be accurately be dynamically activated based on the dynamics of the vehicle. Parameters such as the threshold lateral acceleration can be easily adjusted to account for different vehicle properties such as tire material, vehicle weight, and suspension changes. The software allows great flexibility in tuning the sound and its requirements for activation for a particular application. 6. Sounds Associated With Autonomous Vehicles The autonomous vehicles in our virtual driving environment are programmed to follow a predefined course dictated by the software at a predetermined speed. As one notices in the real world, a sound’s intensity decays as it propagates outward. The further the listener is from the sound’s source, the weaker the sound appears to be. This phenomenon is known as sound attenuation. Attenuation has a number of sources, first and foremost is the spreading of the acoustic energy. In threedimensional space, sound from a point source emanates in all directions in spherical wave fronts. As the wave fronts move farther away from the source, their surface area increases, as defined by the equation SA = 4πr 2 . The increasing surface area and constant acoustic energy mean that the sound wave energy is increasingly diluted on this front as the surface area increases. The intensity, therefore, is defined by the equation below. P Intensity ∝ 4πr 2 P = Powerinitial

While sound waves travel through the air, the properties of the medium in which the waves travel are as important as the waves themselves. These properties determine the sound wave’s speed; generally, sound waves travel faster in a denser

medium. In addition, the medium has two effects on the wave that contribute to its attenuation, those being the absorption and scattering of acoustic energy. Sounds added to a Java 3D virtual environment by default have no distance attenuation associated with them. However, the Java 3D Sound API incorporates distance attenuation into its sound model by allowing the programmer to define the distance attenuation curve, the relationship between the sound’s acoustic gain and its distance from the listening position. Accuracy requires that the laws mentioned above be followed, but it is up to the user to define the resolution of the attenuation curve. The PointSound class contains a member method that is used in defining this curve. PointSound.setDistanceGain( float[] distance, float[] gain ) This method matches distances with intensity gain values contained in the two arrays. The obvious pairing would be 0 distance and a gain of 1, since the attenuation would be null at a distance of zero. These two arrays can be of any length, so long as they are of equal length. Linear interpolation is used to determine points between programmer specified points. Therefore, more points lead to a greater resolution in defining the attenuation curve. It is up to the user to specify these points, but to accurately model real world intensity decay, the points should follow the relationships mentioned above. Figure 2.2 is a graphical representation of this methodology.

Figure 2. A PointSound attenuation curve.

The Given matrices for the PointSound Constructor would be: distance = [ 10, 12, 16, 17, 20, 24, 28, 30 ] gain = [ 1.0, 0.9, 0.5, 0.3, 0.16, 0.12, 0.05, 0.0 ] It is therefore up to the programmer to use distance attenuation pairs that accurately define a realistic attenuation curve. The Java 3D API allows the programmer the flexibility to modify these to suit the particular application.

7. Sound Spatialization

A human’s brain is able to determine the source of a sound it perceives by analyzing the input it receives from each ear. To create an immersive experience, sounds in the virtual environment must replicate this physical phenomenon. That is, they must be given a sense of locality. To do this, the Java 3D API must manipulate the speaker outputs to trick the brain into perceiving that the sound is coming from a specific location is space. To do this, it is first important to understand how the brain itself determines the location of sound sources. Humans are able to determine a sound’s source in the horizontal plane by comparing the signals the two ears are sending to the brain. There are two fundamental characteristics of the signal that the brain analyzes to determine location, the time difference between the two signals, or inter-aural time delay, and the intensity difference between the two signals, or the inter-aural intensity difference. To explain why the brain uses each of these characteristics, let us look at a few simple examples. Let us assume that a sudden sound comes at you from your left side. Since sound travels at a constant finite speed, it will hit the left ear first and the right ear second. The distance between the two ears determines the time difference between the signals of the two ears. This is what is called the inter-aural time delay. From this delay in signal, the brain can calculate where approximately the sound is coming from in the horizontal plane. A sound straight ahead will have little to no inter-aural delay, a signal directly left or right of the person will have the largest delay. This case is complicated a bit when a constant sound is taken into consideration. Since the sound is constant, the brain cannot know the initial arrival times of the sounds on each ear. The inter-aural time delay must therefore be determined some other way. All sounds are pressure waves that travel through the medium of air. In simplistic terms, these sounds can be thought of as sine waves of varying frequencies. Like any sine wave, there are peaks and troughs. Since the signal to each ear is identical in shape, the brain determines the inter-aural delay by analyzing the phase of each signal. The brain compares the time these peaks are observed by each ear and measures the time difference. In the case of a sound coming from the left side, a peak in the waveform will hit the left ear first and the right ear second. This is all well and good for lower frequency sound, since the sound’s wavelength is longer than the distance between the two ears. At high frequency sound, however, more than one peak can exist within the distance between the ears. At a frequency of 20,000 Hz, which is approximately the upper limit of human hearing, one cycle covers 1.7 cm, a distance that can fit between your two ears multiple times. Now, the relationship between sound location and phase shift no longer is a simple one, as the time difference between peaks is no longer deterministic in calculating location.

Sound waves at frequencies higher than 2000 Hz therefore must rely on another characteristic of the sounds to determine location. This is where the inter-aural intensity difference comes into play. A person’s head will naturally block some of the sound from one of the ears when the sound’s source is coming from the left or right. A sound coming from the left side will produce a much larger signal from the left ear than from the right ear, since the listener’s head will block much of the signal to the right. In summary, sounds with frequencies between 20 and 2000 Hz use the inter-aural time delay to localize sound; sounds with frequencies between 2000 and 20000 Hz use the inter-aural intensity difference to localize sound. Together, these two processes constitute the duplex theory of sound localization. 8. Doppler Shift

The final physical phenomenon that must be addressed in developing a realistic soundscape is modeling the Doppler shift that occurs to a sound whose source is moving relative to the listening position. Sound moving toward a listener has a perceived shift upward in its pitch; sound moving away has a perceived shift downward. Before discussing the Java 3D API’s handling of the Doppler shift, it is important to first discuss this phenomenon in detail.

Figure 3. Doppler Shift diagram.

Imagine that a sound is moving to the left as represented by the arrow in the diagram above. As this sound is a waveform, imagine that each of the circles represent a peak in the waveform. These peaks move at a constant rate of speed uniformly away from the point at which the sound was emitted. Peak 1 was released when the sound source was at location P1, Peak 2 at location P2 and so on. Pitch is similar to frequency, that is, the rate at which the wave oscillates. The higher the pitch, the more frequently these waveform peaks are incident upon the listener. As can be seen from the diagram above, if the sound is traveling toward a listener, i.e. the listening point was on the left, the peaks will arrive more frequently and the pitch will appear higher than if the sound source was stationary.

Conversely, if the sound source was traveling away from the listening location, i.e. on the right side of the diagram, the peaks will arrive less frequently and the pitch will appear lower than if the sound source was stationary. This effect of a sound’s source’s velocity on its perceived pitch is what is known as the Doppler Effect. Java handles the relationship between velocity and pitch in the following manner. The equation below determines the perceived pitch of the sound. S ( f )' = S ( f ) − [DS ⋅ (DV W ( f , Dh) )] Where: Dh = Distance from sound source to center ear. Ds = Doppler scale factor from AuralAtribute class. Dv = Doppler velocity (between listener and sound source.) F = Frequency S = Sound source frequency. W = Wavelength of sound source based on frequency and distance. The Doppler Effect has also been used by Heidet, et.al. [4] in their sound rendering of an automobile simulator. 9. Summary and Conclusions

The rationale for developing this three-dimensional sound software package is to approximate the real world to the best of the hardware and software’s ability. The rapid development of computer hardware has brought about an opportunity to work in a new realm altogether, that of audio. Creating a soundscape to match the richness of the visuals of the driving environment leads to a more holistic model of a driving environment. All simulators to date have a sense of artificiality about them; that is to say, a user rarely, if ever, mistakes a conventional simulator for reality. Repeated textures, a lack of shadows and realistic lighting, and the limits of our video displays immediately signal to the user’s brain the environment’s artificiality. What developers and designers of future simulators must strive for is removing some of these cues that indicate it is a computer-generated world. More complex and computationally expensive lighting models, higher polygon counts on physical models, and, in the case of our driving environment, a more realistic dynamics model, all work towards minimizing these effects of artificiality cues on user perception. Clearly, the lack of sound would also be categorized as an artificiality cue. Nearly all the real world models that virtual environments try to replicate have a large audio content to them. Creating a rich sound environment will help to minimize the artificial nature of many virtual reality tools. The sound package described in this document attempts to model major sound sources in the driving domain. A variable engine noise was created that is based on engine speed and throttle input. In order to complete this work, the vehicle’s drive train itself had to be modeled. Tires squeal when the lateral loads become too great.

Wind noise increases as the car’s speed increases. A horn and turn signal is available for the operator to utilize. Autonomous vehicles navigate through the roads, themselves sources of noise. Collisions with this autonomous traffic result in loud crashes. In all, this software adds another level of detail and fidelity to the driving environment. And with a more faithful simulator comes more faithful behavioral test results, which are, after all, one of the main motivations behind developing a simulator in the first place. References [1] F. Brooks, What’s Real About Virtual Reality, IEEE Computer Graphics and Applications, 19, 6:16-27, 1999. [2] Henry Sowizral, Kevin Rushforth, Michael Deering, The Java 3D(TM) API Specification, 2d ed. Addison-Wesley Pub Co, 2000. [3] Thomas D. Gillespie, Fundamentals of Vehicle Dynamics, 1st ed. Society of Automotive Engineers 1992. [4] A. Heidet, O. Warusfel, G. Vandernoor, B. Saint-Loubry, A. Kemeny, A Cost effective architecture for realistic sound rendering in the SCANeR II driving simulator. Proceedings of the 1st Human-Centered Transportation Simulation Conference, Iowa City, 2001.