Procedural Methods for Audio Generation in Interactive Games Stefan ...

Procedural Methods for Audio Generation in Interactive Games Stefan Rutherford Leeds Metropolitan University 60 Beechwood View, Leeds [email protected] +44 (0)7533951352 1 Introduction A number of fields within the game industry use procedural methods, however, audio is yet to take advantage of many of the benefits procedural methodologies can provide. This report aims to establish an understanding of the different procedural methods used for audio content generation, allowing those who wish to utilise procedural methods for audio to gain a holistic understanding of the current methodologies available to them. This will include the potential strengths and weaknesses of each process whilst referring to current examples of each.

designer will process and manipulate these sounds using digital audio workstations. This processing and manipulation allows the sound designer to create the desired sound effect for a given scenario. After this, systems of playback with varying degrees of complexity are created according to the requirements of a particular circumstances. Through the use of the aforementioned and the use of digital signal processing (DSP), audio data is allowed to be perceived as reactive to a given in-game action. For example, as a player walks forward in a game, footsteps can be heard. These footsteps are likely to be a set of audio samples that are randomly selected and varied, using pitch and volume automation to reduce repetition*. [5] Andy Farnell provides a good description of these methods with the following diagram:

The report aims to advocate wider use of procedural methods for audio in interactive games so that the benefits of these methods can be received with the grander aim of improving interactive audio and interactive games in general. 2.1 Data Driven Audio Before addressing the concepts behind procedural audio, current techniques used for interactive audio in games must be understood. The implications of the current methods will be explored. This exploration will allow questions to arise that procedural methods may contain the answers for. Currently, sound designers or sound recordists will record sounds as audio data. The sound

Fig. 1 Audio workflow in games [7] This method requires audio data to be stored on disk and any audio required to be played back instantaneously (i.e. needs to happen quicker than can be streamed from disk), such as footsteps or gunshots, must be stored in RAM. The Game Audio Tutorial makes the following point: ”Some of your sounds will play off disk, where they still have to compete for space with all the graphics assets, but many will

*There are many other established methods for making the most of audio data using playback systems and DSP although this will not be tackled in any further depth in order to remain focused upon the topic at hand .

need to be loaded up into RAM to be able to play instantaneously when needed. Ram costs money. Ram is limited.” [5] If the audio was produced procedurally, there would be little, or no requirement for audio data to be stored In RAM. Andy Farnell presents four critical reasons as to why audio data cannot account for the different scenarios and challenges presented by interactive games. Growth: "Sound is an interactive phenomenon, so there is not a fixed number of sounds belonging to one object." [6] Using audio data can only account for a discrete number of events for any action or object within a game. Relationships: "The number of realistic couplings that lead to distinguishable excitation methods and consequent sound patterns is still large." The excitation of an object may happen in a number of ways. For example, it may be struck directly, hit with a glancing blow, scraped and forced into excitation in many other ways. The excitation may occur in a different medium such as water, or even when the propagating medium is moving causing the Doppler effect. [6] Each of these things affects how a sound is produced and how it actually sounds. Representing this with audio data requires vast amounts of audio data and DSP to account for changes in the environment. Combinations: A simple hypothetical implementation of footsteps in a first person shooter may need to take a number of factors into account. Firstly, the different textures the player may walk upon should be considered including wood, grass, metal and concrete etc. Secondly, the different characters in the environment require different footstep sounds in order to distinguish between different characters/to provide authenticity. Finally, sounds for walking,

running and other various pacing need to be addressed. Farnell writes, “taken as discrete states, this matrix of possible sounds quickly grows to the point where it is difficult to satisfy using sample data." [6] In essence, to account for all of these features, the number of audio samples and the amount of data size required can soon become very large. Access: "Heavenly Sword, a recent PS3 title boasts over 10GB of audio data. 10GB of data requires careful memory management, placing data at reasonable load points and forcing the flow of play to be over-structured. This moves us towards a linear narrative more like a film than a truly random-access ”anything can happen” virtual world." [6] Each of these problems equate to large amounts of audio data that a sound designer must produce in order to accurately represent a virtual world through sound. This audio data then has to be stored in the game files and loaded into RAM at the appropriate times causing linearity due to the limitation of hardware capabilities. As we will see in the following sections procedural methods for audio attempts to alleviate these issues. Procedural Content Generation Now that an understanding of the problems that data driven approaches to interactive audio presents has been established, an exploration of models which rely less heavily, or not at all, on data will be discussed. In order to do this, sound must in some way be treated as a process of creation rather than the playback and manipulation of existing data. When explaining the concept behind procedural methods, Fournel states, “Procedural refers to the process that computes a particular function, procedural content generation [is the process of] generating content by computing functions.” [3] Procedural audio can therefore be seen as

the process of producing audio through some form of synthesis of sound. Utilising synthesis as a tool to produce the audio in a game, can address some of the issues the data driven approaches suffer from. Nicolas Fournel summarises the times when procedural content generation is useful. He states: “Procedural content generation is used due to memory constraints or other technological limitations.” Fournel continues and claims that it can also be “used when there is too much content to create, when we need variations of the same asset and when the asset changes depending on the game context.” [3] Referring back to the section on data driven audio, these benefits address the disadvantages of the data driven approach. Procedural Methods for Audio The following sections will discuss the procedural methods for audio in interactive games whilst also outlining the strengths and weaknesses of each. Data Driven Approaches to Procedural Audio Whilst some procedural methods do not require audio data, there are methods which make use of audio data combined with procedural methods. Data driven procedural audio methods allow sound designers to produce sounds, in the same way that they would usually, using current approaches. If we refer to figure 1, a data driven procedural approach replaces the parameter mapping stage of development with a procedural method. The following sections shall investigate these.

Nicolas Fournel describes this method: “Top down, you analyse [the] example of the sound you want to create and you find the adequate synthesis system to emulate them.” (Nicolas Fournel. 2010). Partial Re-synthesis Partially re-synthesising a sound involves some form of deconstruction and analysis of audio data. The elements suitable for the intended synthesis model are found, synthesized and subtracted from the original audio. The parts of the data not suitable for the intended synthesis model are still stored as audio data. The synthesized and residual data parts of the sound are then re-combined and played back together at the appropriate time in a game. For example, the game “Crackdown 2” uses modal synthesis to synthesise the modal components of a sound effect. "We identify peaks in frequency corresponding to resonant modes. The parameters for the modes can be stored quite compactly. We synthesise the sinusoids associated with each mode and sum them up to obtain the modal component of the sound. Subtracting this from the original clip produces a residual that consists mostly of noise. The residual is often quite short and can be clipped resulting in significant compression." [8]

Top Down Approach (analysis and re-synthesis) The ‘top down’ approach involves analysis of existing audio data with the intention of using suitable methods of synthesis to either partially, or fully, reconstruct that sound.

Fig. 2 Spectrograms of an original source sound, the modal part and the residual part. [9]

Randomisation of the synthesised modes allows variation of a sound. This can be achieved using only a single piece of audio data where, if you were using a data-only approach, many variations of the same asset would be needed . This saves the sound designer time that could be better spent elsewhere; it also reduces the requirement for audio data to be stored in RAM which addresses many of the problems that data driven audio has.

computational cost compared to the data driven approach and the partial re-synthesis approach.

Full Re-synthesis In a similar method to that of partial resynthesis, fully re-synthesising a sound requires an analysis stage. However, one difference is that the synthesis model produced does not rely on any residual audio data to reconstruct the sound. The only data required is the data used to control the synthesis model (for example the frequency and amplitudes of modes). Figure 3 shows how audio data can be analysed using various methods, these are then fitted into suitable synthesis models. Using the appropriate synthesis models (this can be a single model or several depending on the complexity of the audio intended for resynthesis) the sound can then be reconstructed. The difference between full and partial resynthesis is partial re-synthesis will select a single synthesis technique such as modal synthesis and rely on residual audio data to reconstruct the non-synthesized part of the sound. Full re-synthesis will use combinations of synthesis models to re-synthesize a sound fully without relying on residual audio data to reconstruct the sound. Full re-synthesis allows further reduction in the data required. Once a model is created from the audio data it completely synthesizes the sound this allows the model to be highly parametric. These parameters can then be adapted to game parameters allowing the synthesis to react intimately with the parametric variables provided by a game engine (whether it relates to physics modelling or otherwise). [10] A disadvantage of this system is it's increased

Figure 3. Audio analysis and modeling. [10] Granular Synthesis In some cases granular synthesis can be used in interactive audio to reduce the required data size and increase the amount of variation that can be achieved from a set of audio data (addressing two key issues from data only approaches). This technique allows "sampled sound to be segmented into small elements known as grains and played back in virtually limitless combinations" [11] The use of granular methods varies depending on the intended audio data. For example car engines in racing games often use audio data of a car accelerating and decelerating. The grain selection is then matched to section of audio that matches (in revolutions per second) the ingame parameter for revolutions per second. The common data driven approach for car engines would be to use several loops each with a steady RPM. These loops would then be pitch shifted and cross faded between as the car in a game accelerates. Not only does this method require a larger amount of audio data (to account for all of the loops at different RPM) but it is a less accurate representation of a cars acceleration sound when compared with the granular method.

Granular methods can also be used for crowd sounds. "Crowd backgrounds and chants can be made much more dynamic by variation of the voices using granulation." [11] The disadvantage of a granular approach is the large voice count required and the difficulty in defining control data for the granulation [11]. It is also only suitable for a select few applications (such as those discussed here). Bottom Up The ‘bottom up’ approach uses physical modelling to create appropriate sound synthesis a models to construct a sound. “Sound is, in the end a physical phenomena. So you could go and simulate everything on a machine. This gives you lots of automation.” [4] Essentially the sound is created according to the physical model in the game, this allows sound to react according to the exact parametric conditions in a game. It also allows the game engine to automatically generate these sounds according to the physical model in a game engine. "Sound effects are produced automatically, from 3D models using dynamic simulation and user interaction." [12] Andy Farnell states that "One of the great advantages is that it gives 90% of your assets for free. You just put your objects in the world and you get default sounds” [2] This eliminates the need to create a large amount of the audio assets as Farnell states the “problem is how to provide the colossal amounts of content required to populate virtual worlds for modern video games.” [6] Because this model requires no audio data at all throughout the process the sound designers are the allowed to tweak the synthesis models to create the desired result. After this a sound designer is then liberated to focus on the most important aspects and/or sounds on a project. Of all the procedural methods for audio discussed in this report, the bottom up approach provides the best solution to the data size/amount problem discussed earlier. Sound

designers do not need to create thousands of assets from scratch and the game does not have to store any assets in RAM in order to play them (as they are synthesized). This approach does have some disadvantages compared to other procedural methods. In order to successfully create adequate physical models the skills required are wide reaching. Nicolas Fournel states in order to create such systems an individual or team would need to have an intimate knowledge of "audio synthesis, mechanics, animal anatomy, physics etc..." [10] Finding individuals, or creating teams of individuals with all of the required knowledge to successfully implement top down approaches risks being a very costly. Nikunj Raghuvanshi states a disadvantage of this method: “It takes a lot of computing, you could not [currently] do this at run-time.” he also states another disadvantage “There is no space for any artistic control over sounds that are produced or propagate.” [4] However Andy Farnell suggests that this might not be the case: “Every Procedural Audio team would need a good sound designer. I wouldn’t leave it to the programmers, I want somebody who has a great set of ears and I would actually put them in a higher position and get them to direct the programmers and say, ‘No its more like this, listen to these examples. I want to get this emotion across’, and they can direct it aesthetically.” [2] Another reason why this method may be hard to adopt is due to the requirement of a highly detailed physics engine to feed the sound systems all the required information. The physics produced by a typical game engine may not simulate all of the physics required to adequately inform the sound systems. "The foremost requirement is the presence of an efficient dynamics engine which informs the sound system of object collisions and the forces involved" [13] This would require

Combined Approach

Conclusion

The combined approach uses a combination of the top down and bottom up approaches. Nikunj Raghuvanshi explained in a recent discussion how an implementation of this may look like:

This report has explored and discussed approaches to procedural audio in games outlining each of the methodologies and their advantages and disadvantages. It has been found that there is a trend towards higher computational cost at run-time for techniques less reliant on existing audio data.

"One first takes a bottom-up approach, does the physics offline, and generates data. This replaces the standard asset-acquisition stage with computation. This is followed by a topdown pass that analyzes the generated data and fits it to parametric, perceptual models. The game designer can then tweak the default parameter values, but left alone, they still correspond to physically-consistent values. The run-time is then procedural and real-time." [14] This method produces a middle ground between the two methods allowing some of the advantages of each method to be received. Due to the fact that the top down stage is offline and used for the sole purpose of data acquisition, and suitable synthesis models are created using the top down approach, the computational cost at run time is only the cost of running synthesis models created using a top down approach (which is computationally less expensive). The benefits at run time are exactly the same as the top down approach. A disadvantage of such an approach is the infrastructure required to implement it. In order to effectively use this methodology all of the infrastructure for both bottom up and top down approaches would have to be invested in. Of all the procedural methods for audio discussed in this report the combined approach requires the most infrastructure to successfully implement. Another disadvantage is this methodology suffers from all of the disadvantages of a bottom up approach discussed earlier (with, as previously stated, the exception of computational cost).

The report has also discovered that there remains some disparity between practitioners as to which methods should be adopted. The report provides readers with knowledge of the procedural methods for audio in interactive games giving practitioners in the field the ability to better understand each method and decide the next steps in interactive audio. Future work lies in exploration of the disparity between practitioners attempting to better understand each point of view. This work would aim to provide an understanding of points of view across the field resolving in a consistent direction as to what procedural methods for audio should be adopted for a given set of circumstances in the future of interactive games. Bibliography [1] Andy Farnell. (2007). Synthetic game audio with Puredata. [2] Andy Farnell. (2012). Procedural Audio: Interview with Andy Farnell. Available: http://designingsound.org/2012/01/procedural -audio-interview-with-andy-farnell/. Last accessed 21/1/2012. [3] Nicolas Fournel. (2010). What is Procedural Audio?. Available: http://www.gdcvault.com/play/1012704/Proce dural-Audio-for-Video-Games. Last accessed 20/1/2012.

[4] Nikunj Raghuvanshi. (2011). Sound Synthesis in CRACKDOWN 2 and Wave Acoustics for Games. Available: http://www.gdcvault.com/play/1014416/Sound -Synthesis-in-CRACKDOWN-2. Last accessed 19/1/2012. [5] R. Stevens, D. Raybould (2011). The Game Audio Tutorial. UK: Focal Press. p33, pxvii. [6] Andy Farnell. (2007). Synthetic game audio with Puredata. Available at: http://obiwannabe.co.uk/html/papers/audiom ostly/AudioMostly2007-FARNELL.pdf [7] Andy Farnell. (2007). An introduction to procedural audio and its application in computer games. Available: http://obiwannabe.co.uk/html/papers/procaudio/proc-audio.html. Last accessed 02/05/2012. [8] Brandon Lloyd, Nikunj Raghuvanshi, Naga K. Govindaraju. (2011). Sound Synthesis for Impact Sounds in Video Games. Proceedings of the Symposium on Interactive 3D Graphics and Games. p1-7. [9] Brandon Lloyd, Nikunj Raghuvanshi, Naga K. Govindaraju. (2011). Sound Synthesis for Impact Sounds in Video Games. Available: http://www.youtube.com/watch?feature=playe r_embedded&v=8ozyAKYo118. Last accessed 3/5/2012. [10] Nicolas Fournel. (2011). Procedural Audio: Challenges and Opportunities. Available: http://www.proceduralaudio.com/papers/GDC%202011%20%20Audio%20Boot%20Camp.pdf. Last accessed 20/4/2012. [11] Leonard J. Paul. (2011). GRANULATION OF SOUND IN VIDEO GAMES. AES 41st International Conference. p1-5.

[12] Kees van den Doel, Paul G. Kry, and Dinesh K. Pai. (2006). FOLEYAUTOMATIC: Physicallybased Sound Effects for Interactive Simulation and Animation. SIGGRAPH conference proceedings. [13] Nikunj Raghuvanshi, Ming C. Lin. (2006). Interactive Sound Synthesis for Large Scale Environments. University of North Carolina at Chapel Hill. p1-8. [14] Nikunj Raghuvanshi. (2012). Procedural Audio Discussion. Available: http://stefanrutherford.tumblr.com/post/1640 4174383/tops-down-bottoms-up-what-can-wedo-for-procedural#disqus_thread. Last accessed 10/05/2012.

Procedural Methods for Audio Generation in Interactive Games Stefan ...

Procedural Methods for Audio Generation in Interactive Games Stefan ...

Suggest Documents

Procedural Methods for Audio Generation in Interactive Games Stefan ...

Polynomial methods for fast Procedural Terrain Generation

Polynomial methods for fast Procedural Terrain Generation

Interactive Procedural Building Generation Using ...

Procedural Audio in Computer Games Using Motion Controllers: An ...

Procedural Content Generation for Games: A ... - Distributed Systems

Procedural Content Generation for Real-Time Strategy Games

Procedural Generation of 3D Caves for Games on ... - Julian Togelius

Procedural Content Generation for Games: A ... - Distributed Systems

Procedural Content Generation in Games - University of California ...

Mixed Reality Meets Procedural Content Generation in Video Games

Audio for Games

Procedural Content Generation for GDL

Procedural Urban Environments for FPS Games

Procedural Urban Environments for FPS Games

Inconsistency Robustness for Scalability in Interactive ... - Stefan Marr

Interactive Evolution for the Procedural Generation of Tracks in a High ...

Procedural generation of populations for storytelling

Experience-Driven Procedural Music Generation for ...

Procedural shape generation for multi ... - Purdue Engineering

IFI7108 Digital Interactive Audio

SCENARIOS PRODUCED BY PROCEDURAL METHODS FOR ...

Procedural Generation of Mediterranean Environments

Interactive Games - FTSM - UKM