A Graph Editor for Remote Composition of Game Music for Small-Scale Game Development Sebastian Heise, Michael Hlatky, Kristian Gohlke, David Black
Jörn Loviscach
Hochschule Bremen (University of Applied Sciences) Fachhochschule Bielefeld (University of Applied Sciences) Fakultät für Elektrotechnik und Informatik Fachbereich Ingenieurwissenschaften und Mathematik Bremen, Germany Bielefeld, Germany
[email protected],
[email protected],
[email protected] [email protected],
[email protected] Abstract. Similar to a movie’s soundtrack, the music score of a computer game must provide an immersive and captivating experience to the user. Hence, the score needs to adapt and react to the gameplay in real time. Many software tools have been developed to support music composition for interactive computer applications such as games. The most advanced of these tools rely on graphbased representations of a composition using MIDI scores. The graph is formed from nodes, which represent MIDI files, and transitions between them, which are governed by conditions. This describes the music to be synthesized depending on the state of the game and the history of states. Some of these tools even come with an integrated API (Application Programming Interface) and a ready-touse game sound engine to facilitate game soundtrack production. However, none of these tools focuses on the “long tail” market of low-end games for mobile devices or browser-based games, especially those written in Adobe Flash. Such a technically limited environment requires more specific tools, such as the one we demonstrate here.
1. Introduction
provided by the mobile device or inside the Flash Player. Nevertheless, it seems that the development of these small-scale games focuses largely on enhancing the game graphics, whereas the element of game sound is often merely seen as a dispensable extra component that has hitherto been subjected to relatively little research [1].
Catalyzed by increasingly higher bandwidths in recent years, browser-based games programmed in Java or ActionScript, the programming language provided by Adobe Flash, have become hard to distinguish from their counterparts that run natively on desktop PCs. In particular, reimplementations of old arcade game titles have enjoyed renewed popularity. These games are often played for short-lived pleasure or just to pass some time, as they do not plunge a player into an hour-long plot, a deep storyline, or complicated rules. Despite their ostensibly simple design, the popularity of the most successful of these games seems to depend highly on the game designer’s careful attention to detail in graphics, music, sound design, and gameplay.
The immersive qualities of a properly crafted, interactive soundtrack are hardly ever leveraged to their full potential, even though a majority of these games feature built-in stereo soundtracks running in the background with a bit rate sufficient for high-fidelity quality. We attempt to provide a solution for game development situations in which the feasibility of high-level sound engines for interactive game audio that are commonly used in large-scale projects [2][3] is limited. For small-scale projects in particular, the broad functionality offered by these engines is rarely needed. Limited budgets or small team sizes might also prohibit their use. Furthermore, large sound engines commonly rely on lowlevel access to audio hardware. These engines are required to access digital signal processors or other low-level functionality. The targeted platforms for smaller games, such as mobile phones, virtual machines, or the Flash Player, often do not provide this low-level functionality and, in addition, might also be subject to further restrictions in terms of processing power or capability. The development environments for such platforms usually include basic asset management tools. However, these tools are mainly useful in replacing certain game assets rather than arranging or composing game audio. Using such tools, sound designers attempting to create interactive audio scores are required to define the underlying logic directly in the source code, as these asset management tools lack proper interfaces for remote score composition and debugging. The approach presented here enables sound designers to compose the score while playing a game. It offers a solution that bridges the gap between asset management and the production of interactive audio scores.
Nowadays, many of these games are also widely distributed on mobile devices. They can be downloaded onto top models such as the rather powerful iPhone, although these games are often also included with lower-priced devices. In part, this widespread availability can be attributed to lower screen resolution requirements. Developers do not employ very specialized tools for the creation of interactive music scores for these games, as the audio playback capacity of the game’s sound engine is often limited to MP3 files and does not provide access to more advanced features of the audio hardware. Whereas common tools suitable to be used for game editing include highly sophisticated and powerful sequencers, samplers, and virtual instruments, the very unique task of nonlinear or adaptable sequence arrangement has been somewhat neglected by current composition tools. Furthermore, the task of sound design for these games is frequently undertaken by game development specialists who often are not trained in music and composition. This phenomenon results in a prevalence of simple loop structures and singular sound effects that are directly connected with actions in the game. Thus, the audio score is only minimally adaptive and interactive. In addition, the game designer is often limited to using simple graphical effects in the game due to the limited computational resources
-1-
A Graph Editor for Remote Composition of Game Music
2. Related Work
3. Prototype
Human-computer interaction pioneer Bill Buxton describes the use of computers as tools in the compositional process [4]. He distinguishes between dedicated composing programs and tools for computer-aided composition; the former requiring limited interaction with the sound designer (or composer), the latter requiring a much greater level of interaction. A sound engine for games has to combine these properties. Whereas the sound designer creates the game sound, the engine must provide means for creating a unique and distinguishable soundscape or music piece. The composition process itself should be musical in nature in the sense that the sound designer desires a high degree of control by limiting the amount of decision making required by the machine. After the process of composition is complete, the composition has to be dynamically reinterpreted using the current game state as an additional input. The decision of what should be played when is left to the algorithm. In other words, the computer recomposes the soundscape.
3.1. Composition To overcome the difficulties found in many of the existing approaches to creating game music, we have developed a simple node-based graphical composition engine (see Figure 1), which allows playing a game and simultaneously composing its score. Nodes in the software’s interface define sounds to be played back that depend on the current game state. Each node in the graph represents a sound file in one of many common audio file formats and of any duration. Transitions and related conditions can be defined between all nodes (see Figure 2). A condition consists of a comparison operator such as “greater” or “less or equal,” a parameter name, and a static value.
Sanger [5] mentioned the “one-to-forty” problem: Hollywood and the gaming industry are closely related. The same company might work concurrently on a movie and on the game, the movie’s interactive counterpart. A game composer is compensated for one hour of music, the same that a film score composer receives, but producers often forget that music in a game is played much longer than film music. Sanger argued that a topof-the-notch game is played on average for 40 hours. Any kind of music, such as game music, played for 40 hours is likely to become boring. Even if the total numbers change, the ratio between budget and playtime usually remains the same.
Figure 1: Screenshot of the graph-based composition tool.
Freeman [6][7] attempts to make concert performances more interactive by allowing visitors to create a customized violin score. Small musical fragments of the original score could be recombined using a graph-based web interface. Before the actual concert is played, the system generates a linear version of the user-generated score as a simple PDF document that would consequently be played back by the orchestra.
Many game development environments and other software frameworks such as Microsoft XNA and Adobe Flash provide simple Web clients that can be used to send parameters to our software through an HTTP request, requiring no more than a single line of code. During the design stage of the game, the game developer can define triggers, which, if occurring in the gameplay, send a parameter value to the graph editor by means of an HTTP request. During music composition, the graph editor executes the sound state machine and replays the sounds, possibly even on a remote machine. Each time a node’s audio file reaches its end during playback, the conditions are checked against the most recently received parameters, which then determine the next node for playback.
Current state-of-the-art game engines such as Crytek’s CryENGINE3 [2] include a range of built-in tools for designing in-game soundscapes, offering a great amount of flexibility to the sound designer. One of these tools is a graph editor that allows the creation of an immersive, dynamic soundscape that reacts to the player’s position and actions within the game. The industry standard game sound engine FMOD provides an array of similar graphical interfaces for sound designers [3]. Microsoft’s XACT Audio Creation Tool [8] also offers an interface that enables sound designers to manage assets and create interactive game audio, though with lesser functionality than FMOD. The availability of such tools to developers and sound designers is, however, often restricted to very large projects. Other software tools that allow for the composition of interactive music, such as Nodal [9] or IMTool [10], are primarily based on editing MIDI data. However, the integrated application programming interfaces (APIs) would not allow direct communication with a browser-based game, essential for interactive sound design. MIDI scores to be synthesized on a player’s computer in Flash are withal problematic, as both Flash games and Java based games run in a virtual machine. The Flash player, which does not yet support the creation of multithreaded applications, accordingly allows for no extra computationally-heavy audio rendering tasks. Most CPU time is devoted to rendering the graphical display of the game, as the graphics subsystem of the host hardware offers little support. The Java Virtual Machines found on most mobile phones are hampered by similar limitations.
Figure 2: Node and Transition editor
During music composition, the sound designer can play the game and simultaneously follow the game states and the sounds chosen for playback that depend on the graph. The sound de-
-2-
A Graph Editor for Remote Composition of Game Music
3.3. Game Design In order to preliminarily test and assess the capabilities of our system, we reimplemented a version of the well-known game Pong [11][12]. Pong seemed suitable to demonstrate a sound engine for several reasons. Firstly, the game employs very simple and unobtrusive graphics (see Figure 3) and thus only exhibits a minimum amount of graphical distraction. The player can better focus on the soundscape and on the overall gameplay experience. Secondly, the game also plays very well with almost no sound at all. The only sound that is necessary to be present in the game is the sound of the ball hitting the paddle. If the frequency at which this diegetic sound event occurs increases, the players’ tension might rise accordingly. However, the game does not have any narrative elements nor does it rely on sounds that might otherwise influence the players emotionally. Therefore, all emotions evoked by the game largely originate from the gameplay action itself. We exploit the auditory lacuna present in this game to test our sound engine. Thirdly, Pong’s underlying game concept can be instantly understood, even by people who have never played any computer games before. This allows the sound engine to be tested with players who might have no training in computers, thus permitting a comparative evaluation of different sound engines with users from many different backgrounds.
signer can modify both the audio files and the sound graph in the editor without ever modifying the game code. Different volume levels can be set, different sound files can be chosen and tested, and transitions from one file to the next can be changed seamlessly. The entire graph layout may also be changed without having to wait until the game developer recompiles the game or builds new resource files. This allows the soundtrack to be adapted in real time to the game throughout the entire development process. When the interactive music composition process is finished, the composed sound graph can easily be exported in an XMLbased file format, ready to be parsed into the game’s state machine decision tree. In a distribution version of the game, all HTTP calls to the composition engine would be removed at compile time of the release candidate and replaced by corresponding function calls to the built-in music state machine, which selects the appropriate audio files to be played during the game. 3.2. Music Content In contrast to linear scores such as those found in films and many computer games, our method both requires and allows a somewhat different approach to game music composition. The composer must combine both traditional, linear techniques with techniques that allow easy, on-the-fly transitions between musical sections of different moods. In order to demonstrate this software, four different pieces of music were composed by a trained musician specifically to test the tool. As the presented composition tool aids in arrangement and not in note-level composition, the pieces to be used with the tool were composed using traditional sequencing techniques. Each of the composed pieces of music corresponds to a unique mood of music. These four moods include idling, uptempo, mellow, and high energy rush. By creating these four pieces, we were able to test the feasibility of the graph-based composition tool in a real scenario.
Following this path of simplicity, the instant playability of our prototype is further enhanced by the use of two custom physical interfaces as means to control the on-screen paddles. These devices resemble the rotary knobs also found in the classic arcade version of Pong [11][12]. However, our controllers allow for endless rotation. They are crafted from a solid block of wood topped with an ice hockey puck that freely rotates on a ball bearing (see Figure 4). The puck is tracked using the optical sensor of a standard optical mouse.
The game state design requires that a piece of music must continue to play until a command is received to switch to another node. Then, the music must transition to the new piece seamlessly and quickly. Therefore, each mood must be a loop of music within which different exit points can be utilized. As the transition time is required to be short, each mood of music contains smaller subsections of music that can be played in any order. The result of these requirements is that each subsection is a variation on a theme, creating structural differences in the overall music when played in different orders. As each is merely a variation, they can each precede the transition music, thus allowing for seamless changes of mood without fading or abrupt starting and stopping of the music. The transitions between each mood of music were specifically composed. We opted for this method because it allows for a more seamless integration of all moods when transitions are needed.
Figure 3: Screenshot of our re-implementation of the classic arcade game Pong.
The combination of the very unobtrusive in-game graphics and a sturdy, easy-to-use haptic controller were chosen in order to make the game as easily accessible and attractive as possible so that the perceived quality of the sound engine could be evaluated for extended gaming sessions.
The benefit of such a method of composition is that it allows for quick and audibly satisfying transitions between moods, which can change rapidly during the course of the gameplay. In addition, moods of music can continue indefinitely, benefitting from the varying loops included in each mood of music. The drawback to this method is that loops of music must be limited to approximately two measures, because quick changes in game mood require that the music also change quickly. Any change less than two measures must wait until the previous music has been completed to transition to a further mood.
Forthcoming user evaluations will use our Pong prototype and hardware controller to determine the plausibility and effectiveness of the game audio in such a playing situation. We will determine whether the musical transitions provided by using our system adequately relate to ever-changing game situations and emotions, and whether they provide an enhanced playing expe-
-3-
A Graph Editor for Remote Composition of Game Music
rience over small-scale games with preset, static audio tracks. Additionally, tests will determine whether the workflow of the sound designer has been improved by using our system.
the same time making the production process more flexible and transparent. In future versions, our sound engine might also include the ability to map sounds to a greater number of different output channels, allowing for the creation of interactive spatial audio fields.
References [1] Ekman, I. Meaningful Noise: Understanding Sound Effects in Computer Games. In: Proceedings of the 2005 DAC Conference, (Copenhagen, Denmark, December 1-3, 2005). IT-University of Copenhagen, Denmark. 2005. [2] Crytek GmbH. CryENGINE 3 - Specifications. http://www.crytek.com/technology/cryengine3 /specifications/ Also see: http://www.crytek.com/fileadmin/ user_upload/cryengine3/CryENGINE_3_Info_booklet.pdf (Last accessed: May 19, 2009) [3] Firelight Technologies Pty, Ltd. FMOD Designer http://www.fmod.org/index.php/products/designer (Last accessed: May 19, 2009) [4] Buxton, William A. S. A Composer’s Introduction to Computer Music. Interface, Journal of New Music Research, Vol. 6, No. 2 (1977), pp 57-72. Routledge, London, UK. 1977. [5] Sanger, G. A.: The Fat Man on Game Audio: Tasty Morsels of Sonic Goodness. New Riders, Boston, MA, USA. 2004. [6] Freeman, J. Graph Theory: Interfacing Audiences into the Compositional Process. In: Proceedings of the 2007 Conference on New Interfaces for Musical Expression, NIME07. pp 260-263. New York, NY, USA. pp 260-263. 2007. [7] Freeman, J. Graph Theory: Linking Online Musical Creativity to Concert Hall Performance. In: Proceedings of the 6th ACM SIGCHI Conference on Creativity & Cognition, C&C ’07. (Washington DC, USA, June 13-15, 2007). pp. 251-252. ACM, New York, NY, USA. 2007. [8] Microsoft Corp. Microsoft Cross-Platform Audio Creation Tool (XACT). http://msdn.microsoft.com/enus/library/cc308030(VS.85).aspx (Last accessed: May 20, 2009) [9] McCormack, J. McIlwain, P. Lane, A. Dorin, A. Generative Composition with Nodal. In: E.R. Miranda (ed.) Workshop on Music and Artificial Life (part of ECAL 2007), Lisbon, Portugal. 2007. [10] Chiricota, Y. and Gilbert, J. IMTool: an Open Framework for Interactive Music Composition. In: Proceedings of the 2007 Conference on Future Play, Future Play ‘07. (Toronto, Canada, November 14.-17. 2007). pp 181-188. ACM, New York, NY. 2007. [11] Kent, Steven L. The Ultimate History of Video Games: From Pong to Pokémon and Beyond, Three Rivers Press, New York, NY, USA, 2001. p 37. [12] Cohen, S. Zap: The Rise and Fall of Atari. McGraw-Hill, New York, NY, USA. 1987. [13] Black, D. Gohlke, K. Loviscach, J. EarMarkIt: An AudioOnly Game for Mobile Platforms. In: Proceedings of the Audio Mostly Conference 2008, (Piteå, Sweden, October 22.-23. 2008). pp 135-137. Proceedings at the Interactive Institute, Piteå, Sweden. 2008.
Figure 4: The puck-controller. An ice hockey puck serves as a heavy, endless rotation knob. The massive wooden body is fitted with the internals from a standard computer mouse that track the axial movement of the puck.
4. Conclusion We introduce a practical, online-based approach to remotely compose interactive music and audio scores for games via the HTTP protocol. To further demonstrate the capabilities of our system, we have implemented an arcade game which allows, in cooperation with our software, quick and easy composition of dynamic game music using pre-produced audio files. Similar approaches can already be found in large-scale specialized game engines, but these are not readily available for small-scale game development. The tools and frameworks commonly used in these small-scale projects were often not designed for developing games, let alone adaptive audio scores, and thus lack many useful features. We present a solution that augments these frameworks with functionality that had been lacking, such as dynamic transitions and a node-based paradigm of sound clips, and separates the technical development of the game from the workflow of creating highly dynamic game audio. Furthermore, our solution provides a large amount of artistic freedom and flexibility to the sound designer. Our software removes common deadlocks that are often symptomatic in such small-scale projects, such as working with sequentially changing and looping audio sequences. Until now, sound designers involved in the game development process had to rely on tools that offered only limited means for collaborative work. The solution presented here allows sound designers and software developers alike to focus on their dedicated tasks independently from each other. This is particularly important in settings in which games are played on mobile devices and must be seen in the context and environment in which they are used. Our solution allows the sound designer to change the game sound instantly while being out in the field without having to access the whole production toolchain. Further user tests will be conducted to evaluate the perceived quality of the sound engine and improve the workflow for the sound designer. In previous work, members of our team presented a prototype of an audio-only game [13] that uses a complex soundscape. The system presented here could have simplified the production process of this and similar soundscape to a great extent, while at
-4-