Dynamic Load Management for MMOGs in Distributed ... - DPS, UIBK

Dynamic Load Management for MMOGs in Distributed Environments master thesis in computer science by

Herbert Jordan submitted to the Faculty of Mathematics, Computer Science and Physics of the University of Innsbruck in partial fulfillment of the requirements for the degree of Master of Science

supervisor: Dr. Radu Prodan, Institute of Computer Science

Innsbruck, 9 December 2009

Certificate of authorship/originality I certify that the work in this thesis has not previously been submitted for a degree nor has it been submitted as part of requirements for a degree except as fully acknowledged within the text. I also certify that the thesis has been written by me. Any help that I have received in my research work and the preparation of the thesis itself has been acknowledged. In addition, I certify that all information sources and literature used are indicated in the thesis.

Herbert Jordan, Innsbruck on the 9 December 2009

1

2

Abstract Throughout the last decade, massively multiplayer online games have become an important section of the video game industry attracting millions of paying subscribers. Thereby, the load generated by providing the corresponding interactive, virtual worlds has to be distributed among hundreds or even thousands of server instances to provide the necessary quality of service. Typically, providers therefore employ over-provisioned infrastructures based on static load distribution schemas. However, due to the high user demand variability of MMOGs, static approaches lead to inefficient resource utilization. Within this thesis, a dynamic load management solution for MMOGs is presented. The solution is realizing a flexible, decentralized, hierarchical approach capable of dynamically adjusting the resource requirements of game sessions to fit the present user demand, thereby preserving the quality of service. Combined with emerging cloud computing technologies the presented approach provides the potential for significantly reducing the costs caused by providing MMOGs. Within this thesis, the potential savings are evaluated using simulation-based experiments. Furthermore, the solutions ability to cope with rapid load changes within games is demonstrated.

Zusammenfassung Innerhalb der letzen zehn Jahre haben sich Massively Multiplayer Online Games zu einem wichtigen Bereich der Videospielindustrie entwickelt, welcher Millionen an zahlenden Kunden bedient. Der notwendige Aufwand f¨ ur die Bereitstellung der entsprechenden, interaktiven virtueller Welten muss dabei auf mehrere hundert Server verteilt werden um eine angemessene Spielqualität zu erm¨ oglichen. In der Praxis verwenden Anbieter dabei statische Konzepte zur Lastverteilung, jedoch f¨ uhren diese aufgrund stark schwankenden Teilnehmerzahlen zu einer ineffizienten Nutzung der verwenden Ressourcen. Diese Master-Arbeit beschreibt ein alternatives, dynamisches Konzept zur Lastenverwaltung. Der verwendete dezentrale, hierarchische Ansatz ist in der Lage die zur Bereitstellung von Spielen benötigten Ressourcen dynamisch an die Anzahl der Teilnehmer anzupassen ohne die Qualität des Spiels negativ zu beeinflussen. In Kombination mit Cloud Computing Diensten ermöglicht dieses Konzept eine erhebliche Kostenreduktion. Das mögliche Einsparungspotential sowie die F¨ ahigkeit des Systems auf schnelle Lastwechsel zu reagieren wird dabei durch diverse simulationsbasierter Experimente aufgezeigt.

4

Contents 1 Introduction

9

2 The Game-Balancing Problem 2.1 The Problem . . . . . . . . . . . . . . . . 2.1.1 Massive Multiplayer Online Games 2.1.2 The Game Loop . . . . . . . . . . 2.1.3 The Zoning Concept . . . . . . . . 2.1.4 The Replication Concept . . . . . 2.1.5 The Instancing Concept . . . . . . 2.1.6 The edutain@grid Project . . . . . 2.1.7 Problem Summary . . . . . . . . . 2.2 Problem Analysis . . . . . . . . . . . . . . 2.2.1 System State Representation . . . 2.2.2 System Requirements . . . . . . . 3 The 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8

System Architecture Conceptual Overview . . . . . . The Hoster Interface . . . . . . The Load Balancing Controller The Algorithm Environment . . The Algorithm . . . . . . . . . The Communication Network . The Load Balancing Server . . The Administration Client . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . . . . .

. . . . . . . .

4 The Load Management 4.1 The Load Management Problem . . . . . . 4.2 The Bin Packing Problem . . . . . . . . . . 4.2.1 The Basic Bin Packing Problem . . 4.2.2 Extending the Bin-Packing Problem 4.2.3 A Generic Heuristic . . . . . . . . . 4.3 Overview on the Balancing Concept . . . . 4.3.1 The Load-Balancing Hierarchy . . .

. . . . . . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . . . . .

13 13 13 16 19 22 24 25 27 28 28 34

. . . . . . . .

37 37 39 41 43 46 46 49 50

. . . . . . .

51 51 55 56 58 59 62 62

5

4.4

4.5

4.6 4.7

4.3.2 Internal Organization of the Load Management The Local Balancing Component . . . . . . . . . . . . 4.4.1 Overview . . . . . . . . . . . . . . . . . . . . . 4.4.2 Retrieving the Current State . . . . . . . . . . 4.4.3 Analysing the System State . . . . . . . . . . . 4.4.4 The Resource Allocation Step . . . . . . . . . . 4.4.5 Load Pattern Reshaping . . . . . . . . . . . . . 4.4.6 Applying Balancing Operations . . . . . . . . . The Global Balancing Component . . . . . . . . . . . 4.5.1 Overview . . . . . . . . . . . . . . . . . . . . . 4.5.2 The Global Load Metric . . . . . . . . . . . . . 4.5.3 On Demand Duty Movements . . . . . . . . . . 4.5.4 Periodic State Evaluations . . . . . . . . . . . . The Session Starter . . . . . . . . . . . . . . . . . . . . Configuration Options . . . . . . . . . . . . . . . . . .

Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

66 68 68 69 71 77 80 81 82 82 82 83 84 85 85

5 The Simulation Environment 5.1 The Load Models . . . . . . . . . . . . . . . . 5.1.1 The Memory Load Model . . . . . . . 5.1.2 The Network Load Model . . . . . . . 5.1.3 The CPU Load Model . . . . . . . . . 5.1.4 The Host Load Model . . . . . . . . . 5.2 The Simulated Infrastructure . . . . . . . . . 5.2.1 Internal Organization . . . . . . . . . 5.2.2 Game Session Profiles . . . . . . . . . 5.2.3 Simulating Balancing Operations . . . 5.3 The Simulation Engine . . . . . . . . . . . . . 5.3.1 Abstracting Time . . . . . . . . . . . . 5.3.2 A Extended Discrete Event Simulator

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

91 91 93 93 95 99 99 101 103 105 106 107 107

6 Experiments 6.1 The Weighting Mechanism . . 6.1.1 The Experiment Setup 6.1.2 Experiment Results . 6.1.3 Conclusion . . . . . . 6.2 Rapid Load Changes . . . . . 6.2.1 The Experiment Setup 6.2.2 Experiment Results . 6.2.3 Conclusion . . . . . . 6.3 A Real World Scenario . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

111 111 112 114 118 118 119 123 129 129

6

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

6.3.1 6.3.2 6.3.3

The Experiment Setup . . . . . . . . . . . . . . . . . . . . 129 Experiment Results . . . . . . . . . . . . . . . . . . . . . 135 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 141

7 Related Work

143

8 Conclusion and Future Work

149

List of Figures

153

List of Tables

155

Bibliography

159

7

8

Chapter 1 Introduction Since the dawn of computer technologies, games have always been a popular type of application. Thereby, especially since the 1980’s, video games have driven advances in technological due to their wide popularity and high demands on resources. During the second half of the 1990’s, the growing availability of internet connections allowed game designers to support multiplayer game sessions over the network. Quickly a new type of online game emerged. Today’s massively multiplayer online games (MMOGs) allow thousands or even millions of concurrent users to interact with each other inside consistent, virtual worlds. The popularity of MMOGs has grown rapidly throughout the last decade. Figure 1.1 illustrates the total number of subscribers registered to some of the most popular games (from [1]). In 2008, more than 16 million paying users throughout the world have been playing MMOGs. Thereby, the marked is clearly dominated by World of Warcraft [2], which had more than 10 million subscribers in January 2008 and its popularity is still growing. Nevertheless, beside the marked leader, several other titles have been able to attract more than one million paying subscribers and dozens of smaller MMOGs have achieved more than 100.000 subscriptions.

Total Current Subscriptions

18.000.000 16.000.000 14.000.000 12.000.000 10.000.000

8.000.000 6.000.000 4.000.000

2.000.000

World of Warcraft RuneScape Lineage Lineage II 41 Others

0

Figure 1.1: Total Active MMOG Subscriptions from [1]

9

To provide a consistent, virtual environment for millions of concurrent players, large dedicated multi-server infrastructures consisting of hundreds or even thousands of computers are required [3, 4, 5, 6]. Thereby, the game load caused by managing participating clients needs to be distributed among the available resources. However, the quantity of gamers contributing to the same game sessions is highly dynamic, both on the short and long term. For instance, the number of active players is highly depending on the time of day [3]. Further, the popularity of MMOG titles is varying on the long term [1]. Therefore, providers have to over-provision their infrastructure to cope with this dynamic characteristic. Unfortunately, this simple approach is leading to an inefficient utilization of resources, which is increasing the costs for maintaining game sessions [3]. The high costs for maintaining the necessary infrastructure needed to provide MMOGs is making it difficult for new companies to enter the market. This problem has been identified within [3] and an alternative concept based on data centers offering resources on demand has been proposed. Within such a system, the allocated amount of resources can be adapted to the actual present load. Hence, resources only need to be paid when they are actually required. Therefore, short as well as long-term changes in the user demand can be compensated. In addition, large fix costs are transformed into smaller variable costs. Hence, the economic risk of providing MMOGs can be significantly reduced. Further, this approach follows a common trend toward providing services using cloud computing [7]. Within this thesis, a dynamic approach for managing the workload of game sessions is presented. The provided solution is capable of dynamically adjusting the number of server instances required for maintaining a game session based on the current user demand. Further, the system is mapping the derived necessary instances to the available infrastructure such that certain objectives are fulfilled. Most commonly, those objectives will aim on keeping the costs for maintaining the overall game session as low as possible. The presented solution is based on the services developed by the edutain@grid project [8]. Beside other features, this project provides a framework for game developers, which is realizing the distribution of game server workload among multiple machines. Additional services developed by this project support administrative tasks, thereby including abstract means for manipulating the load distribution of games build upon the offered framework. Those controls are utilized by the system presented within this thesis to realize its tasks. Hence, the demonstrated solutions may be applied to any game title based on the edutain@grid software components.

10

The work to be presented within this thesis is divided into a scientific and an engineering part. The latter thereby includes details on the software solution developed for providing and testing the resulting session management service using the Java programming language. Thereby, a rough overview on the overall architecture as well as a view details on some of the involved components will be provided. The scientific part on the other hand is focusing on evaluating the devised load management algorithms build on top of the developed architecture. Therefore, a simulator capable of emulating the behaviour of MMOG session has been implemented and applied for conducting various experiments focusing on different properties of the algorithm. Within the following chapter, a basic introduction on the internal aspects of contemporary multiplayer games and its architectures is provided. Further, techniques offering the possibility of distribution game load among multiple server instances are presented. Thereby a special focus is put on those concepts supported by the edutain@grid project. Further, based on the covered details, the problem to be solved by this thesis is concretised. In addition, the entities to be considered by the session management algorithm are identified and definitions for various terms used throughout this thesis are provided. Finally, a list of requirements on the resulting solution is derived. Within chapter 3, the software architecture of the presented solution is covered. Thereby the various components of the devised distributed solution as well as their dependencies will be described. Besides laying the foundation for the resulting system, the architecture is also implicitly solving some of the requirements derived within the previous chapter. Chapter 4 finally describes the algorithms devised and evaluated for this thesis. It is therefore providing an abstract overview on the basic concept followed by a discussion of the combinatorial NP hard bin packing problem. An extended variant of this problem is forming the foundation of the devised algorithm. The chapter concludes by describing some internal aspects of the algorithms’ implementations including the set of supported configuration parameters. Chapter 5 is covering details on the simulation environment developed for evaluation the algorithm’s abilities within the experiment chapter. The resulting simulator is capable of emulating the behaviour of game sessions. Therefore, load models describing the resource usage of game server instances have been defined and are described within this chapter. Further, the modifications applied to the load management system to perform experiments using the concept of discrete event simulations are summarized. The resulting simulator has been used to conduct several experiments, which are covered within chapter 6. Thereby, different aspects of the devised algorithm

11

are investigated. For instance, one of the experiments is aiming on evaluating the achievable cost reductions when using the proposed dynamic load management mechanism within real world situations. Therefore, the algorithm is confronted with user demands derived from the popular MMOG RuneScape [9]. Further, the infrastructure provided for maintaining the simulated game session is modelling Amazon’s Elastic Computer Cloud [7]. The achieved cost reductions as well as some quality of service parameters are presented. Finally, after a short chapter describing some related work, this thesis concludes by summarizing its achievements and contributions. Further, some areas for potential future research based on the results of thesis will be enumerated.

12

Chapter 2 The Game-Balancing Problem The most essential foundation of every load balancing solution is the possibility of influencing the load distribution within the managed system. Therefore, a profound understanding of the manipulated environment and its influencing factors is required. Massive multiplayer online games exhibit some very specific characteristics due to their common architecture, which lead to some basic concepts for distributing load. Both, the basic architectural structures of MMOGs and the most essential load distribution schemas applied within this area will be covered within the following sections. This chapter is divided into two sections. The first part will discuss the problem of distributing game sessions among multiple nodes. After providing insides on the middleware to be extended by the load management and balancing functionality, the section is concluding by summarizing the major problems to be solved by this thesis. The second part of this chapter analyses the given problem and defines some terms to be used throughout the rest of this thesis.

2.1 The Problem 2.1.1 Massive Multiplayer Online Games MMOGs allow hundreds or thousands of players to interact simultaneously within a large-scale persistent virtual environment. The representation of participants inside the simulated world depends on the game genre. While in the most popular game types like MMO role-playing games (MMORPGs) and MMO firstperson shooters (MMOFPS) each player is personified by a single avatar, there are genres like MMO real-time strategy (MMORTS), where each participant may control hundreds of units. However, common to all MMOGs is the encouragement of players to interact with each other to influence the game progress. Beside human controlled entities, additional independently acting characters managed by the game engine may populate the game world. Human players can interact with these non-player characters (NPCs) or bots for achieving ob-

13

jectives within the game. Beside interactive entities, additional passive elements like collectable items and static game map objects are involved in the game world simulation. A multiplayer game engine’s task is to enforce the complete set of game rules within the virtual environment and to provide a consistent view on the simulated world to all participants as it progresses in time. One of the first design decisions when realizing such an engine is to decide where to process game world adjustments and how to communicate those to all players.

Peer-to-Peer

Client/Server

Figure 2.1: Example Multiplayer Online Game Topologies Figure 2.1 illustrates the two elementary topologies available for distributed multiplayer game sessions. Due to their inherently distributed nature, one option is to compute simulation state updates in a peer-to-peer like fashion. Peers might compute changes to their local part of the game world caused by their own avatars and forward those to the other participants. Within an alternative approach for small game sessions, each peer maintains the entire virtual environment state and distributes only collected user inputs to all other nodes. The received user commands are then used by the peers to keep the local game state copy up to date. The decision between those two concepts is a trade-off between network and CPU load [10]. However, due to scalability issues, only the first proposal can be applied to massive multiplayer online games since a single node is no longer capable of maintaining the entire game state. On the other hand, this approach is particularly vulnerable to cheating attempts. Since each peer would be responsible for the state of the local player’s avatar and the nearby environment, illegal modifications resulting in an advantage for some participants are difficult to prevent [4]. In contrast to the P2P approach, within the client-server topology there is only one process maintaining and updating game state information. This central authority is collecting user inputs from the clients and provides state update information on a periodic basis. Clients act as end user terminals by presenting

14

the current game state and forwarding user reactions. For small game sessions, one of the client processes can carry out the server responsibility. However, this double burden is limiting the available resources for the server. Therefore, some games provide the possibility of running dedicated server processes (e.g. Unreal Tournament). A hybrid approach is given by the multi-server topology as sketched within figure 2.2. Within this architecture, the single server is replaces by a network of strongly connected nodes. Each of those is managing only a subset of the participating clients. If properly designed, the game load can be distributed among the involved servers. As a result, the total number of clients supported within Peer-to-Peer Client/Server a single game session may be increased significantly. A detailed investigation of all three concepts is given in [10].

Figure 2.2: Example Multi-Server Topology Although there have been commercially successful games realizing the P2P approach [11, 6], today’s predominating paradigm is the client-server concept involving multiple server instances if necessary. Especially within massive multiplayer online games, the multi-server-based solution is omnipresent. This is justified by both, technological and business considerations. The complex problem of synchronizing P2P solutions and the associated scalability issues make server based approaches more preferable. In addition, from a business perspective, the server based architecture offers several advantages by separating the server and client implementation. End users receive the client application while the server code can still be protected. Using their part of the game, customers may then participate to large game sessions hosted by the MMOG providers. By authenticating players, the access to the game can be restricted to registered users. Pirate copies of the client application remain useless. In addition, vendors

15

may collect a subscription fee for allowing players to participate. Finally, game providers have the possibility of collecting game play characteristics to improve the game where needed to increase customer satisfaction and binding. To be commercially successful, MMOGs need to be capable of handling several thousands of players within one game session. Since the resources of a single server are no longer sufficient for such situations, techniques are required for distributing load among multiple nodes. The following sections will present approaches accomplishing this goal.

2.1.2 The Game Loop Although real-time games appear as continuous applications, engines are simulating the game progress using discrete time steps. Within each of those steps, the properties of all involved dynamic elements are updated based on the previous state, the issued user commands and the game rules, sometimes referred to as game logic. For instance, the position of a player’s avatar in a succeeding step depends on its previous position, its speed and its heading. User inputs obviously have an influence on the latter two variables. Further, game rules, e.g. preventing players from walking through walls, will have an additional affect. Hence, the entire game simulation can be modelled as a sequence of state transitions. The associated transition function takes the current state as input, together with the set of user commands issued since the last state update. Its result will be the new state, which needs to be forwarded to all players such that future actions can be determined by them. To provide the illusion of a continuous game interaction, games evaluate state transitions at a high frequency. The associated configuration property is commonly referred to as the tick rate of a game. Some engine implementations allow defining this rate during the creation of a new game session, while a few are even capable of automatically adjusting it during game play. Typical values for first person shooters range between 10 and 60Hz. For lower paced role-playing games a frequency of 0.5 to 1Hz could be sufficient [10]. Receive and process user Commands

Update virtual game world state

Distribute updated state information

Figure 2.3: Basic Game Loop Schema Based on this concept, the core of a game engine consists of a single, infinite real-time loop computing the previously described state transition sequence.

16

Figure 2.3 illustrates the simple schema of such a game loop as it is used even within sophisticated computer games like Quake, Half-Life or Unreal Tournament [10]. Each iteration starts by collecting user inputs from the participating clients. After validating those commands, affected entity properties will be updated. Within the second step, actions of non-player characters are determined and applied. Additionally, periodic events triggered by game rules are performed. For instance, items, which have been collected previously, might be made available again. Within the final step, the game state modifications will be forwarded to the clients. After all steps have been completed, the game loop is stalled until it is time to evaluate the next tick. A special focus should be put on the last phase of a game loop cycle. Since the number of state modifications grows approximately linear with the number of players, the amount of data to be transferred to each of the clients during the last step might become extensive. For this reason, many games introduce filters reducing the volume of state update packages send to clients to an approximately constant size. The reduction is based on the fact that within all real time games players have a limited area of influence as well as confined sensing capabilities. Therefore only updates regarding entities within the area of interest of a player are forwarded to the corresponding client. Such information filtering methods exploiting temporal and spatial localities of avatars are known as interest management techniques [12, 13]. Beside the goal of reducing the amount of transferred update information, it also aims on preventing cheating attempts. If the server would provide state information for the entire game world, a corrupted client application would be able to provide the player more information than actually allowed. This could lead to an inadmissible advantage. However, when the information is filtered on the server side, even manipulated game clients cannot provide additional information. Quality of Service and Resource Requirements For supporting fluent game play, sufficient resources need to be provided by the server infrastructure. For once, the host running the server application must provide sufficient computational power to allow the game loop to finish its cycles within the interval determined by the tick rate. Naturally, the time required for computing one state transition depends on the amount of work caused by the various involved steps as well as the speed of the executing processor. When analyzing the steps processed during each loop iteration, it can be observed that most of them depend on the number of involved clients. Unsurprisingly it can be concluded that the amount of CPU time consumed per iteration is increases with the number of managed players. By having a closer look to the involved

17

tasks it can be found that several of them bear at lest the theoretical potential of being executed in parallel. Although traditional game engine implementation are focusing on maximizing the performance within a single thread, recent approaches aim on exploiting this parallelism by distributing tasks among multiple cores of contemporary architectures [14]. Independently from the actual realization, a critical path can be identified which is determining the time required for computing a game tick. The ratio between this duration and the bound stated by the tick rate is known as the game loop saturation. For instance, figure 2.4 is illustrating the processing of a 60% saturated game loop. one iteration

n

working

n+1

idle

n+2

start/end game loop cycle

n+3

n+4

Tick Count / Time

Figure 2.4: Work Rhythm of a 60% Saturated Game Loop Formally, the saturation can be defined by sat =

twork twork = · tickrate tcycle 1sec

(2.1)

where twork represents the time required for processing the critical path of a game loop and tcycle = 1sec/tickrate quantifies the time available for each cycle. For providing fluent game experience, this saturation must not exceed 100%. However, as long as the saturation remains below this level, participants will not notice any differences. Hence, at least in theory, games will not appear more responsive when having a saturation of 10% than when requiring 90% of the available time. Beside computational resources, game servers instances require also sufficient network bandwidth. Incoming and outgoing messages need to be processed quickly enough to avoid congestion, which would introduce additional delay or cause messages to be lost. Again, the amount of network resources required by game server instances depends on the number of players. Due to interest management features, the load increases approximately linear with the number of clients. However, incoming and outgoing network load is not balanced equally. User command packages typically contain very little information while game world updates tend to be significantly larger.

18

A final resource requirement to be covered within this section is the amount of memory needed to maintain the game state. While game engine variables, game world information and code fragments occupy an approximately constant amount of memory, the storage needed for handling player information naturally grows with the number of players. To avoid the negative effects of thrashing, the memory requirements of game sever instances should not exceed the amount of physical memory installed on the hosts they are executed on. Based on this knowledge, the intuitively expected relation between server load and client counts can be justified. The more players a server is handling, the higher is its load. Therefore, distributing work between multiple server instances corresponds to disseminating clients. However, due to the strict quality of service requirements introduced by the tick rate, some constrains have to be considered. For instance, players currently involved in a mutual real-time interaction, e.g. seeing or fighting each other, should be handled by the same server instance. This way, no additional delay is introduced due to the otherwise necessary interprocess synchronization and communication mechanisms. Since in general, realtime interactions only occur between game entities located close to each other, scalable load decomposition schemas have to consider the proximity of players’ avatars within the game world. Some well established concepts solving this problem are covered within the following subsections.

2.1.3 The Zoning Concept One of the most successful and straightforward load decomposition schemas for massive multiplayer game servers is the zoning concept [4]. It follows the divide and conquer paradigm. The main idea is to split up the entire game world, which is too large to be maintained by a single server, into smaller partitions as illustrated within figure 2.5. Each of the resulting zones is small enough to be handled by one game server instance. Therefore, the game world simulation is distributed among multiple nodes by starting up several server instances, each responsible for maintaining one of the derived zones. Clients are assigned to servers according to the location of the avatars they are controlling. By reducing the responsibility of a server from maintaining the entire world to a smaller fraction, the workload can be effectively reduced. All three types of resources discussed within the previous subsection are affected. Therefore, this simple schema provides a powerful concept for realizing a scalable game server infrastructure. However, one of the major drawbacks of this partitioning schema is the need of a client to migrate between servers whenever its avatar is moving to another zone. During this process, the client has to establish a new

19

Figure 2.5: The Zoning Concept

connection to the target server and player state information has to be transferred between the old and the new game server instance. The delay caused by this procedure is usually covered within the game by showing some loading screen or an animated sequence. Another challenge is the realization of real-time interactions between entities located within different zones, hence across zone borders. Since such interactions require the state maintained by at least two servers, additional communication overhead is introduced. One simple approach to circumvent this problem is to use a game map partitioning within which such situations cannot occur. For instance, if the virtual world consists of a set of small islands, each of those can be considered as a small mini-world. Assuming there is no real-time interaction between avatars located on different islands, e.g. entities on different islands are out of sight of each other, inter-zone actions are effectively prevented. Further, moving from one zone to another can be modelled within the game by using a boat, an airplane or another type of teleportation portal. However, in this case the combined zones do not really form a continuous game world. The environment can be considered as a set of separated rooms participants may enter. Mirroring/Replicating

Instancing

Supporting Seamless Virtual Worlds To realize continuous game maps, the distributed simulation engine has to provide support for inter-zone interactions to make the virtual world seamless. To do so, servers managing adjacent zones may share the responsibility of maintaining the state of entities located close to the common borderline. If this

20

jointly managed border area is wider than the maximum range of interest of involved avatars, all interactions across zone borders can be supported. Figure 2.6 illustrates this concept of overlapping zones. Within implementations that are even more sophisticated, entering the border area will trigger the first steps of the migration process. Therefore, if an avatar is actually crossing the borderline, the migration can be executed with a minimum overhead. Ideally, this will eliminate the need of loading screens. Hence, server changes are transparent for the end users [15].

player area of interest

overlapping area

Figure 2.6: The Principle of Overlapping Zones Most implementations require a static game world partitioning. Theoretically, the game world can be divided arbitrarily. However, especially the size of the resulting zones has to be determined carefully. Large zones bear the risk of overloading the servers responsible for maintaining them. On the other hand, too small zones will lead to frequent server changes when moving through the virtual environment. Within systems not supporting overlapping zones, this will lead to numerous loading screens, disturbing the game play. In the case overlapping is supported, small zones will lead to a bad ratio between overlapped and non-overlapped areas. If the size of a zone drops below the area of interest of an avatar, the entire zone will be redundantly maintained by servers managing adjacent zones. Further, the server managing the small zone has to maintain information about the surrounding area, which in total would be several times larger than the actual zone itself. As a result, forming small zones is of little use when overlapping is enabled. Therefore, the increased synchronization overhead and the high grade of redundant computations would render the concept inefficient [5]. A final drawback of the zoning concept to be discussed within this section is the lack of load balancing support. Since zones are statically defined and managed by a single node, there is no way within the original concept to move load from one node to another. For instance, if 10 out of 12 zones are almost empty and a large number of clients are gathering around a few hot spots within

21

the final two zones, the game load is highly unbalanced. Nevertheless, based on the client dissemination schema covered within this subsection, there is no possibility to resolve this situation. Considering the pro and cones presented so far, the zoning concept especially fits the requirements of MMORPGs. These games have very large maps supporting the creation of a high number of big zones while avatars have a usually moderate area of interest. Hence overlapping areas would be small if needed. Further, players are distributed throughout the entire game map completing quests and only a few hotspots are present. Finally, since the tick rate required for providing fluent game play is low compared to FPS and RTS games, a high number of clients can be maintained within a single zone before the processing power becomes a limiting factor. Therefore, the lack of a missing load-balancing concept can be compensated by over provisioning resources. On the contrary, FPS and RTS games are usually based on much smaller game maps. Additionally, the area of interest might be huge and the game world needs to be seamless. For instance, snipers within FPS depend on this property. Further, the game concept is encouraging players to converge at certain locations to fight epic battles. As a result, a high number of entities will concentrate on a relatively small area. Finally, since these games are additionally very fast-paced simulations, a high tick rate is required to provide the illusion of a quickly responding environment. Therefore, processing power is easily becoming the limiting factor. All this considerations make the zoning-concept only a semi optimal solution for these kinds of games [5]. An alternative approach trying to improve upon these shortcomings is presented within the following subsection.

2.1.4 The Replication Concept To solve the emerging problems when applying the zoning concept to FPS and RTS games, an alternative proposal has been made within [5]. Instead of dividing the already small game world of this type of applications into even smaller fractions, it simply suggests distributing the responsibility of managing the units within the game among multiple nodes. Therefore, the set of all game entities is partitioned into smaller subsets. Each involved server instance gets one of these sets assigned. The contained elements are considered active entities on the corresponding node. Hence, the node is responsible for updating the dynamic properties of those elements. This also includes maintaining client connections and process issued commands for avatars. For all non-active entities a shadow copy of the remotely managed unit is maintained. An example involving two server instances is illustrated within figure 2.7.

22

active entity

shadow entity

Figure 2.7: The Replication Concept involving two Server Instances

During each tick, every server is updating the state of all of its active units. In an additional step, the new state information is then forwarded to all other servers participating in the game so that they can update the state of their shadow copies. The approach scales since it has been shown that updating shadow units based on messages received from the responsible server can be performed much faster than computing the actual state changes itself [5]. Unlike the zoning concept, the replication-based approach does not affect all resource requirements equally. It is mainly focusing on the CPU load, which can be effectively reduced since the number of clients and actively maintained units is kept low. However, it has no effect on the memory consumption of the server processes since every instance has to maintain the entire game state. The effect on the network load caused by a server instance dependsInstancing on additional Mirroring/Replicating influence factors. For once, the amount of data transferred between a server and its clients is significantly reduced since fewer clients are managed. However, additional data streams are added between the server instances. It depends on the implementation which effect is dominating. The fact that the requirements of some resources are not reduced by this concept is limiting its scalability. For instance, huge massive multiplayer game sessions including several thousand game entities can no longer be managed due to their extensive memory consumption. In addition, the size of the game maps is limited due to the high amount of memory required for their representation.

23

Based on those observations, this approach is most suitable for small to moderate sized FPS and RTS games. For these games, the available computational resources are the most limiting factor due to their high tick rates. Thankfully, CPU load can be effectively handled by the replication concept. One essential benefit of this approach when compared to the zoning concept is its ability to support load-balancing operations. While in the previous concept the assignment of clients to the servers is predetermined by the location of their avatars within the game world, the replication concept allows to choosing among multiple servers. Therefore, the CPU load can be effectively balanced among the participating nodes. Even further, the number of involved servers can be modified based on the current load. If the processing power of the momentary engaged nodes is insufficient, new nodes can be added dynamically. On the other hand, if the load levels fall, machines can be removed to increase resource utilization of the remaining instances. Hence, this concept allows saving money by adapting the required resources to the actual present user demand. Still, even game worlds managed based on this concept may become overloaded. For instance, if the time required for processing shadow copy updates exceed the tick duration, adding new replicas will not solve the problem. A solution for this situation is presented within the following subsection.

2.1.5 The Instancing Concept The final load distribution schema for multiplayer games to be discussed within this chapter is the instancing concept. This method can be orthogonally applied to both previously discussed concepts. It simply states that in case a zone within the zoning concept or the entire map within the replication-based approach is overloaded, a new zone or game map instance is created as a last resort. The principle is best described by an example. Assuming a system is using the zoning concept. If one of the zones tent to be overloaded, an additional server responsible for the same zone can be started. However, unlike within the replication concept, those two zone instances are completely separated from each other. There is no interaction between the two versions. Hence, avatars managed by different instances of the same zone are not able to see each other even if they are close. The observable behaviour is equal to the case in which players would participate to different game sessions running the same game map. However, if separated players would move back to an area that is not affected by the instancing approach, they would see each other again [15]. This drastic but efficient measure might be applied to certain hot spots, which are usually not used for interactions between users. For instance, market places where items can be bought from non-player characters might be frequently visit

24

Figure 2.8: The Instancing Concept involving two Server Instances places. However, real-time user interactions might not be essential for those areas. Nevertheless, due to the disturbing side effects observable for end users, this approach should only be applied as a last resort.

2.1.6 The edutain@grid Project The edutain@grid project is a research project supported by the European Union [8]. Its main objective is to provide the benefits offered by high performance grid computing technologies to applications outside the academic world and large industry. A special focus is put on the requirements stated by real-time online interactive applications, hence simulated virtual environments as they occur within multiplayer games. Therefore, the project is developing a service-oriented middleware capable of efficiently distributing the responsibilities of maintaining virtual environments using grid resources [8]. The Real-Time Framework

Mirroring/Replicating

Instancing

The foundation of the project is formed by the real-time framework (RTF) [15]. This programming library aims on providing sophisticated solutions to game developers allowing them to distribute MMOG sessions across multiple nodes. It thereby integrates all three-load distribution schemas discussed within the previous subsections. Therefore, game worlds are primarily distributed according to the zoning concept. Overlapping regions can be defined to create seamless virtual environments. Moreover, each zone might be managed by more than one server instance. The responsibilities between multiple servers maintaining

25

the same zone are shared based on the replication concept. Finally, subareas following the instancing proposal can be defined. Because every zone of the game map is managed by RTF according to the replication principle discussed earlier, new load balancing options are introduced. Beside the means discussed within the section on the replica concept, server instances managing zones may now be moved to other nodes during runtime. This can be archived by starting up a server on the target side, transferring all clients to the new instance and shutting down the original replica. This additional dynamic, which is not supported within the basic zoning concept, allows adjusting the amount of resources allocated for maintaining game sessions based on the current user demand. An additional feature offered by the RTF is the possibility of extracting monitoring information from inside the game loop. Hence, performance values like the game loop saturation or the number of active entities can be retrieved in a generic way, independently from the actual implemented game. The Management Server Beside this essential framework for distributing game load, additional resource management services are required for maintaining applications within a gridbased infrastructure. Within the business model of the eduatin@grid project, the available grid resources are provided by companies or institutions running data centres, clusters or clouds. Each participating resource provider has its own administrative domain that needs to be maintained. Therefore, a set of management tools have been developed and integrated into a management server application. This service is in control of the available resources. It is maintaining deployment information describing which game is executable on which node. Further, it is capable of starting and stopping game server instances on demand. game server instances if requested or shuts them down. Special resource allocation and capacity planning services might be employed to enforce service level agreements. Provider specific preferences on the resource consumption pattern can be incorporated as well. In addition to the basic game server controlling features, the management servers also provide monitoring capabilities. It is extending the rudimentary support of the RTF framework by providing additional meta-information on the game server level. For instance, the amount of memory or network bandwidth currently consumed by a server process can be monitored. Further, the total amount of resources available on the various nodes can be retrieved. A special feature offered by the management server is the capability of providing predictions on future resource requirements. Hence, beside the currently

26

observed load state, this service provides predicted values for the near future. The temporal offset between the current time and the given prediction is thereby fixed. Internally, this feature is realized using neural networks [16].

2.1.7 Problem Summary The aim of this thesis is to develop a software solution capable of maintaining multiplayer online game sessions on a global level based on the infrastructure offered by the edutain@grid project. Beside the management capabilities required for starting and stopping game sessions or modifying the list of involved grid sites, the core objective lies on performing load management and balancing for active game sessions. Hence, the resulting component has to decide where and when to run game server instances managing which zone for which game session as well as how to distribute workload among replicas of the same region. The resulting session management system has to cope with several issues introduced by the characteristic of the underlying grid infrastructure. For once, the inherently heterogonous environment has to be considered. Hence, as an example, the amount of resources contributed by the various nodes within sites is likely to vary. Further, it cannot be assumed that all game maps are deployed on all available nodes. Since dynamic changes of the configuration are another characteristic of grids, the available capacities and game map deployments may even change over time. Hence, nodes might be added, upgraded or removed and game maps may be installed or deleted. All this dynamic behaviour has to be considered by the load-management solution. Fortunately, all the necessary information is maintained and provided by the services of the edutain@grid management layer. Another issue is given by the multiple administrative domains a grid network is consisting of. Different provider may have different preferences on the way their contributed resources should be used. For instance, sites representing a cluster may prefer having the assigned load distributed evenly across all involved nodes. To the contrary, cloud based infrastructures would favour load distributions where a minimum number of nodes are involved to save money. Therefore, the resulting system has to be flexible enough to deal with such site-specific preferences. Beside the functional requirements discussed so far, additional architectural obligations are introduced by the grid. The resulting solution has to be scalable to handle the numerous resources present within large grid environments. Further, it has to be failure tolerant. Since, the solution has to manage game sessions throughout their potentially long life cycles, errors may occur and com-

27

ponents will fail. Hence, the system has to be capable of dealing with such situations. Finally, the load-management algorithm realized on top of the developed infrastructure should gain advantage of the load value predictions offered by the corresponding service of the management layer. Based on those prognoses it should be possible to prevent situations in which insufficient resources lead to quality of service degradations.

2.2 Problem Analysis Based on the problem description provided so far, this section is going to analyse the core objectives of this thesis from a system architectural perspective. The result of this analysis will be a data model capable of representing the state of the managed system. Additionally a list of features to be supported by the resulting application is derived.

2.2.1 System State Representation To be capable of load balancing the game sever environment, an internal representation of its state is required. The corresponding data model needs to include all the necessary information to perform the actual session management tasks. Figure 2.9 illustrates the nine essential data entities to be considered as well as their relations. Application

Session

Infrastructure

ROIA

Session

Hoster

Game Map

Zone Duty

Host

Zone

Zone Replica

Deployment

Figure 2.9: Data Entities Involved in the State Representation Within the following paragraphs, definitions and further descriptions of the various entities will be provided. In addition, the software components responsible for managing the corresponding items will be named. Thereby, several of

28

those entity types are maintained by the management services provided by the involved grid sites (see section 2.1.6). Subsequently this service layer will be referred to as the management layer, as it is named within the edutain@grid project. ROIA The edutain@grid project provides support for a class of applications, known as real-time online interactive applications (ROIAs). Applications of this type typically involve the simulation of a virtual environment within which a high number of users can interact concurrently. The most prominent subcategory of this class is given by massive multiplayer online games, which have been considered so far. However, for instance, simulations for educational purposes follow the same principles. Within the data model, ROIA entities represent applications of the corresponding class present within the covered domain. Instances are identified by their name and version. Generally, ROIAs having the same name but different versions are considered incompatible. Hence, processes running different versions of the same game will never contribute to the same game session. The information on the available games is provided by the services of the management layer. Hence, the administrators of the various sites are responsible for maintaining them. Game Map Most ROIA implementations separate the actual game rules from the virtual world to be simulated. This way, for instance, multiple different environments reflecting arbitrary scenarios can be simulated by the same game. A game map is the blueprint of a virtual environment, which can be instantiated by running a game session based on it. It contains detailed information on the game world including static elements like trees, houses or walls as well as dynamic elements like collectable items and NPCs. Game maps are identified by the ROIA they are based on and some arbitrary name. Further, maps represent the basic unit of deployment. Hence, not the set of applications installed on a host is defining its capabilities. Those are defined by the set of supported game maps. Just as the ROIAs, the information on the available game maps is provided by the management layer. The session management system developed for this thesis has no influence on this part of the state representation and is simple using it for realizing its tasks.

29

Zone For realizing the zoning concept, which is one of the basics concepts supported by the RTF, each game map has to be partitioned into a set of zones. Thereby, zone entities are used to address the defined sub regions. They are identified by the game map they are part of and some arbitrary integer ID making them unique. However, no additional information on zones will be available. For instance, no information on their spatial proximity is included within their representation. This limitation is inherited from the edutain@grid project, which does not provide the necessary information. Just as the previous two entities, this information is offered by the management layer and cannot be influenced by the load management system. Hoster While the entities discussed so far focus on the available applications and their properties, the following three items will cover information on the infrastructure. Again, the granularity is increasing by every step. As a result, this part of the state representation is defining hierarchies, just as it is done by the previous three entities. The root element for each of those hierarchies is given by a hoster. Within the business model of the edutain@grid project a hoster is an organisation operating the infrastructure required for running game sessions. From a grid-based perspective, a hoster is offering access to a grid site. To be accessible by the load management and balancing system, each site must provide access to its implementation of the services specified for the management layer of the edutain@grid project. Hosters are identified within the state representation by a simple name. Unlike the previous entities, hosters are managed by the session management system on a global level. Hence, the administrators of this level maintain the list of involved hosters. Within this thesis, the term site might as well be used as a synonym for a hoster. Host Each hoster provides access to a set of hosts. These nodes provide all the resources required for running game server instances. Therefore, hosts represent the executing units within the balancing environment. Within the state representation, each host is represented by the hoster it belongs to and same arbitrary name making it unique within its domain. The list of available hosts is again

30

maintained by services of the management layer and cannot be influenced by the load management and balancing system. Within this thesis, the terms host and node are used interchangeably. Hence, both refer to the same entities. Deployment Not all games and game maps are supported by all nodes throughout the system. The link between hosts and the installed game maps is realized by instances of the deployment entities. The presence of one of these entities within some state representation therefore certifies the capability of a host to run server instances contributing to sessions running the given game map. Maintaining the set of available deployments is again one of the core responsibilities of the services on the management layer. Session The last three item types contributing to the system state description are dealing with the representation of managed game sessions and their state. Just as the other two columns of entity types shown within figure 2.9, instances of these classes are forming hierarchies. However, unlike most of the entities covered so far, the balancing system has significant influence on these elements. Each hierarchy described by these final three elements is representing the state of one game session. A game session is an instance of a game world hosted by various game servers to which players can contribute. Within the state representation, a session is identified by a unique ID and the game map it is based on. Instances of this type are forming the root nodes of the resulting hierarchies. Managing game sessions on a global level is one of the essential responsibilities of the resulting software solution. Figure 2.10 illustrates the various states a session is passing through during its life cycle. While the properties of a new session are configured by an administrator, sessions are considered to be within a preparation phase. Beside the selection of a game map and additional options, an initial hoster can be chosen. The corresponding site will be the one on which the new game session will be created. As soon as the system is instructed to start up a configured game session, the state is updated accordingly. The main objective during the start-up phase is to provide one game server instance for each zone being part of the game map to be simulated. However, to decide where to place those server processes only a reduced set of load information is available due to the lack of data describing

31

Prepairing Starting

Failed

Running

Stopping Closed Figure 2.10: Game Session Life Cycle

the not yet running session. Therefore, a separated part of the applied load management algorithm is controlling the game session at this stage. After all the needed server processes have been created, the session is reaching the running stage. During this long-lasting phase, the regular load-management algorithm is responsible for maintaining the involved processes and resources. Finally, if the administrator decides to stop a session, the shutdown phase is initiated. Again, the session control is handed over to a separated part of the management algorithm. However, during the shutdown no special strategy has to be considered. All that has to be done is to remove the involved processes. If all the steps are processed correctly, the session reaches the closed state. However, if during any of the previous phases an error occurred, it is marked as failed and is automatically discarded. One of the most critical steps in this context is the start-up phase. This phase might fail if not sufficient resources are available for supporting the new game session. For all other phases, the option of moving to the failed state is more of a theoretical nature.

Zone Duty Zone duties are entities describing the responsibility of a hoster to maintain a specific zone being part of a given session. Multiple hosters may be accountable for maintaining the same zone within a session. However, since there is an increased data exchange between game server processes managing replicas of the same zone, managing a zone within a single hoster eliminates overhead by exploiting network locality. As a result, the responsibility of managing a zone

32

should be assigned to a single hoster. This can be realized by considering zone duties as the basic unit of load transferred between hosters. Zone duties are identified by a combination of the hoster they are assigned to, the session they are part of and the zones they are referencing. Unlike other entities, instances of this type are not managed by any particular component. They are derived from the current distribution state of a session. Zone Replica The last entity type included within the state representation is the zone replica. Instances of this class represent game server processes running on hosts. All instances managing the same zone are treated equally. Therefore, all of them are referred to as zone replicas, even if there is only one process managing the corresponding zone. Zone replicas are identified by the deployment they are based on, the zone duty they are part of and some process ID making them unique within one node. The last key value needed to be included to make different processes managing the same zone for the same session on the same node distinguishable. This distinction is becoming important if a host offers multiple cores but the game server implementation is optimized for a single thread. In this case only multiple server instances running on the same node can consume all the available resources. In those cases it may even happen that those processes are responsible for the same zone of the same session. Only the additional ID makes them distinguishable and hence addressable for load balancing purposes. The zone replicas are managed by the session management system. Essentially, controlling these entities is the main responsibility of the resulting solution. Within this thesis, unless otherwise noted, the phrases (zone) replica, replication, server instance and server process are used interchangeably. Hence, all of those refer to the same concept. System State Data Model The system state at any moment in time can be modelled as a directed acyclic graph. The vertices are given by instances of the nine data types illustrated within figure 2.9 while the edges are formed by their relations. Since these relations are already fixed by the items themselves, only the set of involved nodes need to be stored. Therefore, merging the current states received from multiple hosters boils down to simple set operations. Further, each node within the DAG might be annotated by load information. However, to keep data entities involved in the representation of the distribution

33

System State Represenation Metric

Distribution State (DAG)

Subject

Value

Load State (Table)

Figure 2.11: The Structure of the complete System State Representation state reusable and the set of assigned load values flexible, the corresponding data is maintained within a separate table. Every entry within this table is a triple containing the type of load metric represented, the subject annotated by the represented entry and the actual value. By adding a proper index structure on this table, load information can be accessed efficiently. Figure 2.11 illustrates the complete data structure used to represent the system state within the load management solution.

2.2.2 System Requirements To conclude this chapter a short overview on the functional requirements on the resulting system should be provided. Load Management and Balancing Algorithm As has been pointed out within section 2.1.7, the core of the resulting system is a component performing continuous load balancing for the managed sessions. However, since there obviously is no single perfect algorithm for this task, this part should be kept as flexible as possible to allow applying different implementations. Hence, the balancing algorithm needs to be properly separated from the rest of the system. A positive side effect of this separation is that algorithm implementations get relieved of any responsibilities not directly related to the actual task of load balancing game sessions. Therefore, additional algorithms can be implemented more easily by focusing on the essential parts. In addition, most of the realized algorithms will have configurable properties. For instance, threshold values used to determine whether a node is overloaded may be supported as modifiable options. Since several of these configuration values need to be tuned based on the managed applications, the system should support adjusting them during runtime. In some cases it might even be necessary to exchange the actual algorithm implementation used for balancing the

34

system to achieve the intended modifications in the system behaviour. Both, configuration updates and algorithm exchanges should therefore be supported. Monitoring Beside the core responsibility of maintaining game sessions, monitoring operations have to be supported. The corresponding features allow system administrators to obtain an overview on the current state of the system. By recording historical states, even further analysis can be performed. The extracted information can for instance be used to decide whether new game sessions can be added to the system without exceeding the available resources. In addition, planning the upgrading of available capacities can be supported by this information. Finally, it allows controlling the behaviour of the balancing system. If irregularities are discovered, the algorithm configuration can be modified to correct them. Configuration and Administration The resulting software solution must offer an interface for administrators to manipulate the active configuration of a running system instance. The options reach form manipulation the set of hosters to be considered by the system, over the creation and destruction of game sessions to algorithm specific properties like threshold values. Thereby, the entire set of management operations has to be offered to the administrators through a secure channel. Distributed System For the various obligations introduced by the underlying grid infrastructure, the resulting system should be realized as a distributed solution. This more scalable approach allows handling the high amount of resources available within grids. If properly realized, it also allows compensating for failing components since single point of failures can be avoided. Therefore, both, the algorithm and the underlying application environment have to be build based on this paradigm.

35

36

Chapter 3 The System Architecture As has been described in the previous chapter, many obligations have to be satisfied by the resulting system. Several of them impose constrains on the architecture of the solution which will be described within this chapter. It starts by providing a basic overview on the overall system and continues with some more detailed discussions on the various involved components.

3.1 Conceptual Overview The developed session management package consists of several, distributed components. Figure 3.1 provides a general overview on the involved elements as well as their relations. Administration Client

Management Server

Database

Algorithm

P2P

Algorithm Algorithm Environment Controller

Algorithm

Algorithm Environment


Controller

Controller

Mngt. Services

Mngt. Services

Mngt. Services

Mngt. Services

Figure 3.1: The Overall Architecture of the Session Management Solution

37

The foundation of the system is provided by the services offered by the various involved hosters. On the next higher level, multiple controller instances are responsible for managing the resources of the available sites. In general, each controller might be responsible for an arbitrary number of hosters and the assignment linking hosters and controllers may be altered dynamically. However, in practice there will most likely be one controller instance for each hoster. Nevertheless, the possibility of managing multiple sites provides means to implement backup mechanisms. If one controller fails by any reason, another controller can take over until the original instance becomes available again or a new one is added. The concept of having multiple controllers allows distributing the computational load introduced by performing load-balancing operations among multiple nodes. In addition, communication delays are mitigated by placing controller instances close to the underlying management services. Therefore, this approach allows the system to scale with the number of sites to be managed. The controller instances are connected to each other through a peer-to-peer network. Therefore, each controller is capable of accessing the services offered by every other peer if necessary. Further, the network allows sharing information. For instance, the most important configuration options including the maintained game sessions or the involved hosters and their connection details are shared using the network. Thereby, the network allows resolving conflicts caused by different versions of the configuration. In any case, the shared version is considered to be the valid one. Hence, due to the supported features, the network component is a combination of a registry for remote services and a distributed hash table. On top of the controller components, a special layer realizing an environment for the actual algorithm has been placed. This environment is providing a fa¸cade for the algorithm to interact with the rest of the system. For instance, it includes a failure tolerant and flexible way for communicating with the involved hosters. The fact that those may be reassigned to other controllers during the communication is handled transparently. The location of hosters is thereby resolved using a simple message routing concept. Finally, on top of the algorithm environment the actual algorithm is operating. The system only specifies a simple interface for the actual algorithm implementation. This ensures the flexibility stated as one of the basic requirements. Multiple instances of the algorithm may interact with each other to fulfil their duties. The algorithm environment is therefore proving means for retrieving connections to the various algorithm instances throughout the P2P network. The components described so far are all involved in the actual execution of the session maintaining tasks. As long as there are no changes in the configu-

38

ration, this subsection of the system can operate autonomously. Nevertheless, an additional component is required for dealing with configuration changes. The load-balancing server is carrying out this task. It is important to see that this component is not essential for the actual load management and balancing process. Therefore, it does not introduce a single point of failure. Its main responsibility is to provide an access point for administrators to manage the system. Further, it is realizing the monitoring responsibilities by periodically capturing the system state from the involved controllers. Monitoring data is then stored within a database to support future analysis. The same persistent storage is also used to back up hoster connection details and other important information in case the entire system is restarted. In such situation the server component would be responsible for reinitializing the configuration within the network. Finally, on the other end of the system, the administration client can be found. This client connects to the management server through a secure RMI connection and allows administrators to examine the current system state and its configuration. Further, tools for modifying setup options are offered by this client as well as rudimentary monitoring and analysing utilities. The given architecture lays the foundation of a system satisfying all the requirements derived within the previous chapter. However, to provide a real scalable and flexible solution, all of the involved components and communication protocols must achieve this goal. Within the following sections, a few more details on these elements, their responsibilities and implementations will be covered.

3.2 The Hoster Interface The entire solution is based on the services offered by hosters. To support different types of implementations on this level, a common interface focusing on the essential aspects has been designed. Besides keeping the interface compact, a major consideration was to keep the interface stateless. Therefore, none of the two communication partners using the interface needs to maintain any context information. This property allows reassigning hosters to other controllers more easily since no extra information has to be transferred. It also makes it simpler for other controllers to replace failed instances. In general, using stateless protocols reduces the vulnerability due to failing components.

39

Supported Functionality The management services provided by hosters need to support two fundamental functions. They must provide information on the current state of the managed resources as well as means to manipulate the load assignments throughout the hoster’s domain. To realize support for the first of those, the interface includes a single method requesting a snapshot of the current system state. The data structure used to return the result has already been described within the previous chapter (2.2.1). The corresponding method allows specifying a set of required load metrics to be included within the resulting state information. Beside basic metrics like the memory consumption of every game server within a hoster’s domain, also special metrics requesting predicted values may be included. This way, the support for predicted load values offered by the services on the management layer can be utilized efficiently. The means to manipulate the load distribution are divided into three separated methods. One of those instructs the hoster site to start a new game process managing a replica of a certain zone within a given session while another orders the elimination of an existing server instance. The third method directs the management layer to redistribute load between multiple zone replicas managing the same zone. A parameter allows specifying the redistribution ratios for the involved processes. This could be used to shift CPU load from one overloaded replica to another, less loaded process. Therefore, this operation is effecting the game loop saturation of games. As has been pointed out within section 2.1.2, moving load between processes corresponds to reassigning clients. However, since the relation between the number of clients managed by a game server and its game loop saturation is heavily influenced by application details, the actual changes in the client assignment are left unspecified. Basic implementations may simply distribute the clients within the zone among the involved replicas according to the given ratio. However, more advanced solutions may apply their own analytical models for predicting the CPU load based on the client assignments before redistribution players according to the given load ratio. Handling Asynchronous Operations Since all of the operations potentially require an extensive amount of time to be executed, load state manipulating tasks are performed asynchronously. Whenever one of the given operations is invoked, preconditions are checked and in case all of them are satisfied, an ID representing the issued operation is returned. An additional method allows checking for the execution state of the issued command as well as its return value. Figure 3.2 shows the phases an issued command is

40

passing through. Before an operation identified by its ID has been issued, its state remains unknown. As soon as it is initiated, the state changes to the running phase. The operation may then terminate either successfully or due to a failure. In the first case, the task may have produced some result value. The termination state and the result value can be retrieved based on the operation ID. However, since hoster services cannot maintain operation state information for good, this data may get purged or simply lost after some time. In this case the state associated to the operation ID will be unknown again. Running Successfull

Failed Unknown

Figure 3.2: Phases of issued Balancing Operations

Relaxed Operation Semantic All of these load manipulation operations have to respect the administrative authority of the underlying management layer. Therefore, all the instructions send to the hosters are considered as simple suggestions. They may or may not be carried out. In addition, some of the parameters may allow specifying wildcard values, which can be freely substituted by the management services. For instance, when starting up a new server, the target node might remain unspecified. In this case, the management services can determine where to place the new instance. Further, interpreting instructions as simple suggestions also solves synchronization issues. No matter how fast the balancing algorithm will derive its list of load manipulations, the decisions will be based on out-dated data. Hence, instructions might be included, which no longer are executable. Considering the situation in which any instruction may anyway be ignored, this does not impose a new problem.

3.3 The Load Balancing Controller The load-balancing controller is the central component within the system. It is realizing the bridge between the hoster services, the algorithm, the management server and other controller instances. In addition, it is providing the basic

41

Plug-in System

Controller

Controller Core

Management Server

Other Controller

Management

Hoster Manager

Timer Service

Mngt. Services

Worker Pool


Peer Network

Figure 3.3: Internal Controller Architecture and Related Components platform for distributing the session management activities throughout the grid. Due to this central role, plenty of requirements have to be satisfied by this component. For instance, it needs to deal with the peer network, handle the set of assigned hoster sites, enforce and relay configuration options and support the management server in realizing its monitoring functions. Since this list might even grow when future concepts get integrated, the controller’s architecture is modelling a plug-in based infrastructure. Figure 3.3 illustrates the basic structure of the controller, including some of the involved plug-ins. The controller core consists of the minimum set of functionality such that instances of it can be used within the peer network. Basically, controller peers are addressed by a unique ID within the network. The core is maintaining the local ID as well as an active network connection. It also offers means to establish links to other peers. Beside the basic network management obligations, the core is providing the foundation for the plug-in system by maintaining the set of active extensions. Each of the plug-ins is contributing some type of service that might be used by other plug-ins or even by remote peers. For instance, one plug-in is responsible for managing the set of assigned hosters and associated information. This service is only offered to other plug-ins within the local controller instance. However, another plug-in is realizing the interface offered to the management server to support configuration changes. Thereby, the latter module is depending on the service of the hoster manager plug-in for realizing its tasks. Therefore, dependencies between the various plug-ins exist, which need to be respected during the start-up and shutdown sequences. To do so, each plug-in is allowed to specify a static set of plug-ins types depending on. The controller core is resolving those dependencies before creating and initializing a new plug-in instance. Therefore, unless there are cyclic dependencies, plug-ins can assume throughout their life-

42

time that all modules they are depending on are available. Cyclic dependencies are detected by the controller core during the instantiation phase and the involved plug-ins will not be loaded. It is the programmer’s job to avoid cyclic dependencies. Another design goal that was accomplished by introducing the controller concept is the possibility of keeping the load balancing process close to the managed resources. In addition, the controller component has been designed to have no external dependencies like for instance a data base connection. All the necessary information required for running an instance can be retrieved from the network. This way, the ideal case of running the controller within the same JVM as the management services offered by a hoster is encouraged. Thereby, the overhead of using RMI or another technology for transferring data and instructions through the network would be significantly reduced. Of course, this requires that the management services are also realized using Java. To allow the system to create controller instances in topological proximity to the involved sites, an additional, yet optional method has been added to the hoster interface. This method is requesting the remote resource management services to start up a new controller instance which will automatically join the peer network to take over responsibilities.

3.4 The Algorithm Environment The algorithm environment is responsible for handling and supporting the actual algorithm instances. It therefore provides a balancing environment fa¸cade realizing an abstract view on the entire system. This view is intended to be used by algorithm implementations to accomplish their tasks. The most important feature offered by the algorithm environment is the possibility of accessing the services offered by any hoster throughout the system. This access is not limited to the hosters assigned to the local controller instances. If the algorithm needs to interact with a remotely managed site, the request is relayed across the responsible controller instance. To identify this controller, a small routing table mapping all available hosters to their associated controllers is maintained as shared information within the network. In case the responsibility to manage a certain site is moved from one controller to another, perhaps due to a failed controller instance, the routing table is updated accordingly. Therefore, messages can be transferred from any algorithm instance to any hoster site at any time.

43

6

Controller

Controller

Algorithm

Algorithm

1


Routing Table

Mngt. Services

2

P2P 5


3

4

Figure 3.4: Message Routing Concept within the Algorithm Environment Transparent Message Routing Consider the following example illustrated within figure 3.4. The algorithm has determined that it is necessary to start a new game server instance on some node. It therefore forwards a corresponding command to its local algorithm environment (step 1). The environment is testing whether the request is directed to one of the locally maintained hosters. If so, the message is directly delivered. Otherwise, it is looking up the responsible controller within the routing table and forwards the command to the algorithm environment maintained at the resulting site (2), which will pass on the message to its locally maintained hoster interface implementation (3). The receiver is then returning the ID of the issued start-up request (4) which will be transferred back to the initial caller (5+6). After some time the algorithm may want to know whether the operation has been completed. It therefore issues a request on the current operation state based on the received ID. The same routing procedure is used to retrieve the requested information. If in the mean time, the hoster has been reassigned to another controller, the message will still be delivered correctly. This intermediate routing concept makes the distributed nature of the underlying system transparent for the algorithm. There is no difference in accessing a local or remote side, except the potential delay introduced due to the remote communication. Further, the algorithm does not need to know anything about the controller that is responsible for managing the hoster it is interacting width nor any ongoing reassignments moving the responsibility of maintaining sites between peers. This transparency would in particular support the implementation of a centralized load-balancing algorithm, which is managing all involved resources based on a complete knowledge of the system state. However, such an approach would not scale very well. An alternative is to perform the load-balancing activities in a distributed way, where each participating instance is managing the load issues of a subset of the involved hosters. Naturally, to reduce the

44

network communication overhead, the algorithm instance running on one controller should focus on the locally maintained hosters. To support these types of algorithms, the environment fa¸cade is also providing a set of tokens identifying locally maintained hosters.

Additional Features Besides the routing layer, the balancing environment offers additional features making the implementation of balancing algorithms easier. One of those allows attaching algorithm specific information to a hoster. The given information is maintained by the algorithm environment running on the controller responsible for the given site. Therefore, this information can be used to communicate state information between multiple algorithm instances. For more advanced communication requirements, the algorithm environment supports the establishment of direct communication links between algorithm instances. If an algorithm wants to communicate with another instance responsible for a given hoster, the environment establishes the required connection. A final feature offered by algorithm environment is the support of workflow based balancing operations. Several changes triggered by load balancing algorithms consist of more than one primitive operation supported by the hoster interface. For instance, moving a game server instance from an overloaded node to another consists of three steps. First, a new server has to be started on the target node. In a second step, all the clients need to be moved from the source process to the newly created server instance and finally the original process can be removed. All of these steps require some unspecified amount of time and need to be executed strictly in order. Further, if for instance the first activity fails, the rest of the operations should not be executed. Especially the original server should not be removed as long as there is not proper substitution. By supporting balancing operations organized as workflows, the algorithm implementations is shielded from the details of coordinating primitive operations. The description of balancing operation workflows is based on the composite pattern. Primitive operations offered by the hoster interface can be combined to form more powerful constructs using parallel and sequential connectives. Additional operations may be integrated by implementing a generic activity interface. Figure 3.5 illustrates an example workflow used to move a zone duty from one site to another.

45

Prepaire target site Create new game server instances on target site Reassign Zone Duty Stop game server processes on source site Inform both sites about completion

Figure 3.5: Example of a composed Balancing Operation based on a Workflow

3.5 The Algorithm All the services of the algorithm environment are provided to serve the algorithm placed on top of it. Due to the requirements specified within the previous chapter, almost total freedom is granted to the actual implementation. Nevertheless, a small interface has to be implemented by any algorithm to be manageable by the system. The algorithm interface only includes two major blocks of features. The first is the possibility to start and stop the algorithm. Therefore, when starting up the system or just a new controller, a new algorithm instance will be created and started. If the controller is shutting down or more importantly if an administrator wants to exchange the algorithm implementation, the running instances need to be stopped. Both operations must be supported by any algorithm implementation. The second type of feature required is the support for some kind of configuration data bean. This configuration should contain all the variable parameters the implementation is exposing. To handle those, the algorithm interface is including methods for retrieving and updating the currently active configuration. Further details on the specific algorithm realized for this thesis will be discussed within the following chapter.

3.6 The Communication Network The most characteristic component of the overall architecture is given by the peer-to-peer network connecting all the contributing controller instances. The main responsibility of this component is to provide a distributed registry service enabling nodes to establish connections to all the participating peers. Further, it should provide an efficient way of sharing information. Several challenges have to be faced when realizing a P2P network solution. As a result, many

46

different approaches solving those issues have been proposed. Since the actual implementation of the peer network has a special impact on the usability and scalability of the overall solution, this components has been strictly isolated from the rest of the system to make it exchangeable. Therefore, the network implementation can easily be customized to fit the requirements of end users. To support the modular structure, a compact interface summarizing all the operations to be supported by a peer network implementation has been defined. It includes two basic blocks of operations. The first is supporting the registration of services, therefore controller instances, within the network using some kind of ID. Besides the binding, also lookup and maintaining operations are included. In general, this part of the interface is quite similar to the concept of Java’s RMI registry specification. The second part consists of a set of operations to share data. Like within a map, each data element is thereby addressed by a key. New information can be put into the network, present information can be read and old information can be removed. This part therefore corresponds to a distributed hash table. However, one additional feature requested from any network implementation is the support for atomic updates to be applied on data elements within the network. Those updates are submitted using the command pattern and have to be performed atomically by the network implementation. Hence, it is either applied completely or not at all. Further, all observers have to be exposed to the same sequence of changes. Hence, updates represent a kind of mini-transaction on the data shared within the network. By leafing details on the actual realization unspecified, synchronization issues can be solved by the implementing components using native means. Two additional methods within the interface are dealing with management issues. The fist of those should offer a factory implementation, which, provided it can be passed on to another node, is capable of establishing new network connections for joining peers. It is used to connect controller instances started by remote hoster services to the common network. The second method requests existing network connections to be separated. It is invoked during the shutting down of a controller to leave the network. A Simple Communication Network Implementation For the propose of keeping the configuration overhead and the number of external dependencies of the resulting solution low, basic implementations of the network interfaces have been included within the package. One simple implementation is maintaining a central repository containing all registered peers and shared information. Although simply to implement, it does not satisfy the re-

47

Master Peer B

2

1

Initiator

3

4

Peer C 6

Peer A

Backup-Master 5

Peer D Peer E

Out of sync

Figure 3.6: Distributed Registry Update Procedure

quirement of a scalable and failure tolerant solution since there is a single point of failure. Therefore another implementation has been realized representing a truly distributed solution. In this second approach, all the participating nodes have a full copy of the entire network state. This state includes all the registered peers as well as all the shared information. Each copy of the state has an assigned version number and one of the nodes is elected to maintain the master copy. Updates on the shared state are realized using a command pattern and are synchronized based on the version numbers. Figure 3.6 illustrates the protocol used to perform an update within this peer network. Whenever a node decides to change a property within the shared state, a corresponding update command is send to the node maintaining the master copy (step 1). If this node could successfully update its local version as well as the state of some backup node (2+3), the previous and the new version number of the master copy is returned to the initial peer (4). Based on the retrieved information, the original peer is creating a new update including the version number of the state it should be applied on, which is then broadcasted to all other peers within the network (5). Every peer is checking whether its local copy version is matching the target version number of the update. If this is the case, the update is applied. Otherwise, there has been some synchronization error. In this case, if the local copy is too old, it is replaced by a copy of the state currently maintained by the master node (6). This update protocol is ensuring a consistently updated state throughout the network. The given solution realizes lookup operations very efficiently since the full state is present at each peer. Therefore, no network communication is involved when reading information. However, this comes at the cost of more expen-

48

Mgmt.

Server Core

Peer Network

sive update operations. Nevertheless, within the session management solution configuration changes triggering shared state updates will occur rather seldom compared to the much more frequent lookup operations. However, due to the lack of an efficient RMI-broadcast support for the last step involved in the upController date protocol, there is anServer upper limit for the number of supported peers. User In case a bottleneck, it can be exchange Algorithm Clientthe given implementation is becomingBalancing Sessions by an implementation interfacing with some more scalable P2P network soluAdministration Algorithm tions like OpenChord [17,Controller 18] or JXTA [19]. In those cases, frequently used Environment Monitoring Hoster values may be cached locally to support efficient lookup operations. The peer Hoster Analysis Monitoring network interface has been designed in a way such thatManager the adapters to the given Data Beans Controller Core projects should be simple to realize. Information Service

Mngt. Server

3.7 The Load Balancing Server

Client

Server

Information Service

Server Core

User Mngr. Sessions Mngr. Controller Mngr. Hoster Mngr.

Peer Network

As has been described within the overview section, this component is representing the bridge between administrators and the rest of the system. Therefore, Simulation Edutain its main responsibilities are the maintaining of the system configuration, the enforcement of setup changes and the realization of the monitoring features. It is not participating in the actual load balancing process.

Controller

Mngt. Services

Monitoring Mngr. Data Beans

Figure 3.7: Server Architecture and Connected Components Figure 3.7 provides an overview on the internal structure of the server implementation. To realize its tasks it is interfacing with three other components. For once, it is offering services to the administration client. The exposed operations allow manipulating the internally managed configuration. The responsibility of maintaining the involved data entities is distributed among multiple managers, each responsible for another type of data as shown in the figure above. On the other side, the controller is connected to the peer network. This connection is used to maintain configuration data within the network, realized as shared information, and to retrieve monitoring data from the various controllers. Each controller has a specific plug-in supporting the interaction with management

49

server instances. Finally, the server is maintaining a connection to a database for recording persistent information. All the available configuration values are stored in there. In addition, snapshots of the entire environment state are collected periodically from the active controllers. The results are merged and archived within the data base. The interaction with the underlying storage system is realized using an information service package provided by the edutain@grid project [20]. Data entities handled by the system can be stored within a MySQL database using the given framework. Further, complex queries can be formulated using pure Java constructs, which then will be mapped to SQL queries such that the actual query evaluation is performed using the optimized support of the database system. Therefore, this package can be used to perform some data analysis directly within the database.

3.8 The Administration Client The final component contributing to the overall session management solution is the administration client. It is providing an interface for administrators to interact with the system and it is the only component not intended to run permanently. For supporting its features, the client is establishing a secured SSL based connection to a server instance. The end user needs to authenticate itself using a username and a password. Based on the privileges assigned to the verified user, only a subset of the available operations may be permitted. For instance, some users might not have the authorization of starting or stopping game sessions although they are allowed to monitor the system state and to alter the balancing algorithm configuration. Like other components, the actual implementation of the client is exchangeable. Any application accessing the server interface can be considered a client implementation. The one included within this project is a Swing based Java application supporting the most essential administrative requirements and monitoring capabilities.

50

Chapter 4 The Load Management As has been described within the architecture chapter, the central element for managing game sessions within the provided software solution is the load management and balancing algorithm. Due to the requirements stated within chapter 2, the actual implementation of this component remains exchangeable. Therefore, different approaches reflecting the preference of the end users can be realized based on the foundations offered by the algorithm environment. Nevertheless, beside the development and implementation of the overall architecture, the designing of an algorithm capable of efficiently managing game sessions and resources has been an essential part of this thesis. Within this chapter, the realized concept will be described in detail. However, unlike the name of this part of the overall architecture is suggesting, the presented approach is consisting of a combination of various algorithms and cooperating components aiming on realizing the load management. The description starts by summarizing the major obligations to be recognized by a load management and balancing algorithm. In addition, the main objectives the presented approach is focusing on are covered within the first section of this chapter. It is followed by a discussion on the bin-packing problem and an extended variant formalized for this thesis. Derived heuristic algorithms solving the latter problem provide an essential foundation for the realized solution. After a short overview on the general organization of the balancing algorithm within section 4.3, detailed descriptions of the involved algorithms and protocols are provided within the sections 4.4 to 4.6. Finally, the chapter concludes with an overview on the set of configuration options offered by the algorithm’s implementation.

4.1 The Load Management Problem Within this section, the problem to be solved by the devised algorithm will be covered in detail. Therefore, figure 4.1 summarizes the basic obligations of the load management and balancing algorithms within the develop system.

51

Administrator sessions

Resources zones

Game Map

replicas

Replication Management

hosts

Resource Management

Figure 4.1: The Basic Balancing Problem On the top level, administrators are determining the game sessions to be Administrator maintained by the system. Based on the corresponding game maps, the sets Resources of zones associated to the various active sessions are derived (see section 2.2.1). sessions zones duties replicas hosts Each of those zones has to be managed by at least one replica. Therefore, Game Zone to Hoster the corresponding Replication Resource one server instance responsible for maintaining region of the Map Mapping Management Management virtual world has to be assigned to one of the available hosts offered by the infrastructure. Global Level Local is Level Since the number of entities present within the various zones constantly changing due to clients wandering around in the simulated virtual world, the resource requirements of the involved server instances is varying over time. It is the job of the algorithm to continuously observe the state of the system and to ensure that every game server has sufficient resources at its disposal. Thereby, as shown within figure 4.1, this obligation is separated into two parts, the replication management and the resource management. 1. The Replication Management — The obligation of the replica management is to decide how many replicas per zone are required to handle the current load caused by maintaining the corresponding game map region. Further, it is responsible for balancing the load within replicas of the same zone such that the game loop saturation is evenly balanced (see section 2.1.2). Since replicas may be maintained by game server instances executed on nodes equipped with processor of different speed, the load may need to be distributed unevenly to achieve approximately the same saturation within all involved processes. The realization of the component recognizing this responsibility within the devised algorithm is covered within section 4.4.3. 2. The Resource Management — The job of the resource management is to allocate resources for all replicas, hence server instances derived by the replication management. Hence, it is computing an assignment of replicas to hosts, thereby satisfying the resource requirements of every involved process. This task has to be performed online. Hence, implementing

52

components have to constantly evaluate and manipulate the current load assignment. Thereby, the number of modifications should be kept as small as possible to avoid the overhead caused by migrated server instances between hosts. Still, the number of involved nodes might have to be constantly minimized to reduce the costs of maintaining game sessions. Details on the algorithm applied to satisfy this obligation are covered within section 4.4. Operations Supported by Hosters To manipulate the load distribution of the managed game sessions, the services provided by the involved hosters offer three primitive balancing operations (see section 3.2). Two of those provide all the means required for manipulating the distribution of replicas among the available hosts. New replicas for certain zones can be added (addReplica) and existing server instances can be eliminated (removeReplica). Furthermore, the third operation (redistributeLoad ) allows moving load between replicas of the same zone by defining the ratio at which workload should be distributed among those. For instance, let r1, r2 and r3 be replicas of the same zone. Performing the load-redistributing operation using the mapping {(r1 7→ 5), (r2 7→ 3), (r3 7→ 4)} as a parameter will cause the total load assigned to those three replicas to be distribution according to the ratio 5 : 3 : 4. However, it is not specified whether this redistribution is realized by disseminating clients, entities or any other load-causing element. Hence, to be used to control the game-loop saturation of server instances, a flexible mechanism tolerating different interpretations has to be applied. The concept derived to accomplish this task within the approach described by this chapter is covered within section 4.4.3. Those three high-level constructs are the only means provided to manipulate the game load distribution. Therefore, in particular, it is not possible to reassign individual clients. Objectives The load management concept covered within this chapter is aiming on dynamically maintaining game load by causing as view situations in which insufficient resources may lead to a degradation of the game play experience. Hence, the quality of the service offered to the end user should not be negatively affected by using the devised dynamic approach when being compared to a static concept. Further, the covered proposal aims on providing a scalable solution, capable of handling large environments. As will be described within section4.3, this goal is accomplished by applying a hierarchical schema distributing the responsibilities

53

of managing and balancing the workload of games among multiple components using varying levels of granularity. Further, the covered concept has been developed to be flexible enough to support the various different preferences on the load distribution, which might be stated by involved hosters. One particular important objective to be supported by dynamic load management concepts is about continuously adjusting the number of hosts involved in maintaining active game sessions to fit the actual workload. Ideally, within every moment of the game session, only as many resources are allocated for server instances as are required due to the current workload. By achieving this goal, the resource utilization would be maximized and the costs for maintaining game sessions would be reduced to a minimum without reducing the quality of service. However, to accomplish this objective, both, the replication and resource management have to make their contributions. While the replication management has to determine the minimum number or replicas required for maintaining the various zones, the resource management has to derive and maintain mappings between server instances and hosts using as few nodes as possible. The conAdministrator Resources cept presented within this chapter is capable of supporting this goal as will be demonstrated within the experiment chapter. sessions zones replicas hosts Game The Hierarchical ExtensionReplication Map

Management

Resource Management

To provide a scalable solution, the devised algorithm is using a hierarchical load management schema. Thereby, an intermediate layer between the zones and replicas is introduced as illustrated within figure 4.2. Administrator

Resources

sessions

zones Game Map

duties

Zone to Hoster Mapping

Global Level

replicas

Replication Management

hosts

Resource Management

Local Level

Figure 4.2: The Extended Hierarchical Balancing Problem The additional abstract concept of zone duties (see section 2.2.1) allows dividing the overall load management task into two layers. The upper, global layer is assigning the responsibility of maintaining zones to the available hosters, thereby defining zone duties. The lower, local layer on the other hand is realizing the duties assigned to a specific hoster by mapping the necessary number of replicas to the available hosts. Hence, the local balancer is including both, the replica-

54

tion and resource management. A detailed description of the algorithms applied on the local level is given within section 4.4. Further, the operations performed on the global level are described within section 4.5. The Problem of Mapping Load Items to Resources - Bin Packing By investigating the responsibilities assigned to the global and local layer, a common pattern can be observed. Both have to map a set of elements to available resources. On the global level, zones have to be mapped to hosters, on the local level replicas to hosts. The resulting problem of packing items of a certain size to containers exposing limited capacities is related to the well-known NP-hard bin-packing problem [21]. By identifying zone duties as items and hosters as bins, the zone to hoster mapping could be derived using heuristics developed for the bin-packing problem. The same techniques can be applied to assign replicas to hosts. However, in both cases, additional constrains not present within the basic bin-packing problem have to be considered. For instance, not every host or hoster is offering the same amount of resources, hence the same capacity, as it would be presumed by the basic bin-packing problem definition. Further, in general, not every replica can be assigned to every host since not on all nodes the required deployments might be present. Nevertheless, as will be shown within the following section, the basic bin-packing problem can be extended by those additional constrains. Therefore, heuristics providing solutions for the resulting extended variant can be applied at various occasions within the load management and balancing concept described within this chapter to derive load item assignments.

4.2 The Bin Packing Problem As pointed out within the previous section, several issues faced by the load management concept boil down to the abstract problem of packing items into bins. For instance, the responsibilities for managing zones have to be assigned to hosters and replicas have to be distributed among hosts. Both situations can be described using a common abstract representation based on load items and bins. Hence, by providing an efficient mean for determining item to bin assignments based on an abstract problem description, the resulting utility can be used to derive solutions for the concrete scenarios. Within this section, such a generic algorithm capable of computing the required assignments will be described. Therefore, in a first step, the actual problem will be formalized. A well-known, closely related mathematical problem is the NP-hard bin-packing problem [21]. It is laying the foundation for the model

55

used to describe the essential packing problem faced by the load management solution. Therefore, it will be described in detail within the following subsection, together with some proposed heuristic algorithms providing solutions for this type of problem. Within the following subsection, the standard problem description will be extended by additional constrains to fit the emerging resource allocation problems when performing game load management. Finally, this section concludes with a generic heuristic capable of computing solutions for problem instances of the extended variant. This generic problem solver is forming the foundation for handling resource allocation issues throughout the load management concept developed for this thesis.

4.2.1 The Basic Bin Packing Problem The bin-packing problem describes the problem of packing a list of items into a set of bins. Thereby, the number of used bins should be minimized. Each item has an associated size and the available bins have a limited capacity. Further, while the volume of the items may be varying, the size of the bins is fixed to a constant value. The aggregated volume occupied by the items assigned to a single bin must not exceed its capacity. Formally, the problem can be described as follows. Let I be the set of items and B be the set of bins. Further, let V be the set of values used to specify volumes and capacities. Let c ∈ V be the constant capacity of the given bins. Finally, let the function vol : I → V define the volumes of all the items within I. An item assignment can be modelled as a function f : I → B mapping each item i ∈ I to the bin b ∈ B it is assigned to. Let f 0 : B → 2I be the total function having i ∈ f 0 (b) if and only if f (i) = b. Hence, f 0 (b) ⊆ I corresponds to the subset of items assigned to bin b ∈ B according to the mapping f . 2I thereby denotes the power set of I, hence, the set of all subsets of I. To solve the bin-packing problem, a valid mapping f using a minimum number of bins has to be found. Hence, |f (I)| has to be minimal for all possible assignment functions satisfying the following constrain: ∀b ∈ B :

X

vol(i) ≤ c

(4.1)

i∈f 0 (b)

This simple constrain ensures that the capacity of none of the used bins is exceeded. The problem of finding such a minimal mapping is NP hard. A proof can be found within [21]. In addition, for some cases such an assignment might not exist at all. Clearly, if the total sum of the item volumes is exceeding the total capacity of the available bins, no valid solution can be found. The

56

same difficulties arise whenever the volume of a single item is exceeding the bin capacity. Heuristics Since finding an optimal solution is not feasible for real life problem sizes, several heuristic approaches have been developed. The most prominent of those are defined by the way a bin is selected for an item to be packed. Well-known examples are the next-fit, first-fit and best-fit heuristics. Within the next-fit heuristic, a new item is added to the same bin the previous has been packed into — if possible. Otherwise, a new bin is used. To the contrary, first-fit is picking the first bin within the list of available bins offering enough space for the new item. Finally, the best-fit heuristic is searching for the bin having the least amount of free capacity although still large enough to accept the new item. In most cases, the quality of the results derived by the heuristics can be improved by sorting the list of items according to their volume in decreasing order before inserting them. One of the most prominent and efficient heuristics is the first-fit decreasing (FFD) strategy. It applies the first-fit approach on a list of items sorted according to their volume in decreasing order. Let OP T (P ) and F F D(P ) be the number of bins required for the problem instance P within an optimal solution and an assignment derived using the FFD heuristic. Within [22] it has been 6 proven that 11 9 OP T (P ) + 9 is a tight upper bound for F F D(P ). Therefore, assignments obtained using the FFD heuristic never require more than approximately 22% of additional bins when compared to an optimal solution. The Load Metric So far the properties of the elements of V used to describe volumes and capacities have not been specified. Within the literature, this set is usually substituted by the positive integers or real numbers. The common limitation to positive numbers corresponds to the fact that within most real world applications negative values do not appear. However, the limitation to those domains is too restrictive and can be relaxed. Nevertheless, this relaxation step is not affecting the fundamental complexity of the bin-packing problem. From the constraint given within equation 4.1, several requirements on the set V can be derived. For once, summing up elements has to be supported. This also includes the necessity of a zero element representing the result when adding up the elements of an empty set. The equation also demands a total order on the elements, which allows checking whether the aggregated volume of assigned items is still smaller than the available capacity. The total order on

57

the elements of V is also required for sorting items as it is required by some of the heuristics. Since within the balancing algorithm the elements of V are used to represent the amount of available resources offered by entities, an additional scaling operation has to be supported. This allows to compute the usable capacity if for instance only 90% of the total resources may be allocated. In addition, an inverse scaling operator is needed, determining to which degree a load value is consuming a given capacity. A particular type of mathematical structures supporting most of the required operations are vector spaces. By extending those with a total order on their elements, vectors spaces can be used within the bin-packing problem and its heuristics to model available and required resources. This observation allows the algorithm within the balancing system to use multidimensional resource consumption metrics to represent load values.

4.2.2 Extending the Bin-Packing Problem The bin-packing problem represents a close approximation of the resource allocation problem to be handled by the balancing algorithm. However, two modifications extending the original problem had to be applied. For once, within the resource allocation, the premise of having only same-sized bins does not hold. Since there is no way of reformulating the problem to compensate for this lack of flexibility, the constant bin capacity c needs to be replaced by a function cap : B → V determining the capacity of the available bins. As a result, the constraint formalized within equation 4.1 has to be updated to the following form: X ∀b ∈ B : vol(i) ≤ cap(b) (4.2) i∈f 0 (b)

The second modification is putting constrains on the possible item assignments. While within the original bin-packing problem any item might be assigned to any bin, load items within the balancing system might not be supported by all the nodes. For instance, game servers can only be started on nodes having a proper deployment. To support this additional constrain, a further function sup : B → 2I is defining the subsets of items supported by the various bins. Therefore sup(b) ⊆ I is the set of all items which can be assigned to the bin b ∈ B. Further, valid assignments need to satisfy the additional constrain given by equation 4.3. ∀i ∈ I : f (i) = b ⇒ i ∈ sup(b)

58

(4.3)

The resulting problem definition is matching the resource allocation tasks to be accomplished within the load management and balancing algorithms. The modified bin-packing problem is still NP-hard. This can easily be shown by observing that the original definition represents a special case of the extended version. Therefore, an algorithm solving the extended bin-packing problem is also capable of solving all classic bin-packing problems. Hence, the extended problem must be at least as hard to solve. Fortunately, the heuristics provided for the original bin-packing problem can be easily adapted to fit the extended model. However, proofs showing bounds for the quality of a heuristic are no longer valid due to the changed problem specification. Further, the heuristics can be enhanced by sorting the list of bins. Since the capacity of the available bins is no longer constant, the order in which bins are considered by the selection strategy might have an impact on the quality of the result.

4.2.3 A Generic Heuristic To provide means for dealing with the extended bin-packing problem within the algorithm, a generic heuristic has been develop. The heuristic can be customized by selection the way items are added to bins as well as the order in which items and bins are considered. Algorithm 1 shows the basic framework of the generic heuristic. The algorithm takes a problem instance as well as some parameters determining its behaviour as inputs. Thereby, the bin selection strategy is defining how items are assigned to bins. Beside first-fit and best-fit, which have been described within section 4.2.1, random and worst-fit are supported. When using the random strategy, bins are selected randomly while worst-fit is choosing the bin offering the largest amount of free resources. All of them have been expanded to respect the additional constrains of the extended bin-packing problem. However, the latter two are not aiming on producing an assignment using a minimum number of bins. For instance, especially worst-fit can be used to distribute the load evenly among the available bins. Hence, the load distribution pattern can be effectively influenced by adjusting input parameters. Beside the bin selection strategy, the item and bin order parameters have an influence on the resulting item mapping. Items can be ordered according to their volume in decreasing or increasing order or not at all. For bins, several sorting orders are supported. They might be arranged according to their capacity, assigned load or free resources in increasing or decreasing order. Since the latter two properties are modified during the assignment process, the list of bins might be resorted after every step.

59

Algorithm 1 Generic Bin Packing Algorithm Input: • • • •

(I, B, vol, cap, sup) the problem description selectionStrategy the bin selection strategy itemOrder the item order to be used binOrder the bin order to be used

Output: • An assignment res : I → B satisfying constrains 4.2 and 4.3 or Overloaded if no such assignment could be derived. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:

res := ∅ I 0 := itemOrder.sort(I) B 0 := binOrder.sort(B, res) for all i ∈ I 0 do b := selectionStrategy.selectBin(B 0 , i, res, vol, cap, sup) if b 6∈ B then return Overloaded end if res := res ] {(i 7→ b)} B 0 := binOrder.resort(B 0 , res) end for return res

The result produced by Algorithm 1 describes how the set of load items I should be mapped to the available bins B according to the chosen heuristic, such that no resource limitations are exceeded. The algorithm therefore starts by initializes the resulting mapping within line 1. Within line 2 and 3 the item and bin sets are sorted according to the specified order. Further, they are converted into lists I 0 and B 0 . Within lines 4 to 11, all the items are added one by one to the available bins according to their order within I 0 . Therefore, for each item i a bin b is chosen by the selection strategy (line 5). Thereby, the current assignment as well as parameters describing the handled problem instance are considered. Within line 6 it is tested whether a valid bin has been selected. In case no bin capable of accepting the new item i could be determined, the algorithm returns by stating that no valid assignment could be derived (line 7). Otherwise, the mapping i 7→ b is added to the resulting assignment res (line 9). The operator ] thereby corresponds to the union of disjoint sets, hence sets without common elements. Finally, since the bin order might depend on the current assignment, the list of bins is resorted (line 10). In case this dependency

60

WorstFit / Decreasing

FirstFit / Decreasing / Decreasing Capacity

0/8

0/11

13/14

17/17

First-Fit / Decreasing

20/20

20/20

10/20

0/20

20/20

4/8

7/11

10/14

13/17

FirstFit / Decreasing

16/20

8/8

FirstFit / Decreasing / Decreasing Capacity

0/20

0/8

Worst-Fit / Decreasing

0/11

13/14

17/17

20/20

WorstFit / Decreasing 20/20

10/20

10/20

10/20

10/20

10/20

Best-Fit / Increasing

0/20

20/20

12/20

18/20

11/11

4/8

7/11

10/14

13/17

16/20

FirstFit / Decreasing / Increasing Load

0/20

8/8

11/11

10/14

10/17

11/20

Figure 4.3: Example Patterns produced by various Heuristics does not exist, the resorting step is skipped. After all items within the list I 0 have been successfully processed, the resulting mapping is returned. The implementation of this generic heuristic, used as a utility function by the algorithms defined within this chapter, is supporting an additional parameter describing an initial load assignment. This assignment cannot be altered by the heuristic. However, it is considered when packing new items into bins. Load Distribution Patterns Not all combinations of the supported parameters create distinct behaviour. For instance, when using the best-fit heuristic, the value of the bin order option does not have an influence on the result. Further, best and worst-fit can be emulated using first-fit and a matching bin order. However, the goal of the algorithm is to provide a complete, orthogonal set of options to support experimenting with the resulting heuristics. Some of the patterns produced by combining the available options are shown within figure 4.3. Within all the examples, a set of 20 items has been assigned to the available bins. Their volumes are evenly distributed among the interval [1..4].

61

20/20

10/1

4.3 Overview on the Balancing Concept Before covering the details of the realized load-balancing concept, a general overview on the basic concept is provided within this section. The description includes the hierarchical decomposition of the load-balancing obligation as well Grid Level as the internal organization of the implementation. The latter is focusing on the involved components, their responsibilities and interactions. Details on their Site Level internal operation will then be covered within the following sections.

4.3.1 The Load-Balancing Hierarchy Node Level The realized resource management concept is distributing the responsibility of handling game sessions among multiple peers throughout the grid infrastructure. The decomposition of the balancing obligations is thereby based on the natural hierarchy found within any grid infrastructure. Figure 4.4 illustrates the levels considered within the load-balancing schema.

Grid Level Site Level Node Level

Figure 4.4: The Load-Balancing Hierarchy The hierarchy consists of three levels, where each is defined through its domain. The lower the level, the closer the resource allocation is to the actual game server instances and the hardware. On the node level, the most detailed information is available and the frequency of making decisions effecting the load distribution is the highest. In the higher levels, the load and resource representations become more and more abstract and balancing operations need to be carried out less frequently. The general idea is that the load on resources is balanced as efficiently as possible on the lower levels. However, in case the assigned total load is exceeding the available capacities on some level, it is the obligation of the next higher level to reduce the stress on the corresponding sub-

62

domain by performing balancing operations involving all the resources within its extended scope. A similar approach for general grid applications has been proposed within [23]. Additional responsibilities may be present on some of the layers to provide full support for the game session management. Further, every layer may perform additional state evaluations and modifications to enforce preferred load distribution patterns on the various levels. Hence, the influence of higher-level services on the system state is not restricted to situations in which lower layers become overloaded. The Node Level The smallest domain defining the lowest level is given by a single node. On this layer, the local operating system is responsible for assigning the available resources to the maintained processes. When running game server instances, the resource allocation algorithms applied by the OS have to be capable of supporting the real-time requirements of the involved applications. While those do not introduce big challenges for the memory management, especially the CPU scheduling has to provide support for the timing constrains introduced by the game loop. Due to their internal real-time simulation loop concept, game server implementations require access to the processor on a periodic basis for small amounts of time. To provide best results, the delay between successive iterations needs to be as constant as possible. Further, the execution of a simulation step should not be interrupted to avoid slowing down depending processes. In addition, if multiple game server instances are maintained on a single host, the local scheduler must be capable of handling several, interleaved application exposing the same characteristic execution profile. The presence of multiple cores in contemporary systems as well as multi-threaded game server engines is introducing additional issues. Finally, the operating system also needs to manage the network data flow for the present game servers to satisfy the time constrains introduced by the game loop. This might also involve the management of multiple network interfaces. For the highest efficiency, the CPU and network load needs to be balanced among the available resources. Therefore, the domain of a single node is considered the lowest level in the hierarchy. The algorithms applied on this level have a high impact on the game play experience. However, since this functionality is realized within the operating systems managing the nodes, the solution developed for this thesis has no influence on those algorithms. Nevertheless, this level is included within the load balancing hierarchy to point out its important role for the overall concept.

63

The Site Level The next higher level in the hierarchy is formed by a complete grid site. According to the hierarchy concept, the major obligation of the services on this layer is to distribute the present load among the managed nodes to avoid the exceeding of resource limits. Thereby, the basic items transferred between hosts are game server instances. Hence, moving load mainly corresponds to the migration of server processes. In addition to the responsibility of maintaining the process-to-host assignment, the services on this level are also responsible for managing the replication of zones. Therefore, game loop saturations are constantly monitored and balanced to provide a homogeneous game play experience. In case replicas managing a common zone are about to become overloaded, a new replica is created and assigned to some node. On the other hand, if the saturation of the involved processes drops below a certain threshold, one of the contributing server instances is eliminated. As has been described within section 4.1, those two obligations are referred to as replication and resource management. Within the algorithm developed for this thesis, the corresponding responsibilities are recognized by the local balancer component. Each instance of this component is managing the resources offered by an associated hoster. Therefore, the implementation of this layer is based on a centralized approach. However, due to the distributed nature of the highest level, the overall solution remains scalable. More details on the algorithms applied on this level are provided within section 4.4. The Grid Level The highest level in the balancing hierarchy is covering the entire grid. On this layer, load is dedicated to the available hosters by assigning zone duties (see section 2.2.1). Therefore, entire sets of game servers managing the same zone within a session are moved between hosters if necessary. A positive side effect of picking this granularity is the fact that replicas of the same zone are almost always maintained within a single hoster. This property is only violated during the short periods required for duty migrations. Therefore, the network overhead introduced by managing shadow entities remains low. In addition, the local balancer instances have a full overview on the state of all servers contributing to the management of a single zone. Hence, it becomes easier to decide whether the number of involved replicas has to be adjusted. Beside the load balancing activities, this layer is as well contributing to the session management. Its main obligation thereby is to ensure that all the zones being part of active sessions are assigned to some hosters. Hence, every active

64

zone must be managed by at least one game server process at any time. This type of management operations can only be performed on the global level since only at this layer all the necessary information is available. Within the algorithm implemented for this thesis, the global balancer component is recognizing the responsibilities of managing game sessions on the grid level. Every site contributing to the grid infrastructure is represented by an instance of this component. All of them are connected through the underlying P2P network. Their main task is to negotiate the migration of load between grid sites. Because the protocols for realizing this obligation do not depend on a single, central authority, the overall algorithm is representing a distributed load-balancing schema. Details on the involved protocols are covered within section 4.5.

Resulting Characteristics Although the domains of the various levels within the hierarchy are of varying size, their obligations correspond to a common pattern. On every level, a set of coarse-grained load items need to be managed. Therefore, they are refined to a smaller granularity and distributed among the available resource. While on the global layer the set of managed sessions is refined to a set of zone duties, which are distributed among the participating hoster, the same pattern can be found on the site level. However, in this layer, zone duties are decomposed to zone replicas and distributed among the available nodes. Nevertheless, the central problem remains the same. As has been stated previously, an abstract mathematical model for this situation is provided by the extended version of the bin-packing problem described within section 4.2.2. Therefore, the generic heuristic of section 4.2.3 is used to handle the emerging load assignment problems within the algorithms. The hierarchical decomposition of the load-balancing task allows providing a scalable solution capable of managing the numerous resources of a grid. Thereby, it is important to see that most of the time the involved balancing components perform their operations independently from each other within their own domain. Load distribution modifications on higher layers are primarily intended to compensate for resource deficits on the lower levels. In addition, the hierarchical approach allows applying load balancing operations at varying granularity and frequency throughout the different layers. On the node level, balancing decisions have to be made very frequently using a fine load granularity. To the contrary, on the other end of the hierarchy, operations affect much larger quantities of load. However, they are triggered a lot less frequently.

65

A final property derived from the hierarchical concept and the accompanied partitioning of the domains is the fact that the services on the various layers do not need to be realized by the same implementation throughout the system. A direct example is given on the node level. Clearly, there is no need for using the same operating system throughout the environment to realize the balancing operations. The same holds for the site level. The corresponding services may be customized to treat resources according to hoster specific requirements. Still, the overall load-balancing concept is working out as long as the requirements of the various layers are met.

4.3.2 Internal Organization of the Load Management Module Figure 4.5 illustrates the internal structure of the implemented load management and balancing module. The two major components have already been mentioned within the previous subsection. The local balancer is responsible for distributing the load within the managed site while the global balancer component is negotiating inter-hoster load transfers. Two additional components are responsible for starting up and shutting down sessions. Due to the different challenges emerging during the corresponding session phases, the creation and destruction of sessions cannot simply be handled by the balancer components. Especially when creating a new session, several issues have to be considered which do not occur during ordinary session maintenance. The most severe problem thereby is the lack of load information. Since yet there is no game server instance managing the new session, information about the required resources has to be retrieved through alternative channels.

Global Balancer

Local Balancer

Mngt. Services

Balancer Session Closer

Session Starter

Balancer

P2P

Balancer

Balancer on demand connections

Figure 4.5: Internal Algorithm Organization

66

The Local Balancer The local balancer component is responsible for maintaining the resources and zone replicas of a single site. Hence, it is realizing the services specified for the site level of the balancing hierarchy. To do so, the component is regularly evaluating the load distribution within its domain. In case ways to improve the system state are identified, the corresponding operations are applied. For instance, in case the number of replicas of a zone can be reduced by one, the local balancer eliminates one of the corresponding server instances to reduce resource requirements. Details on the applied algorithms are presented within section 4.4. Every cycle of the balancing loop realizing the periodic evaluations is isolated from the others. Hence, no essential information is locally forwarded between successive iterations. Therefore, every execution represents a discrete event without the need of any local context information. As a result, it is not important which algorithm instance is actually performing the state evaluation. Hence, due to this stateless principle, other controller instances can simply continue the balancing process in case the responsibility for a hoster has been reassigned by any reason.

The Global Balancer As its name suggests, the global balancer component is recognizing the responsibilities assigned to the grid level of the balancing hierarchy. Instances are therefore providing two functionalities. The first is offered to the local balancer component present within the same algorithm instance. It allows identifying sites to which zone duties can be offloaded in case the locally maintained domain becomes overloaded. To the contrary, the second feature is aiming on reducing the number of sites involved in managing concrete game sessions. Therefore, instances of the global balancer component are periodically analysing the session distribution state. In case the entire load assigned to one of the involved sites can be handed over to the remaining hosters, the corresponding load movement operations will be arranged. Both operations are realized by communicating with other global balancer instances representing alternative sites. The corresponding protocols are described within section 4.5. During the necessary negotiations with other peers, a link to the local balancer is required to obtain information on the current system state. Also, since the local balancer has the full authority on managing the resources within the represented site, details like the nodes on which incoming server instances should be placed are retrieved using this connection.

67

The Reservation Manager The interaction of the various components within an instance of the algorithm is coordinated using a shared reservation manager. If for instance the global balancer has accepted to take over a zone duty from a remote site, reservations for the arriving server instances are made on the target nodes. Those reservations are preventing other components from using allocated, however, not yet consumed resources.

4.4 The Local Balancing Component As has been describe within the previous section, the responsibility of the local balancer is to maintain the resources within a grid site and to control the number of replications needed to support the assigned zone duties. Additionally it has been mentioned that the algorithm is periodically analysing the current system state to provide hints for improving the situation. Within this section, the steps involved within such a balancing cycle will be covered in detail.

4.4.1 Overview Figure 4.6 illustrates the various steps accomplished during each balancing cycle. It starts by capturing the current state of the managed system. Within the folDetermine Current State

Analyse State

Allocate Resources

Load Pattern Reshaping

Apply Modifications

Figure 4.6: The Balancing Cycle Sequence lowing step, the retrieved data is analyzed and a list of necessary modifications is created. The list may for instance contain requests about moving server instances to another, not yet specified node or instructions to eliminate one replica of a certain zone. Within the allocation step, the abstract list of modifications is transformed into concrete balancing operations. In particular, target locations for new or moved server instances have to be determined. In case the local site does not offer sufficient capacities, the services of the global balancer might be used for offloading duties. After the necessary operations have been derived, an optional reshaping step is analysing the resulting load distribution. In case additional modifications can improve the load distribution pattern, the corresponding operations are added to the resulting list of modifications. Within the final step, the entire set of derived load balancing operations is forwarded to the

68

hoster to be executed. To avoid interfering with previously issued commands this phase is blocking until all the operations have been finished. Finally, the loop is stalled until it is time to execute the succeeding balancing cycle. Replication and Resource Management As determined by the balancing hierarchy, the local balancing component is responsible for managing the replication of zones as well as assigning replicas to hosts. The replication management is thereby incorporated into to analysing step of the balancing cycle. The resource management on the other hand is realized by the allocation step as well as by the optional reshaping phase.

4.4.2 Retrieving the Current State Within the first step of the cycle, the current state of the managed hoster is obtained from the algorithm environment. Based on the retrieved information a bin model is created. This internal representation of the managed domain models the available nodes as bins and the assigned game servers as items. Further it is the common representation used within the local balancer to describe the state of a site. In a final step, the derived model is annotated with resource reservations maintained by the reservation manger (see 4.3.2). The Local Load Metric The component realizing this step is determining the load metric to be used within the resulting bin model, hence the metric used throughout the local balancing process. Within the formal description of the bin-packing section, this metric corresponds to the set V . Thereby, a very pragmatic approach has been chosen. To represent the load caused by a running game server, a vector of resource requirements is used. Hence, to describe the volume of a server within the bin model, a vector   (CP Ucur , CP Upred )  (M em , M em   cur pred )  lserver =  (4.4)  ∈ ((N0 )2 )4 = Vlocal  (N etIncur , N etInpred )  (N etOutcur , N etOutpred ) is used, covering the current and predicted resource consumption of the corresponding process. Thereby, the amount of memory in use is defined in bytes. Further, in- and out-going network load is measured in bits/sec. Finally, the CPU requirements are determined based on the number of instructions executed for the represented server process per second. Although determining the last

69

value might be especially difficult for an actual application, it has been selected due to a lack of a better alternative for quantifying CPU workload. In a practical environment, approximations based on the consumed CPU time are sufficient. However, ideally some artificial CPU load unit respecting the characteristics of game servers should be defined by hosters using service level agreements. This unit should allow measuring the CPU load caused by a game server instances within a heterogeneous environment in a comparable way. Such an abstract measure would satisfy the requirements of a load metric. Hence, it could be used as a substitution for the less accurate number of instructions. Within the load metric, predicted resource requirements are treated as if they would describe the requirements on an additional resource type. Therefore, when assigning a server instance to a host, all its current and predicted resource requirements have to be satisfied. As has been described within the problem statement chapter, the temporal offset of the predicted values is determined by the services of the underlying grid site (see section 2.1.6). In case no predictions are available, the current values might be used as substitutes. Alternatively, another load vector may be specified since the type of load metric to be used by the algorithm is exchangeable. Hence, even the support for additional resources can easily be incorporated into the present algorithm by exchanging the implementation of the applied load metric.

Dealing with Background Load Another responsibility recognized by this step of the balancing cycle is to filter background load caused by additional processes consuming resources on the managed nodes. Since those tasks cannot be influenced by the balancing algorithm, they are not included within the derived bin model. However, their effect on the available resources must still be considered. The first step to handle background load on some host is to quantify it. Therefore, the sum of the resources consumed by the assigned server instances is subtracted from the total load present. The resulting value corresponds to the system load cause by non-server applications. To incorporate the obtained value into the bin model, the capacity of the bin representing the evaluated host is set to the total available resources reduced by the quantity of the background load. Hence, the capacity of bins within the derived model is describing the available amount of resources on the associated node to be used exclusively for game server instances. An illustration of this filtering mechanism based on a one-dimensional load metric is given within figure 4.7.

70

Server Processes

Host

Usable Capacity within Bin Model

Total Capacity of Host

Background Load Real Situation

Bin-Model Representation

Figure 4.7: Filtering Background Load

4.4.3 Analysing the System State First-Fit / Decreasing

Within the second step, various analyses are applied in parallel on the retrieved state information. Each of those is producing a list of modifications to be applied on the managed system. Those lists are merged and forwarded to the following phase. Thereby, set of may be extended to support addi20/20 the 20/20 10/20applied 0/20 analyses 0/20 tional features. The two main evaluation methods implemented for this thesis are responsible for identifying overloaded nodes and realizing the replication management.

Overloaded Nodes Module The first of those is simply iterating through the bin model looking for over20/20 20/20 20/20 loaded nodes.20/20A host is20/20 considered to be overloaded if its assigned load is exceeding a certain threshold value. This resource threshold is one of the configurable parameters of the algorithm. In case an overloaded node is identified, a random assigned server instance is chosen to be migrated to another host. In case the workload caused by the remaining server instances is still exceeding the resource threshold, additional instances are20/28 selected until the remaining load is 20/22 20/24 20/26 20/30 small enough. Finally, requests for moving the selected game server processes to other nodes are added to the resulting list of modifications forming the input for the following resource allocation phase. The target nodes for corresponding migration operations are thereby not specified.

71

The Replication Management Module The second important analysing module is recognizing the replication management responsibilities. Therefore, it is deciding whether the number of replicas involved in maintaining a zone should be altered. Further, it is responsible for balancing the game loop saturation within processes managing the same zone. Since load can only be distributed between server instances managing the same zone, all the decisions of this module are based on comparing the state of the replicas contributing to the same zone duty. Therefore, in a first step, all server instances running within the managed domain are grouped together according to their managed zone. Each of the resulting groups is then evaluated separately. Based on the composition of the game load saturations within the derived groups, a load re-balancing instruction is derived (see section 3.2). The corresponding implementation is based on a simple weighting mechanism. The basic idea is to annotate the involved replicas with an artificial weight. The set of all weights is describing the proportional distribution of workload among replicas of the same zone. For instance, let R be the set of all replicas and r1, r2, r3 ∈ R be three replicas managing the same zone z. Further, let {. . . , (r1 7→ 100), (r2 7→ 95), (r3 7→ 108), . . .}

(4.5)

be the currently valid weight assignment. In this case, the overall workload of zone z will be distributed among its three replicas based on the ratio 100 : 95 : 108. Hence, r2 is getting less load assigned than r1, which itself is handling less workload than r3. Those weights are constantly adjusted to maintain approximately equal game loop saturations throughout all game server instances managing the same zone. normal

light

normal

heavy

Game loop saturation in % tolerance

average saturation

A

B

C

D

Replicas

Figure 4.8: Example Loop Saturations and Resulting Weight Modifications

72

Figure 4.8 illustrates the weight adjustment principle. The game loop saturation of a concrete server instances is compared with the average saturation among the replicas contributing to the same zone duty. If the difference is larger than a certain tolerance value, the weight will be adjusted. If the saturation it too low, the value will be increased by one unit. If during the next balancing cycle the value is still too low, the weight will be increased by two units and so forth. This procedure continues until the game loop saturation of the corresponding replica is within the predefined tolerance range. As soon as the game loop saturation is no longer too small, the value by which the weight is modified will be reset to zero. The same procedure is applied whenever the saturation is too high unless in this case the weighting is reduced within every step. Algorithm 2 is describing this load adjustment mechanism for the replicas of a single zone. The algorithm takes a set of replicas R managing a common zone as well as a mapping sat : R → R+ 0 defining their current game loop saturations as input. Further, the weights derived during the previous iteration of the balancing loop as well as the last modifications applied on those weights are required. The result of the algorithm represents the new weight assignment weight0 , which is defining the new load distribution ratio and providing the input for the next iteration. The algorithm starts by initializing the resulting weight assignment (line 1). Within line 2, the average game-loop saturation of the considered replicas is computed. It is followed by individually updating the weights assigned to the considered replicas (line 3-22). Each update thereby starts by retrieving the currently assigned weight w as well as the last applied modification ∆w. In case the considered replica r has already been weighted previously, the required information is taken from the handed in weight assignment (line 6). Otherwise, a new, initial value is derived for replica r (line 8). The initial weight is determined based on the weights assigned to the remaining replicas of the same zone. Further, the speeds of the executing hosts are considered to provide a more accurate result. After the current weighting of r has been obtained, ∆w is updated based on the replica’s current game-loop saturation as described above (lines 11-18). After modifying the weight w (line 20), the new weighting is added to the result (line 21). Finally, after the weights of all considered replicas have been updated, the new assignment is returned (line 23). While developing the weight adjustment mechanism, alternative weight modification concepts have been tested. Beside the quadratic concept described above, in which weight modifications are altered by one unit within every step a replica is not classified as normal, an exponential approach has been a promising alternative. Thereby the amount by which weights are altered is doubled when-

73

Algorithm 2 Weight Adjustment Algorithm Input: • R set of replicas managing the considered zone • sat : R → R+ 0 a mapping defining the saturation of the replicas • weight : R → (N0 × Z) a partial mapping assigning the current weights and the last modifications to the various replicas • tol ∈ R+ 0 the tolerance as illustrated within figure 4.8 Output: • weight0 : R → (N0 × Z) a mapping describing the new weights and applied modifications 1: 2: 3: 4: 5: 6: 7: 8: 9:

weight0 := ∅ satavg := avg(sat(R)) for all r ∈ R do (w, ∆w) := (0, 0) if r ∈ dom(weight) then (w, ∆w) := weight(r) else (w, ∆w) := (initialW eight(r, R, weight, sat), 0) end if

10: 11: 12: 13: 14: 15: 16: 17: 18:

cur := sat(r) if cur < satavg − tol/2 then ∆w := max(1, ∆w + 1) else if cur > satavg + tol/2 then ∆w := min(−1, ∆w − 1) else ∆w := 0 end if

19:

w := max(w + ∆w, 0) weight0 := weight0 ] {(r 7→ (w, ∆w))} 22: end for 23: return weight0 20:

21:

ever the corresponding replica remains light respectively heavy. However, this concept turned out to be too aggressive in some cases, such that for instance phases of increasing weights are followed by almost equally long phases in which values are reduced again. Further, since exponential modifications are as minor as quadratic changes during the first two steps, modifications in each direction

74

are starting slow. As a result, the exponential variant has not been able to reach the equilibrium faster than the chosen quadratic approach. Furthermore, while converging toward the balanced state, more aggressive load changes are applied, increasing the risk of accidentally overloading game server instances. Finally, in case the average weight of replicas managing a common zone is dropping below a lower boundary of 70 or is exceeding an upper limit of 130, all weights are normalized such that the average weight becomes 100. This way, weights do not degrade and the impact of modifications remains approximately constant over time. After the weights of the replicas within a group have been adjusted, a rebalancing operation based on the new weights is added to the list of requested modifications generated by this module. Further, all the necessary information required for realizing this mechanism is stored as a hoster attachment (see section 3.4). This simple weighting mechanism enables the balancing solution to deal with the varying processor speed of available nodes. Further, its abstract concept allows handling the various ways the underlying management services might distribute load between server instances (see section 3.2). Finally, the developed mechanism is capable of adapting to the characteristic of the underlying system by learning from the responses to the previous modifications.

Adjusting the Number of Replicas The second responsibility of the replication management module is to determine whether new replicas should be added or present server instances can be eliminated. This decision is made based on the average loop saturation of the involved replicas. If on average a certain threshold is exceeded, a request for a new replica will be added to the resulting list of modifications. On the other side, if the saturation is falling below a certain limit, one of the involved replicas may be removed. However, unlike the upper limit, the lower limit is not a constant value since for instance the impact on the remaining replicas is much bigger when removing one server out of two than out of four. The lower saturation boundary is therefore depending on the current number of replicas. The number of replicas might be reduced whenever the average saturation of the remaining server instances is not exceeding a given threshold thl after the reduction. A simple model allowing to derive the saturation level s(n) below which one replicas out n might be removed is provided by (n − 1) · thl = n · s(n)

(4.6)

75

It is based on the assumption that the game loop saturation scales linear with the number of active entities and the involved nodes are equally fast. Hence, the model is no very accurate but sufficient for dealing with this aspect of the algorithm. Based on this simple model, the average saturation level s(n) below which replicas can be removed without exceeding the given threshold thl can be determined using n−1 s(n) = · thl (4.7) n Figure 4.9 illustrates the resulting boundaries of the game loop saturation depending on the number of involved replicas when using an upper saturation threshold of 90% and a lower of 80%. Within the balancing system, both values are configurable parameters. However, to avoid oscillations, the lower threshold value should be clearly smaller than the upper. The figure shows the saturation upper saturation limits 100%

Average loop saturation in %

upper threshold

lower threshold

lower saturation limits

0%

1

2

3

4

5

6

# of Replicas

Figure 4.9: Example Game Loop Saturation Boundaries ranges at which the current number of replicas will be retained. For instance, the algorithm will maintain 4 replicas for a zone as long as their average saturation is between 60% and 90%. In case the upper limit is exceeded, an additional fifth replica will be added. On the other hand, if the saturation drops below the lower limit, a replication will be removed. By doing so, the saturation of the remaining three replicas will not exceed the lower threshold of 80% according to the simple model defined by equation 4.6. Other Modules Beside the two essential modules described so far, additional components contribute to the analysing step. For instance, one of those is searching for missing

76

or dispensable game server instances. The module therefore realizes a cleanup operation in case some error corrupted the system state. Finally, one last module to be mentioned is responsible for initiating zone duty movements after they have been negotiated by the global balancer.

4.4.4 The Resource Allocation Step Based on the list of modifications derived within the previous step, the allocation step is generating a list of concrete balancing operations to be forwarded to the managed site. The given list of requests includes various different types of modifications. Some of those are rather easy to realize since all the necessary information is already available. For instance, requests for re-balancing the load between replicas can be directly converted into corresponding balancing operations. However, other requests are only partially specified and it is the responsibility of the component handling this step to fill in the missing parameters. For instance, requests demanding a new replica for a certain zone do not specify the target node. Hence, the resource allocation module can chose proper values for the missing parameters to achieve desired load distribution properties. Essentially, the implementation of this component is extracting a list of replicas to be added to the system from the list of modifications. Those new replicas might be needed to reduce the load of other replicas or represent target instances for server migration operations. The set of new replicas is then assigned to the available nodes using the generic heuristic represented within the previous section. Finally, the derived mapping is used to fill in open parameters when assembling the resulting list of load balancing operations. Due to its central role in assigning zone replicas to nodes, the component realizing this step has a significant impact on the resulting load distribution pattern. Especially the parameters used for the generic heuristic computing the replication assignment is influencing the resource usage. Therefore, these parameters are included within the algorithm’s configuration. Bin-Packing based Resource Allocation Algorithm 3 representing a slightly simplified variant of the implemented version. Those simplifications have been made to reduce the complexity of the presented algorithm. For instance, within algorithm 3 only a single replica of a zone might be assigned to one host. However, the actual implementation supports multiple replicas of the same zone being mapped to the same node. Further, zone duty movements, multiple sessions running the same game map and the possibility of insufficient local resources are ignored for simplicity. However, the implemented version is considering all those issues as well.

77

Algorithm 3 Resource Allocation Algorithm (simplified) Definitions: • Let Z be the set of managed zones • Let H be the set of available Hosts, ⊥ ∈ / H an unspecified host • Let V be the vector space used as a load metric (see 4.4.2) Input: • • • • • • • •

R ⊆ Z × H the current set of present replicas toBeM oved ⊆ R the set of replicas to be moved to other hosts toBeRemoved ⊆ R the set of concrete replicas to be removed addReplica ⊆ Z the set of zones to be extended by one replica remReplica ⊆ Z the set of zones to be reduced by one replica vol : (Z × (H ] ⊥)) → V partial mapping defining the volume of replicas cap : H → V a mapping defining the capacity of hosts sup : H → 2Z×(H]{⊥}) defines the sets of replicas supported by hosts

Output: • move : (Z × H) → H replicas to be moved mapped to their target • add ⊆ Z × H replicas to be added • rem ⊆ Z × H replicas to be removed 1: 2:

remReplica := remReplica\ {z ∈ Z|∃h ∈ H : (z, h) ∈ toBeRemoved} rem := toBeRemoved ] f indV ictims(remReplica, R, sup, cap, vol)

3: 4: 5: 6: 7: 8: 9: 10: 11: 12:

toBeM oved := toBeM oved\toBeRemoved newReplicas := toBeM oved ] addReplica × {⊥} vol0 := vol for all (z, h) ∈ newReplicas do if (z, h) ∈ / dom(vol0 ) then vol0 := vol0 ] {((z, h) 7→ estimateLoad((z, h), vol))} end if end for ass := genHeuristic(newReplicas, H, vol0 , cap, sup, R)

13:

add := {(z, h) ∈ Z × H|((z, ⊥) 7→ h) ∈ ass} move := {((z, h) 7→ h0 ) ∈ ass|(z, h) ∈ toBeM oved} 16: return (move, add, rem) 14:

15:

The algorithm accepts sets of modification requests created by the previously performed analysis as arguments. Replicas are thereby represented by pairs consisting of the managed zone and the host the corresponding server instance

78

is assigned to. The list of modifications includes a set of replicas to be moved to a different host (toBeM oved) as well as a set of concrete replicas to be removed (toBeRemoved). Further, sets of zones to be extended (addReplica) or to be reduced (remReplica) by one replica are accepted as input parameters. It is thereby assumed that addReplica ∩ remReplica = ∅. In additon it has to be mentioned that those parameters do not specify to which host the new replica should be assigned to nor which existing replica should be eliminated. Those decisions are made by the resource allocation algorithm. Finally, several parameters defining bin-packing related properties are required (vol, cap, sup — see algorithm 1). The algorithm produces a list of concrete balancing operations. This includes movement operations specifying which replica has to be migrated to which host. Further, sets of concrete replications to be added and removed are generated. The algorithm starts by computing the set of replicas to be removed. Therefore, within line 1 the set of zones for which an arbitrary replica should be removed is reduced by the set of zones for which concrete replicas anyway need to be eliminated. Within line 2, the resulting list of remove operations is derived. Therefore, the set of concrete removable replicas handed in as a parameter is merged with a list of victims chosen for the zones within remReplica based on the current assignment. Thereby, replicas of the corresponding zones located on heavily loaded nodes are preferred. The second part of the algorithm is searching for new locations to add replicas. Therefore, after removing dispensable information within the toBeM oved set (line 4), the set of new replicas to be added to some nodes is computed within line 5. It is derived by merging the replicas to be moved and the set of yet unassigned new replicas required for extending the number of servers managing zones. Replicas of the latter type are thereby marked by using the unspecified host ⊥. Within line 6-11 the resource requirements of the new replicas are obtained. For the new replicas, the corresponding load metric is estimated based on the load of the replicas already managing the same zone (line 9). Within line 12 the generic heuristic described within algorithm 1 is used to assign the new replicas to the available nodes. Thereby the current assignment of the present replicas R is considered. The generic parameters determining the heuristics behaviour (e.g. selection strategy, item order) are therefore retrieved from the algorithm’s configuration. Finally, the resulting assignment is subdivided into a set of replicas to be added (line 14) and the mapping describing to which nodes the replicas of the toBeM oved set should be migrated. The three resulting data structures are then returned within line 16.

79

Reacting on Insufficient Local Resources In some cases, the locally available resources are not sufficient to support all the requested modifications. As a result, the algorithm is identifying a subset of zone duties to be moved to other sites. To resolve the destinations, one of the services offered by the global balancer is used. If targets could be obtained successfully, a zone-duty-movement operation as illustrated within figure 3.5 is added to the resulting set of balancing instructions.

4.4.5 Load Pattern Reshaping Within the allocation step, only requests included within the given list of modifications are considered. Therefore, its influence on the load distribution pattern is limited. This restriction is dropped within the subsequent, optional reshaping step. Within this step, the load distribution pattern obtained after applying the operations derived so far is predicted. Based on the inferred information, additional load moving operations might be added to improve the result. The opportunity offered by this step is used within the implementation of the component covered by this section to minimize the number of nodes participating in maintaining game sessions. The corresponding generic load compression heuristic is described by algorithm 4. As usual, the input is provided in form of a bin model. Thereby, the involved hosts are considered to form the set of bins B, the present replicas are defining the set of load items I and the current load assignment is given by the mapping ass : I → B. The result of the algorithm is a new assignment ass0 : I → B using equally or less bins. Thereby, the number of item movements should be kept as small as possible. The algorithm starts by initializing the resulting assignment ass0 using the given assignment ass (line 1). Within line 2 the set of used bins is obtained. The loop realized by lines 3-12 is trying to eliminate individual hosts, hence bins, by offloading their assigned load items to the remaining, already used bins. Bins are thereby considered according to their assigned load in increasing order. Within every iteration, the load items I 0 assigned to the currently evaluated bin b are obtained (line 4). Further, the remaining set of used bins B 0 not including b is computed within line 5. Based on those values, the generic heuristic of section 4.2.3 is used to test whether all the items within I 0 can be moved to bins within B 0 . If the corresponding operation is successful, the resulting assignment ass0 is updated accordingly by replacing all mappings for elements in I 0 by the newly derived variants (line 9). Further, the currently evaluated bin b is removed from the set of used bins (line 10).

80

Algorithm 4 Generic Load Compression Heuristic Input: • • • •

I the set of present load items B the set of used bins ass : I → B the current item to bin assignment vol, cap, sup as within algorithm 1

Output: • ass0 : I → B assignment using equal or less bins 1: 2: 3: 4: 5: 6:

ass0 := ass Bused := ass(I) for all b ∈ sortByIncreasingLoad(B, ass) do I 0 := {i ∈ I|ass0 (i) = b} B 0 := Bused \{b} dist := genHeuristic(I 0 , B 0 , vol, cap, sup, ass0 )

7: 8: 9: 10: 11: 12: 13:

if dist 6= Overload then ass0 := {(i 7→ b) ∈ ass0 |i ∈ / I 0 } ] dist Bused := Bused \{b} end if end for return ass0

Therefore, if this optional reshaping step is enabled, the local balancer will constantly try to reduce the number of involved nodes. However, if no reduction is possible, this step is triggering no extra load balancing operations.

4.4.6 Applying Balancing Operations Finally, after the current state has been analysed and the list of state modifications has been derived, those operations are executed using the workflow concept supported by the algorithm environment (see 3.4). Thereby all the derived balancing operations are executed in parallel. Further, the algorithm is waiting until all the operations have been finished to avoid interfering with other balancing cycles. This way the algorithm does not need to consider ongoing operations during the analyzing phase. Nevertheless, if the execution of some balancing operation requires an exceptional long period, a timeout will be triggered, terminating this final step of the cycle. After this, the balancing loop is ready to perform its next iteration.

81

4.5 The Global Balancing Component The global load-balancing algorithm is working on the highest layer of the balancing hierarchy. Its main responsibility is to negotiate the load migration between hosters and the maintenance of the zone duties assignments.

4.5.1 Overview On the global layer, resources are managed on a per session basis in a distributed way. Therefore, for every session, the list of participating sites is maintained as shared information within the network. Further, for each session one of the contributing hosters is elected to be the session manager. Hence, the global balancer being part of the algorithm instance managing the selected site becomes a central authority for load balancing decisions involving the associated session. For other active sessions within the system, different session managers are selected. Hence, the responsibility of managing sessions is distributed among all load-balancing peers. In case a node is no longer capable of fulfilling this central role, the responsibility can be handed off to another site contributing to the same session. The global balancer is providing two services. The first supports the movement of load on demand. Therefore, whenever a local balancer instance detects that its available resources are insufficient to manage its assigned load, this service can be used to offload a subset of the assigned zone duties to other sites. The second service is based on a periodic evaluation of the session distribution. Thereby the algorithm is checking whether the number of involved hosters can be reduced by concentrating the game session load on a subset of the participating sites. Both services are coordinated by the global balancer instance owning the authority on the corresponding session.

4.5.2 The Global Load Metric Any peer contributing to the network of global balancers provides information on the current load state of the managed site as well as its free capacities. The load state is represented by the set of assigned zone duties annotated by the amount of load caused by each of those. The used load metric is based on the metric utilized for local balancing operations (see section 4.4.2). However, instead of using a single vector containing all the consumed resources, an entire set of those vectors is used to model the load caused by a zone duty. For instance, let R be the replicas of a zone duty d. Further, let l : R → Vlocal define the local load

82

values for all replicas r ∈ R. The global load value lduty used to describe the resource requirements of d is then provided by lduty = l(R) = {l(r) ∈ Vlocal |r ∈ R} ∈ 2Vlocal = Vglobal

(4.8)

Hence, each element within the set is representing the load induced by one replica maintained for the corresponding zone duty. The support for the required metric operations like summing up values is extended appropriately. The free capacity of a hoster is represented using the same load metric. However, in this case each element within the set is representing the free resources available on some host capable of supporting a game server for a given session. In general, the capacities offered by peers do not need to correspond to the actual available resources since a certain percentage might be reserved for internal management operations. The way this reduced offer is derived may be hoster specific. Within the implementation used for this thesis, hosters are offering a fixed percentage of the available resources. Like other parameters, the concrete value is part of the algorithm’s configuration. The fine granularity of the selected load metric allows determining more precisely whether a certain site is capable of maintaining a zone duty. An alternative would be to use aggregated load metrics. For instance, resources consumed by the various replicas involved in managing a zone might simply be summed up. However, in this case, details would be hidden and the probability of deriving zone duty movements that cannot be realized would be increased. Finally, just as the local load metric, the global metric remains exchangeable.

4.5.3 On Demand Duty Movements Figure 4.10 illustrates the procedure used for moving a zone duty to other sites on demand. It is initiated by an overloaded site. Thereby a message is send to the central coordinator of the corresponding game session (step 1). In the following step, all the sites already participating to the managed game session are asked for their free capacities (2). If one of those is capable of managing the overloaded zone duty, a message is sent to the original hoster including the identified target site (3). The zone duty will then be migrated between the corresponding sites (4). However, if none of the involved participants offers sufficient capacities, the game session needs to be extended to a new site. Therefore, random hosters within the grid are contacted and asked for their free capacities. The results are collected and as soon as a configurable number of none empty responses have been received, the site offering the largest capacity is chosen. Due to the basic

83

Session Manager 1 3

Balancer

2

Balancer

Balancer

Mngt. Services

Mngt. Services

4 Mngt. Services

Session Manager 1 Figure 4.10: Offloading Zone Duties on Demand 3

2 3

1

characteristic of this procedure,Session it is only used to compensate for growing game Manager Balancer Balancer Balancer sessions. Therefore, a second operation is needed to react on shrinking sessions. 2 2 4

2

4

Mngt.State EvaluationsMngt. 4.5.4 Periodic Services Services Balancer Balancer

Mngt.

Services Balancer

One of the objectives of the global balancing algorithm is to keep the number 5 of hosters involved in maintaining a game session 5as low as possible. Therefore, Mngt. Mngt. Mngt. the elected session of the involved Services manager is periodically Services testing whether some Services sites can be eliminated. Figure 4.11 illustrates the corresponding operations. 3

1

Session Manager 2

2

Balancer

Balancer 5

Mngt. Services

2

4

Balancer 5

Mngt. Services

Mngt. Services

Figure 4.11: Reducing the Number of Participants The periodically triggered procedure starts by collection load-state information from all sites involved in maintaining the game session (step 1+2). The obtained data is then transformed into a bin model within which the bins repre-

84

sent sites and items zone duties. On this model the same compression algorithm as described within section 4.4.5 is applied (3). If a possibility of reducing the number of involved hosters is found, instructions for the necessary modifications are forwarded to the corresponding balancer instances (4). Finally, zone duties will be moved accordingly (5).

4.6 The Session Starter As has been described within the overview section of this chapter, a separated part of the algorithm is dealing with sessions during their start-up phase. The special treatment is required since information used by the load balancing cycle is not available during the creation of new game sessions. As an immediate consequence, an alternative load metric has to be chosen. Since within this phase of a game session life cycle no clients are present, the CPU and network load will be low right after the completion of the start-up procedure. Therefore, only the memory consumption is representing a potential bottleneck. Hence, this type of resource is the only one considered within the used load metric. As usual, the needed information for quantifying the load caused by a new server instance has to be provided by the management services offered by the hosters. When starting a session, the set of required processes is mapped to the available resources using the generic heuristic of section 4.2.3. The bin model is thereby based on the customized load metric. As usual, the parameters influencing the load distribution pattern are part of the algorithm’s configuration. Additional options allow specifying the maximum rate to which nodes are filled as well as a scaling factor for the resources reserved per server instance. The latter can be used to acquire extra memory per process to compensate for the expected growth. Unlike within the start-up phase, no specific issues have to be considered when shutting down a game session. The only thing to do is to remove all the associated game server instances. During the next iteration of the local balancer, the reshaping component will reorganize the load distribution such that the desired pattern is re-established.

4.7 Configuration Options Within the description of the algorithm’s implementation provided by this chapter, several situations have been mention in which the behaviour of the algorithm

85

is depending on the value of some configuration parameters. Within this section, the set of offered options will be briefly summarized. Timing Parameters Parameter globalBalancingInterval localBalancingInterval

Value Type Time Span Time Span

Default Value 60 seconds 10 seconds

Table 4.1: List of Timing Parameters Table 4.1 enumerates the set of parameters controlling the timing of the algorithm. The first of those, the global balancing interval, is determining the period between two successive state evaluation cycles of the global balancer as described within section 4.5. The corresponding operations are aiming on minimizing the number of involved grid sites. Hence, keeping the interval between successive checks low reduces the time in which more hosters are involved in maintaining a game session than there would be necessary. However, smaller periods also produce higher overhead caused by balancing activities since data needs to be exchanged more frequently and the corresponding computations have to be performed more often. The second parameter is controlling the delay between successive iterations of the local balancing loop as described within section 4.4. Small intervals allow the system to react more quickly on improvable situations while larger periods cause less overhead. Unlike the global balancing interval, using a too large value for this parameter might not only lead to bad resource utilization. Since no events indicating overloaded resources are triggered by the underlying balancing layer, situations in which the assigned load is exceeding the available capacities can only be detected within those cycles. Further, to determine a value for this parameter, the prediction time offset offered by the underlying management services and the time required for performing balancing operations need to be considered. In addition, the time required by the monitoring services of the management layer to derive stable data on the resource consumption of processes has to be considered. Global Balancing Parameters Besides the timing, the implementation of the global balancing algorithm offers two additional options. The first allows determining the number of random hosters probed for available capacities before extending a game session to a new site. The later is specifying the fraction of the locally available resources

86

Parameter numHostersProbedOnExtension globalCapacityScaling

Value Type Integer Percentage

Default Value 3 90%

Table 4.2: List of Global Balancer Parameters offered for sessions maintained by other global balancer instances within the peer network. Therefore, this factor allows reducing the advertised capacities, thereby limiting the amount of new load being assigned to grid sites due to global zone duty movements. This way, resources maintained by hosters can be reserved to compensate for situations in which the load of locally assigned zone duties is growing. Local Resource Management Parameters Parameter selectionStrategy itemOrder binOrder resourceThreshold maxFillingRate loadCompressionEnabled

Value Type Enum Enum Enum Percentage Percentage Boolean

Default Value FirstFit Decreasing DecreasingLoad 90% 80% true

Table 4.3: List of Local Resource Management Parameters Table 4.3 enumerates the various options influencing the resource management within the local balancing component. The first three parameters are used as input parameters for the generic heuristic presented within section 4.2.3, which is applied whenever a load assignment activity has to be carried out while performing local balancing tasks. Hence, those parameters affect the load distribution pattern enforced by the system. Further, the resource threshold parameter is specifying an upper limit for the load assigned to a single host. In case this limit is exceeded, some of the assigned server instances will be offloaded to fall below this limit. The maximum filling rate on the other hand is limiting the capacity of a node to be considered when assigning new server instances. Hence, whenever a new server instances is assigned to some host, the resulting load will not exceed the specified limit. Clearly, to avoid oscillating behaviour, the corresponding configuration value has to be smaller than the resource threshold. Finally, the load compression flag allows to determine whether the optional reshaping step described within section 4.4.5 is applied during the local load

87

balancing cycle. In case this option is enabled, the resource threshold and the maximum filling rate are representing upper and lower limits for the aspired resource utilization. The current version of the algorithm’s implementation does not support sitespecific configurations nor does it support alternative patterns for the reshaping step. However, both extensions might be easily integrated if needed. Zone Replica Management Parameters Parameter upperSaturationThreshold lowerSaturationThreshold saturationTolerance concessionThreshold concessionLimit

Value Type Percentage Percentage Percentage Percentage Percentage

Default Value 90% 70% 1% 95% 50%

Table 4.4: List of Replica Management Parameters Beside the resource management, also the replication management offers a set of options. The first two of those define the boundaries for the tolerated average game-loop saturations as has been described within section 4.4.3. Further, the saturation tolerance parameter is specifying the margin within which replicas are considered to be equally loaded (see figure 4.8). The final two parameters are considered when searching a host to manage a new replica of a zone. During the corresponding procedure, the load caused by the new server instance is estimated based on the current amount of resources consumed by the existing replicas. For most resource types, the average requirements of the existing replications provide a good approximation for this purpose. For instance, the memory required by a new server instance is likely to be the similar to the memory required by any other server managing the corresponding zone since all of them have to maintain the same state information. Any host capable of running the new server instance therefore needs to offer the derived amount of memory. The same holds for the network bandwidth consumption of server instances. However, this rule of thumb does not hold for CPU requirements. The following example is illustrating the particular dilemma resulting from this observation. Consider a fast node running a server instance managing the only version of some zone. If the server’s game loop becomes overloaded, a new replica has to be added, taking over some of the assigned clients. When selecting a target host for the new server instance, the current load of the overloaded server is used. In case all other nodes within the corresponding site are equipped with slower processors, none of them can be selected since none of them offers

88

sufficient computational resources. However, even a server instance running on a slower CPU would help to reduce the load of the overloaded process. Hence, unlike memory and network bandwidth consumption, the CPU load restriction does not represent a strict yes/no decision. Therefore, some fuzziness had to be added. This dilemma is resolved by reducing the CPU requirements for new server instances in case the average saturation of the corresponding replicas is getting to high. The two concession parameters thereby determine this reduction. The first of them, the concession threshold is determining the saturation limit above which this reduction will become effective. The CPU requirements are reduced linearly between the given threshold and a saturation of 100%. The concession limit parameter thereby specifies the maximum reduction to be applied to the CPU requirements in case the current replicas are fully saturated or even overloaded. While the concession threshold parameters should be slightly above the gameloop saturation threshold, the concession limit needs to be adjusted to match the maximum performance difference of CPUs within the managed domain. Session Starter Parameters Parameter selectionStrategy itemOrder binOrder maxFillingRate loadItemScaling

Value Type Enum Enum Enum Percentage Percentage

Default Value FirstFit Decreasing DecreasingCapacity 90% 110%

Table 4.5: List of Session Starter Parameters The final set of parameters is influencing the session starter component. Just as the resource management configuration of the local balancer, the starter setup includes parameters for the generic bin-packing heuristic used to assign server instances to hosts. Those parameters therefore determine the initial load distribution pattern when creating a new game session. The two remaining parameters put additional constrains on the resulting server assignment. As within the resource management, the maximum filling rate is limiting the total amount of load to be assigned to a single host. In addition, the load item scaling-factor supports an artificial increase of the predicted resource requirements retrieved for the new server instances from the underlying management layer services. Both options are intended to provide means to compensate for the expected increase of load as soon as the first clients connect to the newly created game session.

89

Although some of the parameters for the session starter exhibit the same names as their counterparts considered by the local resource management, all of them can be specified independently from each other.

90

Chapter 5 The Simulation Environment An integral part of the software development process is about testing the implemented solution. Therefore, within this project, the functional requirements on the various components have been continuously tested using JUnit [24]. However, although this type of automated testing helps assuring that all the involved components meet their specifications additional means are needed to evaluate the capability of the solution to balance games. To validate the overall operability of the session management system and especially to investigate the behaviour of the implemented algorithm when being confronted with various scenarios, a testing environment based on a simulated infrastructure has been developed. It is emulating the services offered by the underlying management layer including the characteristics of active game sessions. Therefore, the effects of all the supported balancing operations had to be specified. In addition, load models describing the resource consumption of games have been devised. Details on both, operations and models, will be covered within this chapter. The first section of this chapter is describing the load models applied for simulating the resource consumption of game server instances. Models for CPU, memory and network load values are covered. An overview on the internal organization of the environment simulator is provided within the following section. Further, the effects of the various balancing operations on the simulated entities will be specified. Finally, the last section of this chapter is describing how the real-time depending concepts of the balancing system have been abstracted to be able to simulate scenarios lasting weeks within minutes.

5.1 The Load Models Within this section, load models for the resource consumption of game server instances will be described. All of them are based on [3, 5]. However, some modifications and extensions have been integrated to satisfy additional requirements.

91

All of the following models aim on describing the amount of resources consumed by a single game server instance. Thereby it is assumed that those servers realize the distribution of game load based on the real-time framework mentioned within section 2.1.6. Hence, the load of an entire game session is distributed among the involved servers using a combination of the zoning and replication concepts. Further, each server process is only responsible for managing a single replica of a zone. As a result, all the load models within this section focus on the quantification of the resources required for managing a single zone replica. An additional assumption has been made for developing the following load models. For simplicity, it is generally assumed that every client is represented by a single avatar within the game world. This premise holds for first-person shooter and role-playing games. However, real-time strategy games do not satisfy this restriction. Nevertheless, the given models might easily be extended to consider multiple avatars per client by introducing an additional parameter describing the average number of entities controlled by each player.

Model Parameters All of the models expose numerous parameters, which can be separated into two groups. The first set of model parameters is describing constants determined by the actual implementation of the game. Values for those parameters might be extracted from real game implementations to tune the load models to fit specific applications. However, those are never affected by any balancing operation. The second set of parameters on the other hand is describing the state of the zone managed by the server which’s load should be approximated by the models. Those parameters include the quantity of clients connected to the corresponding zone or the total number of replicas maintaining it. Obviously, those values are affected by the applied balancing operations. In addition, some of those are affected by the total number of participating players. Therefore, values for those session dependent parameters needed to be provided by the simulation environment itself in order to be capable of using the described models to estimate resource requirements. For the following models, let C be the number of clients maintained by the modelled server. Further, let N be the total number of players connected to the managed zone. Hence, N corresponds to the total sum of clients assigned to servers managing replicas of the same zone. In addition, let BE represent the total number of NPCs and items within the maintained fraction of the game world and let AE be the size of the subset of those managed as active entities

92

by the modelled server instance. Finally, let R represent the total number of replicas managing the maintained zone.

5.1.1 The Memory Load Model The first model to be covered within this section is describing the memory consumption of server instances. Therefore, the memory requirements of four different portions of a server process have been considered. For once, a constant amount of memory is allocated for static structures required to run the actual game engine. This includes the memory needed to store the program code, session independent data structures of the engine and the execution stacks of the various involved threads. Within the model, all those zone and session independent requirements are represented by the constant parameter mengine . To maintain an actual zone, memory for handling region specific information like the position of trees, walls and other static elements is needed. Therefore, a function mzone (z) is introduced, describing the memory requirements of zone z. Further, the state of dynamic entities within the zone has to be maintained. Thereby, the information associated to all entities within the managed region has to be stored, since the data of both, active and shadow entities is required within every cycle. Within the model, the memory required per client is determined by a constant parameter mc . Further, the average storage used per NPC or item is given by mb . The resulting load model describing the memory requirements of a server maintaining zone z is therefore given by equation 5.1. Mtotal (z) = N · mc + BE · mb + mzone (z) + mengine

(5.1)

To simplify the task of defining model parameters, it can be assumed that mzone (z) is approximately constant for the various zones z of a given game map. Therefore mzone (z) may be replaced by the constant mzone . Further, the constant parameters mengine and mzone can be combined to a single, game-map specific parameters mcore . This way, no zone specific model parameters are required for describing the memory consumption of game servers. The following equation is summarizes the resulting simplified model used within the simulator. Mtotal = N · mc + BE · mb + mcore

(5.2)

5.1.2 The Network Load Model To model the network requirements of a game server, in and outgoing traffic has to be distinguished due to their different characteristics. Hence, for both

93

directions different models need to be considered. The models covered within this section are similar to those described within [5].

Ingoing Network Traffic The amount of data retrieved within each cycle of the game loop corresponds to the sum of information collected from the managed clients and the total volume of shadow copy updates received from other nodes. The client data can be modelled by C · dcin

(5.3)

where dcin represents the average number of bytes received from each player per cycle. Updates for shadow entities on the other hand can be computed using ((N − C) + (BE − AE)) · deu

(5.4)

Thereby, ((N − C) + (BE − AE)) is computing the number of locally maintained shadow entities and deu represents the amount of data exchanged for updating one of those. Hence, the ingoing data volume retrieved per game loop cycle can be derived by the following equation. Din = C · dcin + ((N − C) + (BE − AE)) · deu

(5.5)

To get the bandwidth consumption in bit/s the volume must be multiplied with the number of cycles per second (f ) and the bits per byte. The resulting model for the ingoing bandwidth is represented by the following equation. Bin = (C · dcin + ((N − C) + (BE − AE)) · deu ) · f · 8

(5.6)

Outgoing Network Traffic The outgoing data stream is composed of the data send to the clients and the update information distributed for active entities to keep remote shadow copies up to date. The amount of data send to clients can be modelled using the following term C · dcout (5.7)

94

where dcout describes the average update package size send to involved clients. Further, the amount of data transferred to other servers managing the same zone to keep shadow entities up-to-date can be modelled using (R − 1) · (C + AE) · deu

(5.8)

Thereby, (R − 1) is computing the number of additional replicas managing the same zone. Further, (C + AE) computes the number of active entities locally maintained and as within the model for the ingoing network load, deu describes the average package length of a shadow entity update message. Hence, the model is not assuming support for multicasting protocols. Therefore, the amount of data send within every cycle of the game loop can be modelled by the following equation. Dout = C · dcout + (R − 1) · (C + AE) · deu

(5.9)

Again, this data volume has to be multiplied with the game loop frequency f and the number of bits per byte to retrieve the actual amount of consumed outgoing bandwidth. Bout = (C · dcout + (R − 1) · (C + AE) · deu ) · f · 8

(5.10)

5.1.3 The CPU Load Model The most complex model is describing the CPU usage of game server instances. The model therefore estimates the number of instructions required for computing a single cycle of the game loop. This value is than multiplied by the game loop frequency to retrieve the number of instructions issued by the server process per second. Finally, based on the number of instructions, the specification of the executing processor is considered to derive the CPU time consumption as well as the game loop saturation of the modelled game server.

Estimating Instructions To estimate the number of instructions per cycle, the various tasks to be performed within every cycle of the game loop are considered. For instance, within the first step, client commands are collected and applied on the corresponding avatar. This can be modelled using C · icmd

(5.11)

95

where icmd corresponds to the number of instructions required to retrieve, verify and apply player commands. Further, the instructions required for sending game state updates back to the clients can be estimated using C · icu

(5.12)

where icu corresponds to the number of instructions needed for assembling the update information and forwarding those to the network. Beside the avatars controlled by the connected clients, also the assigned NPCs and items need to be updated. The corresponding number of instructions can be approximated by AE · ibi (5.13) where ibi is representing the average number of instructions required to determine and execute activities of NPCs as well as the operations needed for enforcing game rules considering the remaining items. An additional task to be performed within every cycle of the game loop is to synchronize shadow entities. Therefore, update information needs to be collected and sent to other nodes. The resulting workload can be estimated by (C + AE) · ius

(5.14)

where ius is describing the number of instructions needed to extract the update information and to forward it to the network. Furthermore ((N − C) + (BE − AE)) · iur

(5.15)

is approximating the number of operations required for receiving and processing shadow copy updates. Thereby the first term is deriving the number of maintained shadow entities and iur represents the quantity of instructions executed to perform the corresponding operations for one of those. Finally, complex interactions need to be processed within every game loop cycle. The workload caused by a single supported interaction type is modelled based on two steps. The first is determining whether a certain interaction is actually happening. For instance, this part is testing whether two entities are colliding. Another example would be about determining whether an entity is currently attacked by another entity. The number of instructions for this testing step can be estimated using T (C + AE, N + BE) · icheck

(5.16)

where icheck is representing the number of instructions required for performing

96

a single test and T (n, m) is specifying the complexity of the testing algorithm based on the number of locally maintained entities n and their total number m. Thereby, T (n, m) is describing the necessary number of testing steps to be executed to support the corresponding interaction. For instance, for testing whether entities are close to a certain point in the map like a flag or a healing point, the complexity would be T (n, m) := n since only active entities need to be considered. On the other hand, determining whether some entity is attacking a locally maintained avatar might have a complexity of T (n, m) := n · m. Even more complex interaction patterns might result in T (n, m) := n · m2 . Besides detecting the presence of interactions, they also need to be carried out on the active game entities. Therefore, an additional number of instructions estimated by (C + AE) · iint · pint (5.17) needs to be executed per cycle. Thereby iint corresponds to the number of instructions required for realizing the interaction and pint ∈ [0..1] determines the probability of the presence of the described interaction. Multiple interaction types might be evaluated within every cycle of the game loop. However, only the one with the most dominating complexity is considered by this model. Based on the given estimations, the total number of instructions issued by a server per game loop cycle can be approximated by Icycle =

C · icmd + C · icu + AE · ibi + (C + AE) · ius + ((N − C) + (BE − AE)) · iur + T (C + AE, N + BE) · icheck + (C + AE) · iint · pint

To reduce the complexity of the model, two simplifications have been applied. For once, the parameters icmd and icu specifying the number of instructions required for handling server-client interactions are combined to the parameter icio := icmd + icu. Further, since the tasks of updating a shadow copy and assembling update information for active entities are accessing and processing the same data in a similar way, the parameters ius and iur are unified to a single parameter iu . The resulting model for the number of instructions per cycle is given by the following equation.

Icycle =

C · icio + AE · ibi + (N + BE) · iu + T (C + AE, N + BE) · icheck + (C + AE) · iint · pint

97

Based on this value, the total number of instructions issued per second by a server process can be estimated by Itotal = Icycle · f

(5.18)

where f is the tick rate or frequency of the game loop. Derived Load Metrics Based on the number of instructions per second, additional metrics are derived. For once, the CPU time consumption per second can be computed by ttotal =

Itotal S

(5.19)

where S is the single core speed of the CPU measured in instructions per second. It is thereby assumed that all the available cores are operating at the same speed. Further, the critical path time can be computed using Amdahl’s law [25]. tcritical = ttotal · (1 − p) +

ttotal · p n

(5.20)

Thereby, p is the fraction of the game loop which can be processed in parallel and n the number of cores available on the executing host. Finally, the game loop saturation of a server process is derived by the following equation. sat =

tcritical 1sec

(5.21)

Hence, the CPU load model presented within this section allows to describe situations in which multithreaded game server implementations are exploiting contemporary architectures providing more than a single core. Tuning Parameters Various parameters of the CPU load model are hard to derive by analysing the game server code. However, by performing experiments with a concrete implementation, empirical data can be collected. The resulting server characteristic allows adjusting the model parameters to fit the observed behaviour. Thereby, the model described within this section should offer sufficient flexibility to match any derived characteristics. Further, due to the clear separation of the roles of the various parameters, the task of adjusting them to fit any observed behaviour should not be too difficult.

98

5.1.4 The Host Load Model In some cases, it is necessary to provide load information for hosts instead of single servers. For instance, when dealing with background load, the overall load of nodes is required. To provide those values within the simulator, the load of the assigned server instances is simply aggregated. Therefore, the total memory consumed on a node equals the sum of the memory requirements of the assigned server processes. Since background load is anyway filtered by the balancing algorithm (see 4.4.2), the presence of other applications consuming resources on simulated nodes has been ignored to reduce the complexity of configuring the simulation environment. Nevertheless, for testing alternative algorithms, which are treating background load differently, the load model for hosts would have to be adapted. Although this model describing the resource usage on a host is quite intuitive for memory and network loads, it is also assuming that the task scheduler of the operating system is capable of satisfying the computational needs of the managed server processes efficiently. Hence, whenever the total CPU load is not exceeding the available resources, the scheduler is capable of assigning the managed processes to the maintained cores such that all their timing constrains are satisfied. This might be achieved by using a fine enough granularity. In case the processor scheduler is not capable of satisfying this premise, the announced computation capacity of the corresponding node may be reduced to limit the assigned load.

5.2 The Simulated Infrastructure The general idea of the simulation environment is to provide an implementation of the hoster service interface (see 3.2), which can be used by the balancing system to interact with a simulated infrastructure. The balancing system itself is thereby not aware of the fact that it is not dealing with real game server instances. Due to the clear separation of the management services from the rest of the system, the components of the balancing solution can be tested within the simulator using the same configuration as within a real-world grid environment. A corresponding simulation setup including four sites is illustrated within figure 5.1. Although the architecture of the balancing system has been designed to support the distribution of the various components among multiple nodes, the simulation environment is only offering its service to entities within the same JVM. This restriction simplified its implementation. Nevertheless, since all other com-

99

Administration Client

Database

Management Server

Algorithm

P2P

Algorithm

Algorithm



Controller

Algorithm Environment Controller

Controller

Mngt. Services

Mngt. Services

Mngt. Services

Mngt. Services

Simulation Environment

Figure 5.1: The Simulation Environment within the Overall Architecture

ponents of the system can be instantiated multiple times within the same process, arbitrary complex scenarios can be assembled and tested. Due to the chosen concept, real world system configurations can be tested within the simulator. However, when perform experiments focusing on the properties of the balancing algorithm some of the isolated components of the system can be exchanged to improve the speed of execution. For instance, since the entire set of controller is maintained within a single process, there is no need for using RMI calls to realize communications between those. Therefore, for performing experiments, the RMI based P2P implementation is replaced by an equivalent solution supporting only process-internal communication using standard method invocations. Another modification on the system configuration is eliminating the database dependency. Therefore, the component responsible for mapping data beans and queries to the underlying DB is replaces by an equivalent in-memory solution. The seamless exchange of the utilized storage system is one of the features offered by the information service implementation applied to maintain persistent information. As a result, the entire system including the simulation environment can be packed into a single executable Java archive without any external dependencies.

100

Further, all the information required for performing experiments is determined by the content of a single XML file. This file includes load model parameters as well as the system configuration to be used for the simulation. Therefore, experiments can be conducted in a batch mode on any Java enabled host without the need of providing database or network access.

5.2.1 Internal Organization Within this subsection, an overview on the internal organization of the simulated infrastructure is provided. It thereby describes how the load models covered within the previous section are integrated to emulate an environment running virtual game sessions. The major task thereby is to provide the necessary session depending model parameters including the number of assigned clients and bots. After providing an overview on the involved entities, the corresponding mechanisms managing those values will be covered.

Component Overview

Simulation Environment 1

Simulation Configuration 1

n

Simulated Hoster 1 n

n

Simulated Host

Simulated Session

1 n

Simulated Process

Session Profile

1 n

Figure 5.2: Internal Organization of the Simulation Environment Figure 5.2 illustrates the essential entities used to realize the virtual environment and some of their relations. A central role is recognized by the SimulationEnvironment, which is representing a facade for handling the remaining components. Only one instance of this type is created when performing simulations. It is responsible for managing the entire virtual scenery. It also provides the implementation of the hoster service interface offered to the balancing components. Further, it is periodically triggering operations refreshing the load values maintained for the various simulated resources. The new values are thereby retrieved using the models covered within the previous section. During the creation of the simulation environment, the information provided by the corresponding configuration is used to instantiate the simulated infrastructure.

101

Just as within a real grid, the virtual resources are organized within hierarchies. SimulatedHoster instances are forming the root element of the corresponding trees. Thereby, their main task is to maintain a list of simulated hosts and to provide access to them. SimulatedHost instances themselves maintain a set of simulated process. Further, information on the deployed games is associated to those objects. Finally, SimualtedProcesses are representing game servers. For each of those instances, the managed zone replica is stored. Further, every simulated process is annotated by a set of values describing the current number of assigned clients and bots. Beside those, information covering a predicted future entity assignment is maintained. While the earlier values are used to determine the current resource consumption of the emulated process, the latter set is utilized to derive load value predictions. Unlike the entities modelling the infrastructure, SimulatedSession instances are representing active game sessions. Therefore objects of this type maintain references to all processes managing zones for the represented session. Based on an associated SessionProfile providing information on how many clients are connected to the various zones of an ongoing game, this entity type is responsible to adjust the number of clients assigned to the involved processes. Simulating Client Assignments Essentially, SimulatedSession instances periodically compare the current number of players assigned to associated processes with the quantity of clients specified by the dedicated game session profile on a per-zone basis. In case those values do not match, the client assignment is altered to compensate. For instance, if according to the session profile more than the currently managed clients should be present within some zone, the number of missing clients is determined. The resulting amount of new players is then disseminated equally among the processes managing replicas of the corresponding zone. A similar procedure is applied when the number of clients needs to be reduced. Beside the current number of assigned clients, a value representing the future quantity of managed players is required to support load value predictions within simulations. Therefore, the simulated sessions are looking up the future number of clients within their assigned profiles. The predicted values are then derived by disseminating the difference between present and future client counts among the involved processes. Thereby, the previously updated current client count values are forming the foundation for the derived predictions. Finally, simulated session entities are also distributing the responsibility of maintaining bots and items equally among the involved processes. The total number of those entities is thereby derived from the simulation configuration

102

on a per zone basis and remains constant over time. Therefore, the entity distribution within a zone only needs to be updated whenever the number of servers managing the corresponding fraction of the game world is changing. Simulating Resource Consumption Based on the client and entity assignment, the simulated processes are capable of determining the amount of resources required by the emulated server instances. Thereby, the load models of the previous section are applied. Further, future resource requirements are derived by feeding the predicted client counts to the same models. Finally, simulated hosts can use the updated load values of their assigned processes to refresh their own resource requirements. Hence, to emulate the resource usage within the simulator, the following procedure is periodically triggered. It starts by instructing the simulated sessions to update current and predicted client counts within their associated processes. After the player assignments have been refreshed, the remaining load values are re-evaluated. Therefore, the trees formed by hoster, host and process instances are traversed in post-order. Thereby all the assigned load values are updated. This procedure is based on the assumption that dependencies between load values within those trees only exist in the top-down direction. Fortunately, this property is satisfied by all the simulated resources.

5.2.2 Game Session Profiles As has been pointed out within the previous subsection, game session profiles are used to specify the number of players participating to simulated game sessions. Each of those profiles is providing the number of participating players for every moment of the specified session on a per-zone basis. The source of this information is exchangeable. However, the two essential ways to provide the necessary information within experiments are to use files containing the corresponding data or lazy evaluated functions modelling the number of clients within zones. The latter possibility is thereby used to test specific properties of the algorithm while the first is applied to simulated real-world game sessions based on collected data. Beside the data sources, the simulator also allows to specify additional on the fly profile transformations. This feature should allow performing a series of experiments based on slightly varying profiles without the need of generating the necessary trace files. For instance, the number of clients provided by a trace file can be scaled by some constant factor to model sessions of different popularity. Further, profiles can be shifted along their time axis. This could be used to simulated sessions offered within different areas of the world, hence within

103

different time zones. Finally, arbitrary reshaping operations can be applied on the raw data. Figure 5.3 illustrates a simple example of this type of transformation. Thereby, a raw session profile retrieved from some source is transformed by multiplying it with an arbitrary shaped transformation. The resulting profile thereby allows for instance to model a growing popularity in the first view days after creating a new game session based on profiles missing the corresponding data.

Clients in Zone

Raw Session Profile 1600 1400 1200 1000 800 600 400 200 0

00:00

12:00

00:00

12:00

00:00

12:00

00:00

12:00

00:00

12:00

00:00

12:00

00:00

12:00

00:00

12:00

00:00

12:00

00:00

Transformation Scaling Factor

1,2 1 0,8 0,6

0,4 0,2 0 00:00

12:00

00:00

12:00

00:00

12:00

00:00

Clients in Zone

Transformed Profile 1600 1400 1200 1000 800 600 400 200 0

00:00

12:00

00:00

12:00

00:00

12:00

00:00

Figure 5.3: Example Game Session Reshaping Transformation

Those transformations are applied on the fly when using the profile within the simulator and are configurable together with other session properties within the simulation configuration file. Additional transformations might be added by implementing the corresponding interface.

104

5.2.3 Simulating Balancing Operations Finally, the effects of the supported balancing operations on the virtual infrastructure have to be specified. The execution of those operations is managed within the simulation environment fa¸cade. All of them are performed asynchronously within a separated thread since beside their effects also the time required to perform the necessary actions is simulated. Hence, for instance the task of adding a new game server is blocking for a certain amount of time to emulated the delay introduce by starting up a real process. The corresponding time span can be specified within the simulation configuration file. Within the remaining parts of this section, details describing the effect of the four main operations demanded by the hoster service interface will be covered. Adding Replicas This operation starts by checking whether all its preconditions are satisfied. If one of them is violated, for instance the game map to be maintained by the new server instance is not deployed on the targeted host, the operation is immediately aborted. Otherwise a new simulated process is initialized and assigned to the corresponding virtual node. Simultaneously, the new process is registered to the session it is part of. Finally, the operation is suspending its execution for a given amount of time to simulate the delay introduced by starting up a new server instance. As soon as this sleep operation is finished, the process of adding a new replica is completed. The moment a new process is announced to the corresponding simulated session, the bot and item responsibilities are reassigned. Hence, the new server will immediately be responsible for some entities. However, none of the clients are reassigned. An additional re-balancing operation has to be submitted to accomplish this task. This allows the balancing system to decide from where the load should be acquired. Nevertheless, the new process is participating on managing the zone. Therefore, whenever new clients arrive, the new process gets a subset of those assigned. Removing Replicas This balancing operation starts by locating the simulated process managing the replica to be removed. If found, the sleep operation emulating the time required for shutting down the serve is executed before the process is actually removed from the environment. Hence, processes consume resources until they are discarded. Further, the associated simulated session is informed about the elimination of one of its processes. The session is thereby compensating for

105

the removed server instance by distributing its assigned clients, bots and items equally among the remaining replications of the same zone. Redistributing Load The load redistribution operation is realized by reassigning clients according to the ratio specified through one of the provided parameters. Therefore, all the involved simulated processes are located and their current client count values are updated to match the given load distribution. Like the procedure realizing the elimination of a replica, this operation is emulating the time required for executing the necessary modifications before they finally become effective. Retrieving the Current State This operation allows retrieving a snapshot of the current state of the system. Within the simulated environment, it is simply collecting the required data from the maintained entities. Therefore, the tree representing the targeted hoster is traversed. Further, a small delay is emulated by suspending the executing thread for a predetermined period before finally returning the collected information. This delay is simulating the latency introduced by transferring data between the management services and balancing system. As usual, its duration is determined by the simulation configuration.

5.3 The Simulation Engine The provided simulation environment allows testing the algorithm’s behaviour within various situations. However, one final problem needed to be solved to perform experiments efficiently. The entire balancing solution is designed to provide a continuous service capable of managing game sessions. To do so, all essential operations are executed periodically. For instance, every cycle of the local balancing loop is triggered by a timer (see section 4.4.1). The same is valid for some operations performed by the global balancer component. Further, as has been described within the previous sections, several mechanisms within the simulator are building upon time-aware concepts. It follows that the system, as it has been defined so far, is processing simulated scenarios in real-time. This limitation would render the simulation environment unsuitable for perform experiments. However, since this problem has been anticipated from the beginning of the project, the entire solution is using a small set of abstract utilities to perform time depending operations. Hence, a layer of abstraction is placed between the

106

system and the underlying notion of time. The basic implementation for the so called timing environment is based on standard Java components using the notion of real-time. It is intended to be used when running the system within a productive environment. However, to reduce the execution time of simulations, an alternative implementation using a virtual notion of time has been developed. This variant is based on the concept of discrete event simulations.

5.3.1 Abstracting Time To isolating applications from real-time, time depending operations needed to be identified. Within the balancing system, three types of such operations have been found. A very frequently use time-related operation is simply about requesting the current system time. This operation is for instance used throughout the system to produce timestamps for logging or monitoring data. Another kind of time-depending operation is used to realize delayed or periodic executions of tasks. The provided features are for instance used to trigger periodic balancing cycles within the algorithm or regular state updates on the simulated environment. Finally, the last type of time-related operation is used to suspend threads for a predetermined period. Within the system, such sleepcommands are only used for simulating the latency of balancing operations. Nevertheless, beside the remaining identified operations, those as well have to be supported by the interface defining the abstraction between the system and the underlying notion of time. Besides defining required functionality, the timing environment interface is also specifying some basic properties of time, which must not be violated by implementations. For instance, one of those is determining that time must never appear to go backward. Another, more subtle requirement is that in any case time must eventually proceed. Applications may build on this property. For instance, a thread performing a sleep operation is essentially deadlocked until the progress of time is releasing it. Real-time is naturally satisfying this property. However, implementations based on some virtual notion have to ensure the progress of time, even if the entire managed system is blocked.

5.3.2 A Extended Discrete Event Simulator To reduce the time required for performing experiments using the simulator described within this section, a simulation-time based implementation of the abstract utility set specified by the timing environment interface has been developed. It is based on the concept of discrete event simulations, illustrated within figure 5.4.

107

Event Loop Thread

Thread Pool

Event

Figure 5.4: Basic Concept of Discrete Event Simulations Event Loop Thread

Thread Pool

To simulate the tasks executed when running a program, the entire control start flow is interpreted as a chronological sequence of events. Each event is annoEvent 1a sleep tated with a timestamp determining Monitor its execution time and the actual task it is Event 2 release representing. To run simulations, the simulation engine is maintaining a priority Event 1b queue including all events still to be executed anddone a virtual clock representing the current simulation time. The events within the queue are ordered according to their execution time. Event Loop Thread Pool The simulation is performed within a loop. During every cycle, the Thread event with the lowest execution time is retrieved from the queue. After the simulation-time start is updated to match the timestamp of the obtained event, the corresponding task Event is executed. This may cause new events to be added to the event queue. As soon Monitor Subtask as all the necessary operations have been completed, the sleep simulationSubtask continues with the next event. Within the scenarios simulated by the engine described in this chapter, the majority of events are generated by periodically executed tasks. Hence, performEvent Loop Threadstate is corresponding Thread ing a balancing cycle or updating the simulation to onePool of those events. However, additional actions are scheduled based on the scenario start Event 1a sleep described within the experiment configuration. For instance, events creating or start Monitor closing sessions are integrated this way. Finally, jobs realizing a wakeup Eventcall 2 for sleeping threads can also be found within the event queue. However, release done addiEvent 1b concept to tional modifications had to be applied on this basicdone event simulation support sleep-operations within events. Supporting Sleep Operations The basic idea behind the realization of sleep operations, hence the temporary suspension of the execution of a thread, is to block it until it is released by a wakeup call. This call is triggered by an event scheduled within the event queue to be executed by the time the thread is intended to continue its work. However, combining this approach with the simple event loop concept would lead to a deadlock as soon as the first sleep operation is executed within an

108

Event Loop Thread

Thread Pool start sleep

Monitor

Event 1a

Event 2

release

event. The problem is caused by the fact that the event loop thread itself would Event 1b done be blocked until a wakeup signal is received. However, the only source for such a signal is the affected thread itself. To solve this problem, the execution of the actual event is performed by a Event Loop Thread Thread Pool separate thread. The event loop thread is monitoring the execution and in case a sleeping operation is triggered, only the event-executing thread is suspended. In start addition, the event loop thread is informed about Event the paused event execution. It Monitor therefore continues processing events within the event queue using an alternative Subtask sleeprepresenting Subtaskthe wakeup call for the external thread. As soon as the event suspended thread is executed, the corresponding thread will continue its work. Figure 5.5 illustrates his concept. Event Loop Thread

Thread Pool start

Event 1a sleep

start

Monitor

Event 2 release done

done Event 1b

Figure 5.5: Extended Event Simulation Concept The threads required for realizing this solution are maintained within a pool. Assuming this pool can grow, an arbitrary number of suspended event executing threads can be handled at the same time. However, due to the principle of the discrete event simulation, at any time only one of the involved threads is actually running. The remaining threads are either idle or suspended. Thereby, suspended threads preserve their context information, hence the execution state of an event to be continued in the future. Since this information cannot be backed-up and restored within standard Java, no equivalent solution could be realized within a single thread. Besides handling sleep events, this mechanism also allows to support events depending on each other through synchronization mechanisms. Typically, such situations would not exist within a standard discrete event simulator. However, since the events handled by this implementation are not specifically designed to be executed within such an environment, dependencies may arise. For instance, some event may produce data consumed by another task. In case the latter is executed prior the producer event, the system would block since the data will never be provided. However, when executing events within external threads, this blocked situation is detected by the event loop thread and the simulation

109

continues its work. corresponds exactly toThread the behaviour within a realEvent LoopThis Thread Pool time based environment. In case a thread is waiting for some data, real-time will progress as well. Event

Multithreaded Events An additional modification was necessary to cope with a specific property of the handled events. For performance reasons, several of the involved tasks are composed of subtasks, which are executed in parallel within different threads. All those sub jobs may depend on each other or may use the offered sleep Event Loop Thread Thread Pool operation support. Hence, the capability of the event loop thread to monitor a single task running within an start external thread had to be extended to consider Eventloop 1a an entire job hierarchy. Thereby, sleep the event is only allowed to continue Monitor with the next event if the current task has been completed or all its active Event 2 release synchronously invoked subtasks are blocked by any reason. Figure 5.6 illustrates Event 1b done a simple example for such a situation in which subtasks have to be considered to determine the state of an event execution. Event Loop Thread

Thread Pool start

Event

Monitor sleep

Subtask

Subtask

Figure 5.6: Supporting Complex Event Tasks Event Loop Thread

Thread Pool

To realize this extended event startmonitoring, additional information on the deEvent 1a sleep and their sub jobs has to be collected. pendencies between the executed tasks start Monitor data is transparently obtained Therefore, the necessary runtime whenEventduring 2 ever an event is starting another release subtask. The results done can be used to construct done Eventexecution 1b the job hierarchies needed for monitoring the of events. All the event-monitoring activities are performed transparently for the actual tasks. Hence, the code build upon the provided implementation of the timing environment utilities is not required to offer any specific functionality. Therefore, the simulation-time based timing environment can be integrated without any difficulties. By using it instead of the basic real-time variant, simulations covering weeks can be performed within minutes.

110

Chapter 6 Experiments Within this chapter, the abilities of the session management solution developed for this thesis will be demonstrated. Therefore, the simulation environment described within the previous chapter is used to confront the overall management framework including the algorithm of chapter 4 with various scenarios. The three experiments presented within this chapter are ordered according to their scope. The first of those is focusing on the capability of the algorithm to balance the load assigned to server instances managing replicas of the same zone. Therefore, the maintenance of a single zone within a heterogeneous infrastructure is simulated and the resulting game-loop saturation values are observed. The narrow view of a single zone is widened by the following experiment to the scope of a medium sized session maintained within a heterogeneous environment consisting of multiple grid sites. Thereby, beside other properties, the algorithms behaviour when being confronted with rapid load changes is investigated. The final experiment is testing the algorithms abilities within a real-world scenario. The aim of this experiment is to provide information about the expected behaviours of the system when managing actual game sessions under realistic conditions. Thereby, the results will demonstrate the potential savings that can be achieved by applying dynamic session management strategies for providing massive multiplayer online games.

6.1 The Weighting Mechanism This experiment is investigating the ability of the replication weighting mechanism covered within section 4.4.3 to balance the game loop saturation within multiple replicas of the same zone. Therefore, the algorithm is confronted with a heterogeneous environment within which every node is equipped with processors of varying speed. Hence, when running server instances on different hosts, an uneven load distribution is required to achieve balanced saturation values. To show the algorithms reactions, the weight adjustments and the resulting game-loop saturations are illustrated.

111

Besides the balancing of the game-loop saturations, this experiment also demonstrates the effects of the concept applied for adjusting the number of involved replicas as it has been described within section 4.4.3.

6.1.1 The Experiment Setup The Game Session Since the mechanism to be investigated by this experiment is operating on a perduty basis, the conducted simulations only needed to involve a session consisting of a single zone. For this zone, a simple artificial game session profile illustrated within figure 6.1 has been generated.

Clients in Zone

1200 1000 800 600 400

200 0

0

60

120

180

240

300

360

420

480

540

600

Session Time [sec]

Figure 6.1: Game Session Profile used for First Experiment The profile is quickly increasing the number of clients within the simulated zone until a peak of one thousand clients is reached. After this, the clients are leaving as fast as they have arrived. The aim of this profile is not to model any real world situation. Its objective is to force the algorithm to create additional replicas for the simulated zone and to balance the load associated to the corresponding processes. Further, the entire increase and decrease is carried out unnaturally fast, within 10 minutes. This short period allows investigating the load assignment and weighting changes in more detail since those usually occur within less than a minute. Since the decisions of the weighting mechanism are entirely based on snapshots of the system, the actual shape of the game session profile has no effect on the results as long as there is some load that can be distributed. The Environment Since the experiment is aiming on observing the load distribution between game server instances running on heterogeneous nodes, the environment is consisting

112

of a single site offering three nodes equipped with different processors. The corresponding specifications are listed within table 6.1. Host Host A Host B Host C

Core Speed [MIPS] 8000 7000 5000

Number of Cores 1 2 4

Table 6.1: Processor Specifications used for first Experiment To challenge the weighting mechanism, the provided nodes all offer a different number of cores. Further, every processor exhibits a different core speed. Hence, to achieve the same game loop saturation within processes running on different nodes, load has to be distributed unevenly. Load Model and Game Parameters Since this experiment is focusing on the game loop saturation of server processes, the CPU load model used to derive the corresponding values is of great importance. The corresponding model parameters have been adjusted such that three almost fully loaded server instances are required to manage the maximum number of 1000 clients. Further, the parameters of the memory load model have been configured such that at most one server instance can be created per host. This way it could be ensured that the algorithm is not assigning multiple server instances to the same node, thereby circumventing the necessity of balancing the load between instances running on processors of different speed. Finally, the parameters of the network load models have all been set to zero since those are not important for this experiment. In addition, further game specific properties had to be determined. The time required for starting up a new server instance has been set to 12 seconds. Further, removing instances requires 8 seconds and moving load between existing instances takes 4 seconds. Algorithm Configuration To perform simulations, the algorithm configuration needed to be specified for this experiment. Most of the parameters supported by the algorithm’s implementation thereby have no effect on the mechanism investigated by this experiment. Nevertheless, the upper and lower saturation thresholds as well as the saturation tolerance are important. The corresponding configuration values have been set to 90%, 80% and ±1%. Further, the local balancing interval has been set to the default value of 10 seconds. This option is important since it

113

is determining the frequency at which the load state is evaluated and weights are updated. Hence, beside the time needed to perform balancing operations, this property is determining the time required to reach a balanced situation, sometimes referred to as equilibrium. Varied Parameters To challenge the weighting mechanisms with different scenarios, the parallel fraction of the game loop is varied within this experiment (see CPU load model described in section 5.1.3). Since all of the involved hosts offer a different number of cores, the load distribution ratio corresponding to equal saturation levels within all involved processes is heavily influenced by this factor. However, the value of this model parameter is not considered by the weighting mechanism. Therefore, it represents one of the hidden, unknown variables the abstract concept of modifying weights is trying to compensate for. For the experiment, results for 0%, 25%, 50% and 100% parallel executable game engines are evaluated. Observed Metrics To demonstrate the capability of the algorithm to balance game loop saturations, the corresponding values of the involved server instances are observed. If the algorithm is able of performing its task, those values should eventually converge. Further, to illustrate the internal weight modifications, the sequences of weights derived by the investigated mechanism are presented. The resulting graphs will demonstrate the mechanisms ability of correcting weights by learning from the effects of previous modifications.

6.1.2 Experiment Results Figure 6.2 illustrates the measured game loop saturations within the processes involved in managing the investigated zone, based on a strictly sequential execution. Every 10 seconds a sample has been taken. In the beginning of the simulation, a single process on host A is handling the entire load. As soon as its saturation exceeds 90%, an additional replica on host B is started (t = 190sec). It takes some time for the new replica to become available. During this time, the saturation values are low. However, as soon as the second process is ready, both of them exhibit the same game loop saturation (t = 210sec). This is caused by the fact that the initial weightings for new replicas are evaluated based on the core speed of their associated host and the weight/core-speed ratio of the already present processes. Thereby it is assumed that game servers are executing their

114

tasks strictly sequential, which holds for almost all contemporary game engines. As a result, when managing game server implementations exposing a parallel fraction of 0%, the initial weights derived for new server instances correspond to the value needed for establishing balanced game-loop saturations.

Game-Loop Saturation

Host A

Host B

Host C

125% 100% 75% 50% 25% 0%

0

60

120

180

240

300

360

420

480

540

600

Session Time [sec]

Figure 6.2: Observed Game-Loop Saturations for sequential Game Loops As the experiment continues, the average saturation of the two replicas trespasses the upper saturation threshold a second time and a third replica is added on host C (t = 240sec). Again, the game loop saturation fits right after the new process is becoming fully available. After the peak of 1000 clients (t = 300sec) has been crossed, the load is reduced continuously. As soon as it is save to remove one of the involved replicas without thereby exceeding the lower saturation limit within the remaining server instances, the corresponding operation is carried out. Since the elimination of a replica requires some time, the resulting saturation remains even below this limit. Further, while reducing the number of replicas the game loop saturations remain balanced since the relative load ratio between the remaining server instances is preserved. Host A

Host B

Host C

Replica Weighting

125 100 75 50 25 0 180

240

300

360

420

480

Session Time [sec]

Figure 6.3: Weight Adjustments for sequential Game Loops

115

Figure 6.3 illustrates the sequence of weights derived during this first simulation. As has been described, the initial weights already match the weights corresponding to the equilibrium. Hence, no further modifications had to be applied.


Host A

Host B

Host C

125% 100% 75% 50%

25% 0% 0

60

120

180

240

300

360

420

480

540

600

Session Time [sec]


Figure 6.4: Observed Game-Loop 25% parallel Game Loops Host A Saturations Host B Host for C 125% 100% 75%

50% 25% 0% 0

60

120

180

240

300

360

420

480

540

600

Session Time [sec]


Figure 6.5: Observed Game-Loop 50% parallel Game Loops Host A Saturations Host B Host for C 125% 100% 75%

50% 25% 0%

0

60

120

180

240

300

360

420

480

540

600

Session Time [sec]

Figure 6.6: Observed Game-Loop Saturations for 100% parallel Game Loops Figures 6.4 to 6.6 illustrate the saturation developments using alternative values for the parallel fraction parameter. The higher this value is the more out of balance are the initial weights derived for new replicas. Hence, more time and modification steps are required to reach the equilibrium. Nevertheless, this goal is eventually always achieved.

116

Within the 100% parallel case, the resources of the first two nodes are sufficient to manage the entire load. Therefore, no additional third process needed to be started. Host A

Host B

Host C

Replica Weighting

125 100 75 50 25 0

180

240

300

360

420

480

Session Time [sec]

Figure 6.7: Weight Adjustments Host A Host Bfor 25% Host C parallel Game Loops Replica Weighting

125 100 75 50 25 0

180

240

300

360

420

480

Session Time [sec]

Figure 6.8: Weight Adjustments for 50% parallel Game Loops Host A Host B Replica Weighting

125 100 75 50 25 0

180

240

300

360

420

480

Session Time [sec]

Figure 6.9: Weight Adjustments for 100% parallel Game Loops The corresponding weight modifications performed to reach the balanced state at the end of every simulation are shown within figures 6.7 to 6.9. Especially in the last case, the linear growing weight modifications can be observed. Since the initial weight of the second process is far off the value required for establishing the equilibrium, both values are continuously updated by the control loop formed

117

by the saturation measurement, the weighting mechanism and the balancing operations until the resulting game loop saturations are matching. Thereby the growing speed in which weights are adjusted can be observed. It even illustrates a situation in which the weights are altered by a too big value within the final step such that an additional correction in the opposite direction is required.

6.1.3 Conclusion The conducted experiments have shown that the weighting mechanism is capable of balancing the game loop saturation of replicas managing the same zone within a heterogeneous environment. The mechanism is especially effective in case the game loop is executed sequentially. However, the experiments also demonstrated the ability of the concept to deal with hidden variables. Hence, the algorithm is capable of compensating for incorrect model assumptions by learning from the effects of previous weight modifications. Finally, in case multithreaded game engines become common, the parallel fraction may also be considered within an improved variant of the implementation whenever deriving initial weights for new replicas. Thereby, the time required to reach a balanced state for parallel game server instances would be significantly reduced. However, the abstract high-level weighting mechanism will still be required to deal with additional hidden variables.

6.2 Rapid Load Changes The second experiment to be covered within this chapter demonstrates the system’s ability of managing entire game sessions under rough conditions. Thereby, various aspects of the balancing problem are investigated. It also shows the effects of interactions on the global level as well as the results achieved by the resource management on the site level of the balancing hierarchy. Further, the experiment has been designed to evaluate the presented solution’s ability to deal with changing resource limitations. Therefore, the game profile and load models are tuned such that during some periods of the simulated session the memory requirements represent the limiting factor while during others the CPU load is forming a bottleneck. In addition, the system’s capability of minimizing the number of involved nodes and hosters is demonstrated. Further, situations in which resources or replicas become overloaded are examined in detail. Finally, the influence of the prediction service offered by the underlying management layer is investigated within this section.

118

6.2.1 The Experiment Setup The Game Session For this experiment, an artificial game session profile has been generated. The profile covers 20 zones. Thereby, the number of clients within each of the simulated zones corresponds to the profile illustrated within figure 6.10.

Number of Clients

Number of Clients in Zones 1200 1000 800 600 400 200 0 0

10

20

30

40

50

60

Session Time [min]

Figure 6.10: Game Session Profile used for the Second Experiment The profile describes a game session within which the number of assigned clients is raised from zero to full load within 15 minutes. After this, the same amount of time is spent on eliminate all of the participating clients again. This cycle is repeated twice. Just as within the previous experiment, this artificial pattern does not intend to model any real world situation. Its purpose is to stress the algorithm such that various interesting aspects can be investigated. Beside the rapid increase and decrease of the game load, the fact that all 20 zones expose the same profile is introducing additional challenges. Due to this characteristic, all zone duties need to be extended by additional replicas at approximately the same time. Only the fact that the environment is somewhat heterogeneous is introducing a minor dispersion. The same effect appears when reducing the number of replicas involved in managing a zone of the simulated game session. The Environment For the experiment, a moderate sized, heterogeneous environment has been defined. The foundation is provided by two different types of nodes. Their characteristics are specified within table 6.2. Based on those nodes, three grid sites are assembled, each representing the domain of another hoster. The first of those is consisting of eight nodes of type A as well as seven of type B. The second site contains 12 nodes, 6 of each type and the last hoster is contributing 10 nodes of type A.

119

Resource Core Speed [MIPS] Number of Cores Memory [GiB] In-Going Network [Gbps] Out-Going Network [Gbps]

Host Type A 8000 2 2 1 1

Host Type B 6000 4 4 1 1

Table 6.2: Host Type Specifications used for the Second Experiment

Load Model and Game Parameters The parameters for the involved load models have been adjusted such that on the one hand the resulting load values emulate somewhat realistic conditions while on the other hand they allow observing the aspects this experiment is focusing on. The derived parameter values are enumerated within table 6.3. Parameter Name Game-Loop Frequency Parallel Fraction Instructions per Client IO Instructions per NPC/Item Instructions per Shadow Update Instructions per Interaction Check Instructions per Interaction Interaction Probability Interaction Check Complexity Memory for Core System Memory per Client Memory per NPC/Item Data per Client Command Data per Client Update Data per Entity Update

Symbol f p icio ibi iu icheck iint pint T (n, m) mcore mc mb dcin dcout deu

Value 25Hz 0% 515000 8000 10000 70 2000 60% n×m 400 MiB 25 KiB 2 MiB 82 byte 160 byte 100 byte

Table 6.3: Load Model Parameters used for the Second Experiment The defined memory load model is using a moderate amount of memory for the actual game engine and for the static information to be maintained for managing a zone. Compared to this, the data volume consumed per client is rather low. However, the selected value should be sufficient to store information like an avatar’s position, its health state and its set of collected items. Finally, the memory required to maintain an NPC, which is controlled by artificial intelligence routines has been chosen rather high. Nevertheless, the actual values are

120

not that important for this experiment. The resulting memory consumption characteristic in relation to other resource requirements however is. The parameters for the network load values have been derived from a study conducted on the bandwidth requirements of Quake 3 (see [26, 27]). Finally, the CPU load model parameters have been adjusted such that three replicas running on processors matching the specification of host type B are capable of managing the maximum number of 1000 clients at a saturation of 85%. Beside the load model parameters, several game properties have been fixed for this experiment. For once, the game loop is performing its tasks sequentially at a tick rate of 25Hz. Further, within the simulations conducted for this experiment, it requires 12 seconds to start up a new game server as well as 4 seconds to eliminate an existing instance. Finally, reassigning clients to match a specified load distribution ratio requires one second. Algorithm Configuration Table 6.4 summarizes the algorithm configuration used for conducting the experiments described within this section. Properties for which the default value has been used are omitted to reduce the number of rows within the table. Parameter Name globalCapacityScaling resourceThreshold maxFillingRate upperSaturationThreshold lowerSaturationThreshold

Value 80% 95% 90% 80% 79%

Table 6.4: Customized Algorithm Configuration for the Second Experiment For the experiments, the lower and upper resource usage boundaries have been increased to 90% respectively 95%. Further, the upper game-loop saturation threshold has been set to 80% to compensate for the sharp increase of the number of clients within the game profile. Since increasing and decreasing load phases are strictly separated, the lower saturation threshold has been set to 79%. This way, the number of processes involved in managing a zone is reduced early and oscillating behaviour can be avoided. Finally, the global capacity scaling factor has been set to 80%. Varied Parameters To investigate the influence of resource usage predictions, the simulation is conducted twice based on almost equivalent configurations. However, within one of

121

the setups, the prediction feature is enabled using an offset of 2 minutes while within the other no resource usage predictions at all are offered to the algorithm. Therefore, by comparing the results of the two simulations, the effect of predictions on the capabilities of the balancing system can be observed.

Observed Metrics Within the simulations performed for this section, the amounts of resources consumed on all the involved nodes are recorded every 10 simulated seconds. In addition, the number of assigned clients and game entities as well as the current game loop saturation is stored for every running game server instance at the same rate. From the collected raw data, the number of involved nodes or hosters managing the simulated game session can be extracted. Further, the utilization of the allocated resources can be obtained. Thereby the utilization utilx (t) of any resource x ∈ {CP U, M emory, N etIn, N etOut, . . .} at some time t is defined by usex (t) utilx (t) = (6.1) totalx Thereby usedx (t) represents the amount of resources of type x occupied at time t and totalx corresponds to the total available capacity. The overall utilization util(t) of a host is then derived by the following equation. util(t) = max(utilM emory (t), utilCP U (t), utilN etIn (t), utilN etOut (t))

(6.2)

In the ideal case, all resources should be constantly utilized at 100%. However, in a practical environment, where resources are not available at arbitrary granularity and load is influenced by unpredictable external factory like the behaviour of players, this optimum result cannot be achieved. Since a utilization of more than 100% corresponds to the fact that more resources are required than there are actually available, the full saturation must not be exceeded. Otherwise, the game experience for the players would start degrading. Hence, the objective of the balancing algorithm is to archive resource utilizations as close to 100% as possible, thereby avoiding situation in which this limit is actually exceeded. Situations in which insufficient resources are allocated for parts of the maintained game session can as well be derived from the collected simulation data and are investigated by this experiment. Beside the possibility of exceeding resources limits, also game server instance might become overloaded. The experiment has been designed to provoke such overload events to demonstrate the conditions under which they occur. Further, the experiment provides information on the way the algorithm is resolving those events.

122

Formally, an overloaded resource event occurs whenever util(t) is larger than 100%. Overloaded replica events on the other hand are present whenever the game-loop saturation of a server instance is exceeding the limit of 100% due to too many actively managed entities or clients.

6.2.2 Experiment Results Node and Hoster Counts

without prediction Involved Nodes

1000

Figure 6.11 illustrates the number of nodes involved in maintaining the simulated game session with and without resource prediction. As can be observed, in both cases the algorithm is adjusting the number of involved nodes according to the present load. The various small spikes that can be observed especially during phases of growing load are caused by the make-before break concept applied whenever moving a server instance from one node to another. Hence, whenever migrating a game server between hosts, the new one is brought online before the shutdown of the original instance is initiated. Therefore, for a short period, more server instances are present than required, thereby occupying a higher number of hosts. with prediction

25 20 15 10 5 0 0

10

20

30

40

50

60

Session Time [min]

Figure 6.11: Nodes Involved in Maintaining the Game Session By comparing the two configurations, it can be observed that the variant using the prediction is starting earlier to increase the number of involved nodes. Thereby it is preparing for the predicted higher resource requirements, information that is not available within the alternative setup. The node reduction however is handled equally. This is caused by the fact that a prediction of a future lower load does not allow the algorithm to shut down server instances earlier since the current load still needs to be maintained. Hence, predictions only have an effect on the algorithm’s behaviour during phases of growing load. The same observation can be seen within figure 6.12, which is illustrating the number of hosters involved in maintaining the entire game session. Since

123

the prediction-based configuration is sooner requiring more resources than its counterpart does, the session is earlier extended to another grid site. Still, the reductions are performed at the same time.

Involved Hosters

without prediction

with prediction

2 1 0

0

10

20

30

40

50

60

Session Time [min]

Figure 6.12: Hosters Involved in Maintaining the Game Session

Utilization Figures 6.13 and 6.14 illustrate the utilization of the various resources considered by the session management system over time. Thereby, for every time step the corresponding values have been evaluated for all involved nodes and the resulting average is shown within the graph. Both visualizations demonstrate the ability of the algorithm to deal with changing resource usage characteristics. While during the first and last view minutes of the simulation as well as during a few minutes around half time, the memory consumption of the involved server instances represents the limiting factor, within the remaining periods the CPU load is dominating. The figures thereby demonstrate the algorithms ability of considering the most severe resource limitation at any time.

Utilizaton

Memory

CPU

Network In

20

30

Network Out

100% 80% 60% 40% 20% 0%

0

10

40

50

Session Time [min]

Figure 6.13: Resource Utilization without Prediction

124

60

Utilization

Memory

CPU

Network In

20

30

Network Out

100% 80% 60% 40% 20% 0% 0

10

40

50

60

Session Time [min]

Figure 6.14: Resource Utilization with Prediction

Figure 6.15 compares the average overall node utilizations of the evaluated configurations derived according to equation 6.2. Thereby, it can be observed that during periods of increasing load the algorithm achieves lower resource utilizations in case resource predictions are considered since extra capacities are reserved for the upcoming load.

Utilization

without Prediction

with Prediction

100% 80% 60% 40% 20% 0%

0

10

20

30

40

50

60

Session Time [min]

Figure 6.15: Overall Node Utilization Comparison Both configurations are able to maintain the total resource utilization around approximately 80% most of the time. This level is influenced by two aspects. On the one hand, the algorithm configuration used for the experiments is affecting the maximum achievable utilization. Since the upper resource threshold is set to 95%, no higher utilization could be reached within a stable state. On the other hand, the coarse-grained nature of the load items is as well introducing a limiting factor for the acquirable resource utilization. Especially the CPU load caused by server instances exhibits this property. For instance, a sequential game server being saturated at 80% is using 80% of the CPU time of a single core machine. Therefore, the remaining 20% of free resources might be used for other game servers. However, such small load items might not be available to fill

125

in the gap. Hence, the high ratio between load item volumes and bin capacities is leading to situations in which significant amounts of free resources cannot be exploited, thereby reducing the maximum sustainable resource utilization. Overload Events Figure 6.16 illustrates the number of observed overload events during both simulations. Thereby, overloaded resources and replicas are presented independently. Further, table 6.5 provides additional statistical information on the observed events.

Number of Events

without Prediction

with Prediction

60 50 40 30 20 10 0 Replicas

Resources

Figure 6.16: Number of Overload Events observed with and without Predictions As can be derived from table 6.5, every overload event can be resolved within at most 30-40 seconds. Although the period between succeeding local balancing cycles is set to 10 seconds, operations like starting up new server instances require significantly more time. Therefore, since every cycle is waiting for its balancing operations to be completed, iterations might need more than the configured interval to finish their tasks. Within this experiment, whenever a server has to be migrated from one host to another, at least 17 seconds have to be spent on starting the new instance, moving the load and shutting down the old one. As a result, overload events might be detected with the corresponding delay. Further, the actions applied for resolving the problem requires an additional amount of time. Based on this observations it can be concluded that whenever some resource or replica becomes overloaded, it is discovered within the next local balancing cycle and the derived compensation action is successfully resolving the situation. As illustrated within figure 6.16, predictions on the future resource requirements allow the algorithm avoiding overload events almost entirely. Only a single event describing an overloaded node has been observed within the prediction incorporating simulation. Hence, the algorithm is making efficient use of the additional provided information. However, as has been shown previously,

126

Overloaded Replicas Property Without Pred. Number of Overload Events 12 12 Number of Events during Increase Number of Events during Decrease 0 Minimum Duration [sec] 20 Average Duration [sec] 20 Maximum Duration [sec] 20 Average Saturation during Events 105.3%

With Pred. 0 0 0 -

Overloaded Resources Property Without Pred. Number of Overload Events 56 Number of Events during Increase 54 Number of Events during Decrease 2 Minimum Duration [sec] 10 Average Duration [sec] 14.5 40 Maximum Duration [sec] Average Utilization during Events 102.9% Overloaded Resource Type always CPU

With Pred. 1 0 1 20 20 20 110.0% always CPU

Total Without Pred. 1050 152 740 0.69%

With Pred. 20 164 470 0.01%

Property Total Duration of Events [sec] Total Server Time [sec] Error Ratio

Table 6.5: Statistical Details on observed Overload Events

this reduction comes at the price of reduced resource utilization during some phases. Further, even without considering predictions, the rapid load changes simulated for this scenario can be handled such that players would experience a deteriorated quality of service at most 0.69% of the time. Thereby it is assumed that exceeding resource and game-loop limitations immediately results in a noticeable reduction of the game play quality. Causes for Overload Events Beside statistical information on the occurred problems, the causes for the observed events could be derived from the additional logs produced by the simulator. Essentially, there are three situations potentially leading to this kind of events.

127

The first problem possibly resulting in overloaded resources is caused by the fact that whenever a replica is eliminated, the load of the remaining server instances managing the same zone is increased. Especially when eliminating one out of two replicas, the CPU time demand of the remaining instance is approximately doubled. Hence, whenever removing a replica, the load on the hosts managing the remaining game servers is increased although no specific operation is performed on them. Thereby their capacities are potentially exceeded. This effect, which is only occurring during phases of decreasing load, is so fare not anticipated by the algorithm. However, as can be seen within table 6.5, resources are rather seldom overloaded during the corresponding phases. Hence, the number of events caused by this unconsidered issue is limited. Another chain of events potentially leading to overloaded replicas could be derived from the logs. The problem arises when moving a server process from a node equipped with fast CPU cores to another host offering slower processing units. Thereby, assuming the game loop is executed mostly sequential, the corresponding loop saturation is increased since processing the critical path requires more time on the slower node. Although the algorithm always ensures that the target node has sufficient free capacities for executing the number of instructions per second associated to the moved server instance, it is not tested whether the critical path can be computed fast enough. Generally, the problem can only occur within heterogeneous environments when moving load between nodes offering more than a single core. Further, the presence of more than one replica is mitigating the problem since, at the time the new server is started, the weighting mechanism is only assigning a reduced quantity of load to the new instance. The rest is distributed amount the remaining nodes. Therefore, the problem is only relevant in situations where there is only a single, almost fully loaded replica of a zone, based on a multithreaded game loop implementation, which is moved between nodes equipped with multi-core processors of different speed. Further, problems due to this issue may occur during any phase, whenever the reshaping step is reducing the number of involved nodes or instances are offloaded from overloaded hosts. Finally, the last reason for situations in which resources or replicas become overloaded is provided by rapid, unanticipated load increases. Hence, during one balancing cycle, the load on some resource is below some defined threshold, and within the next step, the available capacity is exceeded. Reliable resource usage predictions provide an effective counter measure for such situations, as has been shown within this experiment. Alternatively, in case sudden load increases are typical for the managed game session, the corresponding threshold parameter values might be reduced to compensate for such events.

128

6.2.3 Conclusion Within this experiment the ability of the algorithm to deal within rapid load changes on the various levels of the balancing hierarchy have been demonstrated. In addition, it could be shown that the algorithm is capable of dynamically adapting the amount of involved hosts and hosters to the number of participating clients. Thereby, the type of resource representing the limiting factor for assigning load items may even vary. Further, the impact of reliable resource usage predictions on both, the resource utilization and error rate has been investigated in detail. Finally, some of the remaining issues of the algorithm leading to overload events have been identified and discussed. Although the identified problems are only responsible for a minority of the overload events observed during the simulations, further improvements of the algorithm may even include mechanisms to compensate for those.

6.3 A Real World Scenario Within the final experiment to be presented within this chapter, the algorithm’s ability to manage realistic real world game sessions should be evaluated. Further, the potential cost reduction resulting from using dynamic session management techniques instead of static approaches should be demonstrated. For this experiment, all simulation parameters have been derived from real world situations wherever possible to provide useful results. The overall tenor is focusing on a fast-paced first-person shooter or real time strategy game build upon the RTF framework (see 2.1.6) supporting both, the zoning and replication approach. Thereby a single game session consisting of 100 zones should be maintained using resources offered by the Amazon Elastic Computer Cloud (Amazon EC2, [7]).

6.3.1 The Experiment Setup The Game Session To incorporate realistic values for the number of players within the various zones, information collected from an actual MMOG is used. The authors of [3] have recorded the number of users connected to the various servers maintaining the popular online adventure game RuneScape [9] during the last two weeks of August 2007. The collected data has been generously provided to conduct the experiments described within this section. The offered traces cover the number of client connected to 144 servers over a period of more than 15 days. Throughout this period, every two minutes the load

129

of the involved hosts has been evaluated and recorded. Figure 6.17 illustrates some example traces describing the number of clients managed by the observed servers. Apparently, RuneScape enforces a hard limit of 2000 players per server. None of the traces ever exceeds this upper boundary. A more detailed analysis of the collected data can be found within [3].

Number of Clients

Server 1

Server 10

Server 30

Server 90

2500 2000 1500 1000 500 0 0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Session Time [days]

Figure 6.17: Example Client Count Graphs from RuneScape Traces [3] For the experiment described within this section, it is assumed that each of the involved servers is managing a different zone of the overall game map. Therefore, the trace of server X is used to simulate the number of clients within zone X. Further, intermediate values between the sampling points are linearly interpolated. The same technique is applied to close the gap of missing measurement values, which are occasionally present within the traces. The resulting traces of the first 100 servers are then used to derive the game profile for the simulations conducted to evaluate the system’s behaviour when being confronted with real world load patterns. Finally, a start-up transformation as illustrated within figure 5.3 is applied on the derived profile to introduce a gentle initialization phase. This artificial phase lasts for one hour. It is not considered by any of the analysis covered within this section. The Environment As has been described within the introduction of this section, the experiment is intended to simulate the usage of Amazon’s Compute Cloud for maintaining game sessions. Therefore, the specifications of the involved hosts are determined based on the instance type definition provided by Amazon. Amazon thereby offers several alternative virtual host configurations, representing different resource compositions. For instance, the default instance type provides 1.7 GB of memory, a CPU with 1 virtual core having a speed of 1 EC2 Compute Unit and 160 GB of storage. Thereby, one EC2 Compute Unit provides the equivalent

130

CPU capacity of a 1.0 - 1.2 GHz 2007 Opteron or 2007 Xeon processor. For more details see [7]. For this experiment, the High-CPU medium instance type has been chosen. This type offers proportionally more CPU resources than memory and it is intended for compute-intensive applications like the game to be simulated. It provides the standard 1.7 GB of memory combined with two virtual cores offering 2.5 EC2 Compute Units each. Amazon also offers an extra large High-CPU instance type providing eight virtual cores of the same speed at the same price per core. However, the medium variant has been preferred to obtain a finer granularity when reserving resources. Property Core Speed Number of Cores Memory Ingoing Bandwidth Outgoing Bandwidth

Value 5820 MIPS / 2.5 EC2 Compute Units 2 1700 MiB 1 Gbps 1 Gbps

Table 6.6: Host Specification for the Real World Experiment Table 6.6 contains the host specification used to describe one instance of the selected type. The various parameters have been retrieved from the Amazon EC2 web page and additional processor information published on [28]. Within the simulation environment used for this experiment, instances of the chosen machine type are offered by five different hosters. The first of those is providing access to 90 instances, the remaining to 70, 60, 60 and 45. Hence, within the simulated environment 325 machines are available. The separation into different hosters might model availability zones or even regions within the EC2 (for more details on those terms interested readers may be referred to [7]). Alternatively, the simulated hoster may represent different companies offering EC2 like services for games. Load Model and Game Parameters The parameters for the load models have been derived from various sources. The most important CPU load model parameters like for instance the interaction complexity or the number of instructions per client IO have been derived from [5]. Within this paper, an implementation of a real-time strategy game is demonstrating the replication concept (see section 2.1.4). Further, the paper includes results describing the number of clients that could be maintained with a given number of replicas and a tick rate of 25Hz. The corresponding values have been evaluated within the paper on a Pentium 4 1.7 GHz. For the experiment

131

covered within this section, the CPU load model parameters have been adjusted such that the resulting model fits the published client count values. Table 6.7 enumerates the amount of clients within a single zone that can be managed by a given number of replicas running on processors of the type used within [5] and the CPUs offered by the selected instance type of the EC2.

Processor Intel Pentium 4, 1.7 GHz 2.5 EC2 Compute Units

MIPS 5100 5820

Number of Replicas 1 2 3 4 5 109 195 263 319 364 126 225 302 367 419

Table 6.7: Maximum Number of supported Players Since according to this model the full 2000 clients of the RuneScape traces cannot be supported with any number of replicas (1000 clients requires 501 replicas, more cannot be supported since already updating the shadow entities is saturating the game loop), the game profiles derived from the RuneScape traces describing the changes in the user demand are scaled down to a maximum of 390 clients. Hence, to maintain the maximum load of a zone, five replicas are required within the simulated infrastructure. Thereby, the saturation of the involved game loops reaches 93%. Values for the network model parameters have been retrieved from [26]. Within this technical report, the network traffic characteristics of the popular FPS game Quake 3 [27] have been investigated. The report analyses the data flow between server and individual clients in both directions. It thereby shows that the traffic from the server to a client is consisting of messages with package lengths ranging between 60 and 160 bytes. Those packages are sent at a very constant rate of 20 packages per second, which most likely correspond to the tick rate of the game. For the experiment, the maximum package size of 160 bytes has been chosen, to compensate for the fact that within the simulated game significantly more players are involved, causing a higher number of changes within the virtual game world. Due to the previously determined tick rate of 25Hz, this package size results in an outgoing data rate of 32kbit/sec. Within the report, the same properties have been investigated considering the traffic from the client to the server. Thereby, an average package length of 65 bytes has been derived. However, unlike in the opposite direction, the packages rate is varying between 30 and 50 packages per second, depending on the refresh rate of the applied graphics card. As a result, a data rate ranging between 12 and 19kbit/sec was observed. For the experiment, a package size of 82 bytes has been chosen, which, together with the tick rate of 25Hz results in a data rate of 16.4kbit/sec between client and server.

132

The final parameter required for the network load model represents the amount of data transferred for updating a shadow entity. Since replication is not supported by Quake 3, the corresponding value cannot be derived the same way as the other parameters. Therefore, a value of 100 bytes has been chosen, which should be sufficient to contain all the information required to update the properties of a game entity that might change within a cycle. After all, the network model parameters have only a minor influence on the experiment results since usually CPU requirements represent the bottleneck for executing game server instances. Finally, parameters for the memory model had to be determined. Thereby, the memory requirements of the game engine together with the zone data has been set to 160 MiB. For every managed client, an additional 128 KiB are reserved and for every NPC or item 300 KiB are used. Since the CPU load model has been specified such that multiple replicas are required most of the time to handle the game load, the memory model parameters have only a minor impact on the experiment results, just as the network parameters do. As for the previous experiments, additional game properties needed to be fixed. Therefore it has been determined that within the corresponding simulations, it requires 12 seconds to start up a new game server instance as well as 8 seconds to eliminate an existing instance. Additionally, 4 seconds are required for moving load between replicas of the same zone. Finally, server instances are processing their tasks purely sequentially like most contemporary game server implementations. Algorithm Configuration Several different algorithm configurations are compared within this experiment. Table 6.8 shows the common options used by all configurations. All the options are chosen such that the total number of hosts involved in maintaining the game session is constantly minimized. Based on those common options, various threshold values are simulated. Table 6.9 enumerates the eight configurations simulated to investigate the effects of threshold parameters. Due to the different threshold values used within the various simulations, different resource utilization rates should be derived. Further, the dependency between threshold values, prediction and overload events can be investigated based on the chosen algorithm configurations. Observed Metrics The various algorithm configurations investigated as part of this experiment are confronted with the first 15 days of the scaled RuneScape traces. During the

133

Timing Parameters Parameter Name Value globalBalancingInterval 60 seconds localBalancingInterval 10 seconds Global Balancing Parameters Parameter Name Value globalCapacityScaling 90% 5 numHostersProbedOnExtension Local Balancing Parameters Parameter Name Value selectionStrategy FirstFit itemOrder Decreasing DecreasingLoad binOrder loadCompressionEnabled true saturationTolerance 1% Session Starter Parameters Parameter Name Value selectionStrategy FirstFit itemOrder Decreasing binOrder DecreasingLoad maxFillingRate 90% loadItemScaling 110% Table 6.8: Common Configuration Options for third Experiment

Parameter Name resourceThreshold maxFillingRate upperSaturationThreshold lowerSaturationThreshold concessionThreshold concessionLimit 2 min Prediction

90 90% 80% 90% 80% 90% 90% off

90p 90% 80% 90% 80% 90% 90% on

Configurations 95 95p 98 98p 95% 95% 98% 98% 85% 85% 90% 90% 95% 95% 98% 98% 85% 85% 90% 90% 95% 95% 98% 98% 95% 95% 98% 98% off on off on

99 99% 95% 99% 95% 99% 99% off

99p 99% 95% 99% 95% 99% 99% on

Table 6.9: Simulated Algorithm Configurations corresponding simulations, data regarding the saturation of the involved replicas as well as the resource usage on the various hosts is collected at a sampling

134

rate of 10 seconds. Due to this fine granularity, more than 80 million records are collected per simulation. The high density of the gathered information is required to ensure that overload events are reliably detected. From the collected data, the number of involved nodes, the resource utilization and details on encountered overload events are extracted as within the previous experiment. Static and Ideal Assignments Besides evaluating the defined algorithm configurations, the properties of a static and an ideal load assignment are derived. Thereby, the static assignment is considering the maximum load within the various zones throughout the simulated period and distributes the resulting number of required replicas among the available nodes such that a minimal number of nodes is needed. Further, during the simulation, this assignment is never altered. Hence, resources are not efficiently utilized most of the time. This static approach should thereby provide a comparison for determining the potential savings that can be obtained by applying dynamic strategies. The ideal assignment on the other hand is computing a close to optimum server-to-host assignment for every single time step of the simulated game session. Thereby every step is evaluated independently without considering the previous server assignment. The required mappings are derived using the FFD heuristic (see section 4.2.1) which, since within the given environment all bins offer the same capacity, provides results using not more than 122% of the actual optimum number of nodes. However, due to the characteristics of the emerging bin-packing problems, it can be assumed that the derived results are much closer to the actual optimum. This ideal assignment therefore provides an approximated lower boundary for the number of nodes to be involved in maintaining the simulated game session. Further, the derived ideal resource utilization values provide an upper boundary for the achievable results of any algorithm. Hence, the ideal assignment strategy can be considered as a reference line denoting the best possible solution focusing on the resource usage.

6.3.2 Experiment Results Node Counts Figure 6.18 illustrates the number of nodes required by the various configurations to maintain the simulated game session. Thereby the difference between a configuration and its predicted counterpart is marginal. Hence, within the graph only a single curve is shown per pair of configurations. However, in general, as has been demonstrated within the previous experiment, configurations

135

considering resource predictions require tendentiously more nodes during phases of increasing load. Static

90/90p

95/95p

98/98p

99/99p

Ideal

Involved Nodes

250 200 150 100 50 0 0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Session Time [days]

Figure 6.18: Nodes Involved in Maintaining the Game Session As can be observed, with the exception of the 90 and 90p configurations, all setups are capable of keeping the number of involved nodes constantly below the nodes required by the static allocation. Since at full load the game loop of the involved replicas is saturated to 93%, the 90% based configurations maintain 6 instead of 5 replicas per zone to manage the zone load. Therefore, the resource requirements of those configurations are significantly higher. In general, it can be observed that the closer the threshold values get to 100%, the lower the number of involved nodes are. Nevertheless, especially during the small phases of decreasing load following the daily peaks, in general too many nodes are involved in maintaining the game session when compared to the ideal case. This is most likely caused by the relatively low lower boundary for the resource usage and game loop saturation. Even within the 99% based configurations, the lower limits are set to 95%. By reducing the range between upper and lower boundary, the gap between the ideal and the observed behaviour may be closed. Finally, within figure 6.18 the capability of the dynamic approach to deal with long term changes in the user demand can be observed. The average number of players is slightly declining by the end of the second week. Hence, for those days, fewer resources are required. The dynamic approach presented within this thesis thereby automatically adjusts the resource requirements to the actual load to adapt to the changed conditions. Costs Since within the Amazon EC2 instances can be allocated per hour, the average number of nodes required for maintaining the game session provides a close approximation on the costs caused for maintaining the game session. However,

136

Costs Relative to Static Assignment

this estimation does not consider the fact that whenever a new hour is started, the full fee for that period has to be paid. Due to the long time scale simulated within this experiment, the granularity of an hour is small enough to be considered of low impact. Nevertheless, the potential savings illustrated within figure 6.19 therefore have to be considered as lower boundaries. 100,0%

78,8%

Static

90/90p

70,9%

68,2%

66,3%

61,7%

95/95p

98/98p

99/99p

Ideal

Figure 6.19: Relative Costs caused by Maintaining the Game Session Again, since the difference between a configuration and its prediction-based counterpart is insignificantly small, only a single value is shown per pair. Thereby, the resulting costs are normalized to the costs that would be caused by a static assignment. As can be observed, applying dynamic session management techniques allows cutting costs by approximately 1/3 for the simulated scenario. Involved Hosters Figure 6.20 provides an overview on the number of hosters involved in maintaining the game session. Thereby, for instance, the static approach requires the services of three sites throughout the entire simulated period. However, the dynamic configurations on the other hand are able to reduce the number of involved hosters to two during extensive periods of the game. Thereby, costs for inter-site network traffic can be saved and communication delays between nodes within different sites managing adjacent zones are reduced. By applying the ideal assignment, it would even be possible to maintain the game session within a single site for 24% of the time. However, the other configurations always require at least two sites. The inability of the evaluated setups to concentrate the load on the nodes offered by a single hoster is based on the relative low globalCapacityScaling factor used throughout all configurations. It is set to 90%, hence, whenever moving load toward a hoster, at least 10% of its resources have to remain free. Unfortunately, within situations in which a single hoster would be sufficient for maintain the entire game session still more

137

Fraction of Game Session

120% 100%

80% 60% 40%

20% 0%

1 Hoster

Static

90/90p

95/95p

98/98p

99/99p

Ideal

0%

0%

0%

0%

0%

24%

2 Hosters

0%

20%

36%

39%

41%

48%

3 Hosters

100%

47%

63%

61%

59%

28%

4 Hosters

0%

32%

1%

0%

0%

0%

Figure 6.20: Fraction of Game Session spent using different Numbers of Hosters than the allowed 90% of its nodes are required. Hence, during the simulations the selected configurations have never been able to concentrate the load within a single site. Nevertheless, by increasing the global capacity scaling parameter, a higher fraction of the game session should be maintained involving a smaller number of hosters. Finally, it has to be pointed out that the derived values describing the number of involved sites heavily depend on the capacities offered by the various hosters. Hence, those results should only demonstrate the ability of the algorithm to concentrate load within a smaller subset of sites. In a real world situation, when using static session management approaches requesting a fixed number of nodes, the infrastructure would most likely be adapted to be capable of running all server instances within a single site. Resource Utilization Figure 6.21 illustrates the total resource utilization achieved by the various configurations according to equation 6.2. Thereby, throughout all simulations, the CPU utilization represents the dominating element. As can be observed, the higher the resource and replication boundaries are set, the closer the utilization is getting toward the maximal achievable utilization represent through the ideal assignment. However, as with the number of involved nodes, the gap between the ideal and the simulated configurations is significantly larger during the short phases of reduced load after the daily peak. As has been pointed out before, this behaviour is caused by the relatively low lower boundaries used within the simulations. Increasing those should improve the resource utilization during

138

90/90p

95/95p

98/98p

99/99p

Ideal

Avg. Node Utilization

100% 80% 60% 40% 20% 0% 0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Session Time [days]

Figure 6.21: Overall Node Utilization during Game Session Maintenance the corresponding phases. On the other hand, in general, increasing threshold values also increases the risk of overloading resources since fewer capacities are reserved for potentially growing load. Overload Events Figure 6.22 illustrates the number of overload events observed throughout the conducted simulations. Replicas

Resources

Number of Events

1200 1000

800 600 400 200 0 90

90p

95

95p

98

98p

99

99p

Configurations

Figure 6.22: Number of Overload Events observed during the Simulations Thereby, the number of overload events for the 90% and 95% based configurations are very low. With increasing resource usage boundaries, the number of events increases as well. Especially within the 99% based configurations, the number of events is significantly higher due to the small margin reserved for varying load. Additional, figure 6.22 illustrates the positive effect of incorporating resource predictions. In general, by providing future resource requirements

139

Overloaded Resource Reduction

to the algorithm, game session load can be managed without inducing almost any overloaded replica. Further, within every pair of configurations, the setup considering resource predictions produces a lower number of overloaded resource events. Nevertheless, the influence is gradually reduced the closer the boundaries are moving toward 100%. While for the 95% based configuration respecting predicted resource usage values is reducing the number of corresponding overload events by approximately 48%, the same modification is reducing the error rate for the 99% based configuration by only 22%. The corresponding values are illustrated within figure 6.23. 75,0% 48,1%

90/90p

95/95p

18,0%

22,1%

98/98p

99/99p

Pairs of Configurations

Figure 6.23: Overloaded Resource Event Reduction by considering Predictions Further statistical details on the observed events are presented within table 6.10. Beside the plain number of events, the table also provides hints for their duration and severity. Thereby the accuracy of the measured durations is limited due to the chosen sampling rate of 10 seconds. However, increasing this rate would have produced an even larger amount of data per simulation. Nevertheless, general trends can be observed. Considering all simulations, overloaded replica events could be resolved within approximately 15 seconds while reducing the load of overloaded resources typically required more than 21 seconds. This inequality is caused by the different amount of time required for executing compensating operations. While replica load issues can be resolved by moving load between existing replicas, overloaded resources require the migration of server instances. Since the latter operation type requires more time to be executed, overloaded resource events are resolved slower. Another observation based on the event statistics covers their severity. Throughout most of the simulated configurations, the available capacities are not exceeded by more than 3-3.5% on average during overloaded events. Only for the single event occurred while simulating the 90p configuration as well as within the 99% based configurations slightly higher values have been observed.

140

Finally, another speciality of the 99% based configurations are their exceptionally low average game load saturation during corresponding overload events. This is caused by the fact that during the simulation the weighting mechanism described within section 4.4.3 may be confronted with replicas, which are almost saturated to 99% on average. Still, no additional replica will be added since the corresponding threshold has not yet been exceeded. However, due to the granularity of the replica weight adjustments, the smallest modifications may already result in a slightly overloaded replica. The resulting unacceptable situation will then be compensated within the next weighting round since the tolerance range for the saturation is exceeded. However, in the meantime an overload event has occurred. Hence, this balancing act is responsible for a certain fraction of the encountered overloaded replica events. Finally, table 6.10 compares the total time during which the game play experience may be deteriorated due to overloaded resources or replicas with the overall time services are offered to participating clients. Thereby, even for the configuration producing the highest number of overload events, more than 99.99% of the time players are maintained by server instances that are neither overloaded nor running on an over allocated host.

6.3.3 Conclusion The experiment described within this section has demonstrated the ability of the solution developed for this thesis to adapt resource requirements of game sessions to real world load patterns. In addition, the impact of resource and game loop boundaries on the resulting resource utilization and error rate have been illustrated. Further, the conducted simulations provided information pointing out additional ways for improving the algorithm’s performance and setup. Nevertheless, further experiments need to be carried out to evaluate their potential. Further, the results showed that the number of situations in which resource or game loop limits are exceeded could be kept on an acceptable level even when using high threshold values. Thereby, approximately 1/3 of the costs for running game sessions can be saved compared to static session management approaches. Further experiments may investigate the effect of modifying resource and saturation thresholds independently from each other. In addition, infrastructures based on Amazon’s extra-large high-CPU instance type may be simulated to determine whether the unused resources of eight virtual cores are sufficient to run an additional server instance, thereby increasing the load utilization. Finally, alternative resource prediction approaches may be tested. For instance, the offset may be varied or inaccuracies may be added.

141

90 1 10 10 10 102.9%

Overloaded Resources 90p 95 95p 1 27 14 30 10 20 30 24.1 25 30 40 30 104.5% 102.1% 101.8%

Overloaded Replicas 90p 95 95p 0 20 0 10 13.5 40 102.0% -

98 172 10 21.4 40 102.9%

98 163 10 14.7 40 101.1%

98p 141 10 21.4 40 103.5%

98p 0 -

99 757 10 21.6 40 103.9%

99 1043 10 16.3 50 100.6%

99p 590 10 22.7 40 104.7%

99p 5 10 12 20 100.2%

Property Number of Overload Events Minimum Duration [sec] Average Duration [sec] Maximum Duration [sec] Avg. Saturation during Events

90 4 20 27.5 30 102.6%

99p 13 480 4385.0 1 : 28K

Property Number of Overload Events Minimum Duration [sec] Average Duration [sec] Maximum Duration [sec] Avg. Utilization during Events

99 33 320 4369.8 1 : 11K

Total 95 920 4642.3 1 : 436K

98p 3 020 4498.1 1 : 128K

90p 30 5113.7 1 : 14.7M

98 6 090 4486.3 1 : 64K

90 120 5105.0 1 : 3.7M

95p 350 4652.4 1 : 1.1M

Property Total Duration of Events [sec] Total Server Time [days] Approximate Error Ratio

Table 6.10: Statistical Information on observed Overload Events

142

Chapter 7 Related Work Beside the concepts covered within chapter 2, additional approaches for distributing and balancing game server workload have been proposed. A few of those are presented within this chapter. Entity Centric Approach A very different approach for distributing and balancing the load of game servers has been presented within [29]. Instead of subdividing virtual worlds into smaller regions to partition the entities to be managed by the various server instances, each in-game object is handled individually. For instance, for each avatar representing a player, the elements in its area of interest or aura are maintained individually. Thereby, regular messages including position and state updates as well as interaction commands are exchanged between the avatar and its nearby elements. This way the environment information forwarded to the corresponding client is kept up-to-date. Further, less frequent messages including positional information are exchanged between entities without overlapping aura. This way, entities entering the area of interest of a unit are identified. Due to the independent management of the game entities, no constrains are present for the client to server assignment. Hence, any client may be managed by any server. Therefore, load balancing can be realized using off the shelf NAT systems distributing incoming client connections among the available servers based on their current load state. Thereby, any standard load-balancing schema may be used. However, the client assignment needs to be sticky. Hence, clients have to remain assigned to their initially determined server. This requirement has to be satisfied since only on those servers the information describing the state of the corresponding avatars is present. Therefore, by handling clients individually, the same load balancing solutions applied for distributing load among web-servers may be used for game servers. By managing individual entities instead of entire regions of games, constrains on the load assigned to the involved servers are eliminated. Hence, workload can be much simpler distributed among the available machines. Further, servers

143

may be dynamically added or removed to compensate for varying load. Within the experiments presented in [29] it could be demonstrated that this approach scales at least up to 10 servers managing 6000 clients for the evaluated game. However, the provided concept is assuming low update frequencies. Within the experiments, between 3 and 5 update messages have been sent per second from the servers to their assigned clients. Hence, the proposed solution may be sufficient for realizing medium sized role-playing games. However, it is not suitable for fast-paced real-time strategy or first-person shooter games, which have been the main objective of the RTF framework laying the foundation for this thesis.

Dynamic Game Regions Several approaches are proposing the dynamic modification of region boundaries to balance the workload assigned to the servers managing the corresponding zones. A simple variant has been described within [30]. The proposed concept is based on two-dimensional game maps, which are subdivided by partitioning lines along one of the two axes. Each of the resulting areas is assigned to one of the available server instances. Further, the involved servers are constantly monitoring their load. If the workload is exceeding certain thresholds, the server instances managing adjacent regions are contacted. If one of them has free resources, the common partitioning line separating the corresponding regions is moved such that a fraction of the load is migrated from the overload server to its neighbour. By applying this simple rule, sections causing high workloads tend to become smaller while regions containing fewer entities grow. Hence, the workload gets balanced among the involved game servers. The main issue of this approach is its limitation to a small number of servers. By including too many machines, a high number of long, slim regions would be generated, and the overhead for constantly migrating entities due to their movements within the world would harm performance. Consequently, the given approach does not scale to a larger number of server processes. Within the paper, simulation based experiments including only five server instances have been conducted. Further, improving the concept by partition the game map along both axes would increase the solutions scalability. However, this modification would require much more communication between servers to coordinate the movement of borderlines since more than two processes are affected. Further, since each partitioning line adjustment is influencing the load within more than two regions, an even load distribution can in general no longer be achieved.

144

Advanced Dynamic Game Regions A much more sophisticated dynamic region management concept has been presented within [31]. As usual, every server is maintaining a subset of the game entities. Thereby, the smallest rectangle containing all the positions of the assigned objects is called the coverage region of a server. The game entities and clients are disseminated among the involved server instances such that on the one hand the load is equally distributed and on the other hand the resulting coverage regions are overlapping as little as possible. Since due to the movement of entities those regions are changing their size and position, constant adjustments are necessary to preserve the latter property. Thereby, the presented concept is closely related to the principles of R-tree data structures [32]. This advanced dynamic solution especially focuses on the possibility of clients clustering around dynamically chosen points of interests. Within such areas, rather small coverage regions will be generated while sparsely populated parts of the game world may be covered by very large regions. Therefore, it provides an efficient possibility for balancing load between game servers. Within the paper, simulation-based experiments demonstrate the capability of this concept to efficiently balance clients among 64 servers. Further, the solution can be easily extended such that server instances are dynamically added or removed depending on the present workload. Finally, an additional benefit is provided by the fact that this approach is relieving game developers from the burden of defining efficient region boundaries during design time. The down side of the concept is its increased complexity introduced by handling dynamic zone boundaries. For instance, while within systems based on static zoning approaches, each server only needs to maintain connections to a fixed number of other processes managing adjacent zones, within this approach one zone might theoretically be adjacent to all others. Further, since the covered regions are constantly changing their size and position due to player movements, the corresponding information needs to be synchronized among all involved servers to properly support interactions across region boundaries. Finally, since the region managed by a server might wander around the entire game map over time, server instances need to be capable of reloading missing game world information after the process has been started. As a result, although this approach provides a powerful way of balancing workload between game servers, it comes at the price of an increased number of implementation issues. The approach chosen by RTF, laying the foundation for the load management concept developed for this thesis represents a more straightforward solution. Hence, game developers can more easily integrate the required functionality. Further, static zoning concepts do not produce any overhead due to changing

145

region boundaries. In particular, no frequently modified information needs to be shared among all the involved processes. Hence, approaches based on static game world partitioning provide an even high scalability than dynamic solutions. However, load might be distributed less efficiently. Locality Aware Load Management Unlike the concepts covered so far, [13] is describing an approach for managing game load based on a static game world partitioning. Hence, the handled problem is closely related to the one investigated by this thesis. It thereby assumes that this partitioning is transparent for players, hence users do not experience any delay when being migrated from one server to another. In addition, crossregion interactions are presumed. Due to those properties, the paper concludes that a large number of small zones are feasible for all game maps, thereby providing a fine enough granularity to perform efficient load balancing. Therefore, the set of regions is dynamically mapped to the involved servers. Moreover, to keep inter-server communications low, the spatial locality of zones is exploited. Since typically only server processes managing adjacent regions of a game world need to communicate with each other, assigning neighbouring zones to the same server instance or to topologically close hosts allows reducing the overall network traffic and encountered message delays. The corresponding locality aware dynamic zone mapping is thereby realized by a decentralized, heuristic algorithm performed by the involved game server processes. Thereby, ideally all servers are managing sets of connected zones, covering large coherent regions of the game world. The simulation based experiments presented within [13] demonstrate the abilities of this approach to handle situations in which a large number of players are moving toward one area or hot-spot within a virtual world. Thereby, the effects of applying locality aware techniques within a LAN and WAN based infrastructure have been evaluated separately. The results show that the difference between dynamic algorithms with and without spatial locality is minimal within LAN environments. However, preserving locality within a WAN can significantly improve the performance. Within the paper, simulations have been conducted for a game environment producing two update messages per second for each client. Further, exceptionally large update packages have been used. The length ranged between 4.5KB for simulations involving 4 players to up to 60KB for 256 players. Considering the update rate, this produces between 4 to 50 times the traffic flowing from the server to the clients than observed for Quake 3 [26]. Further, due to the low update frequency CPU load is never becoming a bottleneck. Hence, within

146

the conducted experiments, network load represents the dominating resource requirement. Within the RTF framework, the necessary granularity for managing the workload of a zone is added by the replication concept. By adjusting the number of replicas, the load within the various server instances can be influenced. In addition, larger zones can be supported, thereby increasing the maximum range of sight of entities (see section 2.1.3). However, RTF does not provide any information on the spatial relation between the managed zones. Hence, this property cannot be exploited by the algorithm devised for this thesis. However, as has been demonstrated within [13], the negative effects of ignoring locality within LANs is low. Therefore, by keeping the number of hosters involved in managing a game session low, inter-server communication will mostly occur within grid sites, hence WAN connections are less frequently used. Furthermore, by ensuring that all replicas of a zone are always managed within a single site, the solution proposed within this thesis is exploiting the implicit locality introduced by the replication concept.

147

148

Chapter 8 Conclusion and Future Work The aim of this thesis was the development of a flexible solution capable of coordinating multiplayer online game sessions on a global scale based on the services offered by the edutain@grid project. Therefore, a scalable, distributed session management infrastructure has been implemented, which’s architecture is based on a flexible network of controllers. Thereby, the involved components are communicating with each other using P2P technologies. In addition to providing support for administrative operations including session management tasks and monitoring obligations, the infrastructure also provides a facade offering abstract access to all components throughout the system. This latter functionality is provided to some implementation of an algorithm managing the resource requirements of active game sessions. The core responsibility of this exchangeable part of the system running on top of all other services is to decide where and when to run server instances contributing to the maintained sessions. In addition, it might balance the load assigned to server processes. An example algorithm recognizing those obligations has been devised for this thesis. To the best knowledge of the author of this thesis, the proposed solution is the first approach using the combined power of the zoning and the replicating concepts to realize a dynamic load management service. The capabilities of the presented solution have been evaluated through various experiments based on a simulated environment emulating the behaviour of game sessions. Thereby, the development of the simulator itself has been another challenge accomplished for this thesis. As the experiments have demonstrated, the proposed solution is capable of dynamically adapting the resources allocated for maintaining game sessions to match the actual requirements. Thereby, due to the high varieties in the user demand throughout a day, the costs for maintaining MMOG sessions could be reduced by approximately 1/3 compared to a simple static approach. At the same time, the game play experience of the end user is almost not affected. Within every conducted simulation emulating realistic scenarios, more than 99.99% of the game session duties have been recognized by server instances having suffi-

149

cient resources at their disposal. Therefore, the corresponding processes should be able of providing the same quality of service, as they would offer within a system based on static resource allocations. Hence, the proposed solution is providing a considerable replacement for contemporary static approaches. Future Work Based on the provided program package, many additional aspects may be investigated in the future. For once, experiments evaluating the effects of yet unconsidered algorithm parameters may be performed. Thereby, a special focus might be put on investigating the effects of adjusting resource and game loop saturation threshold boundaries independently from each other. In addition, as has been pointed out within the experiment chapter, the effect of the global capacity scaling parameter on the number of involved hosters might be investigated. Further, the offset used for resource predictions may be varied as well as other timing parameters of the algorithm. Thereby, some relation between the time span required for starting or stopping server instances and the algorithm’s timing parameters may be derived. The results may provide hints on choosing optimal values for the corresponding configuration options. In addition to experimenting with yet unconsidered parameter combinations, also alternative aspects of the algorithms behaviour may be investigated. Beside the achieved resource utilization and error rates, also the number of balancing operations may be compared. Especially the number of server instance migrations might be of particular interest. Further, the algorithms ability of managing multiple concurrent game sessions could be investigated. Thereby, the way sessions are distributed among the involved hosters may be of particular interest. Besides performing additional experiments based on the provided solution, several extensions may be added to the presented system to fit the requirements of specific environments or to improve its usability. For instance, a component may be added, which is capable of dynamically tuning algorithm parameters. Based on the results derived from corresponding experiments, a mechanism capable of reducing thresholds during periods of growing load and increasing them while load is stable or shrinking may be developed. These dynamic adjustments might potentially increase the resource utilization while keeping error rates low. In addition, the set of options to be determined by administrators would be significantly reduced. Additionally, within an environment of hosters requesting different fees for their provided resources, a cost model describing those charges may be added

150

to the algorithm. Therefore, the extended bin-packing problem of section 4.2.2 may be further extended to consider the costs caused by assigning items to the available bins. In addition, the problem statement would have to be reformulated from using a minimal number of bins to using a subset of bins producing minimal costs. Furthermore, theoretical studies of the extended bin-packing problem and its application within the algorithm may be conducted. A wide range of extensions might also be integrated on the grid level of the balancing hierarchy. Especially the way new hosters are chose for extending game sessions may be customized for real world applications. For instance, additional information on the hosters like the geographical location of the corresponding data centres could be incorporated to choose resources close to the already involved grid sites. Even further, the connection latency of players, which is an additional metric offered by the RTF framework, may be considered during the selection process to chose hosters close to the participating clients. Within EC2 like environments, additional responsibilities may be assigned to the global balancing component. For instance, whenever a local balancer instance is instructing its associated global counterpart to offload some of the assigned zone duties to other sites, the global instance may first try to obtain additional nodes for the locally maintained hoster site. Thereby new on-demand instances of the cloud may be allocated. If this increase of the local capacities is successful, no duties have to be moved. Further, during the periodic reduction steps performed by the global balancer, the number of instances may be reduced if the overall utilization is small enough. Thereby, pricing regulations, like the fact that every started hour in which instances have been used has to be fully paid, can be considered. Similar, the global balancer may be extended to interact with resource managers controlling grid sites. Since the balancing algorithm is isolated from the remaining elements of the provided software package, even entirely different approaches for managing game sessions may be evaluated by implementing alternative algorithms. Thereby, all of them may be evaluated and compared based on the same set of scenarios to derive comparable results. Finally, an evaluation of the systems abilities within a real world environment managing actual game sessions would provide the irrefutable evidence on its proper operation. However, therefore access to a popular game built upon the RTF framework as well as to a corresponding infrastructure would be required.

151

152

List of Figures 1.1

Total Active MMOG Subscriptions from [1] . . . . . . . . . . . .

2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11

Example Multiplayer Online Game Topologies . . . . . . . . Example Multi-Server Topology . . . . . . . . . . . . . . . . Basic Game Loop Schema . . . . . . . . . . . . . . . . . . . Work Rhythm of a 60% Saturated Game Loop . . . . . . . The Zoning Concept . . . . . . . . . . . . . . . . . . . . . . The Principle of Overlapping Zones . . . . . . . . . . . . . . The Replication Concept involving two Server Instances . . The Instancing Concept involving two Server Instances . . . Data Entities Involved in the State Representation . . . . . Game Session Life Cycle . . . . . . . . . . . . . . . . . . . . The Structure of the complete System State Representation

3.1 3.2 3.3 3.4 3.5 3.6 3.7

The Overall Architecture of the Session Management Solution . . Phases of issued Balancing Operations . . . . . . . . . . . . . . . Internal Controller Architecture and Related Components . . . . Message Routing Concept within the Algorithm Environment . . Example of a composed Balancing Operation based on a Workflow Distributed Registry Update Procedure . . . . . . . . . . . . . . Server Architecture and Connected Components . . . . . . . . .

37 41 42 44 46 48 49

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11

The Basic Balancing Problem . . . . . . . . . . . . . . . . . . . The Extended Hierarchical Balancing Problem . . . . . . . . . Example Patterns produced by various Heuristics . . . . . . . . The Load-Balancing Hierarchy . . . . . . . . . . . . . . . . . . Internal Algorithm Organization . . . . . . . . . . . . . . . . . The Balancing Cycle Sequence . . . . . . . . . . . . . . . . . . Filtering Background Load . . . . . . . . . . . . . . . . . . . . Example Loop Saturations and Resulting Weight Modifications Example Game Loop Saturation Boundaries . . . . . . . . . . . Offloading Zone Duties on Demand . . . . . . . . . . . . . . . . Reducing the Number of Participants . . . . . . . . . . . . . .

52 54 61 62 66 68 71 72 76 84 84

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

9 14 15 16 18 20 21 23 25 28 32 34

153

5.1 5.2 5.3 5.4 5.5 5.6

The Simulation Environment within the Overall Architecture Internal Organization of the Simulation Environment . . . . . Example Game Session Reshaping Transformation . . . . . . Basic Concept of Discrete Event Simulations . . . . . . . . . Extended Event Simulation Concept . . . . . . . . . . . . . . Supporting Complex Event Tasks . . . . . . . . . . . . . . . .

6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15 6.16 6.17 6.18 6.19 6.20 6.21 6.22 6.23

Game Session Profile used for First Experiment . . . . . . . . . . 112 Observed Game-Loop Saturations for sequential Game Loops . . 115 Weight Adjustments for sequential Game Loops . . . . . . . . . . 115 Observed Game-Loop Saturations for 25% parallel Game Loops . 116 Observed Game-Loop Saturations for 50% parallel Game Loops . 116 Observed Game-Loop Saturations for 100% parallel Game Loops 116 Weight Adjustments for 25% parallel Game Loops . . . . . . . . 117 Weight Adjustments for 50% parallel Game Loops . . . . . . . . 117 Weight Adjustments for 100% parallel Game Loops . . . . . . . . 117 Game Session Profile used for the Second Experiment . . . . . . 119 Nodes Involved in Maintaining the Game Session . . . . . . . . . 123 Hosters Involved in Maintaining the Game Session . . . . . . . . 124 Resource Utilization without Prediction . . . . . . . . . . . . . . 124 Resource Utilization with Prediction . . . . . . . . . . . . . . . . 125 Overall Node Utilization Comparison . . . . . . . . . . . . . . . . 125 Number of Overload Events observed with and without Predictions126 Example Client Count Graphs from RuneScape Traces [3] . . . . 130 Nodes Involved in Maintaining the Game Session . . . . . . . . . 136 Relative Costs caused by Maintaining the Game Session . . . . . 137 Fraction of Game Session spent using different Numbers of Hosters138 Overall Node Utilization during Game Session Maintenance . . . 139 Number of Overload Events observed during the Simulations . . 139 Overloaded Resource Event Reduction by considering Predictions 140

154

. . . . . .

. . . . . .

100 101 104 108 109 110

List of Tables 4.1 4.2 4.3 4.4 4.5

List List List List List

of of of of of

Timing Parameters . . . . . . . . . . . . Global Balancer Parameters . . . . . . . Local Resource Management Parameters Replica Management Parameters . . . . Session Starter Parameters . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

86 87 87 88 89

6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10

Processor Specifications used for first Experiment . . . . . . . . . Host Type Specifications used for the Second Experiment . . . . Load Model Parameters used for the Second Experiment . . . . . Customized Algorithm Configuration for the Second Experiment Statistical Details on observed Overload Events . . . . . . . . . . Host Specification for the Real World Experiment . . . . . . . . Maximum Number of supported Players . . . . . . . . . . . . . . Common Configuration Options for third Experiment . . . . . . Simulated Algorithm Configurations . . . . . . . . . . . . . . . . Statistical Information on observed Overload Events . . . . . . .

113 120 120 121 127 131 132 134 134 142

155

156

List of Algorithms 1 2 3 4

Generic Bin Packing Algorithm . . . . . . . Weight Adjustment Algorithm . . . . . . . Resource Allocation Algorithm (simplified) Generic Load Compression Heuristic . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

60 74 78 81

157

158

Bibliography [1] B.S. Woodcock. An Analysis of MMOG Subscription Growth, Version 23.0. mmogchart.com, April 2008. [2] World of Warcraft. http://www.worldofwarcraft.com. [3] Vlad Nae, Alexandru Iosup, Stefan Podlipnig, Radu Prodan, Dick Epema, and Thomas Fahringer. Efficient management of data center resources for massively multiplayer online games. In SC ’08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1–12, Piscataway, NJ, USA, 2008. IEEE Press. [4] Wentong Cai, Percival Xavier, Stephen J. Turner, and Bu-Sung Lee. A scalable architecture for supporting interactive games on the internet. In PADS ’02: Proceedings of the sixteenth workshop on Parallel and distributed simulation, pages 60–67, Washington, DC, USA, 2002. IEEE Computer Society. [5] Jens M¨ uller and Sergei Gorlatch. Rokkatan: scaling an rts game design to the massively multiplayer realm. Comput. Entertain., 4(3):11, 2006. [6] Eric Cronin, Burton Filstrup, and Anthony Kurc. A distributed multiplayer game server system. In University of Michigan, page 01, 2001. [7] Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2/. [8] T. Fahringer, C. Anthes, A. Arragon, A. Lipaj, J. Muller-Iden, C. Rawlings, R. Prodan, and M. Surridge. The edutain@ grid project. LECTURE NOTES IN COMPUTER SCIENCE, 4685:182, 2007. [9] RuneScape. http://www.runescape.com/. [10] J. Muller and S. Gorlatch. GSM: A game scalability model for multiplayer real-time games. In Proceedings IEEE INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies, volume 3, 2005. [11] P. Bettner and M. Terrano. 1500 archers on a 28.8: Network programming in Age of Empires and beyond. Presented at GDC2001, 2:30p, 2001.

159

[12] Katherine L. Morse, Lubomir Bic, and Michael Dillencourt. Interest management in large-scale virtual environments. Presence: Teleoper. Virtual Environ., 9(1):52–68, 2000. [13] Jin Chen, Baohua Wu, Margaret Delap, Björn Knutsson, Honghui Lu, and Cristiana Amza. Locality aware dynamic load management for massively multiplayer games. In PPoPP ’05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 289– 300, New York, NY, USA, 2005. ACM. [14] James Tulip, James Bekkema, and Keith Nesbitt. Multi-threaded game engine design. In IE ’06: Proceedings of the 3rd Australasian conference on Interactive entertainment, pages 9–14, Murdoch University, Australia, Australia, 2006. Murdoch University. [15] Frank Glinka, Alexander Ploß, Jens M¨ uller-lden, and Sergei Gorlatch. Rtf: a real-time framework for developing scalable multiplayer online games. In NetGames ’07: Proceedings of the 6th ACM SIGCOMM workshop on Network and system support for games, pages 81–86, New York, NY, USA, 2007. ACM. [16] Vlad Nae, Radu Prodan, and Thomas Fahringer. Neural network-based load prediction for highly dynamic distributed online games. In Euro-Par ’08: Proceedings of the 14th international Euro-Par conference on Parallel Processing, pages 202–211, Berlin, Heidelberg, 2008. Springer-Verlag. [17] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In SIGCOMM ’01: Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, pages 149–160, New York, NY, USA, 2001. ACM. [18] Distributed and Mobile Systems Group of Bamberg University. Openchord. URL: urlhttp://open-chord.sourceforge.net/, 4 2008. [19] Jxta(tm) community project. URL: urlhttps://jxta.dev.java.net/. [20] Vlad Nae, Jordan Herbert, Radu Prodan, and Thomas Fahringer. An information system for real-time online interactive applications. pages 361– 370, 2009. [21] M. Silvano and T. Paolo. Knapsack problems: Algorithms and computer implementations. John Wiley & Sons, 1990.

160

[22] G. Dosa. The Tight Bound of First Fit Decreasing Bin-Packing Algorithm Is. In Combinatorics, Algorithms, Probabilistic and Experimental Methodologies First International Symposium, ESCAPE 2007, Hangzhou, China, April 7-9, 2007: First International Symposium, ESCAPE 2007, Hangzhou, China, April 7-9, 2007 Revised Selected Papers, page 1. Springer, 2007. [23] Belabbas Yagoubi and Yahya Slimani. Dynamic load balancing strategy for grid computing. World Academy of Science, Engineering and Technology, 19(18):90–95, 2006. [24] JUnit. http://www.junit.org/, 2009. [25] G.M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18-20, 1967, spring joint computer conference, pages 483–485. ACM New York, NY, USA, 1967. [26] M. Pozzobon. Quake 3 packet and traffic characteristics. Technical report, Swinburne University of Technology, 2002. [27] Quake 3 Arena. http://www.idsoftware.com/games/quake/quake3-arena/. [28] Amazon EC2 Instances and cpuinfo. http://www.cloudiquity.com/2009/ 01/amazon-ec2-instances-and-cpuinfo/, 1 2009. [29] Fengyun Lu, Simon Parkin, and Graham Morgan. Load balancing for massively multiplayer online games. In NetGames ’06: Proceedings of 5th ACM SIGCOMM workshop on Network and system support for games, page 1, New York, NY, USA, 2006. ACM. [30] Dugki Min, Donghoon Lee, Byungseok Park, and Eunmi Choi. A load balancing algorithm for a distributed multimedia game server architecture. In ICMCS ’99: Proceedings of the IEEE International Conference on Multimedia Computing and Systems, page 882, Washington, DC, USA, 1999. IEEE Computer Society. [31] Roman Chertov and Sonia Fahmy. Optimistic load balancing in a distributed virtual environment. In NOSSDAV ’06: Proceedings of the 2006 international workshop on Network and operating systems support for digital audio and video, pages 1–6, New York, NY, USA, 2006. ACM. [32] Antonin Guttman. R-trees: a dynamic index structure for spatial searching. In SIGMOD ’84: Proceedings of the 1984 ACM SIGMOD international conference on Management of data, pages 47–57, New York, NY, USA, 1984. ACM.

161

Dynamic Load Management for MMOGs in Distributed ... - DPS, UIBK

Dynamic Load Management for MMOGs in Distributed ... - DPS, UIBK

Suggest Documents

On the Potential of Significance-Driven Execution for ... - DPS, UIBK

Dynamic-Distributed Load Balancing for Highly

Dynamic DNS for load balancing - Distributed

Dynamic Load Balancing in Distributed Hash Tables

Performance and Scalability of GPU-based Convolutional ... - DPS, UIBK

Addressing Cheating in Distributed MMOGs - NUS Computing

Addressing Cheating in Distributed MMOGs - NUS Computing

Load Balancing In Distributed Workflow Management System

Dynamic Load Balancing in Distributed Systems in the Presence of ...

Dynamic DNS for load balancing - Distributed ... - Semantic Scholar

Dynamic Load Balancing for Distributed Search - Diamond - Carnegie ...

Dynamic Load Balancing for Distributed Heterogeneous Computing of ...

Distributed Energy Storage Control for Dynamic Load Impact ... - MDPI

A Distributed Diffusion Method for Dynamic Load ... - Semantic Scholar

Dynamic Load Balancing Architecture for Distributed VoD ... - CiteSeerX

Adaptive policies for optimal buffer management in dynamic load

Dynamic Load Balancing in Distributed Content ... - Semantic Scholar

Dynamic Multi-User Load Balancing in Distributed Systems - CiteSeerX

A Guide to Dynamic Load Balancing in Distributed Computer Systems

Dynamic Load Balancing in Distributed Content-based ... - CiteSeerX

resilience to avatar mobility in distributed MMOGs - SÃ©bastien Monnet

VIRTUAL TIME BASED DYNAMIC LOAD MANAGEMENT ... - CiteSeerX

Perception-Based Filtering for MMOGs

Load Management and High Availability in the Medusa Distributed ...