The Journal of Supercomputing manuscript No. (will be inserted by the editor)
Workload Balancing in Distributed Crowd Simulations: the Partitioning Method Guillermo Vigueras · Miguel Lozano · Juan M. ˜ Orduna
Received: date / Accepted: date
Abstract The simulation of large crowds of autonomous agents with a realistic behavior is still a challenge for several computer research communities. Distributed architectures can provide scalability to crowd simulations, but they require the use of efficient partitioning methods. Although Convex Hulls have been shown as very efficient structures for crowd partitioning, providing efficient workload balancing to large scale simulations is still an open issue. In this paper, we propose the integration of a workload balancing technique for crowd simulations within a partitioning method based on convex hulls. The region-based balancing technique re-assigns agents to servers using a criterion of distance. The performance evaluation results show that this technique ensures the saturation avoidance of the servers in an homogeneous distributed system. This feature can increase the scalability of crowd simulations. Keywords Distributed crowd simulations · Workload balancing 1 Introduction Crowd simulations model the motion of crowds and other flock-like groups as interacting particles that display different behaviors in 2D/3D scenes [11, 3]. Agent-based crowd simulations aim to capture the nature of a crowd as a collection of individuals, each of which can have their own goals, knowledge and behaviors [10]. These applications require both rendering visually plausible images of the virtual world and managing This work has been jointly supported by the Spanish MICINN, the European Commission FEDER funds, and the University of Valencia under grants Consolider-Ingenio 2010 CSD2006-00046, TIN2009-14475C04-04, and V SEGLES PIE. Guillermo Vigueras, Miguel Lozano, Juan M. Ordu˜na Departmento de Inform´atica Universidad de Valencia Tel.: +34 96 3544489 Fax: +34 96 3544768 E-mail: {Guillermo.Vigueras,Miguel.Lozano,
[email protected]} Present address: Avda. Vicent Andr´es Estell´es, s/n. 46007 -Burjassot (Valencia). Spain
2
the behavior of autonomous agents at interactive rates. In this sense, simulating the realistic behavior of large crowds of autonomous agents is still a challenge for several computer research communities [2, 13, 4, 9]. The graphical and behavioral requirements of these applications impose a computational cost that highly increases with the numbers of agents in the system, in such a way that a scalable design is crucial for these applications. Distributed computer architectures can provide the required scalability. In a previous work, we proposed an architecture that can simulate large crowds of autonomous agents at interactive rates [5]. In that architecture, the crowd system is composed of many Client Computers, that host agents, and one Action Server (AS), hosted in one computer, that is responsible for checking the actions (eg. collision detection) sent by agents. Although that proposal can simulate large crowds of autonomous agents, the AS still represents a bottleneck when the number of agents is increased. Later, we improved that architecture by parallelizing the AS in a distributed-server fashion. The simulation world was partitioned into subregions and each one assigned to one parallel AS. A scheme of this architecture [15] is shown in figure 1. This figure shows how the space occupied by crowd agents is partitioned into three subregions, and each one is assigned to one parallel AS (labeled in the figure as ASx ), each one hosted by one computer. The agents are execution threads assigned to one Client Computer (labeled in the figure as Clientx ) associated to the corresponding server, ASx . Each AS process hosts a copy of the Semantic Database. However, each AS exclusively manages the portion of the database representing the agents in its region. In order to guarantee the action consistency near the border of the different regions (see agentk in figure 1), the ASs can collect information about the surrounding regions by querying the servers managing the adjacent regions. Additionally, the associated Clients are notified about the changes produced by the agents located near the adjacent regions by the ASs managing those regions. This architecture allows to scale up the system, although it also requires an efficient partitioning method. This partitioning method should efficiently assign the agents to the existing servers in such a way that the number of messages exchanged among the servers is reduced and the system is well balanced.
Fig. 1 General scheme of the distributed crowd architecture
3
In previous works, we proposed Genetic Algorithms (GA) for partitioning crowd simulations, in order to improve the partitioning efficiency [6]. However, these and other region-based partitioning methods [9, 8, 16] add a significant overhead to the system. In order to reduce this overhead, we proposed the use of irregular shape regions (convex hulls) for solving the partitioning problem [14]. The performance of the method based on convex hulls is higher in terms of both fitness function values and execution times, but the usefulness of this method for real distributed systems is limited by the saturation of a few severs. Therefore, workload balancing arises as the main issue to be solved for achieving an efficient partitioning method in large scale distributed crowd simulations. In this paper, we propose to enhance the QHull partitioning method proposed in [14], ensuring that none of the server reaches saturation. In order to achieve this goal, we have modified that method by adding a workload balancing phase where servers re-assigns agents to servers using a criterion of distance. This phase ends when the standard deviation of the average number of avatars hosted in each server is reduced near zero. The performance evaluation results show that this technique ensures the saturation avoidance for all the servers in an homogenous distributed system. This feature can increase the scalability of crowd simulations. The rest of the paper is organized as follows: Section 2 describes the QHull method and it shows some load balancing problems that this method cannot solve for certain crowd behaviors. Section 3 describes the proposed enhancements for that partitioning method. Next, Section 4 shows the performance evaluation of the proposed method. Finally, Section 5 shows some concluding remarks. 2 The QHull Partitioning Method The partitioning problem to be solved in distributed crowd simulations consists of finding a near optimal partition of regions in the virtual world (containing all the agents in the system) that minimizes the number of agents near the borders of the regions. Also, this near optimal partition should balance the number of agents in each region. In order to model this problem, we defined the following fitness function to be minimized [6]: H(P) = ω1 · α (P) + ω2 · β (P), ω1 + ω2 = 1
(1)
The first term in this equation measures the number of agents in the resulting partition P whose surroundings (Area Of Interest or AOI [12] ) crosses the region boundaries. Each interaction of such agents will require the exclusive access of a single server to that AOI. As a result, it requires the exchange of lock requests among two or more servers, and this overhead must be minimized. Concretely, α (P) is computed as the sum of all the agents whose AOIs intersect two or more regions of the virtual world (that is, the number of agents whose AOIs reside in more than one regions, see agentk in Figure 1). β (P) is computed as the standard deviation of the average number of agents that each region contains. Therefore, β (P) measures how balanced the partition P is. Finally, ω1 and ω2 are weighting factors between 0 and 1 that can be tuned to change how the partitions are evaluated. We used H(P)
4
(with 0.6 and 0.4 for ω1 and ω2 values, respectively) as the global fitness function for measuring the quality of the partitions provided by the QHull method as well as other partitioning method evaluated as reference values [14]. The QHull partitioning method consists of the simulation servers periodically assigning each of their agents agk to the server controlling the ri that minimizes the following function: falloc (agk , ri ) = dstMC(agk , ri ) + nAgs(ri ) ∗ dstMC(agk , ri )
(2)
where falloc is the allocation function, nAgs(ri ) provides the number of agents in region ri , and dstMC(agk , ri ) corresponds to the Euclidean distance from agk to the center of mass of the region ri . The first term in falloc considers a spatial criterion and the second term balances the server workload. Every time a region is updated, the corresponding state (center of mass and number of agents) is sent to the neighbor servers. The QHull method implements the Quickhull algorithm [1] in order to reduce the area of the region controlled by each server, in such a way that the number of lock requests is reduced. As a result, this method provides partitions like the one shown in Figure 2 a). In this Figure, the virtual world consists of a 2D square, and agents are represented as dots. The partition provided by the QHull method consists of convex hull regions containing the whole agents in the virtual world. The agents in each region are assigned to a given server. The partitions provided by the QHull method minimize the number of lock requests and avoids a huge imbalance in the number of agents assigned to different servers.
Fig. 2 a) Partition provided by the QHull method for a given simulation cycle. b) Standard deviations provided by the QHull method for each server.
The QHull method provides a good performance for different movement patterns with respect to other existing methods [14]. However, we have detected eventual saturations of some servers in real systems during some simulations when using this partitioning method, particularly for those movement patterns that generate the highest workload (the sdown movement pattern, a situation where there is a single exit in a structured maze and all the agents should evacuate the maze). Since the system
5
is homogenous (all the computers in the system have the same features), we tried to find the reason for this behavior. Concretely, we studied the standard deviation provided for each server by the QHull method. Figure 2 b) shows these values (in percentage) for a representative simulation. In this Figure, the standard deviation has been computed as the average value minus the number of agents assigned to each server. Therefore, positive values mean that the partition has assigned less agents than the average value to that server. Figure 2 b) shows some negative peaks for the plots corresponding to server 3 and server 2 in the first part of the simulation. These peaks mean that these servers host a number of agents 30% higher than the average value. It cannot be guaranteed that these assignments do not exceed the computational bandwidth of the servers, leading to saturation. That is, the workload balancing performed by QHull method is not enough for some cases, and it should be enhanced in such a way that the partitioning method guarantees the saturation avoidance for all the servers.
3 Enhancing Workload Balancing In order to enhance the workload balancing performed by the QHull partitioning method, an additional step is performed when the QHull method finishes. In this phase, each server computes the distance of their agents to the neighbor hulls. Next, the agents assigned to each server are sorted in a list according to this distance criterion (nearest agents first). At this point, the underused servers (the servers hosting a number of agents lower than the average value) will ask a overloaded neighbor server (a server managing a neighbor region of the partition and hosting a number of agents higher than the average value) to migrate some agents. The agents selected for migration in the overloaded server will be the first agents in the sorted list. The reassignment of agents from an overloaded server to an underused server will finish when the number of agents hosted by the underused server reaches the average value of agents hosted by servers. If the overloaded server becomes underused as a result of the migrations, then it starts the process again with one of its overloaded neighbor servers. For some movement patterns, a special case occurs when an underused server manages a region with a single, underused neighbor server. In that case, the algorithm allows the transfer of agents from the underused neighbor server. The reason is that the assignor server will in turn start again the re-assigning process, acquiring agents from an overloaded neighbor server. Listing 1 shows a pseudocode of the enhanced partitioning method. A given function computes the workload that each server supports and classify servers either as overloaded or underused. They store for each server (in field s.imbalance) the difference (measured in number of agents) between the average value of agents hosted by servers and the number of agents hosted by that server. Another function computes which servers are the neighbors of each server (which servers control the surrounding regions) and stores for each server which neighbor servers are overloaded and underused. Finally, function load Balance performs the workload balancing procedure explained above. For each underused server, the inner while loop in function load Balance searches the agents in neighbor regions that are located at the shortest distances from the region controlled by that
6
server. These agents are migrated to the underused server. First, this search is performed in overloaded neighbor servers. If no suitable agents are found, then these agents are searched in the underused neighbor servers. The code below the inner while loop detects changes in the imbalance of servers. The outer while loop takes care of the special case commented above. l o a d B a l a n c e ( u S e r v e r s , o S e r v e r s ) / / U n d e r u s e d and O v e r l o a d e d s e r v e r s serversLoadChange = true while ( serversLoadChange ) { serversLoadChange = f a l s e f o r e a c h s i n u S e r v e r s do { while ( s . imbalance != 0) { bestAgent = g e t b e s t a g e n t f r o m s e r v e r ( s . oNeighbors ) bestNeighbor = g e t s e r v e r ( bestAgent ) i f ( b e s t A g e n t ==NULL) { bestAgent = g e t b e s t a g e n t f r o m s e r v e r ( s . uNeighbors ) bestNeighbor = g e t s e r v e r ( bestAgent ) } i f ( b e s t A g e n t ! =NULL) { bestNeighbor . erase ( bestAgent ) b e s t N e i g h b o r . i m b a l a n c e ++ s . add ( b e s t A g e n t ) } s . imbalance− − } i f ( servers load has Changed ( uServers , oServers ) ) calculate Servers Load ( uServers , oServers ) calculate Servers Neighborhood ( uServers , oServers ) serversLoadChange = true break } } Listing 1 Pseudocode for the workload balancing algorithm.
4 Performance Evaluation We propose the evaluation of the partitioning method by simulation. We have extracted the motion patterns of each agent off-line, and we have used this information as an input for the considered partitioning method. Concretely, we have evaluated the QHull method as well as the QHull method enhanced with the workload balancing phase. We have simulated a crowd simulation composed by 8000 autonomous agents to be distributed in five regions. We have evaluated different crowd scenarios, including evacuation and urban environments. The evacuation scenario consists of a structured 2-D world where there are several emergency exits. The autonomous agents must try to escape from the world as soon as possible. For this scenario, we have used the same well-known movement patterns considered for evaluating the QHull method [14]. However, due to space limitations we only present the detailed results for the worst case, the down movement pattern. This is the pattern that showed the highest
7
standard deviation in the number of agents hosted by the servers [14]. Figure 3 a) shows the fitness function values provided by each partitioning method for the down simulation. In this figure, the X-axis represents the simulation time in simulation cycles, and the Y-axis represents the fitness function values. For comparison purposes, we have measured the values provided by the QHull method and the enhanced QHull method, denoted as LB for Load Balancing method. In order to obtain results comparable to the ones shown in [14], we have used the same ω1 and ω2 values of 0.6 and 0.4, respectively. Figure 3 a) shows that the LB method provides better (lower) fitness function values during the whole simulation. However, the reduction in these values is not very significant. The reason for this low reduction are the ω1 and ω2 values. Since these results do no clearly show that the proposed method provide substantial benefits, we have separately measured the two terms in H(P) function provided by each method: the standard deviation from the average number of avatars hosted by the servers, and the number of lock requests that each partition generates.
Fig. 3 a) H(P) values provided by the QHull and the LB methods. b) Standard deviation of the number of avatars hosted by servers.
Figure 3 b) shows the standard deviation from the average number of avatars hosted by the servers. This Figure shows on the Y-axis the standard deviation measured as number of avatars. This Figure shows the proposed method eliminate the standard deviation for most of the simulation cycles. For those cycles where the QHull method provides standard deviations up to 1300 avatars, the proposed method provides deviations of around 200 avatars. These results show that the proposed method efficiently balances the workload, avoiding the saturation of any server if the system is composed of homogeneous servers. Although they are not shown here due to space limitations, we haved also checked the aggregated number of lock requests and the aggregated execution time required by each method for providing the assignments for all the servers during each simulation cycle. The results show that the cost for the workload balancing process does not represent an important overhead neither in terms of the number of locks requests nor in terms of execution time of the partitioning method.
8
5 Conclusions In this paper, we have proposed the integration of a workload balancing technique for crowd simulations within a partitioning method based on convex hulls. The regionbased balancing technique re-assigns agents to servers using a criterion of distance. The performance evaluation results show that this technique efficiently balances the workload and it does not significantly increase the number of lock requests that take place among servers, neither adds a significant overhead. As a result, it avoids the saturation of any server when applied to an homogeneous distributed system, increasing the scalability of crowd simulations.
References 1. Barber, C.B., Dobkin, D.P., Huhdanpaa, H.: The quickhull algorithm for convex hulls. ACM Trans. Math. Softw. 22(4), 469–483 (1996). DOI http://doi.acm.org/10.1145/235815.235821 2. Diller, D.E., Ferguson, W., Leung, A.M., Benyo, B., Foley, D.: Behavior modeling in commercial games. In: Behavior Representation in Modeling and Simulation (BRIMS) (2004) 3. Frank, T., Bernert, K., Pachler, K.: Dynamic load balancing for lagrangian particle tracking algorithms on mimd cluster computers. In: PARCO’2001 - Int. Conf. on Parallel Computing 2001, pp. 1–9 (2001) 4. Kruszewski, P.A.: A game-based cots system for simulating intelligent 3d agents. In: BRIMS ’05: Proceedings of the 2005 Behavior Representation in Modelling and Simulation Conference (2005) 5. Lozano, M., Morillo, P., Ordu˜na, J.M., Cavero, V., Vigueras, G.: A new system architecture for crowd simulation. J. Netw. Comput. Appl. 32(2), 474–482 (2009) 6. Lozano, M., Ordu˜na, J.M., Cavero, V.: A genetic approach for distributing semantic databases of crowd simulations. In: Proceedings of 21th. International Parallel and Distributed Symposium, p. 237. IEEE Computer Society Press (2007) 7. Morillo, P., Ordu˜na, J.M., Fern¿ndez, M., Duato, J.: Improving the performance of distributed virtual environment systems. IEEE Transactions on Parallel and Distributed Systems 16(7), 637–649 (2005) 8. Quinn, M.J., Metoyer, R.A., Hunter-Zaworski, K.: Parallel implementation of the social forces model. In: Proc. of 2nd. International Conference in Pedestrian and Evacuation Dynamics, pp. 63–74 (2003) 9. Reynolds, C.: Big fast crowds on ps3. In: sandbox ’06: Proceedings of the 2006 ACM SIGGRAPH symposium on Videogames, pp. 113–121. ACM, New York, NY, USA (2006) 10. Reynolds, C.W.: Flocks, herds and schools: A distributed behavioral model. In: SIGGRAPH ’87: Proceedings of the 14th annual conference on Computer graphics and interactive techniques, pp. 25– 34. ACM, New York, NY, USA (1987). DOI http://doi.acm.org/10.1145/37401.37406 11. Sims, K.: Particle animation and rendering using data parallel computation. In: SIGGRAPH ’90: Proceedings of the 17th annual conference on Computer graphics and interactive techniques, pp. 405– 413. ACM, New York, NY, USA (1990). DOI http://doi.acm.org/10.1145/97879.97923 12. Singhal, S., Zyda, M.: Networked Virtual Environments. ACM Press (1999) 13. Sung, M., Gleicher, M., Chenney, S.: Scalable behaviors for crowd simulations. In: Proceedings of Eurographics symposium on Computer animation, pp. 519–528. ACM Press (2004) 14. Vigueras, G., Lozano, M., Ordu˜na, J.M., Grimaldo, F.: Improving the performance of partitioning methods for crowd simulations. In: HIS ’08: Proceedings of the 2008 8th International Conference on Hybrid Intelligent Systems, pp. 102–107. IEEE Computer Society (2008) 15. Vigueras, G., Lozano, M., P´erez, C., Ordu˜na, J.: A scalable architecture for crowd simulation: Implementing a parallel action server. In: Proceedings of International Conference on Parallel Processing (ICPP), pp. 430–437. IEEE Computer Society, Los alamitos, CA, USA (2008) 16. Zhou, B., Zhou, S.: Parallel simulation of group behaviors. In: WSC ’04: Proceedings of the 36th conference on Winter simulation, pp. 364–370. Winter Simulation Conference (2004)