Games of War and Peace: Large Scale Simulation

1 downloads 0 Views 649KB Size Report
In this paper we describe Games of War and Peace (GWP), a test bed for modeling ... time, making the changes in the mesh much smoother, and the number of .... So these are our goals: we must hide the network latency so humans won't feel.
Games of War and Peace: Large Scale Simulation Over the Internet Helder Batista IST / INESC [email protected]

Vasco Costa IST / INESC [email protected]

João Madeiras Pereira IST / INESC [email protected]

Abstract The simulation of complex and realistic interactive virtual environments is a challenging problem in various ways. From the visualization point of view the difficulties arise from the hardware limitations for presenting a complex scene at interactive frame rates. For communications the challenge is the management of a high number of participants without sacrificing the simulation’s performance. We analyze solutions that make possible the realization of simulations with high realism using consumer level equipment available today.

Figure 1. Screenshot with 105 fighters.

1. Introduction In this paper we describe Games of War and Peace (GWP), a test bed for modeling and simulation being developed at IST with the support of INESC. GWP has been designed from the onset with a peer-to-peer architecture running on consumer level equipment connected to the Internet via dial-up modems. We have made use of industry standard API’s like OpenGL [1] and HLA libRTI [2, 3]. The HLA was developed for the United States Military to satisfy their needs for flexible and extensible simulation architecture. The Military have been pioneers and have led the industry in the area of large-scale simulation. The usage of the HLA libRTI in our application is an example of the successful merging of military and entertainment technologies.

First, we describe our objectives and the software architecture as well as the choices behind it. Secondly, we provide some implementation details giving the required background detail. Finally, we discuss the experimental results and future developments. Application

Static Data Graphics Engine

Output World Simulation

Network Module Internet

Input Module

Dynamic Data

Input

Figure 2. GWP software architecture.

2. Software architecture Before • • •

we started design we defined the following objectives: GWP must be a distributed peer-to-peer architecture. GWP must support hundreds of users. GWP must run on consumer level hardware: We chose an IBM Compatible PC with an AMD Duron 800Mhz CPU, 256MB of RAM and a NVIDIA GeForce 2 MX based graphics board as our reference platform. • GWP must be based on industry standard API’s: We chose Microsoft Windows 2000 as the operating system for the development platform due to it having good support for the industry standard API’s OpenGL and HLA libRTI. We decided to implement a simple flight simulation in order to test if GWP could satisfy these requirements. Figure 2 shows the GWP software architecture. We opted to store high volume data such as triangle meshes and textures in a static database in order to reduce network traffic. Dynamic data includes the data used in the flight simulation. The graphics engine is based on OpenGL and supports continuous level of detail (CLOD) to reduce the amount of scene geometry sent to the graphics board. The input module processes input and updates the dynamic database accordingly. The world simulation module simulates the behavior of the entities in the simulation such as fighters. The network module is based on HLA and provides support for dynamic data distribution based on events. These can include fighter movement, collisions, explosions, etc.

3. Visualization A graphics engine that supports simulations in a large scale must provide interactive frame rates when a very large number of objects are visible onscreen. This is so because the playability of a simulation is determined, among other factors, by its frame rate. We must keep the frame rate high enough to give the impression of fluid motion on the screen (i.e. 25 or more frames per second).

For the visualization of complex scenes at interactive frame rates, with adequate image quality, we can adjust the detail level of the objects in the scene according to parameters like the distance of an object to the viewer and the number of objects present in the scene. This can be achieved by so called level of detail (LOD) algorithms. The simplest approach is to store versions of the same object at different detail levels, and choose which version to display according to parameters like the ones mentioned before. Although this is easy to implement it has some disadvantages: • • • •

The modelers have to build various versions of the objects. Multiple copies lead to waste of memory. The viewer can easily perceive the transition between detail levels: this unpleasant effect is called popping. It cannot support large terrains due to size constraints. Terrains are fundamental to most large-scale simulations, especially flight simulations like ours.

There are better algorithms specific to large terrain visualization, like quad-tree based algorithms [4] and ROAM [5], but they don’t solve the problem for generic meshes. Hugues Hoppe has developed a more generic algorithm called progressive meshes [6,7,8], which we have chosen to implement in our engine. The progressive meshes algorithm has the following advantages: • • • • • • • •

Only the most detailed version of an object needs to be modeled. Continuous level of detail (CLOD), i.e. the mesh can be changed one vertex at a time, making the changes in the mesh much smoother, and the number of polygons is finely controlled. Applicable to generic meshes including terrains. Supports the subdivision of large terrains into smaller blocks that can be independently loaded and freed from memory. Supports the preservation of important visual features like material boundaries and texture seams. Irregular tessellation of terrains, which results in better quality using less triangles. View dependent refinement, essential for large terrains because only a small part of the mesh is visible at any moment. Supports the creation of geomorphs, i.e. interpolation between levels of the mesh therefore smoothing the transitions between levels of detail, making them even harder to spot.

Respecting the principles of the algorithm an implementation can vary widely. We will briefly discuss the most important points and aspects different or not discussed in the referenced papers.

3.1. Error function One of the key aspects of the progressive mesh algorithm is the calculation of the visual distortion introduced by removing a vertex. The calculation of this error is nontrivial and can be done in various ways, generally with a trade-off between accuracy and speed. We have used a function of the form:    error = ∑  N ( g i ) − N ( f i ) +  i   ∀i : f i ∈ F , g i ∈ G

  )     × A( g i )  ∑k A( f k )   

∑ A( g

j

j

Where F is the set of original faces, G is the set of faces after the transformation, A(f) is the area of face f and N(f) is the normal of face f. The formula can be divided in three parts: • Measure of the faces orientation variation: preserves the appearance of the surface. • Measure of the faces area variation: preserves the area (and volume) of the mesh. • Face weight: determines that changing larger faces introduces more error. In each step we choose one of the three hypotheses: collapsing to parent 1, collapsing to parent 2 or interpolating the parents geometry. The one that causes the least error is chosen. In Figure 3 we can see that this function performs adequately while not being complex to implement or heavy to compute.

Figure 3. From left to right: original model (30860 faces); simplified to 3160 faces; simplified to 800 faces; simplified to 200 faces. 3.2. Data structures These are the structures used to render the terrain at run-time. The structures used for generic meshes (like the fighters) are similar, except for: • The vertices have an extra attribute for the normal vector. • The faces have an extra attribute to define the material. • The active faces are grouped by material to speed up rendering. • Any dynamic attributes are detached from the Vertex and Face structures, as we’ll explain in the Optimizations section below.

struct Vertex { Point3f pos; Vertex * child; float error; int32 ecol_number : 30; int32 collapsed : 1; int32 dead : 1; uint16 iactive; };

struct Face { Vertex * base_vertices uint16 iactive; };

[3];

struct ActiveFace { Face * static_face; Vertex* current_vertices[3]; };

struct ActiveVertex { Vertex * static_vertex; uint32 geomorphing uint32 step uint32 numsteps Point3f geosource, Point3f geotarget; };

: 1; : 15; : 15;

struct PM { Vertex * vertices; Face * faces; Vector active_vertices; Vector active_faces; };

Figure 4. Data structures for run-time rendering. Large terrain construction Hoppe [8] proposes a way to build and use very large terrains using a hierarchical progressive mesh structure. We use a different approach that is very simple and was easy to incorporate in the original algorithm. To subdivide a large terrain into independent blocks and avoid any visible cracks between them we split the mesh into smaller blocks using the normal procedure on them, but after processing each block we keep the structure of the borders of that block in memory. Since we know which are the neighboring blocks, we then force the transformations on the border of the new block to replicate exactly the structure of the neighbor block. This way, in run-time, when we perform a transformation in one of the borders we are guaranteed to perform exactly the same on the other, and they stitch together perfectly, so no changes are needed to the run-time algorithm. Optimizations Since we have multiple (and possibly a very large number of) instances of the same object in the scene, only with different detail levels, naturally we just want to have one copy of the progressive mesh in memory. But adjusting the detail of the progressive mesh to match what we want for all those objects every frame becomes too slow to be feasible. To solve this problem we have separated the dynamic data structures from the static ones, so that we can have multiple copies of just the dynamic structures, which consume little memory. Therefore the changes in the objects from frame to frame are small and the performance is greatly increased. Another optimization was made when drawing the fighters. The fighters are sorted according to the number of faces, and when drawing, if the current fighter has about the same number of faces as the previous one, we redraw the previous fighter instead, therefore reusing the cached data, and increasing performance. To further improve performance the data structures were designed to avoid memory allocation while doing run-time processing.

Figure 5. 2045x2045 terrain dataset. From left to right: height data; texture data; triangle mesh with 6000 triangles. 3.3. Test results These tests were run on the reference platform. On Figure 6 we show the performance of the engine when flying on a scene with only the terrain, for various triangle counts. The triangle counts are kept constant by adapting the mesh as necessary. The terrain used has 2045x2045 points and is divided into 16 independent 512x512 zones (Figure 5). Of the original 8 million triangles, we chose to keep one eighth i.e. 1 million. The removal of the rest of the triangles introduces a negligible error so they were discarded. The navigation was performed with geomorphing enabled (with a 2 second lifetime). The frame rates showed are approximate minimum values. As said earlier, the goal is to maintain an acceptable frame rate even when many objects are visible. Therefore the triangle count cannot increase linearly with the increase of the number of objects. Also it shouldn’t be kept constant since it would unnecessarily slow down the simulation when few objects are present and have little quality when a large number of objects are present. So we use a heuristic to increase the polygon count non-linearly: when the number of objects doubles (in this case the objects are fighters in the scene) we increase the polygon count by 1/4. This is enough to keep an acceptable graphical quality since the detail of the fighters is also adjusted according to its distance from the viewer. The overall graphical quality can be selected by changing the number of triangles used to draw a scene with a single fighter (the base triangles). This formula can be approximated by: 3

n × base, n = number of fighters

On Figure 7 is an example of the progression of the polygon count as the number of visible fighters increases, with different base numbers (notice that the scale on the number of fighters is logarithmic). On Figure 8 is the performance for a scene with a 6000 triangles terrain with various numbers of visible fighters, with the triangle base set at 1500 triangles ( see Figure 1 ).

200

180

14000 1000 1500 2000

120

10000

85

100

65

52

50

45

8000

fps

fps

150

number of triangles

12000

6000 4000 2000 0

0 4000

6000 8000 10000 12000 14000 number of triangles

Figure 6

2

4

8

16

32

64 128 256

number of visible fighters

Figure 7

90 80 70 60 50 40 30 20 10 0

85

77 67

63 43

54

105

170

209

464

number of visible fighters

Figure 8

4. Communications Communication problems can be summed into just two: bandwidth and latency. We want to support consumer available long-distance communications. 56Kbps modems are in widespread use today. ADSL and Cable modems still aren’t widespread enough (especially in Europe). Therefore we must cater for the 56Kbps users when establishing the available bandwidth and latency. Studies of multimedia applications indicate that humans become uncomfortable with inconsistencies on the order of 100 ms and that these become intolerable on the order of 200 ms [22]. For transatlantic connections over the Internet we can expect latencies between 200 and 400 ms. Even if the latency due to software and hardware can be diminished we are ultimately limited by the speed of light when communicating over very large distances. So these are our goals: we must hide the network latency so humans won’t feel disturbed by it and we must cater for 56Kbps modems while simulating hundreds of objects. Traditionally games like Dogfight and DOOM [16] used the naive approach of sending an update event per object per frame. While this approach works over local area networks (LANs), with their high bandwidth and low latency, over the wide Internet the latency alone will destroy the user experience. Pioneering efforts like SIMNET [16] and NPSNET [17, 18] alleviated these problems by the introduction of techniques like dead-reckoning [16] and multicasting [15]. Multicasting Multicasting is used for communicating between groups of machines. Each machine is registered on a multicast group sharing an IP address. Whenever a packet is sent to the group address the network routers take care of forwarding the packet to every machine in the group. By using multicasting we can therefore reduce the amount of bandwidth needed to communicate an event to several machines. The HLA LibRTI supports multicasting. Dead-Reckoning Dead-Reckoning techniques trade off some consistency in the simulation in order to fake lower latency and reduce bandwidth requirements. They do this by the way of prediction. For example one can predict the trajectory of a projectile given its initial position, velocity and acceleration and certain known constants like gravity. The more predictable an object or event is, greater are the gains achieved by using deadreckoning. In order to ensure that data doesn’t go wildly out of synch in the case of packet loss, dead-reckoning updates must use absolute values instead of differences between values.

For example when sending position updates we send the absolute position in world coordinates, instead of the difference between the current and last position.

4.1. Implementation We decided to apply timestamps to position updates like in the PHBDR protocol [20, 21, 22]. This reduces the error between the actual and remotely displayed positions, but requires that the computers have synchronized clocks. Thankfully this can be achieved by using the Network Time Protocol (NTP). Windows 2000 includes a Simple Network Time Protocol (SNTP) implementation, as part of the bundled services, which proved adequate to our needs. We send an update whenever the distance between the real position and the dead-reckoned position exceeds a small threshold. Orientation information is sent piggybacked to position information. We implemented linear convergence to smooth the remotely displayed flight path. Using the HLA libRTI allowed us to implement the network module without bothering too much with low level details. 30 updates sent

25 20 15 10 5 0 1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 time (s)

Naive

Figure 9. Flight path of a fighter.

Dead-Reckoning

Figure 10. Number of updates sent using the dead-reckoning vs. the naive approach.

4.2. Test results We tested the dead-reckoning algorithm on the flight path in Figure 9. The fighter was still for approximately 1 second before traveling the path. The number of updates sent using dead-reckoning is an order of magnitude smaller than with the naive approach as can be seen on Figure 10. This is done at a cost. The error in the remotely displayed position is slightly greater because we used a non-zero threshold between the real and remotely displayed position. This can be seen on Figure 11.1. Linear convergence also increases error because the object takes longer to return to the real flight path as seen on Figure 11.2. The introduction of latency increases error as expected. However once the update is received the remotely displayed fighter returns to the real flight path as can be seen on Figure 11.3. The peak error in this path is slightly inferior to 7 meters. When we have high latency the error is proportional to the speed of the fighter and the network latency.

7

0,6

6

0,5

5 Error (m)

Error (m)

0,7

0,4 0,3

4 3

0,2

2

0,1

1

0

0 0

1

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16 17 18

0

1

2

3

4

5

6

7

8

Time (s)

0,7

7

0,6

6

0,5

5

0,4 0,3

4 3

0,2

2

0,1

1

0

10 11 12 13 14 15 16 17 18

Figure 11.3. Error with random 200-400 ms latency and no convergence.

Error (m)

Error (m)

Figure 11.1. Error with 0 ms latency and no convergence.

9

Time (s)

0 0

1

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16 17 18

Time (s)

Figure 11.2. Error with 0 ms latency and linear convergence.

0

1

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16 17 18

Time (s)

Figure 11.4. Error with random 200-400 ms latency and linear convergence.

5. Conclusions In this paper we presented solutions and techniques to enable large scale simulations over the Internet using consumer level hardware. From the experimental results we can conclude that this is indeed possible, although still challenging due to the high latency and unreliability of the Internet [23]. The performance data of our graphics engine shows that today’s hardware is capable of displaying large virtual environments with hundreds of visible objects with good visual quality. Multicasting and dead-reckoning combined greatly reduce bandwidth requirements while reducing the latency perceived by the user. However dead-reckoning can’t hide extremely high latencies because the actions of remote users are unpredictable even if constrained. For the future, further testing with a large number of geographically distributed machines is required. Also the introduction of more types of objects (tanks, ground targets, etc) would make the simulation more interesting and is easily supported by the architecture.

6. References [1] [2] [3] [4] [5] [6] [7] [8]

OpenGL Architecture Review Board, OpenGL® 1.2 Programming Guide 3rd edition. Addison Wesley , 1999. Defense Modeling and Simulation Office, High Level Architecture Interface Specification Version 1.3. U.S. Department of Defense, April 1998. Defense Modeling and Simulation Office, HLA RTI-Next Generation Programmer’s Guide Version 3.2. U.S. Department of Defense, September 2000. Peter Lindstrom et all, Real-time, continuous level of detail rendering of height fields. Proceedings of SIGGRAPH 1996, pp 109-118. Mark Duchaineau et all, ROAMing Terrain: Real-time Optimally Adapting Meshes. IEEE Visualization 1997, pp. 81-88. Hugues Hoppe, Progressive Meshes. Proceedings of SIGGRAPH 1996, pp. 99-108. Hugues Hoppe, View-dependent refinement of progressive meshes. Proceedings of SIGGRAPH 1997, pp 189-198. Hugues Hoppe, View-dependent level-of-detail control and it’s application to terrain rendering.

[9] [10] [11] [12]

[13] [14] [15] [16] [17] [18]

[19] [20]

[21] [22] [23]

IEEE Visualization 1998, pp 35-42 Michael Garland and Paul S. Heckbert, Fast Triangular Approximation of Terrains and Height Fields. Carnegie Mellon University, May 2 1997. Paul S. Heckbert and Michael Garland, Survey of Polygonal Surface Simplification Algorithms. Carnegie Mellon University, May 1 1997. Mark J. Harris and Anselmo Lastra, Real-Time Cloud Rendering. to appear in Eurographics, 2001. Ulf Assarsson and Tomas Möller, Optimized View Frustum Culling Algorithms. Technical Report 99-3, Department of Computer Engineering, Chalmers University of Technology, March 1999. Ulf Assarsson and Tomas Möller, Optimized View Frustum Culling Algorithms for Bounding Boxes. Journal of Graphics Tools 5(1): pp 9-22, 2000. Philip M. Hubbard, Approximating polyhedra with spheres for time-critical collision detection. ACM Transactions on Graphics 15(3), pp. 179-210, July 1996. Stephen. E. Deering, Host extensions for IP multicasting, RFC1112, Aug-01-1989. Sandeep K. Singhal and Michael J. Zyda. Networked Virtual Environments: Design and Implementation. ACM Press and Addison Wesley, 1999. Michael R. Macedonia, A Network Software Architecture for Large Scale Virtual Environments. Doctor’s Thesis, Naval Postgraduate School, June 1995. Michael R. Macedonia, Donald P. Brutzman, Michael J. Zyda, David R. Pratt, Paul T. Barham, John Falby, John Locke, NPSNET: A Multi-Player 3D Virtual Environment over the Internet. Proceedings of the ACM 1995 Symposium on Interactive 3D Graphics, 9-12 April 1995. Arthur R. Pope. The SIMNET network and protocols. Technical Report 7102, BBN Systems and Technologies, Cambridge, MA. Sandeep K. Singhal and Daniel R. Cheriton, Using a Position History-Based Protocol for Distributed Object Visualization, Department of Computer Science, Technical Report STAN-CS-TR-94-1505, Stanford University, pp. 3-8. Sandeep K. Singhal and Daniel R. Cheriton, Exploiting Position History for Efficient Remote Rendering in Networked Virtual Reality, Presence, Volume 4 Nº2, Spring 1995, pp. 169-193. Sandeep K. Singhal, Effective Remote Modeling in Large-Scale Distributed Simulation and Visualization Environments. Doctor’s Thesis, Stanford University, 1997. Stuart Cheshire. Latency and the Quest for Interactivity. http://www.stuartcheshire.org, November 1996.