parallel simulation of traffic in geneva using cellular automata

12 downloads 0 Views 329KB Size Report
description of a parallel cellular automata tra c microsimulator. We discuss the data ...... ftp://ftp.epcc.ed.ak.uk/pub/paramics/papers/vehicular-reality.ps.z.
PARALLEL SIMULATION OF TRAFFIC IN GENEVA USING CELLULAR AUTOMATA ALEXANDRE DUPUIS AND BASTIEN CHOPARD



Abstract. Road trac microsimulations based on the individual motion of all vehicles are now recognized as an important tool to describe, understand and manage road trac. Cellular automata models are a very ecient way to implement car motion. This paper presents a detailed description of a parallel cellular automata trac microsimulator. We discuss the data structure, domain decomposition, and provide a detailed performance analysis for a problem of size n on p processors. We consider realistic simulation for the city of Geneva and report some results concerning the global trac behavior. Provided that enough processors are used, large scale simulation can be performed in less time than the real process. Key words. Road trac microsimulations, cellular automata, parallel computing, performance

analysis.

AMS subject classi cations.

1. Introduction. A good understanding of road trac is an important challenge for modern societies. It has a direct impact on our quality of life since most people experience daily the inconvenience of trac jams or the pernicious e ect of the pollution. Trac implies considerable costs for the community and a great deal of e ort is devoted in every large city to reduce the trouble caused by an excess of cars. While trac management is obviously a political problem, a complete and satisfactory scienti c understanding of the phenomena is still lacking. Yet, a reliable scienti c description is crucial to study several scenarios and take a good decision. Depending on the question to be answered, di erent methodologies can be considered. For a global planning, a description of the trac ow in terms of a graph, together with standard optimization techniques may give an adequate prediction [1]. On the other hand, in order to capture small scale phenomena, a ner description may be desirable. The rst attempt to model trac ow as a physical process, in terms of equations dates back to the 1960s [2]. Since then, many techniques have been developed, using both a continuous ( uid-like) approach or a description based on the dynamics of individual cars [3, 4]. In addition to the purely dynamical problem of describing a set of cars, it is crucial to know the number of cars traveling from each possible origins to each possible destinations. Microsimulations, based on the individual motion of all vehicles are obviously closer to reality and o er a lot of exibility. However, they require a large computing power to deal with a realistic number of cars. Massively parallel computers o er a way to address this problem [5, 6]. However, in spite of the increase of CPU power, there is a clear interest to devise a fast and ecient algorithm to compute the motion of each car. A method based on a cellular automata (CA) approach [7, 8] has recently gained a lot of interest. In this approach, cars are represented as points moving on a discretized road with only a small set of possible velocities and accelerations. Several parallel CA simulators exists and are used to predict trac in real cities (e.g. TRANSIMS [9, 10]).  University of Geneva, CUI, 24, rue G eneral-Dufour, CH-1211 Geneva 4, Switzerland (fAlexandre.Dupuis/[email protected]) 1

2

A. DUPUIS AND B. CHOPARD

Some studies of the parallel implementation of trac simulators are available [11, 12, 13] and references [4, 14] contain many of the recent developments in this eld. This paper presents the CA microsimulator developed at the University of Geneva, as well as its parallel implementation and a detailed analysis of the run time performance. Our system is adapted to describe road trac in a city. The dynamics of cars follows the standard CA models. Interacting one-dimensional CAs are used to represent the roads and the junctions in a two-dimensional network of arbitrary size and topology. At each road junction, one-dimensional CAs are interconnected through a rotary structure. Rotaries make it easy to solve priority problems in a synchronous updating scheme and provide a natural domain decomposition where an ecient static load balancing can be obtained. Our model describes generic road crossings with the possibility to adjust the capacity and add trac lights. However, there is no attempt to reproduce the exact structure of a junction. Therefore our model is rather adapted to study emergent properties such as the jamming transition or the uctuation of the travel time between two points in the city. We have considered trac simulations for the case of the city of Geneva and suburbs (4000 km of roads and more than 1000 crossings). Realistic behaviors are observed in a reasonable amount of time.

2. Cellular automata trac model. 2.1. Basic models. A cellular automaton (CA) can be viewed as an idealized

physical world in which space, time and all physical quantities are discrete [15]. CAs provide a numerically simple and ecient way to model and simulate a complex physical process by considering a description at the level of the basic components of the system. The evolution rules are set up so that only the essential features of the real interactions are taken into account. It is then observed that the collective behavior that emerges from the CA dynamics is identical, in the appropriate limit, to the real phenomenon [15]. According to this methodology, single lane car trac can be modeled as follows. The road is represented as a line of cells, each of them being occupied or not by a vehicle. All cars travel in the same direction (say to the right). Their positions are updated synchronously, in successive iterations (discrete time steps). During the motion, each car can be at rest or jump to the nearest neighbor site, along the direction of motion. The rule is simply that a car moves only if its destination cell is empty. The main point of this rule is that the drivers do not know whether the car in front will move or is stuck by another car. Therefore, the state of each cell si 2 f0; 1g is entirely determined by the occupancy of the cell itself and its two nearest neighbors si?1 and si+1 . This dynamics can be summarized by the relation (2.1) si (t + 1) = si?1 (t)(1 ? si (t)) + si (t)si+1 (t) where t denotes the iteration step. A richer version of the above CA trac model has been developed by Nagel and Schreckenberg [7, 4, 14]. The cars may have several possible velocities u = 0; 1; 2; :::; umax. Let ui be the velocity of car i and di the distance, along the road, separating cars i and i + 1. The updating rule is:  The cars accelerate when possible: ui ! u0i = ui + 1, if ui < umax .  The cars slow down when required: u0i ! u00i = di ? 1, if u0i  di .  The cars have a random behavior: u00i ! u000i = u00i ? 1, with probability pi if u00i > 0.

PARALLEL SIMULATION OF TRAFFIC IN GENEVA Va

Vb

Va

Vb

3

Vc

time t+1 Va

Vb

Vb

Va

Va

time t Va

Vc

Fig. 2.1. In the left-hand situation, the vehicle Va has the choice of either of the two cells. Since the cell on its current lane is free, it chooses to move straight. In the middle scenario, vehicle Va sees vehicle Vb in the next cell. Thus, it decides to move to the other track. This change is possible because it does not cause a con ict with another vehicle. This is no longer the case in the right-hand situation in which vehicle Va has to stop.

 Finally the cars move u000i sites ahead.

This rule captures some important behaviors of real trac on a highway: velocity

uctuations due to a non-deterministic behavior of the drivers, and \stop-and-go" waves observed in high density trac regime (i.e. some cars get stopped for no speci c reasons). 2.2. Urban trac model. In this study we are interested in simulating the trac in the city of Geneva, for which a description of the road network is available, as well as an origin-destination matrix. The models presented in the previous section can be adapted and augmented according to the situation to be simulated. Highway trac and city trac may not need the same \microscopic" ingredients. In particular, random velocity variations seem less important in the case of the city trac than they are for highway trac. We shall consider the simpler following model: Each road segment is discretized into cells of constant length (7.5 meters long) to form a one-dimensional CA. In any given cell, there is at most one vehicle. Each vehicle will move if the next cell is free. We consider two possible velocities in order to distinguish between cars driving in the city (v  50km/h) or in the suburbs (v > 50km/h). The slow velocity corresponds to moving to the next adjacent cell, while vehicles with the fastest speed travel two cells during one iteration. More than one lane can be present for a vehicle to travel on a given road in a given direction. A vehicle can change lane if this does not a ect other vehicles, as shown in gure 2.1 which illustrates this rule in the case of a road with two lanes. 2.3. Road junctions. A problem to be solved when modeling trac in a road network is to de ne the behavior of cars at crossings. Here we choose to model any junction as a rotary on which entering and exiting lanes are connected. The junctions are the vertices of the graph representing the city and the connecting road segments are the edges. Figure 2.2 gives an example of two junctions. A rotary is also a 1D CA, with periodic boundary conditions. We have observed that the maximum ow of a rotary is proportional to its length. Thus, in order to adjust the capacity of each junction, the number M of cells in the rotary is chosen

4

A. DUPUIS AND B. CHOPARD

Fig. 2.2. Schematic description of the road junctions of our model. Each crossing is implemented as a rotary. The white arrows show the trac direction; the rotaries are traveled counterclockwise. Dots, squares and crosses indicate the junction, exit and entrance cells, respectively; black arrows show the input or output cells, according to their direction (see section 3). Here, we have = 4. The shaded areas indicate the atomic entities (the cell of the same color will be stored on the same processors).

proportional to the number N of connecting lanes. In our model, we typically take M = N , where = 5. Note that nothing prevents a rotary from being made up of several parallel lanes. The advantage of representing a junction as a rotary is twofold: (i) the rule of motion for road segments or crossing can be implemented in the same way (see section 3). (ii) Vehicles in rotaries always have priority over the other cars; this gives a natural and simple way to deal with concurrency problems in a situation where all car move synchronously; gure 2.3 illustrates these basic priority rules in the smallest possible rotary (M = 4, N = 8). As discussed below, one may add on top of this rotary structure, trac lights, stop signs or priority signs to constrain car interaction. 3. Model implementation. This section gives more detail on the implementation, data structures and domain decomposition we have used in our urban trac simulator. 3.1. Data structure. Most two-dimensional applications using cellular automata implement the space as a regular grid (i.e. a matrix). However if the application is not regular, the previous data structure is clearly unsuitable. With a complex city (e.g. the city of Geneva) a more sophisticated data structure must be implemented. Here we consider a set of one-dimensional CAs interconnected through rings of cells, or rotaries (see gure 2.2). The rst data structure candidate is certainly a list, due to its exibility and intrinsic nature to model interconnected cells. However, as we are interested in an ecient implementation, we consider here a version in which all the cells are gathered in a large indexed vector, and such that each cell owns an index to its successors

5

PARALLEL SIMULATION OF TRAFFIC IN GENEVA Vc 2

Vd

Vc

Ve

1

3

4

Va

time t0

Vb

Vd

2

1

Ve

Va

3

Vb

4

time t1

. The smallest possible rotary and the \natural" priority rule. On the left: at time

Fig. 2.3

t0 , vehicle Va checks cell number 4 and the adjacent position, cell number 3. Similarly, vehicle Vd checks location 2 and 3. On the right: at time t1 , vehicle Va can move but not vehicle Vd because Ve is occupying place 2.

and predecessors. In this way, the neighborhood of each cell is readily reachable and the whole data structure can be explored in random access. Of course, preliminary work is necessary to compute indexes. Road cells that are adjacent along a given lane are labeled by consecutive indexes. At a crossing, or a lane change, a jump in the index space is generally required. Note that rotary cells are included in the same data structure, according to the same rule. The advantage of the above representation is that the non-regular geometry of the city is avoided (any complicated city is coded as a simple vector). Thus, a simple sweep is enough to visit all the cells and there is no unused allocated memory. In the next two sections we describe in more detail the structure of the cells of the automaton. They can be of di erent, according to their functionality: in addition to the regular cells, we shall introduce junction cells, exit and entrance cells, and, nally, input and output cells (see gure 2.2). 3.2. Description of the cells. Each cell of the road network, whether belonging to a rotary or a regular lane, has a list of successors and a list of predecessors. The successor cells are the possible destinations of the vehicle at the next time step. The reason we may have several successors is because a vehicle may jump to another lane when two or more lanes are present, or, in a junction, because the vehicle may exit at some speci c locations or keep turning in the rotary. For the same reason, a cell may have more than one predecessors because it can possibly receive cars from several cells. In our implementation, we introduce priority successors and priority predecessors. A priority successor is a destination cell to which the car can move provided that this destination cell is not occupied, i.e. it has priority over all other car. Similarly priority predecessor cells indicate the cell from which a car is preferentially accepted. On the other hand, non-priority successors can only be reached provided that they are free and that no other vehicle is also planning to move there. Therefore, the motion to a non-priority successor cell cs is conditioned on the checking of the priority predecessor of cs . If several cars want to move to cs from the non-priority predecessors, no motion at all is allowed.

6

A. DUPUIS AND B. CHOPARD

A rotary makes it natural to de ne priority successors or priority predecessors. In this way, the last cell of an entering lane connected to a rotary is a special cell, called a junction cell (see g 2.2), which has no priority successor. Junction cells may also be equipped with stop signals or trac lights. In the case of a stop signal, we impose on the car a waiting time t = t0 + tr where t0 is xed (1 second) and tr is a random time in the range [0; 2] seconds. After the waiting time t, the above priority rule applies. In the case of trac lights, we use two states (green and red) indicating the permission to go in (green) or the need to stop before the junction and wait (red). We also de ne exit cells as the cells of a rotary where the cars can leave the crossing, and entrance cells as the cells of a road segment where the cars leaving a rotary will arrive. Finally, we introduce two other types of special cells: the input and output cells. These cells are intended to respectively inject and remove vehicles in the network and, thus, to implement boundary conditions. Input cells are cells that have a non-priority predecessor which, at time t, contains a car with probability p(t). The explicit time dependence is useful to implement non-homogeneous trac load. Likewise, output cells are cells having a successor which is alway free. 3.3. Parallelization. In our parallel, distributed memory implementation, the road network is partitioned across many processors. Therefore, the successor of some cells may be stored in a remote processor. In order to distribute the network on the various processors, we de ne an atomic entity, or building block, as a junction with its entrance roads. Such atomic entities are shown in gure 2.2 in gray. The whole road network is then the assembly of these building blocks. All the cells of building block are always stored in the same processor and never split across the system. A car will jump to another processor only when taking those particular rotary exit cells having a remote successor. In other words, Inter-processor communications are needed to transfer a vehicle from an exit to an entrance cell that are not located in the same processor. However, from a logical point of view, this motion is similar to that occurring between regular cells. The processing of remote successors and remote predecessors is just a bit more complicated: at the end of a motion step, each processor communicates with all the others by sending the corresponding vehicles and their attributes to the processors owning the destination cells. In this all-to-all personalized communication, the vehicles are packed according to the target processor, so as to avoid the latency time of sending too many messages. Once this communication has completed, the destination processors reply to the source processors to indicate which cars can be accepted (i.e. whether the remote successor is empty). According to the way our building blocks are de ned and distributed across the processors, the network topology is such that no entrance cell has more than one predecessor. Thus, one communication round is enough to implement the inter-processor motion.

4. Network and Trac Load. 4.1. The case of the city of Geneva. The road network of the city of Geneva

is shown in gure 4.1. For the sake of visualization, the network is split into eight parts. In the city center almost all streets are taken into account whereas, in the surroundings, only important commuting roads are represented. The full network comprises 3145 road segments and 1066 junctions for a total of about 4000 km. After

7

PARALLEL SIMULATION OF TRAFFIC IN GENEVA area 5 area 6

area 2 area 1

area 4 area 3

area 8

area 7

. The city of Geneva road network in eight areas.

Fig. 4.1

discretization, there are 560886 road cells, 3145 entrance and exit cells and 538 input or output cells. 4.2. Vehicle routing. In a typical trac problem, every vehicle follows a given path which is prescribed by the so-called Origin-Destination (OD) matrix. This matrix contains, for each possible pair (A; B ) of locations in the network (implemented as input and output cells), the average number of cars traveling from A to B during the rush hours. A realistic OD matrix is known for the case of Geneva. It contains 49423 entries and 85055 cars. However, the OD matrix has no information on the path followed by each vehicle. This path is de ned by the sequence of crossroads the vehicle travels. Therefore, a vehicle can be identi ed by its trip number and the crossroad it is currently on. We consider a static routing for the vehicles. This is a result of the following hypothesis: each driver has a favorite path which changes only if there is an accident or an unusual delay. In a usual trac situation, this does not happen and we assume that each vehicle follows the same route every day. Finding every route for about 50000 OD pairs is an important problem, which is at the heart of trac planning. Several sophisticated methods have been envisaged [1,

8

A. DUPUIS AND B. CHOPARD 4500 4000 3500

Flow [v/1.5h]

3000 Iteration 1 Iteration 10

2500 2000 1500 1000 500 0 0

500

1000

1500 2000 Road

2500

3000

3500

Fig. 4.2. Representation of the sorted occupation histogram after one iteration (solid line) and after ten iterations (dashed line).

14, 16, 17]. A basic principle that is used is that all paths going from A to B that are actually used take the same amount of time (Nash equilibrium). Traditionally, this \equilibrium" is obtained through an iterative method in which trac is assigned, link costs are evaluated and trac is re-assigned accordingly until convergence. Here our goal is not to solve this problem exactly for the case of Geneva but, rather, to have a credible set of paths in order to validate the behavior of our simulator in a real size problem. Thus, instead of using the simulator itself to determine the best paths, we consider a simple implementation of the Nash equilibrium principle. We rst use the Dijkstra algorithm [18] to nd the shortest way to go from an origin node to a destination node through the graph representing the road network. The initial weight of each edge is taken as the time needed to travel the corresponding road segment, assuming free motion, i.e. the length divided by the speed limit. The result of this simple approach is to concentrate all the trac on a few roads, leaving many of them free. For this reason, we consider an iterative algorithm which will scatter the trac ow over the roads. Each iteration tries to nd the shortest way as before but also computes the occupation histogram of each road. With this histogram, we compute a distance penalty for each road (see below). Virtually, after every iteration, the road lengths expand or reduce according to the distance penalty. Finally, after a few iterations (10 in our case), the procedure converges. Note that several paths can be obtained between the same origin-destination pair. To illustrate the necessity of our improvement over the plain Dijkstra method, gure 4.2 presents the sorted occupation histogram after one and ten iterations. According to the Nash equilibrium principle, the distance penalty should be set to the travel time which, in turns, depends on the local trac. Using a simple congestion model [19], the average time < t > to travel a road segment is given by < t >=< N > =j , where < N > is the average number of cars on the road and j the trac ow given by j = v. The quantity  =< N > =L is the average density along the road segment and v the average car velocity. In CA models, a relation can be found [20, 19] between v and , thus giving a way to assign a travel time to all road segments. Note that in these relations, the process of queue built-up is taken into account.

PARALLEL SIMULATION OF TRAFFIC IN GENEVA

9

insertion probability

p2

p1 time I 3

0

2 I 3

I

Fig. 4.3. Representation of the probability distribution used during all of the measures for inserted vehicles.

4.3. Routing to the next crossing. When the vehicles reach a crossing, they enter the rotary and select the appropriate exit, as speci ed by the routing procedure of the previous section. This is realized as follows. In our model, each car is characterized by a path number and each junction is labeled. A centralized table gives, for each path number and current junction label, what is the next junction to reach. The rotary cells contain the current junction label and the particular cells that have a successor which exits from the rotary have an indication of the next junction that can be reached through this successor. With this additional data structure, the routing of a vehicle from one junction to the next can be obtained easily. 4.4. Vehicle insertion. The OD matrix gives the number of cars traveling from

A to B during the rush hour. However, it contains no information on the departure

time of each vehicle. Here, we assume that the vehicles leave their origin locations according to the peaked distribution shown in gure 4.3. This choice is arbitrary, but it is simple and reasonable. We also assume that the insertion period I is 45 minutes so that almost all cars have reached their destination after 90 minutes, which is the estimated duration of the rush hour in Geneva. Finally, the factor , giving the relation between the two insertion probabilities, p1 and p2 (see g. 4.3) is set to 6, on the basis that the trac pattern generated with this distribution seems realistic. Note again that we are not interested in making trac predictions for the city of Geneva but, rather, to extract generic properties of trac in a large city and demonstrate that our approach is relevant provided that real data are available. At each origin point A on the network, p1 and p2 are determined with the additional condition that the number NAB of cars leaving A for a destination B (known from the OD matrix) is given by Z

I

dt pAB (t) = NAB

Thus, at time step t (time-of-day), a car leaves origin A for destination B with prob-

10

A. DUPUIS AND B. CHOPARD Trip 2

35

30

Travel time [minutes]

Travel time [minutes]

30

25

20 Average travel time 15

10

0

Trip 3

35

25

20 Average travel time

15

10

5

10

15 20 25 30 35 Departure time [minutes]

40

45

0

5

10

15 20 25 30 35 Departure time [minutes]

40

45

Fig. 5.1. Expectation time and \risk" of two di erent trips. The horizontal axis corresponds to the departure time of a test vehicle within interval I . The dashed line shows the average driving time and the shaded region indicates the amplitude of the variation of this time (computed as the standard deviation). Note that the times shown here are quite realistic, thus giving an indirect validation of our simulations for the case of Geneva.

ability pAB (t). One nds

p1 = I 3(N+AB2)

;

AB p2 = I3(N  + 2)

5. Simulation results. Due to the lack of data concerning the real evolution of the trac state in the city of Geneva, we focused on the problem of measuring the time necessary for a test car to travel from a given origin A to a given destination B . This time is of direct interest to the drivers because it determines, for instance, when they must leave their house in order to be on time at their work. This is also a quantity which is easily comparable with the reality by actually driving from A to B . The interesting fact is that the travel time is a uctuating quantity. If one repeats the same trip under the same condition (for instance the next day, at the same time), the drive is likely to be longer or shorter. This fact is well known from everyday experience and is also well reproduced in the CA model because the probability distribution of the departure times gives the necessary randomness to produce uctuations when the simulation is repeated. Our main result is that the amplitude of the variations of the travel times depends very much on the departure time of the test car and on its trip. In the simulations, we studied several trips corresponding to di erent trac situations (crossing the city center, around the city, etc.) The measured times obtained from the simulation for two of these trips (labeled trips 2 and 3) are shown in gure 5.1. These times are in good agreement with real trac measurements made by actually driving along the prescribed routes. For trip 3, the average time needed to reach the desired destination is not constant: it is maximal if the driver leaves 15 to 20 minutes after the start of the rush hour. It is minimal if the driver leaves at the very beginning or the very end of interval I . On the other hand, the average time for trip 2 is quite stable. These two situations di er by the fact that trip 3 uses heavily loaded sections with many crossings while trip 2 uses higher capacity sections. We also observe that, for trip 3, it is impossible to make accurate predictions on the time needed to reach the destination point. Variations up to 30% show up. We

PARALLEL SIMULATION OF TRAFFIC IN GENEVA

11

50 49

average speed [km/h]

48 47 46 45 44 43 42 41 0

1

2

3 car density [%]

4

5

6

Fig. 5.2. Dynamical ow diagram for p2 =p1 = 6. As time goes on (t 2 [0; I ]), the car density rst increases and the upper branch of the diagram is formed; then, when the density decreases, the lower branch is measured.

call this variation the risk1 associated with the trip (for a given departure time) to describe the fact that an expected outcome is likely not to occur. In practice, for trip 3, in which the variation is high, there is a large risk to arrive late at destination, or to be too early, which may not be acceptable either. This also means that it is not possible to establish an accurate schedule for taxis or public transportation, unless dedicated lanes are available. Finally, gure 5.2 shows the dependence of < v >, the average car velocity in the network, as a function of the average car density . Since the trac load is not stationary but concentrated within about one and a half hour, the steady-state density-velocity diagram is no longer valid and must be replaced by a \dynamic" diagram which shows a signi cant hysteresis. 6. Parallel performance. In this section we study the run time performance of our trac simulator. We propose a performance model which gives the execution time in terms of the number of processors p and number of cars n. 6.1. Algorithm. Each iteration of the simulation is composed of three phases: (a) an injection stage in which random numbers decide, as explained in section 4.4, whether or not a vehicle is introduced at the selected input cells; (b) a motion stage where the vehicle are moved according to the rules of motion; (c) an output stage, where the output cells are scanned for removing the vehicles which, according to their travel plan, have reached their destination. All these phases allow for parallel processing. Synchronization of the processors is required for the motion phase. In a cellular automata system, one usually updates the state of each cell. However, in our case, the number of cars n is much less than the number of cells composing the network (typically of factor of 10). Therefore, a systematic visit of all the cells should be avoided to increase performance. Thus our updating scheme focuses on the cars rather than on the cells. With this improvement, an increase of performance of 200% has been measured. We have implemented a location bu er which indicates, at every iteration where the vehicles are located. Since vehicles may enter or leave each processor during the simulation, the location bu er has not the same size at each iteration. The location bu er is implemented as a reallocatable bu er, with a garbage collection process consisting in 1

In nance, the term risk is also used to describe the standard deviation of a random quantity.

12

A. DUPUIS AND B. CHOPARD

recording the location of the rst free place in the bu er. When the rst free place is out of the bu er a reallocation occurs. 6.2. Static load balancing. Here we choose to partition the network statically among the processors so that each of them owns approximately the same number of road cells. The problem is to distribute evenly each junction with its entry roads (our atomic building blocks). In a preliminary phase, the building blocks are sorted according to their size and sent to the processors, in a round robin manner. In this way, we obtain a random distribution of the graph on the processors. The non-uniformity of the trac load and the fact that vehicles move across the network is another source of load imbalance that could be reduced only through a dynamic load balancing scheme. However, such a dynamic data repartitioning may require a global knowledge of the network and may be costly to implement due to the irregular nature of the system. Fortunately, with the above static domain decomposition, we observe that the vehicles are, on average well distributed among the processors and that, as explained below, the overhead induced by the non-local communications are not so expensive. 6.3. Theoretical performance model. Traditionally, the performance of a parallel program is expressed through the relation between the number of processors and the execution time (such as speedup or eciency curves). Here, we are rather interested in obtaining an analytical expression for the execution time and then check whether the measured performance ts the theoretical model. Our study concerns an IBM SP2 machine with 14 processing elements. Such an expression can be derived by considering the di erent stages of the simulation. We shall assume a steady state situation in which no new vehicle enters the network and no vehicle leaves it. Due to our load balancing algorithm, we shall also assume that the work is equally shared by all p processors. The di erent stages that are taken into account are:  Vehicle motion: supposing that the vehicles are perfectly distributed (n=p vehicles in each processor), the execution time of the motion phase is Tmotion = n T1 , where T1 is the processing time for one car. p  Communications: We call Nexit , the total number of exit cells. We assume that they are evenly distributed so that there are Nexit =p exit cells per processor. For each p ? 1 remote processor, a message is built with all vehicles bound for that processor. There is, on average, a fraction 1=p of the Nexit =p exits for each remote processor. The communication takes p ? 1 steps: during step k, all processors Pi send the corresponding message to processor Pi+k , where the processor index is de ned modulo p. A response message is then sent to all originating processor to indicate which vehicles are accepted in the entrance cells. The total time for this communication is then Tcomm = (p ? 1)( Npexit 2 T2 + f ). Here T2 is the time necessary to send back and forth the information related to the car in each exit cell and f is the latency time for the two messages. The above expression is an upper bound for the communication time. Strictly speaking, it should be weighted by the number of cars that actually move from one processor to another. Indeed, in our implementation, no message is sent if no vehicle is present in an exit cell. This requires to send a preliminary message to inform all processors of the number of messages they should expect. Therefore, the quantity f also includes the time for a xed message length.

PARALLEL SIMULATION OF TRAFFIC IN GENEVA

13

In summary, the total execution time can then be expressed as

T (p; n) = np T1 + (p ? 1)( Npexit 2 T2 + f )

When performing performance measurements, we observe a superlinear speedup, probably due to cache e ects. This means that Tcomp (p; n) < Tcomp (1; n)=p. We thus has to modify the above expression of Tcomp as

Tcomp = np Tin + (1 ? ) np Tout

where is the probability that the value to be processed is in the cache of the processor, Tin and Tout are the processing times associated with the in- and out-cache data, respectively. Assuming that, for small values of p, the probability of being in the cache increases linearly with p, we write = Cp and we obtain

Tcomp = np Tout ? (Tout ? Tin )Cn

Thus, in a more compact form, the run time complexity reads (6.1) T (p; n) = a(p)n + b(p) with a(p) = pc ? d (6.2) and (6.3)

b(p) = p p?2 1 e + (p ? 1)f

where c = Tout, d = (Tout ? Tin )C , e = T2 Nexit and f are constant values. 6.4. Performance t and discussion. In order to use the model described in equation (6.1), one needs to know the values c, d, e and f . These quantities are obtained from actual performance measurements. For di erent numbers of processors, we measure the time T (p; n) of one time step (i.e. the time to move n vehicles). Figure 6.1 illustrates the behavior of T (p; n) for various numbers of processors. Note the quasi-linear dependence of T (p; n) on n. From the slope a(p) and intercept b(p) of the least square ts of gure 6.1, we may extract numerical values for c, d, e and f . We obtain c = 1:43 10?5[s] d = 4:81 10?7[s] e = 2:33 10?3[s] f = 1:66 10?4[s] These quantities are machine dependent but we may expect that the form of expression (6.1) also holds for other distributed memory machines. The value we obtain for f is four times larger than the latency time given by the manufacturer, which is compatible with the fact that f corresponds to twice the latency of the communication network, plus the time to send a xed size message. Similarly, e can be related to the network bandwidth w as  8 = 22 106[B=s] w = 2Nexit e Nbytes = 2 2:333145 ? 10 3

14

A. DUPUIS AND B. CHOPARD 0.08

0.07

4 PEs 6 PEs 8 PEs 12 PEs

Time for 1 iteration in [s]

0.06

0.05

0.04

0.03

0.02

0.01

0 0

0.5

1 1.5 Number of vehicles

2

2.5 4

x 10

Fig. 6.1. Relation between the number of vehicles and the time of one iteration for 4, 6, 8 and 12 processors. Measured values are represented by the di erent symbols whereas the solid lines are the relations computed with the equation (6.1).

where Nbytes = 8 is the number of bytes necessary to code a vehicle and the factor of 2 comes from the fact that our expression contains messages in both ways. The e ective bandwidth obtained in this way (22 MB/s) is compatible with the peak bandwidth (40 MB/s) that is expected from the manufacturer speci cations. These results clearly show the validity of our theoretical model and equation (6.1) allows us to extrapolate the execution time for variable numbers of processors and vehicles. 6.5. Simulator speed. A standard way to measure the quality of a trac simulator is to measure the ratio R between the simulated time and the required CPU time. The quantity R indicates how many times the simulator is faster than the reality. In our simulator the cell length is de ned as ` = 7:5 m and the slow velocity corresponds to vs = 50 [km/h]. Thus, the simulated time of one iteration is `=vs = 0:54 second (i.e. the time for a car to jump from one cell to another). The corresponding CPU time for one iteration is T (p; n), as given in equation (6.1). Thus R can be written as :54 R = v T (`p; n) = T 0(p; n) s In table 6.1, the measured ratio R is given, for n = 15000 vehicles and for various number of processors. This value of n is the typical order of magnitude of the number of cars simultaneously present in the city of Geneva during the rush hour. As discussed in the previous section, we observe a superlinear behavior of R as a function of p, due to cache size e ects and taken into account by d in the performance model. The validity of our theoretical model is further con rmed in gure 6.2 where the values of table 6.1 and the expected values from the equation (6.1) are shown together. Figure 6.2 also shows the interpolation of our theoretical predictions to a larger number of processors p, for n = 15000 and n = 25000. We observe that a maximum of performance is reached for p = pmax(n). We have p = 36 for n = 15000 but this result should be taken with care: our performance model makes some simple assumptions on the cache memory behavior which will not scale up to any value of p. However, the fact that the performance decreases for p large enough is generic of our implementation and can be explained as follows: if p > pmax the communication

PARALLEL SIMULATION OF TRAFFIC IN GENEVA

15

number of processors p Ratio (R) 2 4.0 4 10.8 6 19.3 8 28.4 12 41.5 14 49.1 Table 6.1

The ratio R between the simulated time and the CPU time, on the IBM SP2 parallel machine and for 15000 vehicles. 180

160

140

ratio R

120

100

80

Measured n=15000 n=25000

60

40

20

0 0

10

20

30

40

50

60

Number of processor Fig. 6.2. Relation between the number of processors p and the ratio R. The circles are the measured values and the solid and dashed lines are the expected values obtained from equation (6.1).

latency (p ? 1)f of equation (6.1) dominates the execution time. At this stage, it is necessary to consider a more sophisticated partitioning of the road network to avoid non-local communications. The so-called graph partitioning technique [21] would be very appropriate in this problem. On the other hand, the trac load would not be automatically well distributed as it is now by our round robin assignment method. 7. Conclusion. Several cellular automata based microsimulations [14] for trac exist. Although their ability to model real trac is often discussed, their implemention on a computer (parallel or not), as well as their numerical performance, is seldom described (see however [11]). Here we have presented the trac simulator we have developed and applied to the case of the city of Geneva. Due to the lack of precise data, our simulations are not yet expected to match the exact trac situation of Geneva. However, we have reported new and sensible results concerning the uctuations of travel times in a city. We have quanti ed the concept of risk to describe the uncertainty associated with a varying travel time. We have also measured a dynamical ow diagram which contrasts with the usual steady state situation described in many trac models. This paper mainly focuses on the parallel implementation and performance of our simulator written in C++ and using the MPL message passing library of the IBM SP2. Our model is simple but, yet, can deal with any road topology by using the concept of a rotary to represent any type of junction. The data structure consists of an ensemble of one-dimensional CA systems (one per road segment) that are interconnected through the rotaries.

16

A. DUPUIS AND B. CHOPARD

From our data structure, we obtain a natural static domain decomposition, by associating with the processors our atomic CA structures, in a round robin fashion, from the largest to the smallest. In this way, processors may hold several disconnected CAs, which ensure a good average load balancing, in spite of the non-uniform vehicular trac load. The communication overhead remains low because the ratio between the number of exit cells (for instance, at junctions) and the internal cells is low. Also, in our implementation, in order to decrease the latency, we pack in a single message all vehicles bound to the same remote processor. Finally, we present a detailed performance analysis of our simulator. A performance model based on the communication and computation operations is proposed. Actual measurements of the CPU time con rm the theoretical behavior and allow us to assign precise values to the model parameters. From this study, we can compute the time necessary to simulate n vehicles on p processors, for any n and p. The ratio R of the simulated time to CPU time shows that fast trac simulations are possible on a parallel machine. Our values of R compare well with other state of the art trac simulators like, for instance, TRANSIMS [22]. Acknowledgments. We thank the State of Geneva for giving us access to the data used in this study, Mr. Blaise Deriaz for his explanations about them, and the anonymous referees of this paper for useful remarks and suggestions. REFERENCES [1] http://www.spiess.ch/emme2/. [2] I. Prigogine and R. Herman. Kinetic Theory of Vehicular Trac. American Elsevier, New York, 1971. [3] Wilhelm Leutzbach. Introduction to the Theory of Trac Flow. Springer-Verlag, 1988. [4] D.E. Wolf, M. Schreckenberg, and A. Bachem, editors. Trac and Granular Flow. World Scienti c, 1996. [5] G. Duncan, G. Cameron, and S. Druitt. Paramics-mp: a step nearer to vehicle reality. Technical report, Edinburgh Parallel Computing Center, 1995. ftp://ftp.epcc.ed.ak.uk/pub/paramics/papers/vehicular-reality.ps.z. [6] Pierre-Antoine Queloz. Modele de tra c routier et simulateur massivement parallele. Master's thesis, University of Geneva, CUI, 1211 Geneva 4, Switzerland, 1995. [7] K. Nagel and M. Schreckenberg. Cellular automaton model for freeway trac. J. Physique I (Paris), 2:2221, 1992. [8] A. Schadschneider and M. Schreckenberg. Cellular automaton models and trac ow. J. Phys., A(26):L679, 1993. [9] Transims. http://transims.tsasa.lanl.gov. [10] http://trac.comphys.uni-duisburg.de/OLSIM/. [11] M. Rickert. Trac simulation on distributed memory computers. PhD thesis, University of Cologne, Cologne, Germany, 1998. [12] K Nagel, M Rickert, and C L Barrett. Large-scale trac simulations. In J. M. L. M. Palma and J. Dongarra, editors, Vector and Parallel Processing { VECPAR'96, volume 1215 of Lecture Notes in Computer Science, pages 380{402. Springer, 1997. [13] M. Rickert and P. Wagner. Parallel real-time implementation of large-scale, route-plan-driven trac simulation. Int. J. Mod. Phys. C, 7:133, 1996. [14] M. Schreckenberg and D.E. Wolf, editors. Trac and Granular Flow '97. Springer-Verlag, Singapore, 1998. [15] B. Chopard and M. Droz. Cellular Automata Modeling of Physical Systems. Cambridge University Press, 1998. [16] K. Nagel and C.L. Barrett. Using microsimulation feedback for trip adaptation for realistic trac in Dallas. Int. J. Mod. Phys. C, 8(3):483{504, 1997. [17] C. Gawron. An iterative algorithm to determine the dynamic user equilibrium in a trac simulation model. Int. J. Mod. Phys. C, 9(3):393{407, 1998. [18] V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing: design and analysis of algorithm. Benjamin/Cummings, 1994.

PARALLEL SIMULATION OF TRAFFIC IN GENEVA

17

[19] L.G. Tilstra and M.H. Ernst. Synchronous asymmetric exclusion processes. J. Phys. A, 31:5033{ 5063, 1998. [20] B. Chopard, P. O. Luthi, and P.-A. Queloz. Cellular automata model of car trac in twodimensional street networks. J. Phys. A, 29:2325{2336, 1996. [21] B. W. Kernighan and S. Lin. An ecient heuristic procedure for partitioning graphs. The Bell system technical journal, 49(1):291{307, 1970. [22] K. Nagel. private communication.

Suggest Documents