A GPU Based TrafficParallel Simulation Module of

3 downloads 0 Views 1MB Size Report
completes a time step, it must wait for the other processors to complete the ... implemented on a Beowulf cluster and a real time ratio (R TR) .... organize the vehicle data where the head and tail pointers ... turning movement in order to follow the planned path. ... leading to it, every road contains going straight, right-turn, and.
A GPU Based TrafficParallel Simulation Module of Artificial Transportation Systems 23 l2 Kai Wang , , Zhen Shen , ,Member, IEEE 1.

Center for Military Computational Experiments and Parallel Systems Technology, and College of Mechatronics Engineering

2.

State Key Laboratory of Management and Control for Complex Systems, Beijing Engineering Research Center of Intelligent

and Automation, National University of Defense Technology (NUDT), Changsha, Hunan, China. Systems and Technology, Institute of Automation, Chinese Academy of Sciences, Beijing lO0190, China Dongguan Research Institute of CASIA, Cloud Computing Center, Chinese Academy of Sciences, Songshan Lake,

3.

Dongguan 523808, Guangdong Province, China kai.wang [email protected], [email protected] Abstract-Traffic micro-simulation is an important tool in the

traffic activities to analyze the traffic systems. Various indirect

Intelligent Transportation Systems (ITS) research. In the micro­

facilities and activities such as weather, logistic, construction,

simulation,

management, and regulatory should also be modeled since

a

bottom

up

system

can

be

built

up

by

the

interactions of vehicle agents, road agents, traffic lights agents, etc. The Artificial societies, Computational experiments, and Parallel execution (ACP) approach suggests integrating other metropolitan systems such as logistic, infrastructure, legal and regulatory, weather and environmental systems to build an

these factors can affect the drivers' decision-making process and driving behaviors. For the purpose of modeling, analysis, and

control

of

the

traffic

systems,

the

ACP

approach

(Artificial society, Computational experiments, and Parallel

ITS

execution) was proposed, and it has been verified that the

problems. This is reasonable as the transportation system is

application of the ACP approach can significantly enhance

complex that is affected by many systems interacting with each

and improve the reliability and performance of current ITS

Artificial

Transportation

System

(ATS)

to

help

solve

other. However, there is a challenge that the computing burden

technology [1-7]. In ATS, the evolution of the states of the

can be very heavy as there can be many agents of different kinds

vehicle agents is calculated individually which can model the

interacting in parallel in ATS. In recent years, the Graphics Processing Units (GPUs) have been applied successfully in many areas for parallel computing. Compared with the traditional CPU cluster, GPU has an obvious advantage of low cost of hardware and electricity consumption. In this paper, we build a parallel

traffic

simulation

module

of

ATS

with

GPU.

The

simulation results are reasonable and a maximum speedup factor

details of the transportation systems. However, the computing time of evaluating or optimizing the transportation systems based on ATS increases very fast as the road network expands and the number of vehicles increases[8, 9], which reduces its practicability

Keywords-ACP, Artificial TransportationSystem, Traffic Micro­ simulation, GPU, CUDA

T

many

real-time

evaluation

and

fashion and the computational experiments may probably be performed in a parallel way, a parallel computing tool is needed. Graphics Processing Units (GPUs) can be employed to alleviate the computing burden of ATS since it has much

INTRODUCTION

better

floating-point

performance

than

the

CPUs.

David

raffic congestion is a publicly known worldwide problem

Strippgen has verified the effectiveness of GPU for the traffic

that is more and more difficult to solve with the economic

micro-simulation in the case that all the vehicles in the

development, population growth and urbanization around

simulation move with a constant speed [10, 11].

the world. To take full advantage of the limited road resource to

solving

As there are many vehicle agents interacting in a parallel

of 105 is obtained compared with the CPU implementations.

I.

in

optimization problems.

meet

the

increasing

traffic

demand,

the

In this paper, we extend the previous work and provide a

intelligent

GPU based parallel version of traffic micro-simulation module

transportation systems (ITS) have been initialized, developed

of ATS. Based on the work of the same authors [8, 9] of a

and deployed over the last three decades and greatly improve the

transport

safety,

transport

productivity

and

travel

reliability [1]. To

evaluate

various

measurements of

ITS,

traffic

control

and

simple model, we further provide lane changing and the turning path functions to the module.

management

II.

traffic simulation is taken as an A.

important tool as the experiments on the real traffic systems

REVIEW

ACP andATS

are usually very costly or even impossible due to economic or

The ACP approach was originally proposed to model,

policy limitations. In recent years, micro-simulation methods

analyze, and control the complex systems. The ACP approach

such as Cellular Automata (CA) and Multi-Agent Systems

consists of three steps: 1) modeling and representation using

(MAS) are becoming more and more popular since they can

Artificial

describe

Computational

drivers'

reactions

to

different

control

and

management measures of ITS. Moreover, researchers have

2)

analysis

experiments,

3)

and

control

evaluation and

This work is supported in part by NSFC 60921061, 70890084, 90920305, 90924302, 60904057, 60974095,and CAS 2F09N05, 2F09N06, 2FIOE08, 2FIOEIO.

160

by

management

through Parallel execution of real and artificial systems.

realized that it is not enough to focus only on directly related

978-1-4673-2401-4112/$31.00 ©2012 IEEE

socletIeS,

an artificial

connected at the boundary. Moreover, the speed of simulation

transportation system (ATS) in a bottom up fashion. Mirco­

will be governed by the heaviest load bearing processor and

In the ACP approach,

agents can "grow"

simulation is used to describe the interactions between agents

the speed of processor if heterogeneous processors are used,

and macroscopic traffic phenomena can emerge from these

since

interactions and then can be observed and analyzed. Fig. 1

synchronization

illustrates the framework of ATS including traffic, logistic,

balancing is also a critical problem for the parallelization.

construction, management, regulatory, social-economic, and

all

the

processors which

are

means

controlled

that

the

under

time

computing

load

In recent years, several traffic micro-simulation systems

environmental subsystems where the traffic micro-simulation

have

module of the traffic subsystem is the focus of this paper.

TRANSIMS

employed

the

parallel

[12], AIMSUN

techniques

among

[13], PARAMICS

which

[14], and

MATSIM [15] are representatives. TRANSIMS uses a cellular

Traffic Subsystem

automata (CA) technique for representing driving dynamics

Display and Result Analysis Module

and employ the domain decomposition principle to parallelize

Traffic Micro-simulation

the simulation process. The road network is partitioned into

Module

domains of approximately equal size and each CPU of the parallel computer is responsible for one of these domains. It is implemented on a Beowulf cluster and a real time ratio (RTR) of 100 can be obtained which means that one simulates 100

Data Support Module

seconds of the traffic scenario in one second of wall clock

Fig. I.Framework of ATS

In ATS, all agents' travel plans form the traffic demand and

time [13]. AIMSUN uses multi-threads to process blocks of

they are performed in the traffic micro-simulation module to

the road network where the threads can be executed in parallel

constitute the traffic flow of the road network that stands for

and one block can contain one or several intersections and

the traffic supply side. The micro-simulation process is run

their approaches. PARAMICS and MATSIM also employ the

through iterations to get the equilibrium between traffic supply

domain decomposition principle where MATSIM uses queue

and demand. In the first iteration, all agents perform their

simulation algorithm which means that the lanes are modeled

initial plans in the simulation. Then they adjust the plans

as queues. MATSIM was implemented on a Beowulf cluster

according to the traffic situations which are fed back from the

but the results implied that the Ethernet network latencies

simulation to make their travels more efficient. The process is

make it difficult to gain a decent speedup by adding more

repeated through iterations until the equilibrium is obtained. The micro-simulation module is the core of the traffic subsystem

which

is

also

the

most

time-consuming

one.

clusters [10, 11]. Therefore, researchers have tried to rebuild MATSIM on a single computer withGPU and a speedup of 67 compared

with

highly

optimized

Java

CPU

version

is

Therefore, it is critical to parallelize this module to increase

ultimately obtained [10, 11]. However, the agents' driving

the whole execution speed of ATS.

behaviors such as route planning, car following, lane changing,

B.

and day-to-day travel plan adopting (agent learning) are not

Traffic Parallel Micro-simulation Systems

taken into consideration in the work.

In the micro-simulation module, vehicles are modeled as

GPU and CUDA

agents and their movements on the road network of ATS are

C.

computed individually. For a large-scale road network, the

We give a brief introduction to GPU and Compute Unified

number of agents in ATS can be up to millions, which makes

Device Architecture (CUDA)following the previous work of

parallel

the same authors [ 8 , 9]. GPU is a specialized circuit originally

computing techniques are necessary to be introduced to

designed to offload graphics tasks from the CPU with the

increase the execution speed.

intention of performing them faster than the CPU can do. In a

the

computing

burden

very

heavy.

Therefore,

Currently, the most commonly used method of parallelizing

personal computer, GPU usually appears on the video card or

the traffic micro-simulation systems is to decompose the

the mother board. Usually it has excellent floating point

traffic network and simulate the divided portions on different

performances with many cores working together to draw

processors of a mUlti-processor computer system separately

triangles and polygons on the screen. Because of this, people

and

began to use it for parallel scientific computing. To facilitate

simultaneously.

In

the

framework

of

this

domain different

development with GPU, NVIDIA developed CUDA with

portions such as time synchronization and vehicle transfers at

which people can program with high-level languages such as

boundaries should be maintained by the system to ensure the

C, C++ and Fortran.

decomposition

principle,

the

linkage

between

correctness of the simulation results of the whole road network.

In hardware, a GPU has many cores working together to

Time synchronization means that all the portions should be

realize parallel computing. The cores are called Streaming

controlled by one simulation clock. When one processor

Processors (SP), and several cores

completes a time step, it must wait for the other processors to

organized into a Streaming Multi-processor (SM).Each SM

(8

or 32 typically) are

complete the time step before executing the next one. Vehicle

has its own memory called shared memory that all threads in it

transfer means that vehicles reaching the boundary of a region

can access simultaneously, and all SMs share the global

must

memory, constant memory and texture memory ofGPU.

re-appear

in

another

portion

of

the

road

network

161

In software, a typical GPU program consists of two parts:

In this paper, we use a double-loop flow to implement the

one part is the CPU codes that control the process of the whole

traffic micro-simulation module of ATS: the outer loop is the

program and does the sequential work, and the other is the

learning iteration and the inner loop is the simulation process.

GPU part that does the parallel work. A function that executes

As Fig. 3 shows, the overall program flow are controlled by

on the GPU is typically called a "kernel". When a kernel is

the CPU and the iteration counter and the simulation clock are

launched, multiple threads on GPU organized by two levels

also refreshed at the CPU side while the states of the agents

are activated. The top level is called "grid" and the other is

are updated in parallel at the GPU side. As the travel activity

called "block". One grid can consist of at most 65535x65535

planning module is out of the scope of this paper, we assume

blocks and each block can consist of at most 512 or lO24

that every agent has only one trip in the simulation scenario to

threads.

simplify the simulation process.

Then the grid is allocated to GPU for parallel

computing with blocks allocated to different SMs in the GPU

CPU CODES

and threads allocated to different SPs in the SM. Fig. 2 illustrates the basic principle of the GPU parallel computing. CPUCODES

GPUCODES

Serial execution

Parallel Computing Active threads organized

1

by Grid and Blocks

I

..

/ ,/

Grid

Block

Block

Block

Block

(0,1)

(00) ,

(1,1)

(10) , "" ,

LauncH Kernel + Copy Data

.....,,/

(0,2)

"

"

"

'"

Thread

Thread

Thread

Thread

Thread

(01) ,

(1.1)

(1O)

'.

Block

""' (1,2) ,,

Thread (00) ,

� � '"

Block

)

(0,2)

(12)

I I

Re-plan routes based on new

,

. data �'�======�=====?� LO' Copymg , GPU memory ,

"

,

,

\ \ I ,

weighls oflhe roads

N �-

Suggest Documents