A GPU Based TrafficParallel Simulation Module of Artificial Transportation Systems 23 l2 Kai Wang , , Zhen Shen , ,Member, IEEE 1.
Center for Military Computational Experiments and Parallel Systems Technology, and College of Mechatronics Engineering
2.
State Key Laboratory of Management and Control for Complex Systems, Beijing Engineering Research Center of Intelligent
and Automation, National University of Defense Technology (NUDT), Changsha, Hunan, China. Systems and Technology, Institute of Automation, Chinese Academy of Sciences, Beijing lO0190, China Dongguan Research Institute of CASIA, Cloud Computing Center, Chinese Academy of Sciences, Songshan Lake,
3.
Dongguan 523808, Guangdong Province, China kai.wang
[email protected],
[email protected] Abstract-Traffic micro-simulation is an important tool in the
traffic activities to analyze the traffic systems. Various indirect
Intelligent Transportation Systems (ITS) research. In the micro
facilities and activities such as weather, logistic, construction,
simulation,
management, and regulatory should also be modeled since
a
bottom
up
system
can
be
built
up
by
the
interactions of vehicle agents, road agents, traffic lights agents, etc. The Artificial societies, Computational experiments, and Parallel execution (ACP) approach suggests integrating other metropolitan systems such as logistic, infrastructure, legal and regulatory, weather and environmental systems to build an
these factors can affect the drivers' decision-making process and driving behaviors. For the purpose of modeling, analysis, and
control
of
the
traffic
systems,
the
ACP
approach
(Artificial society, Computational experiments, and Parallel
ITS
execution) was proposed, and it has been verified that the
problems. This is reasonable as the transportation system is
application of the ACP approach can significantly enhance
complex that is affected by many systems interacting with each
and improve the reliability and performance of current ITS
Artificial
Transportation
System
(ATS)
to
help
solve
other. However, there is a challenge that the computing burden
technology [1-7]. In ATS, the evolution of the states of the
can be very heavy as there can be many agents of different kinds
vehicle agents is calculated individually which can model the
interacting in parallel in ATS. In recent years, the Graphics Processing Units (GPUs) have been applied successfully in many areas for parallel computing. Compared with the traditional CPU cluster, GPU has an obvious advantage of low cost of hardware and electricity consumption. In this paper, we build a parallel
traffic
simulation
module
of
ATS
with
GPU.
The
simulation results are reasonable and a maximum speedup factor
details of the transportation systems. However, the computing time of evaluating or optimizing the transportation systems based on ATS increases very fast as the road network expands and the number of vehicles increases[8, 9], which reduces its practicability
Keywords-ACP, Artificial TransportationSystem, Traffic Micro simulation, GPU, CUDA
T
many
real-time
evaluation
and
fashion and the computational experiments may probably be performed in a parallel way, a parallel computing tool is needed. Graphics Processing Units (GPUs) can be employed to alleviate the computing burden of ATS since it has much
INTRODUCTION
better
floating-point
performance
than
the
CPUs.
David
raffic congestion is a publicly known worldwide problem
Strippgen has verified the effectiveness of GPU for the traffic
that is more and more difficult to solve with the economic
micro-simulation in the case that all the vehicles in the
development, population growth and urbanization around
simulation move with a constant speed [10, 11].
the world. To take full advantage of the limited road resource to
solving
As there are many vehicle agents interacting in a parallel
of 105 is obtained compared with the CPU implementations.
I.
in
optimization problems.
meet
the
increasing
traffic
demand,
the
In this paper, we extend the previous work and provide a
intelligent
GPU based parallel version of traffic micro-simulation module
transportation systems (ITS) have been initialized, developed
of ATS. Based on the work of the same authors [8, 9] of a
and deployed over the last three decades and greatly improve the
transport
safety,
transport
productivity
and
travel
reliability [1]. To
evaluate
various
measurements of
ITS,
traffic
control
and
simple model, we further provide lane changing and the turning path functions to the module.
management
II.
traffic simulation is taken as an A.
important tool as the experiments on the real traffic systems
REVIEW
ACP andATS
are usually very costly or even impossible due to economic or
The ACP approach was originally proposed to model,
policy limitations. In recent years, micro-simulation methods
analyze, and control the complex systems. The ACP approach
such as Cellular Automata (CA) and Multi-Agent Systems
consists of three steps: 1) modeling and representation using
(MAS) are becoming more and more popular since they can
Artificial
describe
Computational
drivers'
reactions
to
different
control
and
management measures of ITS. Moreover, researchers have
2)
analysis
experiments,
3)
and
control
evaluation and
This work is supported in part by NSFC 60921061, 70890084, 90920305, 90924302, 60904057, 60974095,and CAS 2F09N05, 2F09N06, 2FIOE08, 2FIOEIO.
160
by
management
through Parallel execution of real and artificial systems.
realized that it is not enough to focus only on directly related
978-1-4673-2401-4112/$31.00 ©2012 IEEE
socletIeS,
an artificial
connected at the boundary. Moreover, the speed of simulation
transportation system (ATS) in a bottom up fashion. Mirco
will be governed by the heaviest load bearing processor and
In the ACP approach,
agents can "grow"
simulation is used to describe the interactions between agents
the speed of processor if heterogeneous processors are used,
and macroscopic traffic phenomena can emerge from these
since
interactions and then can be observed and analyzed. Fig. 1
synchronization
illustrates the framework of ATS including traffic, logistic,
balancing is also a critical problem for the parallelization.
construction, management, regulatory, social-economic, and
all
the
processors which
are
means
controlled
that
the
under
time
computing
load
In recent years, several traffic micro-simulation systems
environmental subsystems where the traffic micro-simulation
have
module of the traffic subsystem is the focus of this paper.
TRANSIMS
employed
the
parallel
[12], AIMSUN
techniques
among
[13], PARAMICS
which
[14], and
MATSIM [15] are representatives. TRANSIMS uses a cellular
Traffic Subsystem
automata (CA) technique for representing driving dynamics
Display and Result Analysis Module
and employ the domain decomposition principle to parallelize
Traffic Micro-simulation
the simulation process. The road network is partitioned into
Module
domains of approximately equal size and each CPU of the parallel computer is responsible for one of these domains. It is implemented on a Beowulf cluster and a real time ratio (RTR) of 100 can be obtained which means that one simulates 100
Data Support Module
seconds of the traffic scenario in one second of wall clock
Fig. I.Framework of ATS
In ATS, all agents' travel plans form the traffic demand and
time [13]. AIMSUN uses multi-threads to process blocks of
they are performed in the traffic micro-simulation module to
the road network where the threads can be executed in parallel
constitute the traffic flow of the road network that stands for
and one block can contain one or several intersections and
the traffic supply side. The micro-simulation process is run
their approaches. PARAMICS and MATSIM also employ the
through iterations to get the equilibrium between traffic supply
domain decomposition principle where MATSIM uses queue
and demand. In the first iteration, all agents perform their
simulation algorithm which means that the lanes are modeled
initial plans in the simulation. Then they adjust the plans
as queues. MATSIM was implemented on a Beowulf cluster
according to the traffic situations which are fed back from the
but the results implied that the Ethernet network latencies
simulation to make their travels more efficient. The process is
make it difficult to gain a decent speedup by adding more
repeated through iterations until the equilibrium is obtained. The micro-simulation module is the core of the traffic subsystem
which
is
also
the
most
time-consuming
one.
clusters [10, 11]. Therefore, researchers have tried to rebuild MATSIM on a single computer withGPU and a speedup of 67 compared
with
highly
optimized
Java
CPU
version
is
Therefore, it is critical to parallelize this module to increase
ultimately obtained [10, 11]. However, the agents' driving
the whole execution speed of ATS.
behaviors such as route planning, car following, lane changing,
B.
and day-to-day travel plan adopting (agent learning) are not
Traffic Parallel Micro-simulation Systems
taken into consideration in the work.
In the micro-simulation module, vehicles are modeled as
GPU and CUDA
agents and their movements on the road network of ATS are
C.
computed individually. For a large-scale road network, the
We give a brief introduction to GPU and Compute Unified
number of agents in ATS can be up to millions, which makes
Device Architecture (CUDA)following the previous work of
parallel
the same authors [ 8 , 9]. GPU is a specialized circuit originally
computing techniques are necessary to be introduced to
designed to offload graphics tasks from the CPU with the
increase the execution speed.
intention of performing them faster than the CPU can do. In a
the
computing
burden
very
heavy.
Therefore,
Currently, the most commonly used method of parallelizing
personal computer, GPU usually appears on the video card or
the traffic micro-simulation systems is to decompose the
the mother board. Usually it has excellent floating point
traffic network and simulate the divided portions on different
performances with many cores working together to draw
processors of a mUlti-processor computer system separately
triangles and polygons on the screen. Because of this, people
and
began to use it for parallel scientific computing. To facilitate
simultaneously.
In
the
framework
of
this
domain different
development with GPU, NVIDIA developed CUDA with
portions such as time synchronization and vehicle transfers at
which people can program with high-level languages such as
boundaries should be maintained by the system to ensure the
C, C++ and Fortran.
decomposition
principle,
the
linkage
between
correctness of the simulation results of the whole road network.
In hardware, a GPU has many cores working together to
Time synchronization means that all the portions should be
realize parallel computing. The cores are called Streaming
controlled by one simulation clock. When one processor
Processors (SP), and several cores
completes a time step, it must wait for the other processors to
organized into a Streaming Multi-processor (SM).Each SM
(8
or 32 typically) are
complete the time step before executing the next one. Vehicle
has its own memory called shared memory that all threads in it
transfer means that vehicles reaching the boundary of a region
can access simultaneously, and all SMs share the global
must
memory, constant memory and texture memory ofGPU.
re-appear
in
another
portion
of
the
road
network
161
In software, a typical GPU program consists of two parts:
In this paper, we use a double-loop flow to implement the
one part is the CPU codes that control the process of the whole
traffic micro-simulation module of ATS: the outer loop is the
program and does the sequential work, and the other is the
learning iteration and the inner loop is the simulation process.
GPU part that does the parallel work. A function that executes
As Fig. 3 shows, the overall program flow are controlled by
on the GPU is typically called a "kernel". When a kernel is
the CPU and the iteration counter and the simulation clock are
launched, multiple threads on GPU organized by two levels
also refreshed at the CPU side while the states of the agents
are activated. The top level is called "grid" and the other is
are updated in parallel at the GPU side. As the travel activity
called "block". One grid can consist of at most 65535x65535
planning module is out of the scope of this paper, we assume
blocks and each block can consist of at most 512 or lO24
that every agent has only one trip in the simulation scenario to
threads.
simplify the simulation process.
Then the grid is allocated to GPU for parallel
computing with blocks allocated to different SMs in the GPU
CPU CODES
and threads allocated to different SPs in the SM. Fig. 2 illustrates the basic principle of the GPU parallel computing. CPUCODES
GPUCODES
Serial execution
Parallel Computing Active threads organized
1
by Grid and Blocks
I
..
/ ,/
Grid
Block
Block
Block
Block
(0,1)
(00) ,
(1,1)
(10) , "" ,
LauncH Kernel + Copy Data
.....,,/
(0,2)
"
"
"
'"
Thread
Thread
Thread
Thread
Thread
(01) ,
(1.1)
(1O)
'.
Block
""' (1,2) ,,
Thread (00) ,
� � '"
Block
)
(0,2)
(12)
I I
Re-plan routes based on new
,
. data �'�======�=====?� LO' Copymg , GPU memory ,
"
,
,
\ \ I ,
weighls oflhe roads
N �-