specifications. The rationale of this choice is that JSLEE application servers are currently regarded as very promising candidates for deploying telecom services.
12th IFIP/IEEE 1M
2011: Mini Conference
Performance Management of Java-based SIP Application Servers Mauro Femminella, Emanuele Maccherani, Gianluca Reali Department of Electronic and Information Engineering, University of Perugia, Perugia, Italy {mauro.femminella,emanuele.maccherani,gian1uca.reali}@diei.unipg.it
standardized technologies and APIs, Java simplifies the service
Abstract- Within the activities of the Java APIs for Integrated Networks (JAIN), the Java community offers a set of standard
implementation.
frameworks
industry standard for implementing enterprise-class services. In
and
open
telecommunications
protocol
to
addition, two further Java application frameworks, namely Java
stack,
the
works
The Java Enterprise Edition (EE) is the
APIs for Integrated Networks (JAIN) Server Logic Execution
(SIP)
recent
advanced have
Protocol
However,
create
pointed out that Java-based implementations of the Session Initiation
services.
APIs
fundamental
signalling
Environment
protocol in the convergent telephone-IP world, perform poorly if
deploying
protocol semantics with the Java language features, which does
most widespread solution for implementing servers in data centres, this performance issue needs to be solved urgently.
consists of the joint usage of virtualization and parallelization extensive
creating,
services
multimedia services, have severe scaling problems in large
implemented by using Java-based SIP application servers. It an
allow
telecom
multi-core architectures. Since they currently represent the
improve the throughput and signalling latency of applications
performed
Servlets,
facto standard signalling protocol for voice over IP (VoIP) and
not allow fully exploiting the computing capabilities of multi-core architectures. To face this problem, we propose a solution to
have
SIP
advanced
However, a recent paper
data centres. The problem lies in the combination of the SIP
We
and
managing
[1]. [2] shows that Java implementations of Session Initiation Protocol (SIP [10]) stack, which is the de
they are executed by large multi-core servers, typically used in
techniques.
(SLEE)
and
The problems come from the interactions between SIP protocol
measurement
campaign by using an open source application server compliant
semantics
with the JSLEE (JAIN Service Logic Execution Environment)
unbalanced pipeline stages and lock contention. The same
specifications.
The
rationale
of
this
choice
is
that
with
Java
language
features,
which
performance problem has already been illustrated in
JSLEE
causes
[3], where
application servers are currently regarded as very promising
we have observed that the average CPU utilization never
candidates for deploying telecom services. Results show that it is
exceeds
possible improving performance in terms of throughput and
used. In addition, in
50% when a server equipped with 8 CPU cores is [2] authors show that no effective solutions
signalling latency by running more instances of the JSLEE server
to solve this issue through some best practice configurations
in parallel, each of them into separate virtual machines deployed
exist. In fact, although some improvements can be obtained
on the same server. This improvement can increase throughput
with
values of about 64% and, in the maximum throughput condition, the call set up latency can be nearly halved.
modularity
and
The contribution of this paper is the proposal and analysis of
reusability,
parallelization
lOO% of the
someway counterintuitive. Through an extensive experimental campaign we have shown that a very convenient solution consists of introducing a hypervisor and use virtual machines
nwnber of developers to speed up the service development. candidate
and
setup latency below an acceptable threshold. Our results are
networks and devices. In addition, it allows involving a higher
natural
virtualization
terms of successful supported SIP calls, while maintaining the
the great advantage of enabling services to support multiple
a
of
CPU utilization, but to increase the overall server throughput in
standard open interfaces, rather than proprietary ones, brings
is
usage
servers (ASs). Clearly, our goal is not to achieve
oriented architectures, new levels of abstraction, and well
technology
original
of servers hosting Java implementations of SIP application
the
defmed logical components. In addition, the introduction of
Java
an
techniques in order to better exploit the computing capabilities
development process can benefit from the usage of event
The
procedure
are spare CPU resources to manage a larger workload.
protocols and interoperate with different software and hardware of
this
of the computing capabilities are unused in the average, there
INTRODUCTION
availability requirements. They may rely on different network terms
optimization,
hand, obviously, applications must exploit as much as possible
Advanced telecom services are intrinsically asynchronous
In
and
the hardware capabilities, and it is clear that if more than half
and should fulfill low latency, high throughput and high
platforms.
profiling
language, i.e. easy and rapid service development. On the other
Keywords- Java; SIP; JSLEE; virtualization; parallelization
I.
extensive
nullifies one of the most important features of the Java
(VMs) to host SIP ASs. In more detail, our proposal is to run mUltiple, identical VMs on the same physical server, each of
for
implementing telecom services. Thanks to its intrinsic features,
them running a single instance of the Java-based AS. Despite
such as platform and operating system (OS) independence,
the increased overhead due to virtualization, the peculiarity of
networking support, dynamic adaptability, and availability of
the Java over SIP operation allows increasing the overall SIP
978-1-4244-9221-31111$26.00 ©2011 IEEE
493
call throughput with such a virtualized setting. We use of an
capabilities,
open source JSLEE AS, Mobicents [4], running a SIP-based
management, deployment, and thread pooling.
such
as
service
and
JSLEE
configuration
VoIP service performing several database queries during call lifetime. Our choice of using a quite complex service test is
The Mobicents structure is characterized by an evident
showing improvements with a very simple service, such as the answering
service
used
in
[2],
is
not enough
complexity, as it integrates the Java Virtual Machine (JVM),
to assess
the JBoss AS, and the MSLEE. In order to achieve an efficient
improvements in real service scenarios. Instead, using a JSLEE
platform setup, some critical aspects must be considered.
AS with a realistic VoIP service is reasonable, since JSLEE
Below we briefly illustrate those we have faced in this work.
ASs are primary candidates for the deployment of application services in the new convergent telecom paradigms [6]. Thus,
1)
our tests represent a realistic benchmark to verify if the
the automatic Java Garbage Collector (GC) mechanism for memory cleaning [9]. The drawback is that the developer has
The paper is organized as follows. In Section II, we present
not the full control over the GC behavior, which may even
an overview of the JSLEE specifications, focusing on the
pause and delay the application being executed [1]. Such
critical performance aspects. Section III describes the proposed deployment
configurations
and
the
VoIP
pauses may be critical for real-time telecom applications. In
service
fact, the AS may freeze due to post-pause avalanche restarts
implemented to test them. Section IV presents the numerical
(since during a pause it can accumulate many unprocessed
results of the experimental campaign. Section V illustrates
messages [8]). The JVM v.6 includes different GCs, in order to
related works and, finally, Section VI draws our conclusions. II. A.
meet the requirements of different applications [1]. We have selected the Parallel GC (see also [3]), which is the default and
BACKGROUND
most efficient GC (i.e., it uses the lowest CPU time), even if it produces longer pauses in program execution. In particular, the
JSLEE specifications and available platforms
test results described in [3] shows that the Parallel GC exhibits better performance when used with the UDP transport protocol.
The JSLEE activity aims to specify a Java-based, event oriented container for the execution of carrier-grade telecom
Another issue to be considered when Java is used for real
services [7]. The service logic is implemented in software
time services is the amount of memory to be allocated to the
components called Service Building Blocks (SBBs). A JSLEE
Java heap. As suggested in [8], this is another tricky point,
AS creates a pool of SBB objects and manages them according
since having a large memory allocated to Java may reduce the
to a well defined lifecycle. SBBs operate asynchronously by
frequency of GC phases, but each collection may last a much
receiving, processing, and triggering events. They can be
longer time due to the larger amount of heap to be cleaned.
attached to data streams called Activity Contexts, by which they receive events from other entities. Also, SBBs may be
2)
linked together by parent-child relationships to implement the
Event
Router,
SBB.
which
External
delivers
network
64
bit
64 bit operating system (OS), it is preferable over a 32 bit one. In fact, 64 bit OSs allow overcoming the well known limitation
Events are internally managed by a functional element appropriate
Operating System type: 32 vs.
Typically, when the hardware resources are suited to host a
service logic in a modular fashion.
called
Java memory management
Being based on the Java technology, the MSLEE relies on
proposed solution can be effective in operation.
AS
Critical issues in system configuration
B.
motivated by our convincement that the frequent approach of
each
events,
event such
to
the
as
SIP
of 32 bits OSs that each application may use up to 3GB of RAM. Furthermore, it may also increase the CPU efficiency. Nevertheless, 64 bit systems may be still not mature enough for
messages, are translated into internal Java events by the so
all
called Resource Adaptors (RAs). More generally, the set of
carrier-grade
generally
implemented RAs constitutes an abstract interface layer that
more
applications, stable.
We
whilst have
32
bit
evaluated
systems many
are
Linux
distributions, both 32 and 64 bit versions. The 32 vs. 64 bit
allows a JSLEE server to access external resources.
choice may impact on performance, but it does not affect
The Mobicents Communication Platform is an open source
service implementation, since using either a 32 or a 64 bit OS
project, currently owned by Red Hat [4]. It includes a JSLEE, a
basically means using a different kernel and a different JVM,
Media Server, a Presence Server, and a SIP Servlet Server.
whilst the Java code of JSLEE and application services is
Other commercial JSLEE implementations are OpenCloud
unchanged.
Rhino [l3], and Amdocs jNetX Convergent Service Platform
3)
[14]. In our experiments we have used the v.1.2.6 GA of
typically used to preserve a computer framework (e.g. a
already compliant with the new JSLEE V.U specifications [7]. The MSLEE includes several J2EE components,
database) in a known, consistent state, after system failures. As
such as
regards transactions, the MSLEE relies on the JBoss AS which
Container Managed Persistence (CMP) fields, which enable
natively supports Java Transaction API (JTA), thus allowing
data persistence for SBB objects, Java Database Connector (JDBC)
drivers,
Java
Management
Extensions
for
Database Transactions
In critical data transmission environments, transactions are
Mobicents JSLEE (MSLEE). It comes with a SIP RA which is
the usage of any transaction manager implementation. JBoss is
the
by default configured to use the so-called "JTA compatible in
environment management and monitoring, and Java Naming
VM" transaction manager. In some services, such as banking
and Directory Interface (JNDI), which offers lookup functions
transactions, a provider may need a transactional behavior to
for service registration. The MSLEE is installed within the
ensure that all operations made on a remote database remain
JBoss AS [5], which is a hosting environment offering special
consistent during all system operation, as well as after a system
494
failure. Such a guarantee comes with a large performance
more degrees of freedom with respect to the previous
overhead, in the form of disk writing operations, which are
solution. This configuration is labeled "VM".
directly related to the number of service transactions processed.
Looking at Fig.
Hence, it is of great importance for a provider to carefully
Without
evaluate whether to use transactions or not, in order to ensure
and
control
services.
For
A.
these
consumption. overhead
PROPOSED SOLUTION AND TEST SERVICE DESCRIPTION
since
OS
instances.
server In
consists of running multiple JVMs over a common OS
environment,
usage
and
services
deployment
[19]
that
by
far
fact,
a
virtualized
environment
can
allow
server
on. To the best of the authors' knowledge, there are no other proposals of using virtualization technologies (in particular
of MSLEE. Clearly, in order to correctly forward SIP
bare
calls to the appropriate MSLEE, it is necessary to
metal
hypervisors)
in
the
literature
to
improve
performance of a single AS through the deployment of multiple
introduce a SIP proxy acting as a call dispatcher. We
AS instances in different VMs on the same physical server.
have considered two versions of parallel configuration.
Instead, a common practice is a server consolidation through
The fust one, labeled "JVM", is the classical parallel
virtualization, which allows avoiding to run different and
deployment of n JVMs, each free to access all CPU
underutilized ASs in separated physical servers. Clearly, the
cores without restrictions. In the second configuration,
benefit of using the parallel or the virtualized solutions has to
labeled "1VM-taskset", we have bound each JVM to a
be enough to balance at least the cost of an additional entity,
specific subset of CPU cores, thus allowing each core
the SIP proxy. Nevertheless, it can be easily implemented even
to be used by a single 1VM by using the system
by unpretentious hardware using a SER-based implementation
command "taskset". The rationale of this choice is that
[17]. It is worth noting that a SIP proxy acting as a load
the Java-based SIP ASs do not scale well not only with
balancer is needed also with native configurations in large
the Java heap, but also with the number of CPU cores.
architectures. We show the performance of all considered
Java
architectures in section IV, highlighting the improvements of
computing environments within the same OS.
"1VM", "1VM-taskset", and "VM" over native configuration.
Virtualized configuration (Fig. I.c): it consists of a B.
hypervisor that virtualizes hardware resources and hosts some VMs. Within each VM, we replicate the
Mobicents
community
has
published
a
set
of
performance achievements [8], obtained through an automatic
insulated
MSLEE
Test service implementation The
native configuration (OS, JVM, JBoss, MSLEE AS). In the
virtualized
energy usage, service migration through VM mobility, and so
hosts an instance of JBoss, which includes an instance
completely
completely
consolidation procedures for a more efficient hardware and
installed directly onto the server hardware. Each JVM
separated
a
compensates the small performance degradation expected [27].
real time systems due to large GC times, and usually
"emulates"
operating
including even the OS, can provide a degree of flexibility in
ASs badly scale with the Java heap [20]. This solution
and
multiple
However, by considering both pros and cons, we are
heap allocation may cause performance degradation in
have
and
convinced that the preferred configuration is the third one,
availability of several GBs of RAM, an excessive Java
really
hypervisor
CPU cores may cause resource contention and locking events.
known that, even if 64-bit JVMs allows exploiting the
we
the
problem. We expect that the improvement of the "JVM"
scalability with respect to memory. In fact, it is well
environments,
of
configuration is inferior, since allowing all JVMs to access all
often used in Java-based AS deployments to improve
case,
third
each JVM to a specific set of CPU cores, it emulates a physical
Parallel configuration (Fig. I.b): this configuration is
computing
the
server with few cores, thus mitigating the mentioned scalability
OS runs directly on top of the server hardware.
this
in
less overhead than the virtualized one. In addition, by binding
of the server is installed within a single JVM, and the
•
exacerbated
resources, we expect that the best performance is achieved by
deployment configuration, in which a single instance
solution
is
the second one, and in particular the "JVM-taskset" one. It has
Native configuration (Fig. La): this is the classic AS
this
aspect
configuration does not exploit all the available computing
that we have evaluated by our experiments.
Thus,
This
However, since in the case of Java-based SIP stacks the native
In this section, we present the three deployment solutions
•
most
configuration, the virtualized one, which also includes the
Application server deployment configurations
•
the
instances, which clearly imply additional CPU and memory
reasons, in our experiments JTA has been disabled. III.
indication,
from the presence of multiple JVM, JBoss, and MSLEE
requirements are typically very stringent. This is the case of redirect
performance
minimum overhead. In fact, the parallel configuration suffers
ones, transactions may not be needed, whereas performance forwarding,
I, some considerations can be made.
particular
efficient configuration should be the first one, since it has the
maximum performance. In fact, for some services such as VoIP
call
any
answering service in which the MSLEE acts like a simple User
runs
Agent Server (UAS). It is one of the examples included in the
separately in each of them. In this case, we do not bind
MSLEE package. It utilizes only one SBB which responds to
the virtual CPU s of each VM to real CPU cores,
the incoming call, completes the SIP three-way handshake,
leaving the task of scheduling the VM access to
and, after a short timer expiration, sends a BYE request to the
computing resources to the hypervisor. This guarantees
caller User Agent Client (UAC) that has initiated the call.
495
[
Mobieents JBossA5
I
Mobicents JBoss AS
5unJVM L-
i N at v e 0 5
______ _____
(a)
--------------
SunJVM #1
�1 LI
______
�
I rl
Mobicents
Mob;"",,
Obi""
JBossAS
JBoss AS
5unJVM
5unJVM
SunJVM #n
VM 05#1
VM 05#n
�
i N at v e 0 5
Hypervisor
(b)
(e)
JBossAS
_________ ________ ______
Figure 1.
Application servers deployments configurations.
We have implemented a more complex SIP-based VoIP
CalierUAC
service to suitably characterize the MSLEE performance in a
I
realistic telecom scenario. It could model an online charging system for pre-paid VoIP calls, which periodically updates the user credit in the subscriber profile by accessing a remote database (see also [16]). However, given the widespread usage
INVI
OpenSips TE
100 -Trvinll
SlEE UAS UAC
INVI
100 -Trvine
INVI
skeleton of non VoIP services. The MSLEE manages the entire
TE
100· Tcy;ng
� � I. �
of the SIP protocol beyond IP telephony, it could be also the
Database
TE
180'R;n,;" 200-0K ACK
180 -Rin ing
lBO-Ringing
200-0K
200-0K ACK
signalling by implementing a call control service through a SIP
ACK
•••• -____
Back-to-Back User Agent (B2BUA) architecture [10], which easily allows the introduction of a third party call control mechanism [IS]. The MSLEE acts both as UAS and UAC, by splitting each call in two SIP dialogs over two distinct call-legs (caller-MSLEE and MSLEE-callee, see Fig. 2). This service
"'''.�
,,
implements two timers. One of them (Call Duration Timer other one (Periodic Database Query Timer - PDQT) triggers periodic updates to the subscriber profile database.
'1
atabaseque request atat:.a5eque
BYE
BYE
As regards the internal MSLEE service operation, when the
200 OK
subsystem is invoked and creates a Selector root SBB. This
Figure 2.
component queries the database to retrieve the subscriber profile, which includes the values of CDT and PQDT timers. A.
Then, upon receiving the answer, it activates a child SBB called CallControl, and leaves the signalling control to it. In
BYE 200- OK
200-0K
initial INVITE is received by the MSLEE, its event routing
�
i
��
CDT) is used to control the maximum call duration, while the
j
SPOOse
"I
l
Signalling flow for the test service.
Test bed description The Mobicents JSLEE v.1.2.6 GA, deployed on the JBoss
turn, this child SBB creates the second call leg towards the
v.4.2.3 GA, has been installed on a Fujitsu-Siemens server
callee and establishes the media session between the two end
PRIMERGY TX300 S4 with dual Intel Xeon ES410
points, starting both the CDT and the PQDT timers and
GHz
querying the database upon each PQDT timeout. The call ends
(8
@
2.33
CPU cores) and 16 GB RAM. The 64 bit OS is the
Novell Suse Linux Enterprise server x64 v.lO.l. For executing
when the CDT expires and the CallControl SBB sends a BYE
32 bit experiments, we have used OpenSuse Linux 11.1. In the
message on both call legs. The message exchange is illustrated
virtualized setting, we used the ESXi 4.1 hypervisor [12]. The
in Fig. 2. Alternatively, each of the two end points can send a
JVM is the v.1.6.x. In the MSLEE configuration, we have used
BYE to the other to close the call before CDT expiration. The
the JAIN SIP RA v.1.2, tuning the number of threads to allow a
service performs an additional, [mal database query on call
greater number of events to be processed simultaneously. All
termination, in order to update the user profile in the database.
logs have been disabled during tests execution to improve
In the implementation used for this experimental campaign, the
performance. In addition, we have used the following tools:
call is always closed by the MSLEE upon CDT expiration.
•
IV.
,,
SIPp traffic generator [11], installed on two PCs with Ubuntu Linux v.9.l0, acts as UAC and UAS endpoints.
NUMERICAL RESULTS •
In what follows we first present the test bed used in the
MySQL database v.S.O.Sla and MySQL Connector/J v.S.1.6 JDBC for database access, deployed on a PC
measurement campaign, then the results achieved by executing
with Arch Linux x64 (kernel 2.6.27). Each database
the MSLEE in all configurations illustrated in the previous
interrogation relies on an object-relational mapping by
lILA section. Finally, we compare and discuss the results
using the Hibernate technology, provided by JBoss AS.
obtained with the three deployment schemes.
496
•
The call dispatcher is an OpenSER [17] proxy running
a long history of optimization. However, increasing the Java
on a PC with Arch Linux x64 (kernel 2.6.27). It forks
heap the improvement with a 64 bit OS is not negligible (about
SIP traffic among different MSLEEs according to a
10%), although it requires about a triple amount of memory
weighted round robin policy in the virtualized/parallel
than a 32 bit system using the same cpu. The conclusion is
settings. The weight is proportional to the number of
that, for a 64 bit OS, the optimal amount of memory allocated to the Java heap is a number of GBs equal or slightly lower
CPU cores allocated to each JVMiVM.
than the number of CPU cores, which in our case are 8. In
As for the traffic statistics, the SIPp UAC has been set to
addition, since 32 bit OS are much less greedy of memory, they
generate new SIP calls with a constant rate, referred to as A,
are better candidates for parallel or virtualized deployments.
lasting 60 minutes for each A value. The value for the CDT timer (i.e. maximum call duration) has been set equal to 3 minutes. The value for the PQDT timer has been set equal to 10 plus two additional queries at the call setup (INVITE received)
60
�
50
2 .£
40
§
30
�
20
.!: OJ ::J
and upon CDT timer expiration (call tear down). We have considered the following performance metrics: •
� �
seconds. This means that 18 queries are issued for each call,
E 'x
Maximum call throughput, defmed as the rate of successfully established SIP calls in calls per seconds
2
(cps), with at least 95% of successfully handled calls. •
Figure 3.
Session Request Delay (SRD [18], see Fig. 2). SRD is measured at UAC and defmed as the time interval from response. It represents the latency experienced by the
overload status. In this condition (not shown to improve fIgure neatness), the latency needed to establish few calls exhibits a
measuring timer at the reception of the 200 OK.
steep increase, since the Java heap needs to be continuously polished and the MSLEE has to process the queued messages,
Performance of Native corifiguration
thus
The initial set of experiments has been done to fInd the best server.
We
recall
that
causing
timeouts
and
retransmissions,
which
further
increase the server load and slow down the overall call
confIguration for the MSLEE when it runs in an OS natively physical
Maximum throughput vs. amount of memory allocated to JVM.
maximum throughput. Beyond this value, the server enters the
confIgured with no pause between the 180 RINGING
the
16
increases almost linearly with the offered load up to the
and the 200 OK messages, we have triggered the
in
14
12
10
SRD) as a function of the offered load. It can be seen that SRD
UAC for setting up a call. Since our SIPp UAS is
installed
8
Java heap size (G6)
Fig. 4 shows the average call setup signaling latency (i.e.
the initial INVITE to the fIrst non-lOO provisional
B.
6
4
processing.
this
confIguration uses UDP as transport protocol and the Parallel GC, and JTA is disabled. Fig. 3 shows the maximum call throughput as a function of the amount of RAM allocated to the JVM for a 64 bit and 32 bit OSs. In the 64 bit case, the Java heap ranges from 2 GB to 15 GB, whereas for the 32 bit OS the maximum value is 2.5
"' Ql
GB. First let us consider the maximum throughput achieved by
�
using the 64 bit OS. When the amount of memory allocated to
0.6rr======1�-�--�-�-1 ' '11 2 GB-32bit -B- 2.5 GB-32bit 0.5 --+-- 4 GB-64bit ---A--- 5 GB-64bit --B---- 6 GB-64bit . f:> 7 GB-64bit 0.4 . + 8GB-64bit ---4- 9 GB-64bit 0.3 '-----.,-----;;..--;Y
o II: (/)
the JVM increases from 2 to 7 GB, the relevant throughput
0.2
increases. The major increase is observed when the memory allocated ranges from 4 to 6 GB, and it is almost negligible
0.1
from 6 to 7 GB. Increasing the memory allocation beyond 7 GB does not produce any throughput increase, but rather a slight decrease, especially for values larger than 8 GB. This is
OL-__-L____L-__-L____L-__-L____�__�
30
in line with what the Mobicents team has stated in [8], that is
35
40
45
Offered load
any increase of the memory allocation causes the GC to be executed less frequently, but increases the garbage time. In
Figure 4.
fact, polishing a larger memory increases the relevant service
50 (cps)
55
60
65
SRD vs. offered load for different sizes of JVM.
pauses, which may cause an avalanche restart and a consequent
The second comment is that, up to 50 cps, one of the best
server overload. A different consideration is needed for the 32
performing confIgurations is the 32 bit OS with 2.5 GB
bit OS. It is interesting to note that, even if the amount of
allocated to the Java heap. For a workload of 55 cps, which is
memory allocated to the JVM is only 2 or 2.5 GB (the
maximum call throughput for this confIguration, the SRD has a
maximum allowed with a 32 bit JVM), its performance is
sharp improvement, which doubles the value achieved at 50
comparable with a 64 bit system with a double amount of
cps. Finally, a further signifIcant comment is that, in 64 bit
memory allocated to the JVM. This result is not surprising,
confIgurations, the average SRD improves with the amount of
since 32 bit systems are much more stable and can benefIt from
memory allocated to the Java heap. This is quite evident for 4,
497
5 and 6 GB configurations. Perfonnance of 64 bit OS with 7 and 8 GBs is almost equivalent, and 7 GB configuration slightly outperfonns the 8 GB one only at 60 cps (at this value the 8 GB setup reaches its maximum throughput). Any further increase of the Java heap does not cause significant performance changes, altough this performance is slightly better than in all the other 64 bit configurations. Also this behavior can be explained by resorting to the GC operation: the larger the amount of memory allocated to the JVM, the fewer the pauses due to the GC collections, the longer their duration. Thus, the backlogged SIP messages may produce an avalanche restart causing a server collapse. This is the reason why the server does not achieve the best performance in tenns of call throughput with 15 GB memory allocated to the JVM.
Fig. 6 shows the maximum call throughput values achieved by the three schemes as a function of the number of MSLEE instances. The first comment is that the intuition at the basis of this paper is correct: since Java-based SIP ASs badly scale with CPU cores, the best option to improve hardware resources exploitation is to use several ASs in parallel. Basically, all schemes outperfonn the native configuration, in both 32 bit (55 cps) and 64 bit (61 cps) versions. Up to 3 MSLEE instances, the configuration that shows better results is the "VM" ones, even if the difference with "JVM-taskset" is really small. We ascribe this result to the better insulation capabilities of the "VM" approach. From 2 up to 8 MSLEE instances, the "JVM" approach has very little improvement, since it passes from 80 cps (2 MSLEE instances) to 90 cps (4 and 8 MSLEE instances). Beyond 3 MSLEE instances, the best configuration is definitely the "JVM-taskset". This result is reasonable, since it causes less overhead than the "VM" one, and better CPU resource insulation with respect to "JVM". In particular, any increase of the number of VMs beyond 3 seems to cause an excessive overhead thus, differently from "JVM" and "JVM taskset", performance starts decreasing. The best result in term of maximum throughput is reached by deploying 8 MSLEE instances, each using a single CPU core with 1.8 GB of memory allocated to the Java heap (all other configurations has 2.5 GB allocated to each JVM). In this condition, the maximum throughput reaches 125 cps.
Finally, we have verified that, even for a complex AS like Mobicents, it is still valid the proposition of paper [2]. To this end, we have set an increasing number of CPU cores to be used by the JVM in native configuration through the system command "taskset". The results achieved are shown in Fig. 5, where the maximum call throughput is plotted versus the number of used CPU cores, for both 32 and 64 bit OSs. We first analyze the 32 bit OS behavior. As expected, the maximum call throughput scales with CPU cores only up to 2 cores, then it slightly increases for 3 and 4 cores, and remains stable up to 8 cores. As for the 64 bit version, it exhibits a trend which is nearly linear up to 3 cores, then the slope decreases. However, passing from 1 to 2 cores the relevant throughput does not double, and the same happens with 3 cores. In conclusion, 32 bit OS results the best candidate for parallelization/virtualization purposes, since, even with much less memory allocated to Java heap, with the same number of CPU cores (up to 4), it outperforms 64 bit versions. By using the MSLEE on a 64 bit OS, performance still improves with a number of CPU cores beyond 4, but very slowly. Thus it exhibits poor scalability properties along with a high memory requirement.
5
7
# of MSLEE instances
Figure 6.
2
3
4
5
6
7
On the other hand, considering only the maximum call throughput as performance metric is not suitable, since also the SRD is important. In fact, the delay in establishing a session cannot be arbitrarily high, especially if during an ongoing call it is necessary to redirect the call towards another participant (e.g. a media server or a person). Fig. 7 shows the average and 95th percentile of the SRD as a function of the number of MSLEE instances, evaluated in the maximum throughput condition. We recall that the 64 bit version (native configuration) in the maximum throughput condition (61 cps) achieves an average SRD equal to about 500 ms and a 95th percentile of the SRD equal to about 2 s. It is evident that most configurations exhibit setup delays that cannot be acceptable in real settings. To this end, the reader should bear in mind that the measured delay is relevant only to the component of SRD
8
Number of CPU cores
Figure 5.
C.
Maximum call throughput vs. number of MSLEE instances.
Call throughput vs. number of CPU core used by MSLEE.
Performance of ParallelellVirtualized configurations
In this sub-section, we analyze the results relevant to "JVM", "JVM-taskset", and "VM" configurations.
498
accumulated inside the data center where ASs run, since the
V.
RELATED WORK
UAC and UAS used in our tests represent inbound and outbound SIP proxies or PSTN gateways in the data center.
A.
Thus, in order to obtain the actual SRD values perceived by users, we have to add the delay contributions "caller-AS" and "AS-callee",
systems. They analyze the performance of benchmarks using
which could be two segments of wide area
various numbers of processor cores and application threads.
networks and thus with significant delays. To this end, Fig.
8
JVM performance Papers [24][25] analyze the JVM scalability on multi-core
They correlate low-level hardware performance to the number of JVM threads and system components, in order to observe
shows the maximum throughput for the
potential bottlenecks. Lock contentions and memory stalls
three considered approaches as a function of the number of
cycles, produced by insufficient L2 cache and cache-to-cache
MSLEE instances, with the constraint of limiting the 95th
transfers, are the main observed bottlenecks. JVM includes a
percentile of SRD to 200 and 500 ms. Results are very
parameter called thread-local allocation buffer
interesting. First of all, we observe that striving for maximum configuration showing the best performance in Fig.
(TLAB)
that
limits these issues [24]. A further performance optimization
throughput may cause unacceptable SRD values. In fact, the
[24] is achievable by using the Parallel GC configuration and a
6 (8
suitable ratio of new and old generation memory heap to
MSLEE instances in "JVM-taskset" setting) is now the worst.
reduce the overhead of minor memory collection. It can be set
This happens since the use of a single core for executing GC
by using the
causes excessive delays, and it is not suitable for real time
NewRatio
flag. The
TLAB
and Parallel GC options
have been already used in our experiments,
systems. Instead, the best configuration consists of using 4
NewRatio will be included in
MSLEE instances in the "JVM-taskset" configuration (110 cps
whereas the
future works. However, from first
tests, it seems that in our case its use improves mainly SRD and
with 500 ms constraint and 105 cps with 200 ms constraint).
does not have a large impact on maximum throughput.
This is convincing, since a 32 bit system with 2 CPU cores scales very well (see Fig. 7). A further interesting result is that
B.
the "VM" approach can support up to 95 cps for both
OpenSER SIP Server Performance
constraints. This means that adding the reasonable constraint
OpenSER [17] is a modular SIP proxy server, call SIP
on SRD, the performance in terms of maximum call throughput
router, and SIP registrar server written in C language. It is
of "JVM-taskset" and "VM" approaches gets closer. In fact,
widely adopted in VoIP environments and focuses on proxying
performance loss of the "VM" approach with respect to the
and
"JVM-taskset" one is 13.5% if the 95th percentile of SRD is bounded to 500 ms, and 9.5% with a constraint of 200 ms, In addition, the reader should bear in mind that running an AS
without
server
crash
up
to
our
conclusion
is
that
and localization. Even if also OpenSER can benefit from our internal architecture is not proxy-centric.
the
C.
preferable deployment is "VM" with 3 MSLEE instances.
Web Server Performance Performance scalability problem in multi-core environment
is relevant also for other type of ASs, e.g. web servers.
"'
.s
3 10