The impact of multi-core processor on web server performance

8 downloads 882 Views 874KB Size Report
The key to web server performance is in the capability to use all possible .... using Web Server Test Tool. Microsoft Reliability and Performance Monitor is used to.
The impact of multi-core processor on web server performance Frane Urem and Želimir Mikulić Department of management College of Šibenik Complete Address: Trg A. Hebranga 11, Šibenik, 22000, Croatia Phone: (+385) 22-311060 Fax: (+385) 22-216 034 E-mail: [email protected]; [email protected]

Abstract : The single processor systems are history and the multi-core and many-core systems world is here. Systems performance scales with number of cores only if systems software and applications are designed to fully exploit the parallelism built in multi-core platforms. In this paper we have tried to answer on question is the web server good platform to exploit advantage of the increased raw compute power that comes with the availability of the added cores. We have used a wellknown queuing theory result to compute the average response time of a request at a web server and compared simple performance model with performance scaling results from tests performed on today’s real desktop systems.

I.

INTRODUCTION

The key to web server performance is in the capability to use all possible hardware resources on a single system. Clients communicate with servers through many independent flows (or connections). If web server application processing and the associated network protocol processing of a flow are done exclusively on a single core, we expect minimal data sharing and synchronization between flows. That’s why we expect that web server and web application software can use flow-level parallelism to increase throughput with the number of CPU cores. Typical stack of layers that are existing on all web servers is shown on Figure 1.

Hardware Operating system Virtual machine (JVM or .NET) Application server Application

Since they respond to mutually independent client requests, they should scale easily with the number of cores by exploiting flow-level parallelism. To test this, we set up test server running a well-tuned IIS 7.0. HTTP server and Windows 2008 Server operating system. We have tested server with two and four cores with pairs of cores sharing L2 cache. The Windows 2008 Server kernel supports a parallelized network stack and the IIS 7.0. web server is multi-threaded with one thread per connection.

II.

A SIMPLE PERFORMANCE MODEL

We can use use a well-known queuing theory result to compute the average response time of a request at a web server [1]. Assuming that requests arrive at the web server from a Poisson process, that a request’s processing time at a server has a general distribution, and that a perfect loadbalancer equally distributes the load among all cores in the CPU we can use the M/G/1 queue (that is a queue with Poisson arrivals, arbitrarily distributed service times, and a single server) to compute the average response time. Figure 2 shows a web server with n identical CPU cores and a load-balancer that equally distributes the total incoming traffic of λ requests per second among all cores. The n cores in the CPU each have X requests per second of processing capacity. The web server's total capacity is therefore nX. CPU utilization is computed as:

 = λ E[ts]

(1)

Average queuing time Tq is computed as a sum of average wait time Tw and average service time Ts: Tq = Tw + Ts (2) For M/G/1 system average wait time Tw is computed as: Tw =

Ts (1  Cs 2 ) 2(1   )

(3)

Using (2) and (3) we can compute average queuing time as:

Figure 1. Web server stack of layers

Tq = Ts +

Ts(1  Cs 2 ) 2(1   )

(4)

Equation 11 can be used for computing of theoretical maximum throughput for complete CPU composed of n cores with processing capacity X : λmax = n X (12)

1

λ/n

X

2 λ/n

λ

If we insert Equations 6 and 7 in Equation 4 we have:

X

LOAD BALANCER

1

Tq = 1 +

n λ/n

X

X

2

2   (1  Cs ) nX  1 2(1  ) n X

(13)

Little’s law [4] let us compute the average number of requests processed as: Figure 2. Multi-core web server architecture design includes n CPU cores, each with X requests/sec of capacity

λ - total traffic [request /s] X - processing capacity for one core [request/s] n - number of CPU cores Cs is the coefficient of variation of the service time (the ratio between the service time’s standard deviation  Ts and the average service time Ts ) Cs =

 Ts Ts

(5)

If processing capacity for one core is marked as X, then we can compute average service time as: Ts =

1 X

(6)

Lq = λ Tq

(14)

We can interpret Equation 13 in many ways to analyze influence of some system's parameters on system's response time. As an example we can analyze how the number of CPU cores or Cs (coefficient of variation of the service time) can influence on CPU's response time. For an example we can analyze system with two CPU cores (n=2), Cs = 5 (high value that is specific for web server) and processing capacity X for one core that is 10 requests/sec. Plugging that values into Equation 13 for different values of total arrival traffic λ is resulting with different values of system's response time like on Figure 3. Figure 4 shows the variation in average response times as a function of the utilization for the same system from Figure 3. Canonical performance characteristics is occurring in all benchmark measurements (Figure 6) and it is placed under theoretical throughput characteristic with ceiling that is controlled by the bottleneck resource in the system and can be computed from Equation 12 .

Arrival rate of requests for one core is: λw = 

n

(7)

Basic condition for stable service is:

Suggest Documents