proposed service hosting architecture and its protocols to support ... automatically select the best service, depending on a number ... across the Internet; the dotted lines show the peer-to-peer overlay topology ..... 7-10). Subsequently, the algorithm iterates over all inactive services, starting with the one that would add most ...
A Latency-Aware Algorithm for Dynamic Service Placement in Large-Scale Overlays Jeroen Famaey, Wouter De Cock, Tim Wauters, Filip De Turck, Bart Dhoedt, and Piet Demeester Department of Information Technology (INTEC), Ghent University - IBBT Gaston Crommenlaan 8, B-9050, Gent, Belgium
Abstract—A generic and self-managing service hosting infrastructure, provides a means to offer a large variety of services to users across the Internet. Such an infrastructure provides mechanisms to automatically allocate resources to services, discover the location of these services, and route client requests to a suitable service instance. In this paper we propose a dynamic and latency-aware algorithm for assigning resources to services. Additionally, the proposed service hosting architecture and its protocols to support the service placement algorithm, are described in detail. Extensive simulations were performed to compare the solution of our latency-aware algorithm to the latency-unaware variant, in terms of system efficiency and scalability. Index Terms—self-management, service placement, resource allocation, peer-to-peer, overlay networks
I. I NTRODUCTION In recent years, the exponential growth of the Internet has allowed an ever-increasing number of service providers to offer their services to more and more users. Currently, most of these services are offered on a private and dedicated infrastructure. However, we believe that offering many services on a shared and open infrastructure would provide advantages to both service providers and users alike. For example, the infrastructure’s resources could be dynamically reallocated between services, depending on their current load and popularity. Additionally, new services could be easily deployed on the existing infrastructure. For users, the infrastructure could automatically select the best service, depending on a number of metrics, such as network latency, available bandwidth or server-load. An important aspect of such a generic service hosting infrastructure, is scalability in terms of users and services. This goal can be achieved by allowing each server in the infrastructure to manage itself and cooperate with other servers in a distributed peer-to-peer (P2P) like fashion. Consequently, distributed mechanisms are needed to automatically construct the peer-to-peer overlay topology, place services on servers, discover service instances, and route client requests. In this paper, we propose a new algorithm for allocating server resources to different services. This new algorithm is based on a service placement algorithm proposed by Adam Jeroen Famaey is funded by the Institute for the Promotion of Innovation by Science and Technology Flanders (IWT-Vlaanderen) Tim Wauters is funded by the Fund for Scientific Research Flanders (FWOVlaanderen)
et. al. in [1], [2]. In contrast to the original algorithm, our algorithm takes into account the latency to the nearest instance of the services, when deciding which services to place and how many resources to allocate to them. In their original design, they assumed servers were clustered together in data centers. In such a case, network latency between servers in the same cluster is negligible. We however assume that servers are spread across the Internet. In this case, underlying network characteristics, such as latency, do become important. Additionally, some changes to Adam’s architecture, which are needed to support our new deployment scenario and algorithm, will be discussed. Fig. 1 shows the proposed deployment scenario. The clients and servers communicate with each other via the Internet. Additionally, the servers are connected via a peer-to-peer overlay topology. All management and control messages are sent via this overlay network. Each server is responsible for both taking part in management tasks, and running a subset of the available services. The figure also shows how messages are routed between servers and clients, and servers mutually. Clients directly communicate with servers via the IP routing path on the Internet. Servers on the other hand, communicate via the overlay network. The overlay hop between two servers corresponds to the IP routing path in the underlying network that connects these servers. The rest of this paper is structured as follows. Related work concerning service hosting infrastructures and service placement algorithms is discussed in Section II. The designed service hosting architecture, with emphasis on the changes we made to support our new algorithm, is given in Section III. Subsequently, Adam’s original algorithm and our latencyaware algorithm are described in Section IV. The algorithms are compared and evaluated in Section V. Finally, conclusions are drawn in Section VI. II. R ELATED W ORK Scalable service hosting infrastructures, based on peer-topeer principles, have been studied in some detail [2], [3]. In the design of these infrastructures, it is assumed that all servers are clustered together in one or more data centers. In this case, the effect of the underlying physical network topology on performance is negligible, as dedicated high-bandwidth and low-latency links can be used inside the data centers. In
Figure 1. The proposed deployment scenario: servers and clients are spread across the Internet; the dotted lines show the peer-to-peer overlay topology which connects the servers; the arrows denote how message are routed between nodes
contrast, we assume servers are spread across the Internet, and are connected via shared, possibly saturated, links. In this case the available bandwidth and network latency do have a significant effect on performance. Several algorithms have been proposed for autonomic resource allocation in generic service hosting infrastructures, in this context also known as service placement. In [4], we proposed several centralized, latency-aware algorithms that solve the service placement problem. Because these algorithms require information on the entire network topology, their scalability is limited. Another centralized service placement algorithm was proposed by Karve et. al., in [5]. This algorithm maximizes satisfied demand and minimizes the number of placement changes. In [1], Adam et. al. propose a decentralized variant of Karve’s centralized algorithm. In contrast to our newly proposed algorithm, neither of these two algorithms take into account latency or other characteristics of the underlying physical network. III. S ERVICE H OSTING A RCHITECTURE In this section, the architecture of the service hosting infrastructure is discussed. The original architecture, as designed by Adam et. al., is described briefly. A more detailed description is given of the changes we made. For a more in-depth overview of the original architecture, the reader is referred to [1], [2]. A schematic overview of the architecture is shown in Fig. 2. The different components, except service placement, are discussed in the rest of this section. Service placement is discussed in more detail in Section IV. A. Overlay Topology Management This component consists of two topology construction protocols, CYCLON and GoCast, which allow each server to maintain a list of overlay neighbors. Additionally, we have added an extra component not present in the original architecture. This dynamic node (de)activation component allows
Figure 2. The designed architecture: all components run on each server in a distributed fashion; service management components, such as service discovery and service placement, use the underlying overlay topology to exchange information with their neighbors
servers to independently activate or deactivate themselves, respectively at times of high or low load. Finally, it contains extra components for estimating latency to other servers and measuring resources such as CPU, memory and bandwidth. 1) CYCLON: This is an epidemic protocol, which maintains on every server a cache of predefined size C of active overlay servers [6]. Every server periodically initiates a shuffling protocol. The server selects the server with the oldest timestamp in its cache and sends it the content of its cache. This server then sends a reply with a copy of its own cache. Subsequently, both servers merge the two caches, keeping only the C newest entries. 2) GoCast: The actual overlay neighbors are selected using the GoCast protocol [7]. The protocol maintains two separate lists of neighbors, nearby neighbors and random neighbors. The size of these lists, respectively G n and G r , is configurable. It has been shown that over time the list sizes on all servers converge to G n ± 1 and G r ± 1. Periodically, the GoCast protocol optimizes its neighbor lists, by adding, removing or replacing neighbors. New neighbors, if needed, are chosen from the CYCLON cache. The server always attempts to use the servers with lowest known latency, as nearby neighbors. 3) Latency Estimation: The GoCast protocol uses the nearest servers (in terms of latency) in the CYCLON cache as nearby neighbors. Therefore, a latency estimation is needed of all servers in the CYCLON cache. Many methods have been proposed to perform such estimations, such as GNP [8], IDMaps [9] and Vivaldi [10]. In this paper, we propose the use of Vivaldi, as it achieves high accuracy without the need for
landmark servers and while causing only minimal overhead. The Vivaldi protocol is Incorporated into our architecture as follows. Whenever GoCast or CYCLON messages are exchanged, a timestamp and the sender’s current Vivaldi coordinates are added. The receiver can use these to improve his own set of coordinates. This allows nodes to keep their coordinates up to date without the need for much additional network traffic. Additionally, the CYCLON cache entries are extended to also contain the coordinates of the server in the entry. Whenever a server sends a CYCLON shuffle message, it adds a cache entry of itself as usual. But now, it also adds its current Vivaldi coordinates. That way, every server can maintain the Vivaldi coordinates of all servers in its CYCLON cache, and can thus use these to estimate the latency to these servers. 4) Dynamic Node (De)Activation: This component allows servers to dynamically (de)activate themselves, depending on current server and network load. It uses the resource monitor component for measuring available server and network resources. This allows idle servers to be temporarily used for other purposes, or be shutdown to save energy. This component is not part of the original architecture, but was first proposed and described in our previous work [11]. B. Service Management The service management component is responsible for managing and maintaining services. This consists of three sub tasks, service placement, service discovery and request routing. The request routing component operates on top of service discovery, and uses the discovery information to route a request to a server capable of processing it. The discussion on service placement is postponed until Section IV. 1) Service Discovery: The goal of this component is to find out which services are running on which servers. For this purpose, each server maintains a forwarding table. This table contains for each known service a list of servers running this service. The size of this list is configurable, and it was shown by Adam et. al. [1] that a size of 4 provides a good trade-off between solution quality and generated network load. To disseminate placement information and fill the forwarding tables, Adam devised a scalable mechanism, called selective update propagation. At the start of each placement cycle, which starts after running the service placement algorithm, the server broadcasts the list of applications it will run during the next cycle to its GoCast neighbors. Whenever a server receives such an update message, it adds the originator to the forwarding table, for all the services in the list. If any of the server lists in the forwarding table have become to large, the server with highest latency is removed from it. To make the mechanism more scalable, each server only forwards the entries in the update message which caused a change to its forwarding table. If the update message caused no changes, the server does not forward the message to its GoCast neighbors. As stated above, the service discovery mechanism requires latency information to decide which forwarding table entries to keep. Additionally, our new service placement algorithm needs this information as well. We altered the service discovery
protocol to disseminate latency information. When sending an update message, the originator adds a timestamp to it. Whenever a server receives this update message, it checks if it already knows the latency to the originator of the message. If it does, no other action is required, otherwise the timestamp is used to get a temporary estimate of the network latency and a ping message is sent to the originator to get a more exact measurement. Note that no ping message needs to be sent if the update message caused no changes to the forwarding table. 2) Request Routing: The request routing component operates on top of service discovery, making use of the information in the forwarding table. As the system is fully distributed and decentralized, a client may send its service requests to any overlay server. Whenever a server receives a request, it will determine the service type and attempt the following, in order 1) If it is running the service and has enough free CPU resources it will start processing the request. 2) If the server cannot handle the request and the list in its forwarding table for the service is not empty, it will forward the request to a random server in this list. 3) Otherwise, it will forward the request to a random GoCast neighbor. Once a request has been forwarded a predefined number of times, and still no server was found to process it, it will be dropped. Once a request has been processed, the processing server will send a reply directly to the client. IV. S ERVICE P LACEMENT A LGORITHMS A. Problem Description Given a set of servers and services, the goal of service placement is to decide which services will be run on which servers, without exceeding available server resources. In order to simplify the problem description, it is assumed all servers use the same processor architecture, and services thus consume the same amount of CPU resources on all different servers. In the rest of this section, a more formal description of the service placement problem is given. Given is a set of servers N and a set of services A. Each server n ∈ N has a memory and CPU capacity, respectively Γn and Ωn . Each service a ∈ A has a memory requirement γa per instance, a CPU requirement ωa per request, an execution time θa per request, and a priority πa . Each server n ∈ N executes a subset Rn ∈ A of all available services. Additionally, every server n ∈ N has a set of GoCast neighbors Gn = Gnn ∪ Gnr , which is equal to the union of its nearby and random GoCast neighbors. Let, for service a ∈ A and server n ∈ N , ρprocessing be n,a,t the number of requests that are being processed at any given ied time t, ρsatisf the number of requests that have started n,a,t processing exactly at time t, and ρtotal n,a,t the total number of requests that arrived exactly at time t. The resource constraints can then be stated as follows X ∀n ∈ N Γn ≥ γa (1) a∈Rn
∀n ∈ N , ∀t
Ωn
≥
X a∈Rn
ωa × ρprocessing n,a,t
(2)
Eq. 1 stipulates that the total memory consumed by all services on a server cannot exceed the total memory available on that server. Additionally, the equation shows that memory is assumed to be load independent. This means that a service instance consumes the same amount of memory, no matter how many requests the instance is processing. Eq. 2 defines the same constraint for CPU and shows that we assume CPU to be load dependent. Memory is treated as load independent because many applications consume a considerable amount of memory even if they are not processing any requests. Additionally, because of caching current memory usage may depend on past instead of current request load. Consequently, we chose to use a load independent upper bound for memory. The goal of service placement is to maximize the satisfied demand over the entire network, while adhering to the resource constraints. The satisfied demand is equal to the total number of requests that have been processed or have started processing. This goal can be formally defined as X X X satisf ied (3) max ρn,a,t n∈N a∈A
t
B. Adam’s Algorithm We start with an overview of Adam’s original algorithm, as an understanding of it is needed to understand our proposed latency-aware algorithm. For a more detailed description, the reader is again referred to [1], [2]. Every server uses the algorithm independently to decide which services it will run. It is executed periodically to allow the server to adapt to changes in client demand. As input the algorithm receives a list of currently active services and a list of all known services. Additionally, it knows the total and satisfied number of requests the server and its GoCast neighbors received since the start of the last placement cycle and the latency to the nearest other server running each service. Using this information, the algorithm will then attempt to improve the overall satisfied demand by replacing services in its active list with services in its inactive list. In the rest of this section, a more formal description of the algorithm is given, based on the pseudo-code shown in Algorithm 1. The algorithm starts by calculating the list of inactive services (line 1). Subsequently, the lists of active and inactive services, respectively Rn and Sn , are sorted. The services in Rn are sorted in increasing order of utility, which is, for an active service a, defined as the CPU resources delivered to it, multiplied by its priority, i.e. X satisf ied πa × ωa × θa × ρn,a,t (4) t∈Tn
With Tn the time interval since the last execution of the placement algorithm on server n. The inactive services Sn are sorted in decreasing order of utility, which is defined as the utility the server would add if it would run service a, i.e. X X satisf ied πa × ωa × θa × ρtotal − ρ (5) g,a,t g,a,t g∈Gn ∪{n} t∈Tg
After sorting the lists, the memory used by the currently active services and the utility they provide, is calculated (lines
Algorithm 1 Pseudo-code for the service placement algorithm procedure n.getPlacement() 1: Sn ← A \ Rn 2: sort(Rn ); sort(Sn ) 3: Γused ← getUsedMemory(Rn ) n 4: µcur ← getUtility(Rn ) 5: µbest ← µcur 6: for i = 0 to |Rn | do 7: Rnew ← {} n 8: if i ≥ 1 then 9: Γused ← Γused − getUsedMemory(Rn [i − 1]) n n cur 10: µ ← µcur − getUtility(Rn [i − 1]) 11: end if 12: for all s ∈ Sn do 13: if Γused + getUsedMemory(Rnew n n ) +γs ≤ Γn then new ∪ {s} 14: Rn ← Rnew n 15: end if 16: end for best 17: if µcur + getUtility(Rnew then n ) >µ best cur new 18: µ ← µ + getUtility(Rn ) ∪ Rn [i.. |Rn | − 1] ← Rnew 19: Rbest n n 20: end if 21: end for 22: Rn ← Rbest n
3-4). The algorithm then runs through |Rn | + 1 iterations. During iteration i, it replaces the first i services in Rn with as many services as possible from Sn . During the first iteration (i = 0), it will merely attempt to fit additional services on the server, without removing any active ones. At iteration i the algorithm will start by removing the used memory and utility of the service at position i−1 of Rn (lines 7-10). Subsequently, the algorithm iterates over all inactive services, starting with the one that would add most utility if run (lines 11-15). If enough memory is available it will be added to for that iteration (lines 12the list of new active services Rnew n 14). Finally, it is checked if the iteration is better than previous ones (lines 16-19). The algorithm finishes by changing the list of active services with the best found solution (line 21). When calculating the utility of previously inactive services (line 16), not the total number of unsatisfied requests of this service in the server’s neighborhood is used, as in Eq. 5, but the minimum of this value and the total number of requests that n can actually satisfy for this service, based on its currently available CPU. It is clear that this algorithm has a time complexity of O (|Rn | × |Sn |). C. Latency-aware Algorithm The pseudo-code for our latency-aware algorithm is identical to that given in Algorithm 1. The difference with the original algorithm, is in the way utility is calculated. Let us define δn,a as the latency between server n and the nearest other server in n’s forwarding table for service a. If n does not have an entry in its forwarding table for service a, we define δn,a = 2 × maxs∈A δn,s . Based on this new definition,
the utility of an active service a is defined as X satisf ied ρn,a,t πa × δn,a ×
A. Simulation Setup (6)
t∈Tn
Intuitively, this can be interpreted as the additional latency of the requests passing by n for service a if it would no longer run the service. By then sorting the active services in increasing order, those that would add little additional latency if they were no longer run by the server, will be replaced first by the algorithm. The utility of an inactive service a is calculated as follows X X satisf ied ρtotal − ρ (7) πa × δn,a × g,a,t g,a,t g∈Gn ∪{n} t∈Tg
This formula gives an estimate of how much the total request latency would be reduced if the server activates service a. As inactive services are sorted in decreasing order, those that would cause the biggest drop in request latency will be given priority. The total utility thus gives a measure for the total latency caused by all requests. The algorithm will attempt to minimize this latency, by replacing active services that give only a small reduction with inactive ones that would cause a major gain. D. Random Algorithm This is a trivial algorithm, that selects new active services at random during each run of the algorithm. It will select a service at random from the list of known services. If there is enough memory left, it will add it to the list of new active services. This process is repeated until no memory is left, or all services have been picked. This algorithm is used as a benchmark for the performance of the other algorithms, in Section V. V. E VALUATION In this section the algorithms are evaluated in terms of satisfied demand of the system and the network latency of client requests. The evaluation is performed using simulation results from the peer-to-peer discrete event simulator PlanetSim [12]. The service hosting architecture, as described in Section III, and the service placement algorithms, as described in Section IV, were entirely implemented in the simulation environment. Additionally, we extended PlanetSim to incorporate a means to model a physical underlay with link latencies and bandwidth constraints, as this is an important aspect in the evaluation of our algorithm, and is not supplied by the standard PlanetSim implementation. Statistical analysis was performed to interpret simulation results. A one-way ANOVA [13] was used to compare several levels of a single factor. If an effect of the factor was detected, a Tukey test [13] was performed to determine which averages actually differed significantly. The ’homogeneity of variance’ prerequisite for ANOVA was checked using a modified Levene test [13]. All statistics were performed using a 5% significance level.
The physical underlay topology, its link latencies and bandwidth constraints were generated using the Brite topology generator [14], with Waxman’s method. Overlay servers were selected from the underlay nodes at random. Messages sent between two overlay servers were routed via the shortest hopcount path in the underlying physical network. The available CPU Ωn of every server was set to 1.5 GHz, while the required CPU ωa of every service request was set to 0.1 GHz, so that every server could process 15 requests simultaneously. The processing time θa was set to 250ms for all services. Which means that a server could process 60 client requests per second. The memory capacity Γn of servers was uniformly distributed over the set {1, 2, 3, 4} GB, while the memory requirement γa for services was uniformly distributed over {0.4, 0.8, 1.2, 1.6} GB. This means that, on average, a server could run 2.5 services simultaneously. In order to have comparable simulations, the following parameters were given the same values as in [1]. The CYCLON cache size C was set to 10, while the GoCast nearby and random neighbor counts G n and G r were respectively set to 3 and 1. Every node executed the CYCLON protocol every 5s, GoCast every 1s and the placement algorithm every 30s. The list for each service in the forwarding tables was limited to 4 entries. Client request patterns were generated using a Zipf-like distribution [15]. If services are sorted according to popularity, the total percentage of all requests for the service with rank i is then given by i−α (8) PN −α n=1 n With N the total number of services. These requests are then divided at random among a set of randomly chosen clients (the number of clients is also determined by the percentage generated by Eq. 8). This means that more popular services will not only have more requests, but also more clients sending these requests, which we believe to be a realistic assumption. The α parameter of the Zipf-like distribution was set to 0.6, which has been shown to be a realistic value for content distribution services [15], [16]. All simulation runs lasted 400s, of which during the first 160s no statistics were gathered (warm up period). All results are averaged over 30 simulation runs. B. System Efficiency The goal of this simulation was to compare the performance of the latency-aware service placement algorithm and the original algorithm to each other and to a random placement. Performance is evaluated in terms of satisfied demand (the ratio of answered request count to total request count) and network latency of client requests. All results are shown as a function of the CPU load factor (CLF). This is the ratio between the required CPU for all client requests and the available CPU in the system, i.e. P P total a∈A t∈T ωa × θa × ρa,t P (9) T × n∈N Ωn
1
700 650 average network latency (ms)
average satisfied demand (%)
0.9
0.8
0.7
0.6
0.5
ADAM LTCY RAND
0.4 0.2
0.4
600 550 500 450 400 ADAM LTCY RAND
350
0.6
0.8
1
1.2
1.4
1.6
1.8
2
300 0.2
0.4
CLF
Figure 3.
0.6
0.8
1
1.2
1.4
1.6
1.8
2
CLF
Performance of the algorithms as a function of a CLF up to 2.0, for satisfied demand and network latency
With T the total simulation time interval, and ρtotal the total a,t number of requests for service a sent at time t. The generated underlay network consisted of 1000 nodes, of which 200 were chosen at random as overlay servers. Additionally, requests were generated for 300 different services. Fig. 3 shows the simulation results. In terms of satisfied demand, the latency-aware algorithm LTCY performs better than Adam’s algorithm ADAM (up to 6%) and the random algorithm RAND (up to 18%) for lower CPU loads (CLF ≤ 1.0). At higher CPU loads, LTCY and ADAM perform similarly. RAND, on the other hand, performs consistently worse. However, at higher CPU loads the difference becomes smaller. In terms of network latency, we can again make a distinction between lower and higher CPU loads. ADAM performs best, for all loads (from 6% better than LTCY at CLF = 0.2 up to 14% better at CLF = 2.0). On the other hand, RAND becomes relatively better and better. At higher CPU loads (CLF > 1.0) it even outperforms LTCY. At CLF = 0.2, LTCY is, on average, 15% faster than RAND, while at CLF = 2.0 RAND is 8% faster than LTCY. Statistical analysis of the results shows that in terms of satisfied demand LTCY significantly outperforms ADAM for CLF ≤ 0.8, while ADAM performs significantly better for CLF ≥ 1.6 and both algorithms perform significantly better than RAND for any CLF. In terms of network latency ADAM performs significantly better than LTCY and RAND, while LTCY performs significantly better than RAND for CLF ≤ 0.8 and RAND better than LTCY for CLF ≥ 1.2. From these results we can conclude that LTCY performs well in a non-overloaded scenario, but that CLF = 1.0 is a turning point for the performance of this algorithm. At first sight, these results are not at all as expected. The algorithm that takes into account network latency actually performs worse in terms of latency, but mostly better in terms of satisfied demand. Nevertheless, these unintuitive results can be easily explained. The latency of the nearest service instance in Eq. 6 and Eq. 7 can be seen as a weight assigned to each service. As this weight becomes higher, the chances the server will run this service in the next placement cycle increase. Services for which no instance can be found, get a much higher weight, and thus the chances they will be run
by the server increase dramatically. Therefore, the chances unpopular services will be run at least somewhere in the network, are much greater for LTCY than for ADAM. This will, at lower CPU loads at least, increase the satisfied demand. A direct consequence of this, is that LTCY will start fewer instances of the most popular services. This will increase the average network latency of requests for these services, and thus also the average network latency of LTCY. Additionally, as there are fewer instances of popular services, they will become overloaded much faster. At higher CPU loads, the improvements in satisfied demand will thus be countered by this negative effect. C. Scalability In this section we assess the influence of a varying number of servers and services on the performance of the algorithms. This allows us to evaluate their scalability. For the simulation, we again used an underlay network with 1000 nodes. The CLF was set to 0.6. For a varying number of servers, the service count was set to 300. For a varying number of services, the server count was set to 200. 1) Servers: Fig. 4 shows the simulation results for a varying number of servers. The average satisfied demand is directly proportional to the server count, while average network latency is indirectly proportional to it. This is because a lower server count means a lower total number of service instances can be run in the network. When the number of available servers is adequate to start up enough service instances, both satisfied demand and network latency remain somewhat constant (server count ≥ 300). Additionally, these results confirm the results from the system efficiency simulation. LTCY generally performs best in terms of satisfied demand, while ADAM does so in terms of network latency. When the number of available servers is more than adequate (server count ≥ 300) the results of ADAM and LTCY for satisfied demand do however converge. For network latency, the difference between both algorithms is never more than 8 ms (4% difference), while in terms of satisfied demand LTCY performs up to 6% better than ADAM. Statistical analysis shows that in terms satisfied demand LTCY significantly outperforms ADAM for any number of servers, while ADAM is significantly better than LTCY in
275
average network latency (ms)
average satisfied demand (%)
1
0.8
0.6
0.4 ADAM LTCY RAND 0.2
0
100
200
300
400
250
225
200
175
150
500
ADAM LTCY RAND
0
100
server count
400
500
Scalability of the algorithms in terms of a server count up to 500, for satisfied demand and network latency
1
250
0.9
225
0.8
0.7
0.6
0.5
Figure 5.
300 server count
average network latency (ms)
average satisfied demand (%)
Figure 4.
200
ADAM LTCY RAND 0
100
200 300 service count
400
500
200
175
150
125
ADAM LTCY RAND 0
100
200 300 service count
400
500
Scalability of the algorithms in terms of a service count up to 500, for satisfied demand and network latency
terms of network latency. Both algorithms perform significantly better than RAND, in terms of satisfied demand and network latency. 2) Services: The simulation results for scalability in terms of service count, are shown in Fig. 5. In contrast to the results for server count, the average satisfied demand is indirectly proportional to the service count, and average network latency directly proportional to it. Nevertheless, the reason for this result is the same. If the total number of different services is low, there are enough servers available to start up enough service instances of each service. As the service count grows, the number of available servers is no longer adequate to start up service instances for all services, thus the satisfied demand drops and the network latency rises. Again, LTCY outperforms ADAM in terms of average satisfied demand (up to 6% better), while ADAM gives the best performance in terms of average network latency (at most 10 ms or 5% faster). The difference between the algorithms, both for satisfied demand and network latency, grows as the number of services increases. Statistically, LTCY performs significantly better than ADAM in terms of satisfied demand for more than 100 services. On the other hand, ADAM significantly outperforms LTCY in terms of network latency for any number of services. Both algorithms perform significantly better than RAND for any number of services, both in terms of satisfied demand and network latency.
VI. C ONCLUSIONS In this paper, we proposed a scalable, dynamic and decentralized service placement algorithm, which assigns server resources to a set of services. Unlike existing algorithms, it takes into account the network latency of the underlying physical network topology. This allowed us to explore a new deployment scenario, where servers were spread across the Internet, instead of being clustered together in data centers. In this case, network latency between servers has a visible effect on the delay of client requests, and should thus be taken into account when deciding on which servers to place service instances. Additionally, a detailed description was given of the changes needed to an existing service hosting architecture and its protocols to support our new latency-aware service placement algorithm. Extensive simulations were performed, using the PlanetSim peer-to-peer network simulator. These allowed us to draw several important conclusions concerning the use of latency information in service placement algorithms. First and foremost, contrary to what was expected, the latency-aware algorithm actually performs better than its latency-unaware variant in terms of satisfied demand, but worse in terms of average network latency of client requests. Second, the proposed latencyaware algorithm performs well in a scenario where enough CPU resources are available, but its performance degenerates as these resources become scarce. Third, as long as enough
service instances can be run in the network, satisfied demand and network latency do not degenerate as more servers join the network. This allows us to conclude that the algorithms are in fact scalable in terms of the number of available servers. And finally, both the latency-aware and -unaware algorithm perform, in almost every case, significantly better than the trivial random service placement. Currently, the PlanetSim simulation environment does not contain an implementation of our dynamic node activation mechanism, of the overlay topology construction component. An evaluation of this component and its effect on the other components of the architecture will therefore be performed in future work. ACKNOWLEDGMENTS We would like to thank Constantin Adam and Rolf Stadler for their contributions to our simulator implementation. R EFERENCES [1] C. Adam, R. Stadler, C. Tang, M. Steinder, and M. Spreitzer, “A service middleware that scales in system size and applications,” in 10th IFIP/IEEE International Symposium on Integrated Management (IM’07), 2007, pp. 70–79. [2] C. Adam and R. Stadler, “Service middleware for self-managing largescale systems,” IEEE Transactions on Network and Service Management (TNSM), vol. 4, no. 3, pp. 50–64, 2007. [3] C. Tang, R. N. Chang, and E. So, “A distributed service management infrastructure for enterprise data centers based on peer-to-peer technology,” in IEEE International Conference on Services Computing (SCC’06), 2006, pp. 52–59. [4] J. Famaey, T. Wauters, F. De Turck, B. Dhoedt, and P. Demeester, “Towards efficient service placement and server selection for large-scale deployments,” in 4th Advanced International Conference on Telecommunications (AICT’08), 2008, pp. 13–18.
[5] A. Karve, T. Kimbrel, G. Pacifici, M. Spreitzer, M. Steinder, M. Sviridenko, and A. Tantawi, “Dynamic placement for clustered web applications,” in 15th International Conference on World Wide Web (WWW 06), 2006, pp. 595–604. [6] S. Voulgaris, D. Gavidia, and M. van Steen, “CYCLON: Inexpensive membership management for unstructured P2P overlays,” Journal of Network and Systems Management, vol. 13, no. 2, pp. 197–217, 2005. [7] C. Tang, R. N. Chang, and C. Ward, “Gocast: Gossip-enhanced overlay multicast for fast and dependable group communication,” in Conference on Dependable Systems and Networks (DSN 2005), 2005, pp. 140–149. [8] T. S. E. Ng and H. Zhang, “Predicting internet network distance with coordinates-based approaches,” in Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies, vol. 1, 2002, pp. 170–179. [9] P. Francis, S. Jamin, C. Jin, Y. Jin, D. Raz, Y. Shavitt, and L. Zhang, “IDMaps: A global internet host distance estimation service,” IEEE/ACM Transactions on Networking, vol. 9, no. 5, pp. 525–540, 2001. [10] F. Dabek, R. Cox, F. Kaashoek, and R. Morris, “Vivaldi: A decentralized network coordinate system,” in ACM SIGCOMM Computer Communication Review, vol. 34, no. 4, 2004, pp. 15–26. [11] J. Famaey, T. Wauters, F. De Turck, B. Dhoedt, and P. Demeester, “Dynamic overlay node activation algorithms for large-scale service deployments,” in 19th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM’08), 2008. [12] P. Garcia, C. Pairot, R. Mondejar, J. Pujol, H. Tejedor, and R. Rallo, “PlanetSim: A new overlay network simulation framework,” Lecture Notes in Computer Science (LNCS), vol. 3437, pp. 123–137, 2005. [13] T. Hill and P. Lewicki, Statistics: Methods and Applications. StatSoft, Inc., 2006. [14] A. Medina, A. Lakhina, I. Matta, and J. Byers, “BRITE: An approach to universal topology generation,” in Proceedings of the Internation Workshop on Modeling, Analysis and Simulation of Computer and Telecommunications Systems (MASCOTS 01), 2001. [15] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, “Web caching and zipf-like distributions: Evidence and implications,” in 18th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM’99), vol. 1, 1999, pp. 126–134. [16] P. Backx, T. Wauters, B. Dhoedt, and P. Demeester, “A comparison of peer-to-peer architectures,” in Eurescom Summit 2002 Powerful Networks for Profitable Services, 2002, pp. 215–222.