PROVISIONING VIRTUAL IPTV DELIVERY NETWORKS USING HYBRID GENETIC ALGORITHM Suliman Mohamed Fati
Rahmat Budiartu
Putra Sumari
Multimedia Research Group School of computer sciences Univeristi Sains Malaysia,11800, Pulau Pinang, Malaysia
Networked Computing Center Surya University Serpong, Indonesia
Multimedia Research Group School of computer sciences Univeristi Sains Malaysia,11800, Pulau Pinang, Malaysia
[email protected]
[email protected]
[email protected]
ABSTRACT Nowadays, IPTV services are delivered over physical delivery networks, which suffer from the lack of flexibility in control and management. Virtual delivery networks, as a cost-effective alternative, tend to be the dominant delivery option in the near future due to its advantages for service providers. However, virtual networks provisioning is more complicated as it integrates resources allocation, resources diminishing, and content replication interdependently. In this paper, we investigate the problem of virtual IPTV delivery network provisioning over the recent architecture, peer-service area architecture. The Genetic Algorithm as an optimization tool has been used to find the optimal provisioning parameters including storage, bandwidth, and CPU consumption. The experiments have been conducted on two data sets with different popularity distributions. The experimental results showed the impact of content popularity on the provisioning process.
Categories and Subject Descriptors H.5.1 [Multimedia Information Systems]: Video.
General Terms Algorithms, Performance.
Keywords IPTV, Virtual Delivery Networks, Content Popularity, VDN provisioning, Hybrid Genetic Algorithm.
1. INTRODUCTION In today’s IPTV delivery networks, a set of physical servers are placed in different geographic locations to serve as large as possible of customers. Although the distributed nature of delivery networks, the whole network is owned and managed by a single owner who has a full control on that delivery network. In contrast, virtual Delivery networks give the service provider more cost effective option to allocate their contents with a high level of flexibility [1]. Moreover, the service provider will be able to run Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. IMCOM (ICUIMC)’14, January 9–11, 2014, Siem Reap, Cambodia. Copyright 2014 ACM 978-1-4503-2644-5 …$15.00.
his own virtual delivery network with full control without a need to build a more demanding infrastructure. Thus, the drift in the near future for IPTV is to deliver IPTV services over a virtual delivery networks that are cost-effective and more flexible in the control and administration. Likewise, the concept of virtual network works side by side with the concept of Next Generation Networks (NGNs), which delivers different services over a unique network according to the concept “All-Over-IP”[2,3]. The main idea behind virtual deliver networks is "pay as you use"; which means that the virtual delivery networks are cost-effective. The other aspect of cost effectiveness in virtual network is the ability to adjust the capacity of resources and bandwidth on the fly, or at least periodically, according to the fluctuation in demand. Yet, delivering IPTV services over virtual networks allows the service provider to benefit from the advantages of many physical networks [4]. This is done by allocating the virtual servers, which leased from different physical network providers, and connecting these virtual servers via virtual links. However, such virtual networks require more effort in the process of network provisioning. Such provisioning process includes selecting the best virtual servers, selecting the appropriate links between these servers, and replicating the contents among these servers according to the listed constraints. In virtual networks, the service provider should lease the resources, as he needs when he needs. Thus, the replica placement problem should be integrated with server placement for cost effective virtual delivery networks. This integration allows the service provider to adjust the required resources according to the necessitated replication scheme. However, combining the selection of virtual server site with content replication in virtual networks provisioning represents a challenge due to [1]. ¾
Deciding the type and amount of leased resources depends completely on the contents that should be replicated and their popularity distribution.
¾
Replicating the contents in the present delivery networks assumes that the content must to be replicated among the available dedicated resources. In comparison, the virtual networks aims to minimize the service cost by leasing the resources rather than establishing physical ones. Thus, leasing the resources reduces the cost and accordingly, delivers the service with acceptable QoS level and a minimum delivery cost.
Building a virtual network according to the peer-service area architecture, which proposed by [5] and augmented by us in our previous work [6], is more sophisticated. In this architecture, the
service provider has to lease a set of servers in each service area with appropriate interlinks. Over this virtual network, the contents must be replicated according to their expected load. The expected load is estimated in our proposed load prediction model [6]. To the best of our knowledge, there is no work address the provisioning issues of virtual IPTV delivery network over this recent architecture. Thus, we aim in this paper at addressing the virtual network provisioning issues to help the service providers in building their own optimal virtual delivery network over a heterogeneous substrate networks.
2. BACKGROUND AND RELATED WORK Many researchers have investigated the problem of building a cost-effective delivery networks for the internet or even for Video on Demand (VoD). The main flaw of these studies is they consider the problems of resource allocation or content allocation (also replica placement problem) independently. Solving these two problems independently produces suboptimal solutions due to the direct effect of replicas locations on the amount of leased resources, and vice versa [7]. Moreover, allocating the storage and bandwidth resources at each location for the virtual networks has an impact on finding optimal solutions. Integrating the replica placement problem with the server placement is suggested to build a framework for network provisioning in both the physical and virtual networks. Nakaniwa et al. [8] introduces a mathematical model that joins the content replication with server placement for the hierarchical architecture. Their aim was to maximize the system reliability of delivery network. They introduced an integer programming technique to maximize the reliability of the system by finding an optimal scheme for the servers and content allocation that maximizes the reliability of all the paths from each user to each file locations. They applied their model on real delivery network for internet service provider in Japan called "BBit-Japan" with subject to delay, storage and/or transmission cost constraints. Nguyen et al. [1] addressed the problem of virtual network provisioning for multimedia contents to build a cost-effective virtual delivery network for distributed architecture. Their framework aims to build such a virtual delivery network over a substrate network. Provisioning such virtual networks is very difficult due to the need to integrate the replica placement with server allocation. For that, they formulated the provisioning problem as a combination between content replication and resource allocation. To solve this problem, they proposed a Lagrangian heuristic algorithm. In a similar manner, Houidi et al. [4] proposed a dynamic virtual network creation according to the load and the available resources. They tried to provision their virtual network over multiple substrate networks by proposing exact and heuristic algorithms. The proposed model composed of request splitting algorithm, embedding algorithm. Splitting process aims at helping the virtual network provider to distribute the incoming requests among the available substrate networks providers. The authors proposed two requests splitting algorithms; maxflow min-cut algorithm and linear programming algorithm. Based on the splitting process, embedding process can be started to assign virtual nodes and links into each substrate network provider simultaneously. The authors formulated the virtual network embedding process to be solved by means of mixed integer program. The aim of the proposed embedding algorithm is to
increase the requests acceptance rate and to decrease the leasing cost of infrastructure as well. Li and Wu [5] proposed a heuristic algorithm that finds the optimal number of popular contents and the optimal number of server to store these contents. Their model ignores the contents’ replication and the load balancing.
3. VDN PROVISIONING PROBLEM FORMULATION VDN provisioning process aims at building optimal virtual network architecture side by side with replicating IPTV contents over this network to satisfy the contents demand according to the available resources. This problem is considered an optimization problem, which requires the knowledge about the available resources and the contents requirement in terms of processing power, storage space, and bandwidth capacity. Each content in IPTV requires a storage space, processing power, and bandwidth capacity proportional to its popularity. Thus, replicating the contents according to their popularity distribution gives the optimal replication scheme as well as the optimal VDN topology. In VDN provisioning process, the information about the available resources and the contents requirement are gathered from the network/resources providers and service providers respectively. The service provider is responsible for doing the provisioning process based on his requirements. Upon the completion of provisioning process, he sends the process results to the resources provider to create the virtual topology. In case of the existence of resources broker, he can complete the whole process according the given information from both parties. The general framework of provisioning process can be summarized as: ¾
The information on content popularity distribution and the demand of each video are gathered from the service provider.
¾
The information on the resources from each substrate network is provided by the resources providers.
¾
The desired optimization tool is employed to compute the optimal topology including the replication pattern.
¾
The resultant topology is passed to the resources providers or the resource broker to create the virtual delivery network.
In this paper, peer-service area architecture is considered to deliver IPTV services. This architecture consists of a set of service areas. Each area contains a server cluster with a load dispatcher in the front of cluster to redirect the request. Each customer belongs to his/her nearest service area. Thus in each service area, the service provider must know the locations and capacity limits of potential servers, the potential links among servers, the leasing cost for each network provider. The aim of the service provider is to find the best and optimal virtual network topology, among multiple resources providers, which minimize VDN provisioning cost. The optimal VDN topology includes the locations of virtual servers with the virtual inter-links among them in addition to the processing power, storage space, and bandwidth capacity at each location. To achieve such optimal topology, the service provider must know the replication degree of each video according to its popularity. Moreover, the content requirement in terms of processing power, storage, and bandwidth must be identified clearly, as these required resources and the locations capacities are interdependent. This interdependency emphasizes the need to
address the topology, resources dimensioning, and replica placement jointly to achieve the optimal VDN topology.
decide which location must be belonged within the optimal topology while, the second one to govern the replication pattern.
Therefore, VDN provisioning problem over peer-service area architecture can be modeled as a set of inter-connected subgraphs. Each sub graph cover one service area and consist of a set of nodes and the inter-links among these node. Each node in the sub-graph represents a potential server location that characterized by processing power, storage space, and bandwidth capacity. Each potential node has a limitation on its processing power ݆ݑ, storage space݉ , and bandwidthܾ .
ͳ݂݅݀݁ݐ݈ܿ݁݁ݏݏ݅ݏܽ݁ݎܽ݁ܿ݅ݒݎ݁ݏ݆݊݅݁ݐ݅ݏ ݕ௦ ൌ ቄ (1) Ͳ݁ݏ݅ݓݎ݄݁ݐ
To facilitate the communication among the service areas, the subgraphs are connected using the inert-service areas links. These links enables the customers in any service area to redirect unavailable content’s request to other service area. The cost of these links can be obtained from the redirection process in the replication pattern as will be explained later in this section. In the provision process, we will distinguish between two terms virtual server cost (ܥ ) and replication pattern cost (ܥோ ).
3.1 Virtual Server Cost (ࡸ )
A single VDN can lease virtual servers from different physical network providers. Each physical network provider has own pricing plan for leasing the resources. The pricing plan includes the cost of configuring the virtual server at first time, deploying software, and maintaining the virtual servers including the backup and security. Moreover, the cost of virtual servers’ inter-links should be considered. Therefore, the virtual servers cost should be the cost of leasing, deploying, and maintaining, and interconnecting the virtual servers, which belong to different physical network providers.
3.2 Replication Pattern Cost (ࡾ)
The contents replication represents the main challenge in the provisioning process. In this step, the service provider has to determine the best content replication which determines the best topology. The optimal topology must allocate the contents with a minimum cost. For that, virtual provisioning process must involve an optimization process that produce the optimal network topology taking into account the available resources and their cost as well as the available contents and their popularity distribution. In the second part of provisioning cost, the content replication cost must be envisaged. The replication cost consists of two parts, hosting cost and redirection cost. The hosting cost determines the cost of replicating and serving the popular contents including storage cost, the processing cost, and the bandwidth cost. On the other side, the redirection cost determines the cost of requesting unpopular content from other service areas in case of not replicating them due to the resources limitations. The content replication in peer-service area architecture for IPTV can be summarized as follows: based on the user behavior profile in each service area, the number of copies for each object must be computed. These copies is allocated among the cluster such that each copy must be assigned to only one server based on the criteria of cost, expected load, server’s capacity, object’s popularity, and users’ request distribution pattern to produce a prescription of which contents have to be stored inside each service area, their replication degree, and where to be stored.
3.3 Decision Variables To formulate the VDN provisioning problem, we define two binary decision variables. The first binary variable is employed to
ݔ௦ ൌ ቄ
ͳ݂݅ܿݏܽ݁ݎܽ݁ܿ݅ݒݎ݁ݏ݆݊݅݁ݐ݅ݏݐ݊݅݀݁ݐ݈ܽܿ݅݁ݎ݅ݐ݊݁ݐ݊ Ͳ݁ݏ݅ݓݎ݄݁ݐ
(2)
The binary variable xijs is used also an indicator to show how many copies are allocated for the content in each service area where xijs checks the existence of content in each server inside the service area. Using the parameters and decision variables, which illustrated in the previous sections, we can formulate this problem as a constrained mixed-integer linear programming model, which is NP-Hard problem. Minimize: ்ܥൌ ܥ ܥோ
(3)
Subject to σא ݉ ݔ ݉ ܮܲ א ݆ǡ ( ܣ א ܽ4) σאே ݑ ݔ ݑ ܮܲ א ݆ǡ ܣ א ݏ
(5)
σאே ܾ ݔ ܾ ܮܲ א ݆ǡ ܣ א ܽ
(6)
σא ݔ௦ ܲܮ ܰ א ݅ǡ ܣ א ܽ
(7)
σאೌ ݔ ൌ ܮܲ כ ܰ א ݅ǡ ( ܣ א ܽ8) ݔ אሾͲǡͳሿܰ א ݅ǡ ܮܲ א ݆ǡ ( ܣ א ܽ9) ݔ ൌ Ͳܰ א ݅ǡ ܮܲ ב ݆ ǡ ܣ א ܽ
(10)
ݔ ݕ ݂݈݈݅ܽݎǡ ݆ ( ܽ אif site j selected, it should replicate the contents.) (11)
ݕ ൌ ܲ ݏ݊݅ݐ݈݂ܽܿݎܾ݁݉ݑ݂݊݀݁݊݅݁݀݁ݎܽݏ݅ܲ݁ݎ݄݁ݓ ୀଵ
(12) In this model, the cost function is illustrated in equation (3) as the sum of virtual server cost and replication cost. Constraints (4-6) enforce the server’s capacity of storage, processing power, and bandwidth, respectively at each potential site. These constraints (4-6) state that the total storage space, processing power, bandwidth of all contents replicated at server j should not exceed the capacity of server j. The number of copies for content i inside a service area s has to be less than or equal the number of virtual servers ܲܮ in that service area as depicted in equation (7) and also should bounded by the popularity pi of that content as depicted in equation (8) . Finally, the integrality and nonnegativity constraints are presented in equations (7) and (8).
4. THE PROPOSED SOLUTION According to [9], there is no algorithm with a polynomial time complexity can solve the NP-hard and NP-complete combinatorial optimization problems. In other words, there is no algorithm can
produce exact or guarantee that the optimal solution will be found for those problems. On the other hand, the heuristic algorithms like branch and bound algorithm, greedy algorithm, and dynamic programming suffer from the curse of dimensionality [9]. Moreover, the local search algorithms like hill climbing suffer from the premature convergence [10]. For instance, greedy algorithm finds the nearest local optima of low quality. The main goal of evolutionary algorithms is to overcome these problems and finds the near-optimal solution. These evolutionary algorithms (e.g. Genetic Algorithm) become faster and more effective once they are combined with heuristic or local search techniques [11]. Thus, we integrate the Genetic Algorithm with heuristic repair algorithm to reduce the search space and to improve the ability of proposed model as well. The aim of this integration is to produce as optimal solutions as possible. To solve the problem of VDN provisioning, we employed the hybrid genetic algorithm, as depicted in figure 1. In this model, the objective function is to minimize the hosting cost with respect to the listed constraints. In VDN provisioning, there are a set of candidate locations to construct virtual servers in each service area. We have to select a subset of these locations to replicate the contents over them based on the predefined criteria. For this problem, we employed the binary encoding for the chromosomes. In such binary encoding, the chromosome is represented as a string of ones and zeros. This binary string represents the candidate solutions, in which each cell indicates for the replica position in the service area. Start End Initialize population HRR algorithm
Create new population
RSO algorithm
Y Evaluate
Select
Elitism
Crossover
Mutate
After applying the restricted search operator (RSO), which determines the number of selected servers, Heuristic Repair (HHR) Algorithm checks each chromosome by counting the number of replicas for each video in each service area. For instance, if VDN comprising of 2 service areas with two potential locations for each and the number of contents equals to 5, then the total length of chromosome equals to 20 genes. Each cell in the chromosome holds a binary value ୧୨ୟ אሼͲǡͳሽ that indicates whether or not the object i is replicated at the server j inside the service area a. The position of gene can be identified according to the following equation (13)
For example, the gene 13 holds a replica for the 3rd content into the first server in the second service area a. We can find this using equation (13) as follows.
N
Terminate?
In our problem, we adapt the restricted search algorithm to choose a certain number of servers where each server is represented as a string of binary values instead of a set of genes. In other words, the restricted search operator in our case will work on a set of bulks instead of a set of genes. Each bulk in our case consists of a set of sequential genes. Each potential location j in the service area ܽ consists of a set of sequential genes starting from the gene number ൫ሺܽ െ ͳሻ ܮܲ כ ሺ݆ െ ͳሻ൯ ܰ כ ͳ and ending with the gene number൫ሺܽ െ ͳሻ ܮܲ כ ሺ݆ െ ͳሻ൯ ܰ כ ܰ. The restricted search operator will implement the constraint 12 as follows: depending on the predefined number of locations (P), the restricted search operator will choose randomly P bulks in each service area to be unchangeable. After that, the rest bulks in the service area will be change to zero. In this case, the location is unselected, if all values of cells that belong to this location (i.e. virtual server) equal zero.
൫ ୧୨ୱ ൯ ൌ ൫ሺ െ ͳሻ כୟ ሺ െ ͳሻ൯ כ
N Feasible?
implemented to select two chromosomes at each time. The uniform crossover operator with a standard crossover probability is applied on the selected pair. After that, the reproduced offspring are mutated, with an extremely small mutation probability, to ensure the diversity in search space. During the reproduction process, the new offspring may lie in the region of unfeasible solutions; therefore, two heuristic repair algorithms are used to repair this offspring to be obeyed by the proposed constraints restricted search operator and heuristic replica repair algorithm.
Y
Figure 1: Hybrid GA for VDN provisioning The initial population of this problem is generated randomly by selecting a random binary number for each gene in the chromosome. During the evolution process, a mating pool is
ሺ͵ǡͳǡʹሻ ൌ ൫ሺʹ െ ͳሻ ʹ כ ሺͳ െ ͳሻ൯ כͷ ͵ ൌ ͳͲ ͵ ൌ ͳ͵
5. EXPERIMENTAL RESULTS To investigate the performance of our proposed provisioning model, we have tested the model on two empirical data sets. Both data sets are sampled according to the content popularity distribution. The popularity distributions used are Zipf’s like distribution (Zipf) and the other one is called exponential distribution (DF), which is presented in [5] and shown in equation 14. ൌ
షሺሺషభሻοሻ
మȀమ మ Ȁమ
షሺሺషభሻοሻ σ సభ
(14)
The experimental results proved that the popularity distribution affect the provisioning parameters like exploited storage, Bandwidth, and CPU consumption. In this paper, we will show the effect of popularity distribution on the exploited storage space only.
Figure 2depicts the popularity distribution of both data sets with different skewness values 0.01, 0.04, and 0.08. According to this figure, the curve of small skewness values (i.e. 0.01, 0.04, and 0.01) indicates that the contents popularity are distributed in uniform manner, which means that all contents have the same popularity value. However, the curve tends to be more skewed when the skewness value increased. Notably, the exponential distribution tends to be more skewed than Zipf’s distribution when the skewness value increased (e.g. skewness value equals 0.08). As depicted in the same figure, 36% and 66% of contents are popular for exponential and Zipf’s distributions, respectively when the skewness value equals 0.08. zipf0.01
0.07
zipf0.04
0.06
zipf0.08
0.05
5
zipf
df
4 3 2 1 0
Figure 4: Number of replicas at low skewness exponential distribution, which leads to exploit a less storage space for Zipf distribution contents.
df0.01
0.04
df0.04
0.03
df0.08
0.02 0.01 0
Figure 2: popularity distribution of both data sets Figure 3 plots the exploited storage space versus different skewness degrees for both popularity distributions. The figure shows that the exploited storage space tends to decrease when the skewness degree tends to increase. However, exponential distribution decreases faster than Zipf distribution. Particularly from the same figure, Zipf distribution decreases from 58 GB to 44 GB while exponential distribution decreases from 82 GB to 22 GB for the skewness values (0.01 – 2).
4.5 4 3.5 3 2.5
zipf(1)
df (1)
2 1.5 1 0.5 0
Figure 5: number of replicas at high skewness
ϴϬ
In contrast, the exploited storage space for both types is decreased at the higher skewness value (e.g. 1 and 2) due to that only very small percentage of contents will be replicated in the service area, as depicted in Figure 5. However, Exponential Distribution exploits a storage space slightly less than Zipf distribution at higher skewness values. This is due to that exponential replicated a percentage of contents slightly smaller than the percentage of contents replicated by Zipf’s distribution, as depicted in Figure 5.
ϲϬ
CONCLUSION
ϭϬϬ
storage(zipf)
storage(df)
ϰϬ ϮϬ Ϭ
Figure 3: Exploited Storage versus skewness degrees This behaviour can be interpreted as follows. In case of exponential distribution, the contents are close to each other in their popularity, which leads to replicating more contents in the service area. Moreover, the number of replica of each content is high because of the closeness between popularity values is higher than Zipf’s distribution at the same low skewness values. Such behavior leads to increase the number of replica for the replicated contents, as depicted in Figure 4. Consequently, the required storage space definitely becomes high. From the same figure, the number of replicas for Zipf distribution is less than that of
Delivering IPTV services over virtual networks is the trend in the near future due to its advantages in terms of flexibility, full control, and cost-effectiveness. However, provisioning such delivery networks is challengeable and needs a careful integration between virtual resources allocation, resources dimensioning, and content replication. Provisioning IPTV delivery networks over peer-service area is conducted in this paper using Genetic Algorithm. The experimental results proved that the dominant factor in the provisioning process is the contents popularity. One of the directions in our future work is integrating the request redirection with provisioning process. Considering the request can produce better results as estimating the workload based on the real workload rather than anticipated workload.
REFERENCES [1] Nguyen, H. V., Safaei, F., Boustead, P., and Chou, C.T. 2005. Provisioning overlay distribution networks. Comput. Netw. 49, 1 (September 2005), 103-118. DOI=10.1016/j.comnet.2005.04.001.
[2] Mikoczy, E., and Podhradsk, P.2009. Evolution of IPTV Architecture and Services towards NGN. In Recent Advances in Multimedia Signal Processing and Communications (231), Grgic, M. , Delac, K., Ghanbari, M. (Eds.) ,Studies in Computational Intelligence, Springer Berlin / Heidelberg, pp. 315-339. [3] Esaki, S., Kurokawa, A., Matsumoto, K. 2007. Overview of the Next Generation Network, NTT Technical Review 5(6). [4] Houidi, I., Louati, W., Ameur, W., and Zeghlache, D. 2011. Virtual network provisioning across multiple substrate networks. Computer Networks 55(4): 1011-1023. [5] Li, M., and Wu, C. 2010. A cost-effective resource allocation and management scheme for content networks supporting IPTV services, journal of Computer Communications 33(1): 83-91. [6] Gaber, S. M. A., and Sumari, P. 2012. Predictive and contentaware load balancing algorithm for peer-service area based IPTV networks. Multimedia Tools and Applications, Springer, 1-24.
[7] Thouin, F. 2007. Video-on-demand Equipment Allocation, Master thesis, McGill University Montreal, Canada. [8] Nakaniwa, A., and Ebara, H. 2007. Optimal allocation of cache servers and content files in content distribution networks. In Proceeding of European Conference on internet and multimedia systems and applications, ACTA Press, Anaheim, CA, USA, pp. 15-22. [9] Yu, X., and Gen, M. 2010. Introduction to Evolutionary Algorithms, Springer. [10] Kokash, N. 2005. An introduction to heuristic algorithms. Technical report, Department of Informatics and Telecommunications. University of Trento, Italy. [11] Osorio, H., and Luis G. 2009. A genetic algorithm with repair and local search mechanisms able to find minimal length addition chains for small exponents. In the proceeding of the IEEE Congress on Evolutionary Computation (Trondheim, Norway, May 18-21, 2009). CEC'09. IEEE, 1422-1429. DOI= http://dx.doi.org/10.1109/CEC.2009.4983110