Cache Location in Tree Networks: Preliminary Results

2 downloads 189 Views 51KB Size Report
In our model, only the most popular contents are stored in the caches. The hierarchical effect ... One key feature of a CDN is an equipment called cache. A cache ...
Cache Location in Tree Networks: Preliminary Results Bauguion Pierre, Ben Ameur Walid & Gourdin Eric

Abstract One popular approach to overcome the expected congestions due to the spectacular development of various multimedia applications consists in installing transparent caches at strategically chosen places inside telecommunication networks. The problem of locating caches is a difficult optimization problem, closely related to the p-median problem. In the case where the network is a tree, some cache location problems have already been investigated. In this paper, we propose to refine these models by taking into account a dynamic effect due to cache replacement policies. In our model, only the most popular contents are stored in the caches. The hierarchical effect of several successive caches is also captured by the model. A Mixed Integer Programming model and a Dynamic Programming algorithm are proposed and compared on a preliminary set of numerical experiments.

1 Introduction One of the major trends in modern telecommunications is the spectacular development of a wide range of multimedia services. As a direct consequence, one can expect a huge increase in terms of bandwidth consumption and storage resources. Maintaining a reasonable level of QoS (Quality of Service) is probably one of the

Bauguion Pierre Orange Labs, France Telecom R&D, 38-40 bd du G´en´eral Leclerc, 92794 Issy-les-Moulineaux Cedex 9, e-mail: [email protected] Ben Ameur Walid Institut Telecom, Telecom SudParis, UMR CNRS 5157, 9 rue Charles Fourier, 91011 Evry, France, e-mail: [email protected] Gourdin Eric Orange Labs, France Telecom R&D, 38-40 bd du G´en´eral Leclerc, 92794 Issy-les-Moulineaux Cedex 9, e-mail: [email protected]

1

2

Bauguion Pierre, Ben Ameur Walid & Gourdin Eric

main challenges for the telecommunication network operators and various service providers. Many actors in the Internet community are considering various potential solutions to cope with the problem of traffic increase. The term CDN, for Content Delivery Network [12], seems to federate most of the current activities related to efficient content distribution. One key feature of a CDN is an equipment called cache. A cache deployed in some transit node can intercept (i) a request made by clients for certain content, and (ii) the content itself when it is forwarded back from a central server. The cache will store some of the content (ii) according to a certain replacement policy and answer directly some of the requests (i) if it is currently storing the required content. The replacement policy is an algorithm that tells the cache which content should be removed when a new content has to be stored and its storage capacity limit is reached. The most well-known of these policies are LFU (Least Frequently Used) and LRU (Least Recently Used). The choice of the right policy seems to be essential for a cache to be efficient. A considerable effort has already been spent to analyze, measure and model the efficiency of a cache (the socalled hit ratio, i.e., the percentage of requests that can be answered by the cache) [1, 5, 6, 8, 13]. The effect of a cache is to reduce the load on upstream links and to reduce the delays to access to contents. Apart from its intrinsic efficiency, to be effective, a cache must also be deployed at the right place in a network (if it is too close to the server, it won’t alleviate much links, and if it too close to the clients, then the number of caches to deploy might become huge). The location of caches can hence easily be modeled as an optimization problem, and cache and content location problems have also been quite intensively investigated [2, 3, 4, 7, 10, 11]. Note that most of these optimization model extend a well-known location problem called p-median [14] and for which polynomial algorithms have been proposed in the case where the network is modeled by a tree [9, 15]. In this paper, we propose an algorithmic approach to determine an optimal architecture where transparent caches are installed in a tree network. The root of the tree represents the central server which stores all contents, and each leaf represents a client (or an aggregated set of clients). We will consider probabilistic requests, weighted by the amount of clients performing the same demand. We will also include a capacity constraints on the links. Last and not least, each cache is supposed to contain the most popular contents taking into account the fact that some content might be already stored on other caches that are installed in the subtree rooted at the cache.

2 Problem Statement We consider a telecommunication network that can be modeled by an arborescence (directed tree) T = (V, A) rooted at a special node s ∈ V representing the central server. Without loss of generality, we will assume throughout the paper that the

Cache Location in Tree Networks: Preliminary Results

3

service is a Video-on-Demand (VoD) service and the data are hence videos. A subset T ⊂ V of vertices, called terminals, represent the clients of a given data service. For each vertex v ∈ V , we denote δ + (v) the set of sons of v in the arborescence (the vertices that come next to v in the paths from s to the clients). If δ + (v) = 0, / then v is a leaf. We denote by L ⊂ V the set of leaves. We assume that L = T (every leaf is a terminal and every terminal is a leaf). Let S = V \ T . We associate with each arc a ∈ A, a weight wa ≥ 0 which can either represent a distance or a unit cost (for transporting 1 Mb of data for instance). We assume that there is a finite set D = (d 1 , . . . , d m ) of data (=videos) available in the central server s. Each client j ∈ T associates to each video k ∈ K = {1, . . . , m}, a number pkj ≥ 0, representing the popularity of the video k (depending on the number of requests of the video). Remember that a client can represent an aggregation of many clients. In the cache location problem, we have to decide where to install caches in the network. We assume that each installed cache has a capacity of p (it can store up to p videos) and a cost of cv if it is installed on vertex v. In some instances, we might assume that all installation costs are the same: cv = c for all v ∈ V . To model the behavior of the cache with respect to the dynamic arrival of requests, we will assume that the videos present in the cache are the p most popular ones, the popularity being measured at each potential location (a tree vertex) and taking into account the placement of other caches . The problem is illustrated on a small 7 nodes instances in Figure 1. S

S

[3,0]

[0,8]

1

1

[0,4]

2

3

[0,4]

[3,2]

[3,2]

2

[3,2] [3,2]

[0,4] [3,4]

[0,2]

[3,2]

3

[3,2]

[3,2]

4

5

6

7

4

5

6

7

[3,2]

[3,2]

[3,2]

[3,2]

[3,2]

[3,2]

[3,2]

[3,2]

Fig. 1 A cache location problem instance with m = 2 and p = 1 . The quantities below the leaves represent the popularities for both videos. The unit bandwidth cost is equal to 1 and each cache has a cost of 10. On the left, a feasible solution where caches are installed on nodes 1, 2 and 6. The most popular video is stored in the nodes 2 and 6 whereas the least popular one is stored in the root node. This solution has a cost of 61. An optimal solution, on the right, has a total cost of 56.

3 A Mixed Integer Programming Model We use two sets of binary variables: the binary variable y j is equal to 1 iff there is a cache located at vertex j and the binary variable xkj is equal to 1 if the video k is in a

4

Bauguion Pierre, Ben Ameur Walid & Gourdin Eric

cache at node j. The flow variable fvk represent the popularity of video k at vertex v. This value results from popularities expressed by the sons of v. Our cache location problem can then be modeled by the following MIP: min ∑ c j y j + ∑



k∈K i∈V \{s}

j∈V

w(δ − (i)i) fik ,

xkj ≤ y j ,

subject to: m

∑ xkj = py j ,

k=1 fik =

j ∈ V, k ∈ K,

(2)

j ∈ V,

(3)

i ∈ T, k ∈ K,

(4)

f jk ,

i ∈ S, k ∈ K,

(5)

xkj , y j ∈ {0, 1}, f jk ≥ 0,

j ∈ V, k ∈ K.

(6)

fik

pki (1 − xik ),

(1)

= (1 − xik )



j∈δ + (i)

The objective function integrates the installation cost and the access cost. We used (δ − (i)i) to denote the arc from δ − (i) to i. Constraints (2) and (3) are standard location constraints stating that a video can only be stored at a node where a cache is installed and a cache can contain at most p videos. Constraints (4) express the fact that the popularity for video k that is sent by a leaf node i to its parent is pki if xik = 0 and 0 otherwise. The next constraints (5) have a similar role for Steiner nodes thus allowing to propagate the popularities along the tree depending on the cache locations. Note that these constraints are non-linear, but they can easily be linearized using standard techniques. Additional constraints should be introduced into the model to ensure that, at node i ∈ V , only the p most popular videos can be stored in a cache. To simplify the expressions, let Pik = ∑ j∈δ + (i) f jk be the potential popularity for video k at node i. ′

The new constraints should model the fact that, if Pik > Pik (video k is more popular ′ than video k′ at node i) and xik = 1 (video k′ is stored in node i), then we must have xik = 1. This can be achieved by the following constraints: ′



(xik − xik )(Pik − Pik ) ≥ 0.

(7)

These constraints can also be linearized using standard techniques (For example Big M linearization).

4 A Dynamic Programming (DP) Approach Several Dynamic Programming approaches have been proposed for location problems in trees (for the p-median problem [15] and for caches with a multicast protocol [11]). We propose an heuristic for our problem based on dynamic programming. Due to space limitation we only give a sketch of the algorithm. For each node i, the possible values of the popularities fik depend on the popularities propagated by

Cache Location in Tree Networks: Preliminary Results

5

its sons. Then, if we know for each son of v all possible situations (depending on the caches installed in the subtree rooted at the son), we can build the set of all possible situations for i. To reduce the complexity of the procedure, many solutions are eliminated based on cost consideration and domination between solutions. Unfortunately, this can also eliminate the optimal solution. However, our numerical experiments show that the best solution obtained by dynamic programming is very close to the optimal solution.

5 Computational Experiments In this section, we present some numerical results. The MIP formulation has been solved using MIP solver (Xpress-MP) and the DP algorithm has been implemented. Table 1 shows the computing times needed to solve various instances either using the MIP solver or the DP algorithm. In these tests p = 2 and m = |K| = 5. All tree graphs have been generated with random parameters, following an uniform ditribution : Number of sons : [1,5], Edge capacity : [10,50], Edge cost :[1,20]. Moreover, cache cost and popularity have been chosen so that the optimal solution is non-trivial (cost cache : 195, popularity between 1 and 5). Table 1 MIP and DP results |V |

Xpress

DP

Gap (%)

99 106 132 134 236 255 366 400

14s 10s 73s 5s 85s 76s 234s 3684s

1s 1s 1s 1s 1s 1s 4s 1s

0 0 0.0008 0 0 0 0.004 0

The relative gap between the optimal solution found by the MIP solver and the approximate solution provided by DP is also shown on Table 1. We clearly see that this gap is really very small. Notice the MIP formulation is solved in a brute way without any addition of valid inequalities. Even, the upper bound provided by DP was used to help the solver to find the optimal solution.

6

Bauguion Pierre, Ben Ameur Walid & Gourdin Eric

6 Conclusion We have proposed a somewhat more accurate model for a the optimal location of transparent caches in a tree, taking into account the propagation of content popularities. We have given a MIP formulation for this problem and provided a dynamic programming algorithm. The preliminary computational experiments show that both approaches are promising and might solve problems with real-size instances. An enforcement of the MIP formulation using valid inequalities is the next research step. Capacity constraints can also be taken into account both in the MIP formulation and in the dynamic programming algorithm. Another generalization consists in considering the case where many servers are available with a rooted tree for each server. Then, a node might belong to different trees making the problem more complicated.

References 1. Dan Asit and Don Towsley. An approximate analysis of the lru and fifo buffer replacement schemes. SIGMETRICS Perform. Eval. Rev., 18(1):143–152, 1990. 2. Pasquale Avella, Maurizio Boccia, Roberto Canonico, Donato Emma, Antonio Sforza, and Giorgio Ventre. Web cache location and network design in vpns, 2003. 3. E. Cronin, S. Jamin, C. Danny, and R. Yuval. Constrained mirror placement on the internet, 2002. 4. P.B. Dantzig, R.S. Hall, and M.F. Schwartz. A case for caching file objects inside internetworks. SIGCOMM ’93, pages 239–243, 1993. 5. Philippe Flajolet, Dani`ele Gardy, and Lo¨ys Thimonier. Birthday paradox, coupon collectors, caching algorithms and self-organizing search. Discrete Appl. Math., 39(3):207–229, 1992. 6. Erol Gelenbe. A unified approach to the evaluation of a class of replacement algorithms. IEEE Trans. Comput., 22(6):611–618, 1973. 7. S.L. Hakimi and E.F. Schmeichel. Locating replicas of a database on a network. Networks, 30(1):31–36, 1997. 8. P. R. Jelenkovic. Asymptotic approximation of the move-to-front search cost distribution and least-recently-used caching fault probabilities. Annals of Applied Probability, 9(2):430–464, 1999. 9. O. Kariv and L. Hakimi. An algorithmic approach to network location problems. ii: The p-median. SIAM J. Appl. Math., 37(3):539–560, 1979. 10. P. Krishnan, Danny Raz, and Yuval Shavitt. The cache location problem. IEEE/ACM Transactions on Networking, 8(5):568–582, 2000. 11. H. Luss. Optimal content distribution in video-on-demand tree networks. IEEE Transactions on systems, man, and cybernetics - part A: systems and humans, 40(1), 2010). 12. A. M. K. Pathan and R. Buyya. A taxonomy and survey of cdns. Technical report, The University of Melbourne, 2007. 13. Stefan Podlipnig and Laszlo B¨osz¨omenyi. A survey of web cache replacement strategies. ACM Computing Surveys (CSUR), 35(4):374–398, 2003. 14. J. Reese. Solution methods for the p-median problem: An annotated bibliography. Networks, 19:125–142, 2006. 15. A. Tamir. An O(pn2 ) algorithm for the p-median and other related problems on tree graphs. Operations Research Letters, 19:59–64, 1996.