Towards the Design of Optimal Data Redundancy ... - Semantic Scholar

5 downloads 14089 Views 562KB Size Report
Sep 17, 2010 - Key words: cloud storage, distributed storage, data availability, erasure ...... personal best position, pi,best, and the swarm's best experience, ...
Towards the Design of Optimal Data Redundancy Schemes for Heterogeneous Cloud Storage Infrastructures I Lluis Pamies-Juarez, Pedro Garc´ıa-L´opez, Marc S´anchez-Artigas, Blas Herrera Universitat Rovira i Virgili Department of Computer Engineering and Maths Av. Paisos Catalans 26 43007, Tarragona, Spain

Abstract Nowadays, data storage requirements from end-users are growing, demanding more capacity, more reliability and the capability to access information from anywhere. Cloud storage services meet this demand by providing transparent and reliable storage solutions. Most of these solutions are built on distributed infrastructures that rely on data redundancy to guarantee a 100% of data availability. Unfortunately, existing redundancy schemes very often assume that resources are homogeneous, an assumption that may increase storage costs in heterogeneous infrastructures —e.g., clouds built of voluntary resources. In this work, we analyze how distributed redundancy schemes can be optimally deployed over heterogeneous infrastructures. Specifically, we are interested in infrastructures where nodes present different online availabilities. Considering these heterogeneities, we present a mechanism to measure data availability more precisely than existing works. Using this mechanism, we infer the optimal data placement policy that reduces the redundancy used, and then its associated overheads. In heterogeneous settings, our results show that data redundancy can be reduced up to 70%. Key words: cloud storage, distributed storage, data availability, erasure codes

1. Introduction We are witnessing today the rapid proliferation of Cloud storage services as means to provision reliable storage and backup of files. Amazon S3 [1] is a representative example but also Mosso [27], Wuala [38] or Cleversafe [10]. All of these services offer users clean and simple storage interfaces, hiding the details of the actual location and management of resources. Most of these clouds (e.g., [1, 27]) are built on well-provisioned and well-managed infrastructures, typically data centers, that are responsible for provisioning users with storage services. Very often, these data centers are controlled exclusively by cloud providers (e.g., Amazon, I An

early version of this work was presented at ICPP’09 [26]. Email addresses: [email protected] (Lluis Pamies-Juarez), [email protected] (Pedro Garc´ıa-L´ opez), [email protected] (Marc S´ anchez-Artigas), [email protected] (Blas Herrera) Preprint submitted to Computer Networks September 17, 2010

Google, Microsoft, etc.), whereas the user pays a price for the use of their resources. There is also the notion of resource-performance guarantee between the cloud provider and the user, that ensures that the user sees the performance he/she expects to see. Current cloud storage infrastructures are focused on providing users with easy interfaces and high performance services. However, there are some classes of storage services for which the current cloud model may not fit well. For example, consider a research institute that wishes to freely share its results with others institutions as a “public service” (e.g., in form of a digital repository to make protein research more accessible to scientists), but it requires deployment resources. Since this service may not be commercial, the service deployers may not want to pay the cost of running the service. To host such services, Chandra and Weissman in [7] proposed the idea of using voluntary resources (those donated by end-users in @home systems [2] and peer-to-peer (P2P) networks [33]) to form nebulas, i.e., more dispersed, less managed clouds. Nebulas draw on many of the ideas advanced in Grids, P2P systems and distributed data centers [9]. For Cloud storage, volunteer resources will be attractive for several reasons: • Scalability: Many volunteer systems (e.g., @home systems) consist of millions of hosts; thereby providing a large amount of resource capacity and scalability; • Geographic dispersion: Volunteer resources are highly distributed (e.g., P2P systems); and • Low cost of deployment: Volunteer resources are available for free or at very low cost, which implies that a large amount of disk capacity is available for storage services. As an example, a large nebula may run a cloud service over a @home system such as Folding@home [32]. This platform has approximately 250,000 hosts providing over 1 Petaflop, which makes Folding@home platform comparable to some of the fastest supercomputers in the world today. Since in these settings there is a high degree of heterogeneity, both in terms of storage and computational capacity, along with churn and failures, the provision of reliability poses a huge challenge to democratize cloud computing. This vision is also advocated by the developers of Open Cirrus [5]. Open Cirrus is a cloud testbed for the research community that federates heterogeneous distributed data centers to offer global services, such as sign-on, monitoring and storage. To be operative for the community, handling heterogeneity is therefore critical in these cloud platforms. Actually, heterogeneity will be one of the major challenges of cloud computing platforms in the future [35]. To meet this challenge, Chandra and Weissman [7] state that the solution resides on incorporating the reliability of nodes in service deployment decisions in addition to basic criteria such as computational speed and storage capacity. For Cloud storage it means that the storage process must be aware of the stability of nodes when it stores data to them. Since preserving data availability requires redundancy and scattering 2

each object into multiple hosts, considering the heterogeneity of hosts is essential to optimize the amount of redundancy. An excess of redundancy results in a higher storage and communication burden, with no benefit for the user. Unfortunately, existing storage solutions [14, 25, 29, 37] have considered homogeneous settings, where all nodes are treated equal regarding their on-line/off-line behavior. Although this model is appropriate for commercial clouds, it can be tricky for clouds built of heterogeneous hosts such as nebulas and distributed data centers. Goals. The aim of this paper is to examine whether Cloud storage systems can reduce the required redundancy by considering heterogeneous node availabilities: At first glance, it seems quite intuitive that one can increase data availability by assigning more redundant information to the most stable nodes. However, if we take this approach to the extreme, i.e., by considering only the highest stable nodes, we may experience a decrease in data availability for the simple reason that there are less hosts where to distribute the same amount of redundancy. This illustrates the importance of finding an optimal trade-off between the number of hosts and their on-line availability, which is the novel contribution of this article. By finding the appropriate trade-off, cloud systems will be able to maximize the data availability they provide while reducing the redundancy required to do it. However, this maximization implies that availability-aware cloud systems should be able: • To measure data availability accurately; • To determine the optimal quantity of information to be assigned to each host; and • To find the minimum data redundancy that maintains the targeted data availability. Throughout all the manuscript, we will build our results considering a redundancy scheme based on generic erasure codes. The main reason is that erasure codes are more flexible than traditional schemes based on replication, a feature that make them very appropriate for heterogeneous platforms. Additionally, they are usually more effective in terms of data overhead [25, 29, 36] than replication. Contributions. In this article, we address the above three aspects for building a heterogeneity-aware cloud storage system. Our three main contributions can be stated in the same fashion: 1. We develop an analytical framework to compute data availability in heterogeneous environments. Since it is computationally costly to calculate the data availability when the set of storage nodes grows, we propose a Monte Carlo method to estimate the real value in a less computational expensive way. 2. Since determining the optimal amount of redundancy to be assigned at each host is computationally hard, we propose a novel heuristic based on Particle Swarm optimization (PSO) [21]. From our results, we infer a simple allocation function to optimally find the minimum redundancy required. 3

3. Finally, we provide a simple iterative algorithm to determine the minimum redundancy required to guarantee the data availability requirements of different possible storage applications. Our theoretical and simulation-based results show that a storage system implementing our heterogeneityaware erasure code scheme could reduce redundancy up to 70% in highly heterogeneous clouds. The rest of the paper is organized as follows. Section 2 describes the related works. In Sections 3 and 4 we analytically pose our storage model and the problem statement. Then, the successive sections address each of the three stated contributions. Section 5 describes how data availability can be measured in heterogeneous environments, gives the computationally complexity of this measurement, and proposes a Monte Carlo method to approximate this value when it cannot be analytically measured. In Section 6 we find the optimal number of redundant blocks assigned to each storage node, and in Section 7, we find the required data redundancy to guarantee high data availability. Section 8 presents the benefits of heterogeneity-aware redundancy schemes. Finally in Section 9 we present the conclusions and further research lines.

2. Related Work Cloud storage services have gained popularity in the last years [1, 10, 27, 38]. These services allow users to store data off-site, masking the complexities of the infrastructure supporting the storage service. To store large amounts of data from thousands of users, cloud storage systems build their services over distributed storage infrastructures, more scalable and more reliable than centralized solutions. Different works have proposed distributed data storage infrastructures, some built on grid infrastructures [2, 13, 28], and some others built on P2P overlays [11, 17, 24, 38, 39]. To achieve the desired data availability, all these infrastructures need to store data with redundancy. However, since the use of redundancy increases storage and communication costs, optimizing this redundancy is key to maintain the system scalable. By reducing redundancy, distributed storage systems can reduce storage and communication costs. However, there is a theoretical limit [3] beyond which communication costs cannot be reduced without increasing storage costs, and vice versa. Traditionally, storage systems focused on reaching this theoretical limit by balancing costs among peers [8, 12, 16, 22, 31] and designing redundancy schemes able to provide better communication-storage trade-offs [14, 15, 37]. Unfortunately, all these works build their solutions over a common constraint: they assumed homogeneous properties for all the storage nodes in the system. The main objective of this paper is to optimize distributed storage systems by considering the real heterogeneities present in cloud [5] systems, or any other existing distributed systems. More specifically, we focus on reducing redundancy by considering the individual online/offline patterns of nodes. In [13], Deng and Wang considered node heterogeneities in grid storage systems. However, they did not exploit such heterogeneities to reduce the amount of data redundancy required. To the best of our knowledge, we are the first to exploit node heterogeneities in that way. In [26], we presented an early version of this work 4

where we exploited heterogeneities using simple heuristic approaches. In this paper, we go one step forward and we demonstrate how this simple heuristic can be inferred from an analytical optimization process. Additionally, in [26], the optimization was only proved in small sets of nodes. In this paper, we propose additional algorithms to extend our results to any node set size.

3. Data Storage Model Throughout this paper, we will consider a distributed storage system working in its steady state: the number of nodes and the number of stored objects is kept constant. Let H represent the set with all the storage nodes in the system. Then, the storage process stores each data object in a small subset of nodes, N , N ⊂ H. In this work we do not treat how nodes in N are chosen. A possible solution is to ask to a centralized directory service for a list of nodes with free storage capacity. However, for the sake of simplicity we will consider that N is a random sample of H. We will analyze the impacts of the size of |N | in the next sections. Additionally, let us denote by A(t) and U(t) the two subsets of N containing respectively the available and unavailable nodes at time t. Clearly, N = A(t) ∪ U(t); ∀t ≥ 0. Considering this notation, we define the mean online availability, ai , of a node i as: Z 1 t0,i +τi ai = Pr [i ∈ A(t)] dt; ∀i ∈ N , τi t0,i where t0,i is the instant when node i joined the system and τi its lifetime in the system. We assume that after monitoring each node for a long period of time, we can obtain a good estimate of the mean node availability ai . It is beyond the scope of this work to discuss how these estimates are measured. One possible solution is to use a centralized entity which continuously monitors nodes and serve the computed estimators to storage nodes that need this information. Once defined the storage nodes, we will define the storage system by splitting it into two different processes: the storage process and the recovery process: The storage process uses an hk, ni-erasure code to store objects in the system. It means that each data object is split into k equal-sized chunks. Then, these chunks are used to generate n redundant blocks (n ≥ k), that are finally stored in the set of storage nodes N . The redundancy ratio of this scheme, r, can me measured as: r =

n k.

To define the amount of blocks that each node i, i ∈ N , holds, the storage process

uses the assignment function g, g : N −→ N, where g(i) represents the number of redundant blocks assigned P to each node. This function must ensure that i∈N g(i) = n, and g(i) ≥ 1; ∀i ∈ N . Using erasure codes, the storage process needs to decide which is the redundancy required to store each data object: r. For that purpose, the storage process generates a large “pool” of redundant blocks so that n > |N |. Finally, these blocks are scattered and stored using the nodes in N . The storage process sets the 5

values of n to a static value and then varies k to obtain different redundancy ratios —and then different data availabilities. According to the size of N and the value of n, the storage process can work in three different ways: 1. n > |N |: There are more redundant blocks than storage nodes. The storage process has to decide to which nodes store more blocks. 2. n = |N |: There are exactly the same number of data blocks than storage nodes. The storage process stores one data block to each node. 3. n < |N |: There are less data blocks than storage nodes. Since some nodes will not store any block, we can consider this case as having a smaller set of storage nodes N 0 where n = |N 0 |. Then, this case reduces to case 2. Further, the recovery process is the responsible to retrieve k out of the n redundant blocks and reconstruct the original data object. In order to guarantee with high probability that the recovery process succeeds, the storage system must ensure that there are always at least k redundant blocks available –stored in online P nodes. We will refer to this probability as data availability. Let G(t) = i∈A(t) g(i) be a discrete random variable representing the number of blocks stored in online nodes at instant t. Using this random variable, data availability d can be defined as: d = Pr [G(t) ≥ k] ; ∀t ≥ 0.

(1)

This data availability d represents the probability to detect at least k redundant blocks online. Since we cannot consider that nodes are always available, it is impossible to assume perfect data availability (d = 1). The aim of the storage process is to guarantee that data availability d is higher than a desired availability dˆ ˆ It means that the probability to detect at least k redundant blocks online is higher than d. ˆ such that d ≥ d.

Normally dˆ would take values of several nines, e.g., 0.999 for three nines of availability. Nonetheless, in order to reduce the storage overheads this data availability d should be achieved ensuring that data redundancy, r, is always kept as low as possible (since k ≤ n, then r ≥ 1). 4. Problem Statement Existing storage systems do not consider the individual node availabilities, ai , and use the mean node availability a ¯ as the basis of their models. Recall that the mean node availability is given by, a ¯=

1 X ai . |N |

(2)

i∈N

Without considering heterogeneities, all nodes are treated equally, and then, the assignment function stores to all nodes the same number of redundant blocks. Doing the contrary would lead to the superfluous 6

decision of choosing to which node store more redundancy. Under homogeneous conditions, storage systems assume n = |N |. Then, the assignment function becomes g(i) = 1; ∀i ∈ N . However, as we will show in Section 8, this homogeneous strategy leads storage systems to imprecise results that could make them use more redundancy than what is required. Optimizing this redundancy is crucial for any storage system aiming to reduce the system’s costs: bandwidth and storage overheads. In this paper, we study the impact of considering real availabilities in distributed storage systems and how this heterogeneity could be exploited to reduce the amount of redundancy required to achieve the ˆ Our methodology is divided in three main steps: desired availability: d. 1. Providing an analytical framework to measure real data availability, d, in a heterogeneous scenario. 2. Given a fixed data redundancy, designing an assignment function g which maximizes d. 3. Finding the minimum data redundancy that guarantees that the data availability obtained is greater ˆ than the desired one, d ≥ d. We will address each of these steps in the following sections.

5. Measuring data availability In this section, we provide the basic framework to measure data availability, d, in a specific storage scenario. As we previously defined, data availability is a probabilistic metric that depends on several parameters. Among all these parameters, we are interested in measuring d as a function of: (1) the erasure code’s parameters k and n, (2) the assignment function g and (3) the set of nodes N . Hence, we aim to obtain a function, D, to measure data availability, such that, d = D(k, n, g, N ). In Section 5.1, we define D for a generic heterogeneous environment. Using this expression, we can measure d for all possible combinations of heterogeneous nodes. However, due to its inherent complexity, this function cannot be used when the set of storage nodes N contains more than 20 nodes, |N | > 20. In Sections 5.2 and 5.3 we provide two simplifications of the generic expression applicable under two different assumptions. On the one hand, in Section 5.2, we give an expression for a scenario where node heterogeneities are not considered –e.g. when all node availabilities are very similar. On the other hand, in Section 5.3, we simplify the expression for heterogeneous environments where node’s availabilities can be represented by a small collection of values –clustered availabilities. Finally, when these assumptions cannot be used, we propose a Monte Carlo algorithm to approximate d for any set of storage nodes in Section 5.4. 5.1. Assuming Heterogeneous Node Availabilities Let the power set of N , 2N , denote the set of all possible combinations of online nodes. Let also A,

A ⊂ 2N , represent one of these possible combinations. Then, we will refer as QA to the event that the 7

combination A occurs. Since node availabilities are independent, we have that: Pr[QA ] =

Y

Y

ai

i∈A

i∈N \A

(1 − ai ).

(3)

Additionally, let Lk , Lk ⊂ 2N , be the subset containing those combinations of available nodes which together store k different redundant blocks. Then, ( Lk =

) X

N

A:A∈2 ,

i∈A

g(i) ≥ k

.

(4)

Since we are analyzing the system in its steady state, we will refer to the number of online blocks at any P moment, G(t), simply as G, G = i∈A g(i). Using this formulation, we can rewrite equation (1) as: d = Pr [G ≥ k] =

X A∈2N

Pr [G ≥ k|QA ] .

From the definition of Lk , equation (4), we have that:   Pr [QA ] , if A ∈ Lk , Pr [G ≥ k|QA ] =  0, otherwise. and then, using equations (6) and (3), we can rewrite equation (5) as follows,   X X Y Y  d= Pr [QA ] = ai (1 − ai ) , A∈Lk

A∈Lk

i∈A

(5)

(6)

(7)

i∈N \A

which is the implementation of D for the generic heterogeneous case. The main drawback of the above implementation is the computation of Lk from 2N when N is large.

Knuth treat in depth the subject of generating combinations in the 3rd volume of his monograph [23]. Although the algorithms to generate these combinations are not complex, |Lk | could become very large when N grows, making the d measurement computationally intractable for large sets of storage nodes. We know from equation (4) that the number of combinations of available nodes in Lk increases when we decrease k. Then, the most computationally complex case of measuring d using (7) is when k = 1. We use this case for the complexity analysis. Let us consider the simplest scenario where g(i) = 1, ∀i ∈ N , and

n = |N |. Then, L1 = 2N \ {∅}, and hence, |L1 | = 2|N | − 1. This means that measuring data availability using the heterogeneous generic expression has a O(2n ) complexity. Due to this, it is unfeasible to measure d

with expression (7) in real applications using large N sets. In a typical desktop computer, this measurement does not finish in less than an hour for storage sets larger than 20 nodes. 5.2. Assuming Homogeneous Node Availabilities Previous works on distributed storage systems have assumed homogeneous node availabilities for all participant nodes. In this section, we will show how the function used in equation (7) can be simplified when we assume homogeneous node availabilities. 8

Assumption 1 (Homogeneous Assumption). As we described in Section 4, when we consider homogeneous node availabilities, we assume that n = |N | and that each node is storing exactly one data block, g(i) = 1 ∀i ∈ N . Considering the Homogeneous Assumption, we can redefine (in this particular case) Lk from equation (4) so,  Lk = A : A ∈ 2N , |A| ≥ k .

(8)

Remark 1 (Mean Node Availability). In the homogeneous case, ai = a ¯, ∀i ∈ N , where a ¯ is the mean node availability defined in equation (2). Under Homogeneous Assumption, the complexity of the data availability measurement, d, could be significantly reduced. First of all, we use Remark 1 to restate equation (3) as, Y Y Pr[QA ] = a ¯ (1 − a ¯) = a ¯|A| (1 − a ¯)|N \A| . i∈A

(9)

i∈N \A

Let us define the subset L0x as the subset containing the combinations of available nodes that altogether

store exactly x blocks: L0x = {L : L ∈ Lk , |L| = x}. Under the Homogeneous Assumption this subset satisfies  |L0x | = nx . Using this subset, one can rewrite equation 7 for the homogeneous case as, d=

X

Pr [QA ] =

n X X

Pr [QA ] .

(10)

x=k A∈L0x

A∈Lk

Then we can also formulate the following lemma: Lemma 1. Pr[QA1 ] = Pr[QA2 ] for all ∀A1 , A2 ∈ L0x and x ∈ [1, n]. Proof. We will prove it by contradiction, Pr[QA1 ] 6= Pr[QA2 ], a ¯|A1 | (1 − a ¯)|N \A1 | 6= a ¯|A2 | (1 − a ¯)|N \A2 | . However, since A1 , A2 ∈ L0x , by definition of L0x , |A1 | = |A2 | = x, and then, a ¯x (1 − a ¯)|N |−x 6= a ¯x (1 − a ¯)|N |−x which is a contradiction, and the lemma follows.  Using Lemma 1 and considering that |L0x | = nx , we can rewrite (10) as follows, n   n   X X n n x d= Pr [QA ] = a ¯ (1 − a ¯)n−x . x x x=k

(11)

x=k

This expression is the traditional availability expression used in previous studies [25, 29, 37]. Note that this expression corresponds to the complementary cumulative distribution function of the binomial distribution with probability a ¯ and population n. Measuring data availability d with the homogeneous expression (11) has a O(n) complexity.

9

5.3. Clustering Availability As we showed in Section 5.1, data availability can become computationally intractable for large N sets. The reason of this complexity is that all nodes have different availabilities and then, the number of different combinations of online nodes is very large. However, this complexity can be reduced when groups of nodes in N share the same online availability. This property could appear in distributed data centers where nodes within the data center are homogeneous –e.g. data centers built of racks with identical computers–, but nodes from different data centers have different properties. For example, let us consider a storage system with three storage nodes i1 , i2 and i3 . The heterogeneous measurement should consider 8 different combinations of online nodes (23 ). However, when ai1 = ai2 , the assignment function (that only depends on the node availability) satisfies that g(i) = g(j). In this case, it is easy to see that the online combinations A1 = {i1 , i3 } and A2 = {i2 , i3 } contain the same number of redundant blocks, i.e. g(i1 ) + g(i3 ) = g(i2 ) + g(i3 ), and occur with the same probability, Pr[QA1 ] = Pr[QA2 ] (equation 3). In that case, the heterogeneous measurement method can detect that some online combinations occur with the same probability. In this particular case, {i1 } = {i2 } and {i1 , i3 } = {i2 , i3 }, which reduces the total number of summands from 8 to 6. This property can be used to reduce the measurement complexity by creating clusters of nodes with similar availabilities and by considering the centroid availability of each cluster as the online availability of all its members. This technique will reduce the number of combinations of online nodes to consider. Let C denote the set with the centroid availabilities. Then, for each centroid availability, a ∈ C, Na represents the a-cluster, Na = {i : i ∈ N , ai = a ± } ,

(12)

where  defines the maximum availability error inside the cluster. These cluster sets should verify that, N =

[ a∈C

Na .

(13)

Assumption 2 (Cluster Homogeneity). Since all nodes in the same cluster are considered to have the same availability, the assignment function will treat them equally. It means that, g(i1 ) = g(i2 ), ∀i1 , i2 ∈ Na . In this section, we will simply refer to g(a) as the number of redundant blocks stored in nodes belonging to the a cluster, Na . To represent all the possible combinations of online nodes in a cluster, we define the set of tuples Za . Each tuple in Za contains 1) the total number of nodes in the cluster, 2) the mean node availability in the cluster and 3) the number of online nodes: |N |

a Za = {(|Na |, a, x)}x=0 .

10

Since we consider that all nodes in a cluster have the same online availability, the number of online nodes in a cluster follows a binomial distribution with probability a and population |Na |. For each z ∈ Za , we can measure the probability of finding z3 online nodes in Na as f (z3 ; z1 , z2 ), where f is the probability mass function (p.m.f.) of the binomial distribution, and zi stands for the ith element in the tuple z.

In the same way that 2N contained all the combinations of online nodes for the generic heterogeneous Q case, the Cartesian product a∈C Za contains all the possible combinations of online nodes for the clustered Q case. Each combination of online nodes A, A ∈ a∈C Za , contains |C| tuples defining the number of online node in each cluster. Then, QA represents the event that the combination of available nodes A happens. Since node availabilities are independent, and the number of online nodes in a cluster follows a binomial distribution, we can measure the probability of QA as, Pr[QA ] =

Y

f (z3 ; z1 , z2 ).

(14)

z∈A

We additionally define Zk as the set of combinations where the available nodes store at least k redundant blocks, ( Zk =

) A:A∈

Y

Za ,

a∈C

X z∈A

z3 g(z2 ) ≥ k

.

Then, using this notation, we can rewrite equation (6) as follows,   Pr [QA ] , if A ∈ Zk , d = Pr [G ≥ k|QA ] =  0, otherwise.

(15)

(16)

Finally, we can measure d using equation (5), " # X Y d= f (z3 ; z1 , z2 ) = A∈Zk

z∈A

" =

X A∈Zk

# Y z1  z3 z1 −z3 z (1 − z2 ) . z3 2

(17)

z∈A

The complexity of measuring d in the clustered case depends on the size of Zk . Like in Section 5.1, the Q worst case happens when k = 1 and g(a) = 1, ∀a ∈ C. Under this constraint, Z1 = a∈C Za \{(|Na |, a, 0)}a∈C . Q Since there are |Na | combinations in Za with at least one online node, then Z1 = a∈C |Na |. However, the complexity of measuring d still depends on how clusters were built: the number of clusters and their size. To find the worse cluster scenario we need to define the following Remark and Lemma: Remark 2. The inequality of arithmetic and geometric means [6] (also known as AM-GM inequality), states that the inequality, v u n n X uY 1 n xi ≥ t xi n i=1 i=1 only holds when x1 = x2 = · · · = xn .

11

40000 het. case c=2 c=4 c=8 c=16

number of sums

30000

20000

10000

0 10

20

30 40 storage set size, |N|

50

60

Figure 1: Computational Complexity: Number of summands required to measure d as a function of the number of clusters, c. Note than c = 1 corresponds to the heterogeneous case.

Lemma 2. The worst scenario (where d is most complex to measure) is when all clusters have the same size, |Na | = |N |/|C|, ∀a ∈ C. Proof. When all clusters are equal-sized we have that, |Z1 | =

Y a∈C

 |Na | =

|N | |C|

|C| .

(18)

We want to prove that this value is the greatest possible for |Zk |, hence, 

|N | |C|

|C| ≥ |N | ≥ |C|

Then, since |N | =

P

a∈C

Y a∈C

|Na |

sY |C|

a∈C

|Na |

|Na |, 1 X |Na | ≥ |C| a∈C

sY |C|

a∈C

|Na |,

which is true by Remark 2 and the lemma follows. Finally, considering Lemma 2 and eq. (18), it follows that the complexity of measuring data availability d under the clustered assumption is still O(2n ). Although both expressions for d, heterogeneous expression

eq. (7) and clustered expression eq. (17), have complexity O(2n ), the hidden constants for the clustered case are smaller. In Figure 1 we evaluate the computational complexity of the clustered version for different number of clusters, c, c = |C|, and we compare the number of summands required to measure d in both cases. Considering that in a typical desktop computer we were unable to measure d for sets larger than 20, the results show that for measuring d in sets of up to 50 nodes, 4 or less clusters my reduce computation 12

time significantly. Although sing less than 8 clusters could be useful for measuring d in sets from from 20 to 50 nodes, we need other tools to measure d for larger node sets. 5.4. Monte Carlo Approximation As we showed in previous sections, it is unfeasible to measure the exact data availability for large heterogeneous storage sets. However, if we want to exploit heterogeneity –finding the optimal assignment function g and the optimal redundancy– we need at least an approximate value for d. In this section, we use a Monte Carlo method to obtain this approximate value. The main idea behind this technique is to simulate the real behavior of the storage system and empirically measure the obtained data availability. In this method, we randomly generate a set, Sω , of ω samples drawn from 2N . This set contains ω possible combinations of online nodes. Each combination A ∈ Sω is chosen considering the individual availabilities ai of each node as follows: Pr[i ∈ A] = ai , ∀A ∈ Sω , ∀i ∈ N . Using this notation we can obtain an approximation of the real data availability d,  P A : A ∈ Sω , i∈A g(i) ≥ k , dω = ω

(19)

which tends to the real d value when the random sample grows, d = limω→∞ dω . Algorithm 1 reflects how this value can be easily measured using an iterative method. Since the computation time of dω is directly proportional to ω, we denote by ω ˆ to the minimum ω value that guarantees a low availability error :  ω ˆ := min ω : (dω − d)2 ≤  .

(20)

To find ω ˆ we initially set ω ˆ = 1 and we measure the variance of a sample with 100 different dω values. Then we increase ω one by one, until the variance of this sample is lower than , which would ensure that equation (20) is satisfied. We can see the details of this process in Algorithm 1. 5.5. Measuring Data Availability: A Rule of Thumb In Sections 5.1, 5.2, 5.3 and 5.4 we made a critical analysis of different methods used to measure data availability d = D(k, n, g, N ). However, we showed that each method has its own advantages and disadvantages. Here we give a rule of thumb to know when to use each of them: • Although the generic method described by equation (7) is the only mechanism to measure the exact data availability, it is only applicable for sets of nodes smaller than 15 or 20 (in typical desktop computers).

13

Algorithm 1 Measuring dω . 1: successes ← 0 2:

iterations ← ω

3:

while iterations > 0 do

4:

blocks ← 0

5:

for i ∈ N do

6: 7: 8: 9: 10: 11:

if rand() ≤ ai then blocks ← blocks + g(i) end if end for if blocks ≥ k then successes ← successes + 1

12:

end if

13:

iterations ← iterations − 1

14:

end while

15:

dω ← successes/iterations • When all the nodes have the same availability, or the heterogeneity among them is low, we can assume a homogeneous availability and use expression (11). This expression is highly scalable and can be used with very large sets of nodes. • When the presence of heterogeneity is significant but there are several groups of nodes with similar availabilities, we can use the cluster-based expression, given in eq. (17). Although this method has an exponential complexity, it can be used for large sets of nodes whenever the number of clusters remains relatively small. • Finally, when the above methods are not appropriate, we need to use the method based on the Monte Carlo approximation: eq. (19).

6. Finding Optimal Assignment Function Once we solved the problem of how to measure data availability, in this Section we face the problem of how to assign the n redundant blocks in order to maximize data availability. Unlike the homogeneous case, where each node was responsible of one block, under the heterogeneous case, we will try to increase data availability by assigning more blocks to the more stable nodes. However, finding the optimal data assignment function g is another hard computable task. Determining all the possible assignments of n redundant blocks to a set of nodes, N , is like computing (in number theory) all the compositions of n. Since this problem 14

is known to have 2n−1 different assignments, determining the best assignment becomes computationally intractable for large sets of nodes N . Lin et al. demonstrated in [25], that in the homogeneous case, to maximize data availability there is a trade-off between node availability and the number of nodes used. We have also noticed that in the heterogeneous finding this trade-off is key to determine the optimal assignment function g. In our scenario, we can increase the mean node availability by storing data only to the high stable nodes in N . However, this reduces the number of storage nodes used, which can compromise data availability. On the other hand, if we use all nodes in N , the mean node availability decreases and then, data availability too. Therefore, the assignment function g should find the trade-off between using only stable nodes and obtaining a high mean node availability and using all nodes and obtaining a low mean node availability. Unfortunately, because of the huge number of possible assignments, we will use a heuristic optimization algorithm to find the best assignment. Optimization algorithms work by defining a search space, and finding which point in this space maximizes a specified function. In our case, the function that we want to maximize is data availability, D(k, n, g, N ), and the search space is all the possible implementations of the assignment function g. First, in Section 6.1, we describe the optimization algorithm. Then, in Section 6.2 we describe the search space required by the optimization algorithm. Finally, in Section 6.3, we infer the optimal function g from the optimization’s results. 6.1. The Particle Swarm Optimizer To find the optimal assignment function, we used a Particle Swarm Optimizer (PSO) [21]. PSO can be applied to virtually any problem that can be be expressed in terms of an objective function for which an extrema is required to be found. PSO conducts search using a population of particles that “flies” across the surface of the objective function. Information about promising regions of the function is shared between particles, allowing other particles to update their velocities to direct their motion towards other particles in fitter regions. The election of PSO is not arbitrary. We chose PSO because research results has shown that it outperforms other nonlinear optimization techniques such as Simulated Annealing and Genetic Algorithms [21]. On the search space S, the ith particle is defined by two vectors in S: its position, pi , and its velocity,

→ − v i . The initial positions and velocities are generated uniformly at random in the search space. − At each step, the ith particle updates its velocity → v i and position pi using random multipliers, the personal best position, pi,best , and the swarm’s best experience, gbest , using the following equations: → − − v i = w→ v i + ξ1 r1 (pi,best − pi ) + ξ2 r2 (gbest − pi ), − pi = pi + → v i,

(21) (22)

15

where w is a parameter called the inertia weight, ξ1 and ξ2 are two positive constants, referred to as “cognitive” and “social” parameters, respectively, and r1 and r2 are drawn from a random uniform distribution on [0, 1]. Informally, when all the particles collapse with zero velocity in a particular position in the search space, the swarm has converged. The inertia weight is a user-specified parameter that controls the impact of the previous history of velocities on the current velocity. Hence, it resolves the trade-off between the global and local exploration ability of the swarm. A large inertia weight value encourages global exploration (moving to previously not searched areas of the space), while a small one favors local exploration. A suitable value for this coefficient provides the optimal balance between the global and local exploration ability of the swarm, thereby improving the effectiveness of the algorithm. Previous experimental results suggest that it is preferable to initialize the inertia weight to a large value, giving priority to global exploration of the search space, and gradually decrease it to obtain refined solutions [30]. Consequently, we set the inertia weight using the following equation: w = wmax −

wmax − wmin × i, imax

where wmax and wmin are respectively the initial and final values of the inertial coefficient, imax the maximum number of iterations and i the current iteration. At each iteration each particle i executes the fitness function to rate its actual position and update, if required, the pi,best or gbest . Since each position represents a different block assignment, our fitness function is the data availability provided by this assignment. Since all positions pi represent a point in R0 , we need to find its representation in R in order to evaluate d. To do so we first obtain the discrete position of i, pˆi , where pˆi,j = round(pi,j ), ∀j ∈ 1 . . . |N | − 1, and then, we use the transformation from equation (23) in order to find the real assignment. 6.2. Defining the Search Space Let R|N | be a Cartesian |N |-space; that is: R|N | is an affine and Euclidean |N |-dimensional space coordinated and oriented by the orthonormal affine frame  − − R = O; → e 1, . . . , → e |N | . Then, assuming that N is an ordered set, the vector with all node assignments, [g(i)]i∈N , corresponds to a  point p ∈ N|N | on frame R with integer coordinates x1 , x2 , . . . , x|N | . Each component xi corresponds to

the number of redundant blocks assigned to the ith node in N .

From the whole search space R|N | , we are only interested in a small subset of possible solutions that P satisfy the requirement of the assignment function g: i∈N g(i) = n. This requirement restricts the search 16

Figure 2: Example of the reference frame transformation from R to R0 with three storage nodes. Each axis represents the number of redundant blocks stored in each node. The assignment function has to assign 6 redundant blocks (n = 6). The hyperplane π6 represents the search space containing all the possible assignments. space S to the positive area of the hyperplane πn , πn ≡

|N | X

xi = n.

i=1

Unfortunately, we cannot apply a PSO algorithm directly by randomly setting the particles in S and giving them free movements. If we allowed this, PSO would randomly update the position and velocity of each particle in all of its dimensions. It would cause particles moving within all R|N | , generating positions out of S, and then, not compliant with the requirements of function g: particles should move within the positive area of πn . In order to solve this drawback we move the reference frame within R|N | , setting |N | − 1 vectors of the new frame within the plane πn , and then, freeing particles from one degree of freedom, and keeping them always within πn . Figure 2 depicts a simple example of the frame movement. The example represents a node set with only three nodes N = {a, b, c}. Each axis represents the number of redundant blocks assigned to each of these nodes. Thanks to the transformation from frame R to frame R0 , we can see how the block assignment (1, 3, 2) ∈ R corresponds to (1, 3, 0) in the new frame R0 . Note that the interesting property here is that the third dimension in R0 is always zero.

Besides, because we are dealing with block assignments, we can only work with integer positions in R.

An interesting property of the transformation R → R0 is that when integer positions in R0 are transformed backwards to R, they always have integer coordinates in R too. This property allows to freely move particles

in R0 and round them to the nearest integer position each time we evaluate it through the fitness function. 17

Lemma 3 describes this property analytically. Before stating it, we first introduce the required geometrical background. If we write,

→ − ui =

  → − −  ei−→ e |N | , if 1 ≤ i ≤ |N | − 1,       |N  P| −    → e i,

if i = |N |.

i=1

we can easily find the vectorial equation of πn : |N |−1

πn ≡ X = A +

X

− λi → u i,

i=1

where λi ∈ R, and A is the intersection of πn with the |N |th axis of coordinate system R; A = πn ∩ r|N | . − The coordinates of point A on R are (0, 0, . . . , 0, n). And the vector → u |N | is orthogonal to πn .   0 0 0 Now, let x1 , x2 , . . . , x|N | be the coordinates of a point p ∈ R|N | on the new affine frame R0 =  → − − A; − u ,→ u ,...,→ u . Then, p ∈ π if an only if its last coordinate x0 = 0. 1

2

|N |

n

|N |

Lemma 3. If a point p ∈ πn has integer coordinates on frame R0 , then p has also integer coordinates on frame R; that is: if we have p = x1 , x2 , . . . , x|N |

 R

  = x01 , x02 , . . . , x0|N |−1 , 0

R0

then, |N |

|N |

{x0i }i=1 ⊂ Z ⇒ {xi }i=1 ⊂ Z.

Proof. To prove the lemma we consider the algebraic relationship between the two system of coordinates, − using the definition of → u i:       x01 0 x1        ..   ..   ..   .   .   .       , + = (23) M     x0   0   x|N |−1  |N |−1       0

n

x|N | 0

where (0 . . . 0 n) represents the shift between R and R , and M  1 0 0 ···   1 0 ···  0   0 1 ···  0 M = .. .. . .  ..  . . . .    0 0 0 1  −1

−1 −1 18

−1

is the rotation and scale matrix given by:  1   1    1   ..  . .    1   1

Then, xi = x0i , with 1 ≤ i ≤ |N | − 1, |N |−1

x|N | = −

X

x0i + n;

i=1

and the lemma follows. 6.3. Deriving g from the PSO results To find the optimal assignment function, we run the PSO algorithm in several different scenarios with different set sizes and different node availabilities. In our experiments we set the PSO’s inertia parameters to wmin := 0.5, wmax := 0.75 and imax = 50, and the constants to ξ1 , ξ2 := 1. We used a population of 100 particles. Using this setup we run two different experiments. In the first experiment, we used a small set of storage nodes |N | = 10 and the generic analytical expression (7) in the fitness function. In the second experiment we made use of a larger storage set |N | = 100, and approximated d with the Monte Carlo method. Due the lack of real availability traces from distributed storage systems, we used random availabilities drawn from different beta distributions with mean values {0.25, 0.5, 0.75} and variances {0.01, 0.02, 0.03, 0.04}. To show the effects of the value of k, we tested four different k values for each experiment, k ∈ {n/5, n/3, n/2, n/1.5}, where n := 10 |N |. Figure 3 shows the optimal assignment found for |N | = 10, using the generic heterogeneous expression as the fitness function. We can observe that the optimal assignment tends to assign more redundant data to the stable nodes without discarding the low availability ones. Besides, this assignment also tends to be aligned along a line that passes through the origin of coordinates. The errors that we appreciate —points separated from this line— are he effect of using a non-deterministic assignment algorithm: PSO. Figure 4 shows the same results than Figure 3 but with a larger set of storage nodes |N | = 100, and using the Monte Carlo data availability measure as the fitness function. Although this time we used a doubled heuristic (PSO + Monte Carlo), the results tend to be analogously aligned. This time, since we used a larger set of storage nodes, the results appear less sparse. Again, some errors appear —points separated from the main line— because of the non-deterministic assignment algorithm: PSO. Since the number of assigned blocks to each node i ∈ N , g(i), depends only on its online availability, ai , we can directly infer from the experimental results that g could be expressed as a linear equation, g(i) = s ai + o. Since the origin of the resultant line is (0,0), we know that o = 0. Besides, since the total amount of assigned blocks should be equal to n, X i∈N

g(i) =

X i∈N

s ai = n ⇒ s = P 19

n i∈N

ai

,

35 mean=0.25 mean=0.5 mean=0.75

20

number of redundant blocks

number of redundant blocks

25

15 10 5 0

mean=0.25 mean=0.5 mean=0.75

30 25 20 15 10 5 0

0

0.2

0.4 0.6 node availability

0.8

1

0

0.2

(a) k = 2

0.8

1

0.8

1

(b) k = 3

30

70 mean=0.25 mean=0.5 mean=0.75

25

number of redundant blocks

number of redundant blocks

0.4 0.6 node availability

20 15 10 5 0

mean=0.25 mean=0.5 mean=0.75

60 50 40 30 20 10 0

0

0.2

0.4 0.6 node availability

0.8

1

0

(c) k = 5

0.2

0.4 0.6 node availability

(d) k = 6

Figure 3: Optimal assignments for the |N | = 10 experiment. Each point represents the number of redundant blocks assigned to a node, the availability of which is in the horizontal axis. Each sub-figure contains the assignments for the four different variances used.

20

40 mean=0.25 mean=0.5 mean=0.75

35 30

number of redundant blocks

number of redundant blocks

40

25 20 15 10 5 0

mean=0.25 mean=0.5 mean=0.75

35 30 25 20 15 10 5 0

0

0.2

0.4 0.6 node availability

0.8

1

0

0.2

(a) k = 20

0.8

1

0.8

1

(b) k = 33

40

40 mean=0.25 mean=0.5 mean=0.75

35 30

number of redundant blocks

number of redundant blocks

0.4 0.6 node availability

25 20 15 10 5 0

mean=0.25 mean=0.5 mean=0.75

35 30 25 20 15 10 5 0

0

0.2

0.4 0.6 node availability

0.8

1

0

(c) k = 50

0.2

0.4 0.6 node availability

(d) k = 66

Figure 4: Optimal assignments for the |N | = 100 experiment. Each point represents the number of redundant blocks assigned to a node, the availability of which is in the horizontal axis. Each sub-figure contains the assignments for the four different variances used.

21

and then, g(i) = P

ai

j∈N

aj

× n.

(24)

Equation (24) is the optimal assignment function derived from experimental observations. This simple function assigns to each node a fraction of redundant blocks proportional to the amount of availability it provides to the system. It is possible —in the case of high heterogeneous environments— that g(i) > k, i ∈ N . Although it is unlikely to happen, we need to prevent a single node from storing more blocks than

the required to recover the original data object. To solve this we define the assignment function g 0 as follows: g 0 (i) = min(g(i), k). It is interesting to note how a complex optimization problem has a simple but (as we will see in Section 8) effective solution. Remark 3 (Large n Values). In order to allow the assignment function assign redundant blocks more proportionally to each node availability, n should be large enough to provide the desired assignment granularity. Even so, in some scenarios the assignment process will need to assign/deassign some redundant blocks ad-hoc P to guarantee that i∈N g(i) = n is satisfied. 7. Finding Optimal Redundancy Once we know how to measure data availability and the optimal assignment function, it only remains ˆ Finding to find the minimum amount of redundancy required to guarantee a minimum data availability d. the redundancy means finding the minimum r =

n k

ˆ However, since two different ratio that achieves d.

ˆ In the existing parameters: k and n, are involved in r, different pairs of k and n can be used to achieve d. literature, storage systems initially set k to a fixed value and increase n from n = k until they achieve the ˆ Instead, in this paper we will initially set n to a fixed value n = |N | · β, for a desired data availability d. ˆ By doing it we have large β, and we will decrease k from k = n until we achieve the desired availability d.

more redundant blocks than nodes in N , and we can easily store more blocks to those more stable nodes (Remark 3). Formally, once fixed n, the minimum redundancy required to achieve dˆ is the following k value: n o ˆ k ∈ [1, n] . k := max k : D(k, n, g, N ) ≥ d,

(25)

To measure k, let us state the following Lemma: Lemma 4. D(k, n, g, N ) ≥ D(k − 1, n, g, N ), ∀k ∈ [1, n]. Proof. From the generic data availability expression (equation 7) we can easily infer that function D, or its expression d, is monotonically increasing with the size of Lk . From equation 4 we can also easily infer that |Lk | ≤ |Lk−1 |, and so, the lemma follows.

22

Using Lemma 4, the optimal k can be computationally found k using Algorithm 2. Algorithm 2 L(x) 1: 2: 3: 4:

k ← |N |

while D(k, n, g, N ) < dˆ and k > 0 do k ←k−1 end while It is interesting to note that in some scenarios, due the low node availabilities or the low number of

ˆ or k = 0. In this storage nodes, the storage system could not be able to achieve the desired availability d, case, the storage process will only be able to store the file if it can find a more stable set of storage nodes ˆ N , increases its size, or simply reduce the desired availability d. 8. Redundancy Savings Our validation is focused on answering the following question: Is it worth to use a heterogeneous-aware redundancy scheme instead of a typical and simple homogeneous one? To answer this question we define the Redundancy Saving Ratio metric (RSR). This metric measures the savings in redundancy caused by considering a heterogeneous storage system instead of a homogeneous one. Let rhomo and rheter be the data redundancies that a homogeneous system and a heterogeneous system ˆ respectively. Provided that the heterogeneous needs in order to guarantee the desired data availability d, system under consideration is able to reduce the amount of redundancy, we define the Redundancy Saving Ratio as:  RSR =

1−

rheter rhomo

 × 100

(26)

For our evaluation, we again use two different sizes for the set of nodes: |N | = 10 and |N | = 100; In the first case we use the generic heterogeneous data availability expression (7) in all the optimization process. In the second case we use the Monte Carlo approximation (eq. 19). We run the experiments in both cases with three different desired availabilities dˆ = {0.9, 0.99, 0.999}. For the online availability of the nodes we use the same 12 distributions based on the Beta distribution we used in Section 6.3. Besides, we use three node availability traces from real distributed applications [18]. These traces consists of the availabilities of Planetlab nodes [34], Skype super-nodes [19], Microsoft desktop PCs [4], and Seti@Home’s desktop grid nodes [20]. For each simulation we find the optimal redundancy rhomo and rheter using Algorithm 2. For the homogeneous case, we assume g(i) = 1, ∀i ∈ N . For the heterogeneous case we use the optimal assignment (g from equation 24). We set n := 100|N | in all the experiments. 23

|N | = 10

|N | = 100

Trace

Mean

Var.

dˆ = 0.9

dˆ = 0.99

dˆ = 0.999

dˆ = 0.9

dˆ = 0.99

dˆ = 0.999

Skype[19]

0.544

0.105

42.24%

49.17%

73.15%

28.76%

31.80%

34.84%

Planetlab[34]

0.816

0.063

16.41%

19.60%

24.38%

10.00%

11.68%

13.69%

Microsoft[4]

0.738

0.061

16.39%

19.60%

48.77%

14.45%

16.24%

19.22%

Seti@Home[20]

0.552

0.109

33.14%

36.17%

41.03%

19.76%

22.15%

31.12%

Beta(0.25,0.01)

0.25

0.01

0%

n.a.

n.a.

4.54%

5.87%

0%

Beta(0.25,0.02)

0.25

0.02

0%

n.a.

n.a.

19.22%

19.99%

17.63%

Beta(0.25,0.03)

0.25

0.03

0%

n.a.

n.a.

27.57%

30.42%

31.56%

Beta(0.25,0.04)

0.25

0.04

0%

n.a.

n.a.

36.35%

40.72%

40.89%

Beta(0.5,0.01)

0.5

0.01

0%

0%

n.a.

0%

2.50%

5.26%

Beta(0.5,0.02)

0.5

0.02

0%

0%

n.a.

6.24%

7.14%

5.40%

Beta(0.5,0.03)

0.5

0.03

0%

0%

n.a.

10.00%

11.36%

9.75%

Beta(0.5,0.04)

0.5

0.04

0%

0%

n.a.

15.38%

14.89%

13.95%

Beta(0.75,0.01)

0.75

0.01

0%

0%

0%

1.41%

0%

1.59%

Beta(0.75,0.02)

0.75

0.02

0%

0%

0%

4.11%

5.71%

3.08%

Beta(0.75,0.03)

0.75

0.03

0%

0%

0%

5.33%

5.79%

7.46%

Beta(0.75,0.04)

0.75

0.04

0%

0%

0%

7.89%

8.33%

8.69%

ˆ For the n.a. cases, the storage system was not able to achieve the desired data availability, d.

Table 1: RSRs for real and synthetic availability traces. The simulations filled with the empty set, ∅, were not able to achieve the desired availability in the heterogeneous or in the homogeneous optimization.

24

Table 1 shows the redundancy savings, RSR, for each of the different simulations. It is interesting to note that for the small set of nodes, |N | = 10, the low heterogeneous scenarios are not able to reduce the required redundancy. However, it is reduced up to 34.84% in the higher heterogeneous scenario. For the large set of nodes, |N | = 100, there are redundancy savings for almost all the scenarios. Finally, from the overall results we can infer that the redundancy savings are maximized when the heterogeneity raises, or ˆ when the storage systems requires more quality in the storage service (desired availability d).

9. Conclusions and Further Work Existing cloud storage services are designed and built on the assumption that all storage backends constitute a homogeneous set of distributed resources. This assumption lead these systems to consider a unique online availability for all nodes in the cloud, and then, to optimize their data redundancy according to this assumption. However, as we showed, this assumption simplifies the way data availability is measured, but it introduces an error that causes an increase in the data redundancy, and then, a loss in efficiency. In this paper, we have studied how disregarding heterogeneities in node availabilities affects negatively the performance of heterogeneous cloud infrastructures. To this aim, we have presented an analytical framework which has three main benefits: 1) an algorithm for measuring data availability in heterogeneous storage infrastructures; 2) an optimization algorithm to find the best way to assign redundant blocks to the set of storage nodes; and 3) a mechanism to determine the minimum data redundancy to achieve a desired quality of service. One of the main results that arise form our framework is the data assignment function. We discovered that the best results come up when nodes are assigned an amount of redundancy proportional to their availabilities. We have shown how this solution can reduce data redundancy up to 70% in highly heterogeneous scenarios. These results show how heterogeneity is an important aspect to consider in distributed storage infrastructures in general, and in heterogeneous clouds in particular. Finally, since economic revenue is one of the key aspects of cloud storage services, we will extend our work to consider it in further works. By considering it, we could find better assignments functions to obtain storage systems not only optimal in its redundancy but also optimal in its cost.

Acknowledgments We would like to express our gratitude to the anonymous reviewers for the insights and comments provided during the review process, which have greatly contributed to improve the quality of the original manuscript. This work has been partially funded by the Spanish Ministry of Science and Innovation through project P2PGRID, TIN2007-68050-C03-03. 25

References [1] Amazon.com, 2009. Amazon S3. http://aws.amazon.com/s3. [2] Anderson, D. P., 2004. Boinc: a system for public-resource computing and storage. In: Proceedings of the 5th IEEE/ACM Intl. Workshop on Grid Computing. [3] Blake, C., Rodrigues, R., 2003. High availability, scalable storage, dynamic peer networks: Pick two. In: Proceedings the 9th Workshop on Hot Topics in Operating Systems (HOTOS). [4] Bolosky, W. J., Douceur, J. R., Ely, D., Theimer, M., 2000. Feasibility of a serverless distributed file system deployed on an existing set of desktop pcs. In: Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems. [5] Campbell, R., Gupta, I., Heath, M., Ko, S. Y., Kozuch, M., Kunze, M., Kwan, T., Lai, K., Lee, H. Y., Lyons, M., Milojicic, D., O’Hallaron, D., Soh, Y. C., 2009. Open cirrus cloud computing testbed: Federated data centers for open source systems and services research. In: Proceedings of the Usenix Workshop on Hot Topics on Cloud Computing (HotCloud’09). ´ [6] Cauchy, A. L., 1821. Cours d’analyse de l’Ecole Royale Polytechnique, premi` ere partie: Analyse alg´ ebrique. [7] Chandra, A., Weissman, J., 2009. Nebulas: Using distributed voluntary resources to build clouds. In: Proceedings of the Usenix Workshop on Hot Topics on Cloud Computing (HotCloud’09). [8] Chun, B. G., Dabek, F., Haeberlen, A., Sit, E., Weatherspoon, H., Kaashoek, M. F., Kubiatowicz, J., 2006. Efficient replica maintenance for distributed storage systems. machine availability estimation. In: Symposium on Networked Systems Design and Implementation (NSDI). [9] Church, K., Greenberg, A., Hamilton, J., 2008. On delivering embarrassingly distributed cloud services. In: Proceedings of 7th Workshop on Hot Topics in Networks (HotNets). [10] CleverSafe, 2010. Cleversafe. http://www.cleversafe.com. [11] Cox, L., Noble, B., 2002. Pastiche: Making backup cheap and easy. In: Proceedings of 5th USENIX Symposium on Operating Systems Design and Implementation (OSDI). [12] Datta, A., Aberer, K., 2006. Internet-scale storage systems under churn. a study of the steady-state using markov models. In: Proceedings of the 6th Intl. Conference on Peer-to-Peer Computing (P2P). [13] Deng, Y., Wang, F., 2007. A heterogeneous storage grid enabled by grid service. SIGOPS Oper. Syst. Rev. 41 (1), 7–13. [14] Dimakis, A., Godfrey, P., Wainwright, M., Ramchandran, K., 2007. Network coding for distributed storage systems. In: Proceedings of the 26th IEEE Intl. Conference on Computer Communications (INFOCOM). [15] Duminuco, A., Biersack, E. W., 2008. Hierarchical codes: How to make erasure codes attractive for peer-to-peer storage systems. In: Proceedings of the 8th Intl. Conference on Peer-to-Peer Computing (P2P). [16] Duminuco, A., Biersack, E. W., En-Najjary, T., 2007. Proactive replication in distributed storage systems using machine availability estimation. In: Proceedings of the 3rd CoNEXT conference (CONEXT). [17] Fakult, M. L., 2004. Peerstore: Better performance by relaxing in peer-to-peer backup. In: Proceedings of the 4th Intl. Conference on Peer-to-Peer Computing (P2P). [18] Godfrey, B., 2010. Repository of availability traces. http://www.cs.berkeley.edu/pbg/availability/. [19] Guha, S., Daswani, N., Jain, R., 2006. An experimental study of the skype peer-to-peer voip system. In: Proceedings of the 5th Intl. Workshop on Peer-to-Peer Systems (IPTPS). [20] Javadi, B., Kondo, D., Vincent, J., Anderson, D., 2009. Mining for statistical availability models in large-scale distributed systems: An empirical study of seti@home. In: 17th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS). [21] Kennedy, J., Eberhart, R., 1995. Particle swarm optimization. In: Proceedings of the IEEE Intl. Conference on Neural Networks.

26

[22] Kiran, R. B., Tati, K., Cheng, Y.-c., Savage, S., Voelker, G. M., 2004. Total recall: System support for automated availability management. In: Symposium on Networked Systems Design and Implementation (NSDI). [23] Knuth, D., 1998. The Art of Computer Programming - Sorting and Searching, 2nd Edition. Vol. 3. Addison-Wesley. [24] Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., Zhao, B., 2000. Oceanstore: An architecture for global-scale persistent storage. In: Proceedings of the 9th Intl. Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). [25] Lin, W. K., Chiu, D. M., Lee, Y. B., 2004. Erasure code replication revisited. In: Proceedings of the 4th Intl. Conference on Peer-to-Peer Computing (P2P). [26] Pamies-Juarez, L., Garc´ıa-L´ opez, P., S´ anchez-Artigas, M., 2009. Heterogeneity-aware erasure codes for peer-to-peer storage systems. In: Proceedings of the 38th IEEE Intl. Conference on Parallel Processing (ICPP). [27] Rackspace, 2009. Mosso. http://www.rackspacecloud.com/. [28] Rajasekar, A., Wan, M., Moore, R., Kremenek, G., Guptil, T., 2003. Data grids, collections, and grid bricks. In: MSS ’03: Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS’03). IEEE Computer Society, Washington, DC, USA, p. 2. [29] Rodrigues, R., Liskov, B., 2005. High availability in dhts: Erasure coding vs. replication. In: Proceedings of the 4th Intl. Workshop on Peer-To-Peer Systems (IPTPS). [30] Shi, Y., Eberhart, R. C., 1998. Parameter selection in particle swarm optimization. In: Proceedings of the 7th International Conference on Evolutionary Programming (EP). [31] Sit, E., Haeberlen, A., Dabek, F., Chun, B., Weatherspoon, H., Morris, R., Kaashoek, M. F., Kubiatowicz, J., 2006. Proactive replication for data durability. In: Proceedings of the 5th Intl. Workshop on Peer-to-Peer Systems (IPTPS). [32] standford.edu, 2009. Folding@home: distributing computing project. http://folding.stanford.edu. [33] Stoica, I., Morris, R., Karger, D., Kaashoek, F. M., Balakrishnan, H., 2001. Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM). [34] Stribling, J., 2010. Planetlab all pairs ping. http://infospect.planet-lab.org/pings. [35] Vaquero, L. M., Rodero-Merino, L., Caceres, J., Lindner, M., 2009. A break in the clouds: towards a cloud definition. SIGCOMM Comput. Commun. Rev. 39 (1), 50–55. [36] Weatherspoon, H., Kubiatowicz, J. D., 2002. Erasure coding vs. replication: A quantitative comparison. In: Proceedings of the 1st Intl. Workshop on Peer-To-Peer Systems (IPTPS). [37] Wu, F., Qiu, T., Chen, Y., Chen, G., 2005. Redundancy schemes for high availability in dhts. In: Proceedings of the 3rd Intl. Symposium on Parallel and Distributed Processing and Applications (ISPA). [38] WuaLa, 2010. Wuala. http://www.wuala.com. [39] Zhang, Z., Lian, Q., 2002. Reperasure: replication protocol using erasure-code in peer-to-peer storage network. In: Proceedings of the 21st Symposium on Reliable Distributed Systems (SRDS).

27

Suggest Documents