Optimal File-Distribution in Heterogeneous and ... - Kedrigern

3 downloads 0 Views 419KB Size Report
Jan 25, 2011 - The Total Data Function. δT tu. 0. T. δT (tu). T. Langner et al. Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks. 15 ...
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks Tobias Langner, Christian Schindelhauer, Alexander Souza Computer Engineering and Networks Laboratory (TIK) Department for Information Technology and Electrical Engineering ETH Zurich, Switzerland

SOFSEM ’11 Nov´y Smokovec 25 January, 2011

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

1

Motivation

Distributed Storage is Ubiquitous Motivation

Storing files on multiple machines is a common habit P2P overlays basically represent storage networks File-hosting services like Amazon S3, Rapidshare, etc.

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

2

Motivation

Distributed Storage is Ubiquitous Motivation

Storing files on multiple machines is a common habit P2P overlays basically represent storage networks File-hosting services like Amazon S3, Rapidshare, etc.

Gigantic storage requirements by special applications (Google, Amazon, CERN)

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

2

Motivation

Distributed Storage is Ubiquitous Motivation

Storing files on multiple machines is a common habit P2P overlays basically represent storage networks File-hosting services like Amazon S3, Rapidshare, etc.

Gigantic storage requirements by special applications (Google, Amazon, CERN) Most existing approaches specifically tailored for data centers with homogeneous and symmetric network connections.

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

2

Motivation

Distributed Storage is Ubiquitous Motivation

Storing files on multiple machines is a common habit P2P overlays basically represent storage networks File-hosting services like Amazon S3, Rapidshare, etc.

Gigantic storage requirements by special applications (Google, Amazon, CERN) Most existing approaches specifically tailored for data centers with homogeneous and symmetric network connections. Little attention to asymmetric bandwidths typical for end-user connections

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

2

Problem Setting

Synopsis

1

Motivation

2

Problem Setting

3

Analytical Scaling Maximum Flow in Distribution Problems Total Data Function in 3D The Algorithm

4

Conclusion

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

3

Problem Setting

Distribution Network Problem Setting

si ui s1

di sm

um

u1 c d1

dm

Servers si with i ∈ S = {1, . . . , m}, and one client c

Each server si is characterized by its upload bandwidth ui and download bandwidth di T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

4

Problem Setting

Distribution Problem Problem Setting

How do we split a file f of size |f | into fragments for the servers to minimize the time for one upload and n downloads? si

s1

sm f

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

5

Problem Setting

Distribution Problem Problem Setting

How do we split a file f of size |f | into fragments for the servers to minimize the time for one upload and n downloads? si

xi s1

sm x1

Distribution vector x = (x1 , . . . , xm ) with xi is the size of the fragment of server si . T. Langner et al.

xm

f

P

xi = |f |.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

5

Problem Setting

Distribution Problem Problem Setting

How do we split a file f of size |f | into fragments for the servers to minimize the time for one upload and n downloads? si

xi s1

sm x1

f

xm

P Distribution vector x = (x1 , . . . , xm ) with xi = |f |. xi is the size of the fragment of server si . Equivalent simplified problem with |f | = n = 1. T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

5

Problem Setting

Transfer Times Problem Setting

The quality of a distribution x is given by the transfer times: Upload time: tu (x) = max i∈S

T. Langner et al.

nx o i

ui

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

6

Problem Setting

Transfer Times Problem Setting

The quality of a distribution x is given by the transfer times: Upload time: tu (x) = max i∈S

Download time: td (x) = max i∈S

T. Langner et al.

nx o i

ui nx o i

di

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

6

Problem Setting

Transfer Times Problem Setting

The quality of a distribution x is given by the transfer times: Upload time: tu (x) = max i∈S

Download time: td (x) = max i∈S

nx o i

ui nx o i

di

Total time: val(x) = tu (x) + (n ·) td (x)

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

6

Problem Setting

Objective Problem Setting

We are given a distribution network and a file f with size 1. si

s1

sm f

Objective Find the optimal distribution x∗ that minimizes the total time val(x) = tu (x) + td (x) . T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

7

Problem Setting

Important Properties Problem Setting

6

si

5 4

s1

sm



3

S

2 1

f

1

2

3

4

5

6

The distribution problem can be formulated as linear program. ⇒ Its solution space is a convex region. ⇒ Every local minimum is global.

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

8

Problem Setting

Non-Linear Optimization Problem Linear Program

Upload/download bandwidths u1 to um and d1 to dm File distribution x = (x1 , . . . , xm )

Non-Linear Program  minimize subject to

max m X

x1 x2 xm , ,..., u1 u2 um



 + max

x1 x2 xm , ,..., d1 d2 dm



xi = 1

i=1

xi ≥ 0

T. Langner et al.

for all i ∈ {1, . . . , m}

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

9

Problem Setting

Linear Optimization Problem Linear Program

Linear Program minimize subject to

tu + td m X xi = 1 i=1

xi ≤ tu ui xi ≤ td di xi ≥ 0

for all i ∈ {1, . . . , m} for all i ∈ {1, . . . , m} for all i ∈ {1, . . . , m}

Dimension: m + 2 T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

10

Analytical Scaling

Synopsis

1

Motivation

2

Problem Setting

3

Analytical Scaling Maximum Flow in Distribution Problems Total Data Function in 3D The Algorithm

4

Conclusion

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

11

Analytical Scaling

Maximum Flow in Distribution Problems

The Distribution Problem as Flow Problem Maximum Flow in Distribution Problems

Assume we are given upload time tu and download time td

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

12

Analytical Scaling

Maximum Flow in Distribution Problems

The Distribution Problem as Flow Problem Maximum Flow in Distribution Problems

Assume we are given upload time tu and download time td Download

Upload

c

T. Langner et al.

t u · u1

s1

td · d1

t u · ui

si

td · di

t u · um

sm

td · dm

c

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

12

Analytical Scaling

Maximum Flow in Distribution Problems

The Distribution Problem as Flow Problem Maximum Flow in Distribution Problems

Assume we are given upload time tu and download time td Download

Upload

c

t u · u1

s1

td · d1

t u · ui

si

td · di

t u · um

sm

td · dm

c

The maximal amount of data that can be uploaded within tu and downloaded within td corresponds to a maximum flow.

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

12

Analytical Scaling

Maximum Flow in Distribution Problems

The Distribution Problem as Flow Problem Maximum Flow in Distribution Problems

Assume we are given upload time tu and download time td Download

Upload

c

t u · u1

s1

td · d1

t u · ui

si

td · di

t u · um

sm

td · dm

c

The maximal amount of data that can be uploaded within tu and downloaded within td corresponds to a maximum flow. Value of maximum flow f ∗ equals capacity of minimal cut

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

12

Analytical Scaling

Maximum Flow in Distribution Problems

The Distribution Problem as Flow Problem Maximum Flow in Distribution Problems

Assume we are given upload time tu and download time td Download

Upload

c

t u · u1

s1

td · d1

t u · ui

si

td · di

t u · um

sm

td · dm

c

The maximal amount of data that can be uploaded within tu and downloaded within td corresponds to a maximum flow. Value of maximum flow f ∗ equals capacity of minimal cut m X ∗ val(f ) = min{tu ui , td di } i=1

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

12

Analytical Scaling

Maximum Flow in Distribution Problems

Decision Predicate Maximum Flow in Distribution Problems

There exists a distribution such that f with |f | = 1 can be uploaded and downloaded in time tu and td , respectively, if m X i=1

T. Langner et al.

min{tu ui , td di } ≥ 1

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

13

Analytical Scaling

Maximum Flow in Distribution Problems

Decision Predicate Maximum Flow in Distribution Problems

There exists a distribution such that f with |f | = 1 can be uploaded and downloaded in time tu and td , respectively, if m X i=1

min{tu ui , td di } ≥ 1

We want to find the minimum values for tu and td for which the above predicate holds.

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

13

Analytical Scaling

Maximum Flow in Distribution Problems

Decision Predicate Maximum Flow in Distribution Problems

There exists a distribution such that f with |f | = 1 can be uploaded and downloaded in time tu and td , respectively, if m X i=1

min{tu ui , td di } ≥ 1

We want to find the minimum values for tu and td for which the above predicate holds. Substitute td by T − tu to obtain the total data function. δT (tu ) =

m X i=1

T. Langner et al.

min{tu ui , (T − tu ) · di }

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

13

Analytical Scaling

Maximum Flow in Distribution Problems

Total Data Function Total Data Function

The total data function for a fixed value of T : δT (tu ) =

m X i=1

T. Langner et al.

min{tu ui , (T − tu ) · di }

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

14

Analytical Scaling

Maximum Flow in Distribution Problems

Total Data Function Total Data Function

The total data function for a fixed value of T : δT (tu ) =

m X i=1

min{tu ui , (T − tu ) · di }

δT

δT ,i (tu )

0

T. Langner et al.

T

tu

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

14

Analytical Scaling

Maximum Flow in Distribution Problems

Total Data Function Total Data Function

The total data function for a fixed value of T : δT (tu ) =

m X i=1

min{tu ui , (T − tu ) · di }

δT δT (tu )

0

T. Langner et al.

T

tu

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

14

Analytical Scaling

Maximum Flow in Distribution Problems

Total Data Function Total Data Function

The total data function for a fixed value of T : δT (tu ) =

m X i=1

min{tu ui , (T − tu ) · di }

δT δT (tu )

0

T

tu

Can be evaluated in O(m log m) steps T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

14

Analytical Scaling

Maximum Flow in Distribution Problems

Evaluation The Total Data Function

δT δT (tu )

0

T. Langner et al.

T

tu

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

15

Analytical Scaling

Maximum Flow in Distribution Problems

Evaluation The Total Data Function

δT

0

T. Langner et al.

I1

I2

p1

I3

p2

I4

p3

T

tu

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

15

Analytical Scaling

Maximum Flow in Distribution Problems

Evaluation The Total Data Function

δT

I1

I2

I3

I4

σ3

σ2

σ4

σ1

0

p1

p2

p3

T

tu

σi is slope in interval Ii .

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

15

Analytical Scaling

Maximum Flow in Distribution Problems

Evaluation The Total Data Function

δT

I1

I2

I3

I4

σ3

σ2

σ4

σ1

p1

0

p2

p3

T

tu

σi is slope in interval Ii . Recursion formula δT (pi ) = δT (pi−1 ) + σi · (pi − pi−1 ) T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

15

Analytical Scaling

Total Data Function in 3D

Scaling the Total Data Function Total Data Function in 3D

The total data function δT (tu ) determines how much data can be uploaded to and downloaded again from the servers within time T for different values of tu .

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

16

Analytical Scaling

Total Data Function in 3D

Scaling the Total Data Function Total Data Function in 3D

The total data function δT (tu ) determines how much data can be uploaded to and downloaded again from the servers within time T for different values of tu . Varying values of T only scale the total data function δT

0

T. Langner et al.

T1

T2

T3

T4

T5

tu

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

16

Analytical Scaling

Total Data Function in 3D

Two-Variable Total Data Function Total Data Function in 3D

Interpret δT (tu ) as two-variable function δ(T , tu ) now δ

T

T5 T4 T3 T1

T2 tu

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

17

Analytical Scaling

Total Data Function in 3D

Two-Variable Total Data Function Total Data Function in 3D

Interpret δT (tu ) as two-variable function δ(T , tu ) now δ

T

T5 T4 T3 T1

T2 tu

To find the minimal value of T with δ(T , tu ) ≥ 1 for some tu , establish straight line through the highest point of δ(T , tu ) T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

17

Analytical Scaling

Total Data Function in 3D

Two-Variable Total Data Function Total Data Function in 3D

Interpret δT (tu ) as two-variable function δ(T , tu ) now δ

T

T5 T4 T3

1 T1

T2 tu

To find the minimal value of T with δ(T , tu ) ≥ 1 for some tu , establish straight line through the highest point of δ(T , tu ) Determine where it intersects the plane δ = 1 T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

17

Analytical Scaling

The Algorithm

Determining the Actual Distribution The Algorithm

We determine the minimal value of T as depicted. δ

T

T5 T4 T3

1 T1

T2 tu

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

18

Analytical Scaling

The Algorithm

Determining the Actual Distribution The Algorithm

We determine the minimal value of T as depicted. δ

T

T5 T4 T3

1 T1

T2 tu

The decomposition of T into tu and td is given by the coordinates of the intersection point.

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

18

Analytical Scaling

The Algorithm

Determining the Actual Distribution The Algorithm

We determine the minimal value of T as depicted. δ

T

T5 T4 T3

1 T1

T2 tu

The decomposition of T into tu and td is given by the coordinates of the intersection point.

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

18

Analytical Scaling

The Algorithm

Determining the Actual Distribution The Algorithm

Algorithm to determine a solution with these time bounds: Iterate through all servers si ∈ S Assign to si the data amount xi = min{tu ui , td di } Return the resulting distribution

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

19

Analytical Scaling

The Algorithm

Determining the Actual Distribution The Algorithm

Algorithm to determine a solution with these time bounds: Iterate through all servers si ∈ S Assign to si the data amount xi = min{tu ui , td di } Return the resulting distribution

Through the feasibility predicate m X i=1

we know that

T. Langner et al.

P

min{tu ui , td di } = 1

xi = 1 and thus x is a valid distribution.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

19

Analytical Scaling

The Algorithm

Runtime Complexity The Algorithm

Evaluating the total data function for a fixed value of T runs in O(m log m) steps. δT

δT (tu )

0

T. Langner et al.

T

tu

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

20

Analytical Scaling

The Algorithm

Runtime Complexity The Algorithm

Evaluating the total data function for a fixed value of T runs in O(m log m) steps. δT

δT (tu )

0

tu

T

Intersecting the straight line with the plane δ = 1 is possible in constant time. δ

T

T5 T4 T3

1 T1

T2 tu

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

20

Analytical Scaling

The Algorithm

Runtime Complexity The Algorithm

Evaluating the total data function for a fixed value of T runs in O(m log m) steps. δT

δT (tu )

0

tu

T

Intersecting the straight line with the plane δ = 1 is possible in constant time. δ

T

T5 T4 T3

1 T1

T2 tu

Runtime complexity: O(m log m) T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

20

Conclusion

Synopsis

1

Motivation

2

Problem Setting

3

Analytical Scaling Maximum Flow in Distribution Problems Total Data Function in 3D The Algorithm

4

Conclusion

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

21

Conclusion

Summary Formal introduction of a new distribution problem

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

22

Conclusion

Summary Formal introduction of a new distribution problem Formulation as linear program

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

22

Conclusion

Summary Formal introduction of a new distribution problem Formulation as linear program Derived total data function from flow formulation

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

22

Conclusion

Summary Formal introduction of a new distribution problem Formulation as linear program Derived total data function from flow formulation Presented Analytical Scaling algorithm to find optimal solutions δ

T

T5 T4 T3

1 T1

T2 tu

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

22

Conclusion

Summary Formal introduction of a new distribution problem Formulation as linear program Derived total data function from flow formulation Presented Analytical Scaling algorithm to find optimal solutions δ

T

T5 T4 T3

1 T1

T2 tu

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

22

Conclusion

Future Directions

Implementation of our ideas into real-world applications

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

23

Conclusion

Future Directions

Implementation of our ideas into real-world applications Cope with non-static, fluctuating bandwidths

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

23

Conclusion

Future Directions

Implementation of our ideas into real-world applications Cope with non-static, fluctuating bandwidths Respect limited storage capacity of servers

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

23

Conclusion

Future Directions

Implementation of our ideas into real-world applications Cope with non-static, fluctuating bandwidths Respect limited storage capacity of servers Incorporation of redundancy to allow for failure tolerance

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

23

Conclusion

The End

Thank you for your attention! Questions?

T. Langner et al.

Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks

24

Suggest Documents