Jan 25, 2011 - The Total Data Function. δT tu. 0. T. δT (tu). T. Langner et al. Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks. 15 ...
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks Tobias Langner, Christian Schindelhauer, Alexander Souza Computer Engineering and Networks Laboratory (TIK) Department for Information Technology and Electrical Engineering ETH Zurich, Switzerland
SOFSEM ’11 Nov´y Smokovec 25 January, 2011
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
1
Motivation
Distributed Storage is Ubiquitous Motivation
Storing files on multiple machines is a common habit P2P overlays basically represent storage networks File-hosting services like Amazon S3, Rapidshare, etc.
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
2
Motivation
Distributed Storage is Ubiquitous Motivation
Storing files on multiple machines is a common habit P2P overlays basically represent storage networks File-hosting services like Amazon S3, Rapidshare, etc.
Gigantic storage requirements by special applications (Google, Amazon, CERN)
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
2
Motivation
Distributed Storage is Ubiquitous Motivation
Storing files on multiple machines is a common habit P2P overlays basically represent storage networks File-hosting services like Amazon S3, Rapidshare, etc.
Gigantic storage requirements by special applications (Google, Amazon, CERN) Most existing approaches specifically tailored for data centers with homogeneous and symmetric network connections.
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
2
Motivation
Distributed Storage is Ubiquitous Motivation
Storing files on multiple machines is a common habit P2P overlays basically represent storage networks File-hosting services like Amazon S3, Rapidshare, etc.
Gigantic storage requirements by special applications (Google, Amazon, CERN) Most existing approaches specifically tailored for data centers with homogeneous and symmetric network connections. Little attention to asymmetric bandwidths typical for end-user connections
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
2
Problem Setting
Synopsis
1
Motivation
2
Problem Setting
3
Analytical Scaling Maximum Flow in Distribution Problems Total Data Function in 3D The Algorithm
4
Conclusion
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
3
Problem Setting
Distribution Network Problem Setting
si ui s1
di sm
um
u1 c d1
dm
Servers si with i ∈ S = {1, . . . , m}, and one client c
Each server si is characterized by its upload bandwidth ui and download bandwidth di T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
4
Problem Setting
Distribution Problem Problem Setting
How do we split a file f of size |f | into fragments for the servers to minimize the time for one upload and n downloads? si
s1
sm f
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
5
Problem Setting
Distribution Problem Problem Setting
How do we split a file f of size |f | into fragments for the servers to minimize the time for one upload and n downloads? si
xi s1
sm x1
Distribution vector x = (x1 , . . . , xm ) with xi is the size of the fragment of server si . T. Langner et al.
xm
f
P
xi = |f |.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
5
Problem Setting
Distribution Problem Problem Setting
How do we split a file f of size |f | into fragments for the servers to minimize the time for one upload and n downloads? si
xi s1
sm x1
f
xm
P Distribution vector x = (x1 , . . . , xm ) with xi = |f |. xi is the size of the fragment of server si . Equivalent simplified problem with |f | = n = 1. T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
5
Problem Setting
Transfer Times Problem Setting
The quality of a distribution x is given by the transfer times: Upload time: tu (x) = max i∈S
T. Langner et al.
nx o i
ui
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
6
Problem Setting
Transfer Times Problem Setting
The quality of a distribution x is given by the transfer times: Upload time: tu (x) = max i∈S
Download time: td (x) = max i∈S
T. Langner et al.
nx o i
ui nx o i
di
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
6
Problem Setting
Transfer Times Problem Setting
The quality of a distribution x is given by the transfer times: Upload time: tu (x) = max i∈S
Download time: td (x) = max i∈S
nx o i
ui nx o i
di
Total time: val(x) = tu (x) + (n ·) td (x)
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
6
Problem Setting
Objective Problem Setting
We are given a distribution network and a file f with size 1. si
s1
sm f
Objective Find the optimal distribution x∗ that minimizes the total time val(x) = tu (x) + td (x) . T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
7
Problem Setting
Important Properties Problem Setting
6
si
5 4
s1
sm
⇒
3
S
2 1
f
1
2
3
4
5
6
The distribution problem can be formulated as linear program. ⇒ Its solution space is a convex region. ⇒ Every local minimum is global.
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
8
Problem Setting
Non-Linear Optimization Problem Linear Program
Upload/download bandwidths u1 to um and d1 to dm File distribution x = (x1 , . . . , xm )
Non-Linear Program minimize subject to
max m X
x1 x2 xm , ,..., u1 u2 um
+ max
x1 x2 xm , ,..., d1 d2 dm
xi = 1
i=1
xi ≥ 0
T. Langner et al.
for all i ∈ {1, . . . , m}
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
9
Problem Setting
Linear Optimization Problem Linear Program
Linear Program minimize subject to
tu + td m X xi = 1 i=1
xi ≤ tu ui xi ≤ td di xi ≥ 0
for all i ∈ {1, . . . , m} for all i ∈ {1, . . . , m} for all i ∈ {1, . . . , m}
Dimension: m + 2 T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
10
Analytical Scaling
Synopsis
1
Motivation
2
Problem Setting
3
Analytical Scaling Maximum Flow in Distribution Problems Total Data Function in 3D The Algorithm
4
Conclusion
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
11
Analytical Scaling
Maximum Flow in Distribution Problems
The Distribution Problem as Flow Problem Maximum Flow in Distribution Problems
Assume we are given upload time tu and download time td
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
12
Analytical Scaling
Maximum Flow in Distribution Problems
The Distribution Problem as Flow Problem Maximum Flow in Distribution Problems
Assume we are given upload time tu and download time td Download
Upload
c
T. Langner et al.
t u · u1
s1
td · d1
t u · ui
si
td · di
t u · um
sm
td · dm
c
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
12
Analytical Scaling
Maximum Flow in Distribution Problems
The Distribution Problem as Flow Problem Maximum Flow in Distribution Problems
Assume we are given upload time tu and download time td Download
Upload
c
t u · u1
s1
td · d1
t u · ui
si
td · di
t u · um
sm
td · dm
c
The maximal amount of data that can be uploaded within tu and downloaded within td corresponds to a maximum flow.
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
12
Analytical Scaling
Maximum Flow in Distribution Problems
The Distribution Problem as Flow Problem Maximum Flow in Distribution Problems
Assume we are given upload time tu and download time td Download
Upload
c
t u · u1
s1
td · d1
t u · ui
si
td · di
t u · um
sm
td · dm
c
The maximal amount of data that can be uploaded within tu and downloaded within td corresponds to a maximum flow. Value of maximum flow f ∗ equals capacity of minimal cut
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
12
Analytical Scaling
Maximum Flow in Distribution Problems
The Distribution Problem as Flow Problem Maximum Flow in Distribution Problems
Assume we are given upload time tu and download time td Download
Upload
c
t u · u1
s1
td · d1
t u · ui
si
td · di
t u · um
sm
td · dm
c
The maximal amount of data that can be uploaded within tu and downloaded within td corresponds to a maximum flow. Value of maximum flow f ∗ equals capacity of minimal cut m X ∗ val(f ) = min{tu ui , td di } i=1
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
12
Analytical Scaling
Maximum Flow in Distribution Problems
Decision Predicate Maximum Flow in Distribution Problems
There exists a distribution such that f with |f | = 1 can be uploaded and downloaded in time tu and td , respectively, if m X i=1
T. Langner et al.
min{tu ui , td di } ≥ 1
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
13
Analytical Scaling
Maximum Flow in Distribution Problems
Decision Predicate Maximum Flow in Distribution Problems
There exists a distribution such that f with |f | = 1 can be uploaded and downloaded in time tu and td , respectively, if m X i=1
min{tu ui , td di } ≥ 1
We want to find the minimum values for tu and td for which the above predicate holds.
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
13
Analytical Scaling
Maximum Flow in Distribution Problems
Decision Predicate Maximum Flow in Distribution Problems
There exists a distribution such that f with |f | = 1 can be uploaded and downloaded in time tu and td , respectively, if m X i=1
min{tu ui , td di } ≥ 1
We want to find the minimum values for tu and td for which the above predicate holds. Substitute td by T − tu to obtain the total data function. δT (tu ) =
m X i=1
T. Langner et al.
min{tu ui , (T − tu ) · di }
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
13
Analytical Scaling
Maximum Flow in Distribution Problems
Total Data Function Total Data Function
The total data function for a fixed value of T : δT (tu ) =
m X i=1
T. Langner et al.
min{tu ui , (T − tu ) · di }
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
14
Analytical Scaling
Maximum Flow in Distribution Problems
Total Data Function Total Data Function
The total data function for a fixed value of T : δT (tu ) =
m X i=1
min{tu ui , (T − tu ) · di }
δT
δT ,i (tu )
0
T. Langner et al.
T
tu
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
14
Analytical Scaling
Maximum Flow in Distribution Problems
Total Data Function Total Data Function
The total data function for a fixed value of T : δT (tu ) =
m X i=1
min{tu ui , (T − tu ) · di }
δT δT (tu )
0
T. Langner et al.
T
tu
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
14
Analytical Scaling
Maximum Flow in Distribution Problems
Total Data Function Total Data Function
The total data function for a fixed value of T : δT (tu ) =
m X i=1
min{tu ui , (T − tu ) · di }
δT δT (tu )
0
T
tu
Can be evaluated in O(m log m) steps T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
14
Analytical Scaling
Maximum Flow in Distribution Problems
Evaluation The Total Data Function
δT δT (tu )
0
T. Langner et al.
T
tu
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
15
Analytical Scaling
Maximum Flow in Distribution Problems
Evaluation The Total Data Function
δT
0
T. Langner et al.
I1
I2
p1
I3
p2
I4
p3
T
tu
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
15
Analytical Scaling
Maximum Flow in Distribution Problems
Evaluation The Total Data Function
δT
I1
I2
I3
I4
σ3
σ2
σ4
σ1
0
p1
p2
p3
T
tu
σi is slope in interval Ii .
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
15
Analytical Scaling
Maximum Flow in Distribution Problems
Evaluation The Total Data Function
δT
I1
I2
I3
I4
σ3
σ2
σ4
σ1
p1
0
p2
p3
T
tu
σi is slope in interval Ii . Recursion formula δT (pi ) = δT (pi−1 ) + σi · (pi − pi−1 ) T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
15
Analytical Scaling
Total Data Function in 3D
Scaling the Total Data Function Total Data Function in 3D
The total data function δT (tu ) determines how much data can be uploaded to and downloaded again from the servers within time T for different values of tu .
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
16
Analytical Scaling
Total Data Function in 3D
Scaling the Total Data Function Total Data Function in 3D
The total data function δT (tu ) determines how much data can be uploaded to and downloaded again from the servers within time T for different values of tu . Varying values of T only scale the total data function δT
0
T. Langner et al.
T1
T2
T3
T4
T5
tu
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
16
Analytical Scaling
Total Data Function in 3D
Two-Variable Total Data Function Total Data Function in 3D
Interpret δT (tu ) as two-variable function δ(T , tu ) now δ
T
T5 T4 T3 T1
T2 tu
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
17
Analytical Scaling
Total Data Function in 3D
Two-Variable Total Data Function Total Data Function in 3D
Interpret δT (tu ) as two-variable function δ(T , tu ) now δ
T
T5 T4 T3 T1
T2 tu
To find the minimal value of T with δ(T , tu ) ≥ 1 for some tu , establish straight line through the highest point of δ(T , tu ) T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
17
Analytical Scaling
Total Data Function in 3D
Two-Variable Total Data Function Total Data Function in 3D
Interpret δT (tu ) as two-variable function δ(T , tu ) now δ
T
T5 T4 T3
1 T1
T2 tu
To find the minimal value of T with δ(T , tu ) ≥ 1 for some tu , establish straight line through the highest point of δ(T , tu ) Determine where it intersects the plane δ = 1 T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
17
Analytical Scaling
The Algorithm
Determining the Actual Distribution The Algorithm
We determine the minimal value of T as depicted. δ
T
T5 T4 T3
1 T1
T2 tu
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
18
Analytical Scaling
The Algorithm
Determining the Actual Distribution The Algorithm
We determine the minimal value of T as depicted. δ
T
T5 T4 T3
1 T1
T2 tu
The decomposition of T into tu and td is given by the coordinates of the intersection point.
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
18
Analytical Scaling
The Algorithm
Determining the Actual Distribution The Algorithm
We determine the minimal value of T as depicted. δ
T
T5 T4 T3
1 T1
T2 tu
The decomposition of T into tu and td is given by the coordinates of the intersection point.
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
18
Analytical Scaling
The Algorithm
Determining the Actual Distribution The Algorithm
Algorithm to determine a solution with these time bounds: Iterate through all servers si ∈ S Assign to si the data amount xi = min{tu ui , td di } Return the resulting distribution
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
19
Analytical Scaling
The Algorithm
Determining the Actual Distribution The Algorithm
Algorithm to determine a solution with these time bounds: Iterate through all servers si ∈ S Assign to si the data amount xi = min{tu ui , td di } Return the resulting distribution
Through the feasibility predicate m X i=1
we know that
T. Langner et al.
P
min{tu ui , td di } = 1
xi = 1 and thus x is a valid distribution.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
19
Analytical Scaling
The Algorithm
Runtime Complexity The Algorithm
Evaluating the total data function for a fixed value of T runs in O(m log m) steps. δT
δT (tu )
0
T. Langner et al.
T
tu
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
20
Analytical Scaling
The Algorithm
Runtime Complexity The Algorithm
Evaluating the total data function for a fixed value of T runs in O(m log m) steps. δT
δT (tu )
0
tu
T
Intersecting the straight line with the plane δ = 1 is possible in constant time. δ
T
T5 T4 T3
1 T1
T2 tu
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
20
Analytical Scaling
The Algorithm
Runtime Complexity The Algorithm
Evaluating the total data function for a fixed value of T runs in O(m log m) steps. δT
δT (tu )
0
tu
T
Intersecting the straight line with the plane δ = 1 is possible in constant time. δ
T
T5 T4 T3
1 T1
T2 tu
Runtime complexity: O(m log m) T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
20
Conclusion
Synopsis
1
Motivation
2
Problem Setting
3
Analytical Scaling Maximum Flow in Distribution Problems Total Data Function in 3D The Algorithm
4
Conclusion
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
21
Conclusion
Summary Formal introduction of a new distribution problem
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
22
Conclusion
Summary Formal introduction of a new distribution problem Formulation as linear program
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
22
Conclusion
Summary Formal introduction of a new distribution problem Formulation as linear program Derived total data function from flow formulation
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
22
Conclusion
Summary Formal introduction of a new distribution problem Formulation as linear program Derived total data function from flow formulation Presented Analytical Scaling algorithm to find optimal solutions δ
T
T5 T4 T3
1 T1
T2 tu
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
22
Conclusion
Summary Formal introduction of a new distribution problem Formulation as linear program Derived total data function from flow formulation Presented Analytical Scaling algorithm to find optimal solutions δ
T
T5 T4 T3
1 T1
T2 tu
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
22
Conclusion
Future Directions
Implementation of our ideas into real-world applications
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
23
Conclusion
Future Directions
Implementation of our ideas into real-world applications Cope with non-static, fluctuating bandwidths
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
23
Conclusion
Future Directions
Implementation of our ideas into real-world applications Cope with non-static, fluctuating bandwidths Respect limited storage capacity of servers
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
23
Conclusion
Future Directions
Implementation of our ideas into real-world applications Cope with non-static, fluctuating bandwidths Respect limited storage capacity of servers Incorporation of redundancy to allow for failure tolerance
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
23
Conclusion
The End
Thank you for your attention! Questions?
T. Langner et al.
Optimal File-Distribution in Heterogeneous and Asymmetric Storage Networks
24