Scaling OSN's without pains

20 downloads 110 Views 2MB Size Report
Real OSN data. • Twitter. – 2.4M users, 48M edges,. 12M tweets for 15 days. (Twitter as of Dec08) ... optimization + hack. • SPAR online ... Twitter 128 partitions.
The Little Engine(s) that could: Scaling Online Social Networks Josep M. Pujol, Vijay Erramilli, Georgos Siganos, Xiaoyuan Yang Nikos Laoutaris, Parminder Chhabra and Pablo Rodriguez Telefonica Research, Barcelona

ACM/SIGCOMM 2010 – New Delhi, India

Problem Solution Implementation Conclusions

2

ACM/SIGCOMM 2010 – New Delhi, India

Scalability is a pain: the designers’ dilemma “Scalability is a desirable property of a system, a network, or a process, which indicates its ability to either handle growing amounts of work in a graceful manner or to be readily enlarged"

The designers’ dilemma: wasted complexity and long time to market

vs. short time to market but risk of death by success 3

ACM/SIGCOMM 2010 – New Delhi, India

Towards transparent scalability The advent of the Cloud: virtualized machinery + gigabit network

Hardware scalability Application scalability 4

ACM/SIGCOMM 2010 – New Delhi, India

Towards transparent scalability Elastic resource allocation for the Presentation and the Logic layer: More machines when load increases.

Components are stateless, therefore, independent and duplicable

Components are interdependent, therefore, non-duplicable on demand.

5

ACM/SIGCOMM 2010 – New Delhi, India

Scaling the data backend... • Full replication – Load of read requests decrease with the number of servers – State does not decrease with the number of servers

6

ACM/SIGCOMM 2010 – New Delhi, India

Scaling the data backend... • Full replication – Load of read requests decrease with the number of servers – State does not decrease with the number of servers – Maintains data locality

7

ACM/SIGCOMM 2010 – New Delhi, India

Scaling the data backend... • Full replication – Load of read requests decrease with the number of servers – State does not decrease with the number of servers – Maintains data locality Operations on the data (e.g. a query) can be resolved in a single server from the application perspective Centralized Distributed Application Logic Application Logic

Data Store

8

Data Store

ACM/SIGCOMM 2010 – New Delhi, India

Data Store

Data Store

Scaling the data backend... • Full replication – Load of read requests decrease with the number of servers – State does not decrease with the number of servers – Maintains data locality

• Horizontal partitioning or sharding – Load of read requests decrease with the number of servers – State decrease with the number of servers – Maintains data locality as long as…

9

ACM/SIGCOMM 2010 – New Delhi, India

Scaling the data backend... • Full replication – Load of read requests decrease with the number of servers – State does not decrease with the number of servers – Maintains data locality

• Horizontal partitioning or sharding – Load of read requests decrease with the number of servers – State decrease with the number of servers – Maintains data locality as long as… … the splits are disjoint (independent)

Bad news for OSNs!! 10

ACM/SIGCOMM 2010 – New Delhi, India

Why sharding is not enough for OSN? • Shards in OSN can never be disjoint because: – Operations on user i require access to the data of other users – at least one-hop away.

• From graph theory: – there is no partition that for all nodes all neighbors and itself are in the same partition if there is a single connected component.

Data locality cannot be maintained by partitioning a social network!! 11

ACM/SIGCOMM 2010 – New Delhi, India

A very active area… • Online Social Networks are massive systems spanning hundreds of machines in multiple datacenters • Partition and placement to minimize network traffic and delay – Karagiannis et al. “Hermes: Clustering Users in Large-Scale Email Services” -- SoCC 2010 – Agarwal et al. “Volley: Automated Data Placement of GeoDistributed Cloud Services” -- NSDI 2010 • Partition and replication towards scalability – Pujol et al “Scaling Online Social Networks without Pains” -NetDB / SOSP 2009

12

ACM/SIGCOMM 2010 – New Delhi, India

Problem Solution Implementation Conclusions

13

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101 1

2

3

4 5

0

9

14

8

7

6

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101 1

2

Me 3

4 5

0

9

15

8

7

6

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101 1

2

Friends 3

4 5

0

9

8

7

6

Inbox []

16

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101 1

2

3

4 5

0

9

8

7

6

[] key-101

17

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101 1

2

3

4 5

0

9

8

7

6

[key-101] key-101

18

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101 1

2

3

4 5

0

9

8

7

6

[key-146, key-121, key-105, key-101] key-101 key-105 key-121 key-146

19

ACM/SIGCOMM 2010 – New Delhi, India

Let’s see what’s going on in the world

OSN’s operations 101 1

2

3

4 5

0

9

8

7

6

[key-146, key-121, key-105, key-101] key-101 key-105 key-121 key-146

20

ACM/SIGCOMM 2010 – New Delhi, India

Let’s see what’s going on in the world

OSN’s operations 101 1

2

3

4 5

0

9

8

[key-146, key-121, key-105, key-101]

zz

7

6

1) FETCH

key-101 key-105 key-121 key-146

21

ACM/SIGCOMM 2010 – New Delhi, India

–inbox [from server A ]

Let’s see what’s going on in the world

OSN’s operations 101 1

2

3

4 5

0

9

8

[key-146, key-121, key-105, key-101]

7

6

1) FETCH

R= [key-146, key-121, key-105, key-101]

key-101 key-105 key-121 key-146

22

–inbox [from server A ]

ACM/SIGCOMM 2010 – New Delhi, India

Let’s see what’s going on in the world

OSN’s operations 101 1

2

3

4 5

0

9

8

[key-146, key-121, key-105, key-101] key-101 key-105

7

6

1) FETCH

R= [key-146, key-121, key-105, key-101]

2) FETCH text WHERE key IN R [from server ? ]

key-121 key-146

23

–inbox [from server A ]

ACM/SIGCOMM 2010 – New Delhi, India

Let’s see what’s going on in the world

OSN’s operations 101 1

2

3

4 5

0

9

8

[key-146, key-121, key-105, key-101] key-101 key-105 key-121 key-146

24

7

6

1) FETCH

–inbox [from server A ]

R= [key-146, key-121, key-105, key-101]

2) FETCH text WHERE key IN R [from server ? ]

Where is the data?

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s Operations 101 1

2

A

3

4

A

5

0

9

8

[key-146, key-121, key-105, key-101] key-101 key-105 key-121 key-146

25

A

7

6

A

1) FETCH

–inbox [from server A ]

R= [key-146, key-121, key-105, key-101]

2) FETCH text WHERE key IN R [from server ? ]

in

A

ACM/SIGCOMM 2010 – New Delhi, India

, then data locality!

OSN’s Operations 101 1

2

B

3

4

A

F 5

0

9

8

[key-146, key-121, key-105, key-101] key-101 key-105 key-121 key-146

26

7

6

E

1) FETCH

–inbox [from server A ]

R= [key-146, key-121, key-105, key-101]

2) FETCH text WHERE key IN R [from server ? ]

in B

ACM/SIGCOMM 2010 – New Delhi, India

F

E

, then no data locality!

OSN’s operations 101 1) Relational Databases



Selects and joins across multiple shards of the database are possible but performance is poor (e.g. MySQL Cluster, Oracle Rack)

2) Key-Value Stores (DHT) 

More efficient than relational databases: multi-get primitives to transparently fetch data from multiple servers.



But it’s not a silver bullet: o lose SQL query language => programmatic queries o lose abstraction from data operations o suffer from high traffic, eventually affecting performance: • Incast issue • Multi-get hole • latency dominated by the worse performing server

27

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

0

9

28

8

7

6

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

0 8

9 1

0 4

5 7

Server 1

29

7

6 3

2 9

8 6

Server 2

Users are assigned to servers randomly Random partition (DHT-like)

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

0 8

9 1

0 4

30

5 7

7

6 3

2 9

8

Random Partitioning – DHT

6

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

0 8

9 1

0 4

31

5 7

7

6 3

2 9

8

Random Partitioning – DHT

6

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

0 8

9 1

0 4

32

5 7

7

6 3

2 9

8 6

Random Partitioning – DHT

How do we enforce data locality?

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

0 8

9 1

0 4

33

5 7

7

6 3

2 9

8 6

Random Partitioning – DHT

How do we enforce data locality?

Replication

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

3

2

4

5

0 8

9 1

0 4

5

6 3

7

2 9

4

34

7

8 6

7

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

0 8

9 1

0 4

5

7

6 3

7

2 9

4

8 6

7

#4 #4 Master Master+ +Replica Replica

35

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

0 8

9 1

0 4

5

7

6 3

7

9

4

36

2

8 6

That does it for user #3, but what about the others?

7

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

0 8

9 1

0 4

5

7

6 3

7

9

4

37

2

8 6

7

That does it for user #3, but what about the others?

The very same procedure…

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

0 8

9 1

0 4

3

6 3

7

2 9

38

5

7

6

2 9

8

After a while…

4

8 6

7 0

Data locality for reads for any user is guaranteed!

1 5

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

0 8

9 1

0 4

3

6 3

7

2 9

39

5

7

9

8 6

2

4

8 6

7 0

Data locality for reads for any user is guaranteed!

1 5

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

0 8

9 1

0 4

3

6 3

7

2 9

40

5

7

9

8 6

2

4

8 6

7 0

Data locality for reads for any user is guaranteed!

1 5

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

0 8

9 1

0 4

3

5

6 3

7

2 9

7

9

8 6

8

Data locality for reads for any user is guaranteed!

1

Random partition + Replication can lead to high replication overheads!

2

4

6

7 0

5

This is full replication!! 41

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

0 8

9 1

0 4

3

5

6 3

7

2 9

7

9

8 6

8

Data locality for reads for any user is guaranteed!

1

Random partition + Replication can lead to high replication overheads!

2

4

6

7 0

5

Can we do better?

42

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

We can leverage the underlying social structure for the partition…

8

Data locality for reads in any user is guaranteed!

1

Random partition + Replication can lead to high replication overheads!

0 8

9 1

0 4

3

6 3

7

2 9

43

5

7

9

8 6

2

4

6

7 0

5

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

We can leverage the underlying social structure for the partition…

8

Data locality for reads in any user is guaranteed!

1

Random partition + Replication can lead to high replication overheads!

0 8

9 1

0 4

3

6 3

7

2 9

44

5

7

9

8 6

2

4

6

7 0

5

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

We can leverage the underlying social structure for the partition…

8

Data locality for reads in any user is guaranteed!

1

Random partition + Replication can lead to high replication overheads!

5

Social Partition

0 8

9 1

0 4

3

6 3

7

2

0

1

9

8

2 9

8

4

6

9

45

5

7

6

7 5

0

2

3

4

7

6

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

We can leverage the underlying social structure for the partition…

8

Data locality for reads in any user is guaranteed!

1

Random partition + Replication can lead to high replication overheads!

Social Partition and Replication: SPAR

0 8

9 1

0 4

3

5

7

6 3

7

2

9

8

4

6

9

2 6

7 5

0

0

1

2

3

4

5

9

8

3

7

6

2

46

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

We can leverage the underlying social structure for the partition…

8

Data locality for reads in any user is guaranteed!

1

Random partition + Replication can lead to high replication overheads!

0 8

9 1

0 4

3

5

7

6 3

7

2

9

8

4

6

9

2 6

7 5

0

0

1

2

3

4

5

Social Partition and Replication: SPAR

9

8

3

7

6

2

… only two replicas to achieve data locality

47

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality 1

2

3

4

5

We can leverage the underlying social structure for the partition…

8

Data locality for reads in any user is guaranteed!

1

Random partition + Replication can lead to high replication overheads!

0 8

9 1

0 4

3

5

7

6 3

7

2

9

8

4

6

9

2 6

7 5

0

0

1

2

3

4

5

Social Partition and Replication: SPAR

9

8

3

7

6

2

… only two replicas to achieve data locality

48

ACM/SIGCOMM 2010 – New Delhi, India

SPAR Algorithm from 10000 feet “All partition algorithms are equal, but some are more equal than others”

49

ACM/SIGCOMM 2010 – New Delhi, India

SPAR Algorithm from 10000 feet “All partition algorithms are equal, but some are more equal than others”

1) Online (incremental) a) Dynamics of the SN (add/remove node, edge) b) Dynamics of the System (add/remove server)

2) Fast (and simple)

a) Local information b) Hill-climbing heuristic c) Load-balancing via back-pressure

3) Stable

a) Avoid cascades

4) Effective

a) Optimize for MIN_REPLICA (i.e. NP-Hard) b) While maintaining a fixed number of replicas per user for redundancy (e.g. K=2)

50

ACM/SIGCOMM 2010 – New Delhi, India

Looks good on paper, but... ... let’s try it for real

Real OSN data

Partition algorithms

• Twitter

• Random (DHT) • METIS • MO+ • modularity optimization + hack

– 2.4M users, 48M edges, 12M tweets for 15 days (Twitter as of Dec08)

• Orkut (MPI) – 3M users, 223M edges

• Facebook (MPI) – 60K users, 500K edges 51

• SPAR online

ACM/SIGCOMM 2010 – New Delhi, India

How many replicas are generated? Twitter 16 partitions

Twitter 128 partitions

Rep. overhead

% over K=2

Rep. overhead

% over K=2

SPAR

2.44

+22%

3.69

+84%

MO+

3.04

+52%

5.14

+157%

METIS

3.44

+72%

8.03

+302%

Random

5.68

+184%

10.75

+434%

52

ACM/SIGCOMM 2010 – New Delhi, India

How many replicas are generated? Twitter 16 partitions

Twitter 128 partitions

Rep. overhead

% over K=2

Rep. overhead

% over K=2

SPAR

2.44

+22%

3.69

+84%

MO+

3.04

+52%

5.14

+157%

METIS

3.44

+72%

8.03

+302%

Random

5.68

+184%

10.75

+434%

2.44 replicas per user (on average) 2 to guarantee redundacy + 0.44 to guarantee data locality (+22%) 2.44 53

ACM/SIGCOMM 2010 – New Delhi, India

How many replicas are generated? Twitter 16 partitions

Twitter 128 partitions

Rep. overhead

% over K=2

Rep. overhead

% over K=2

SPAR

2.44

+22%

3.69

+84%

MO+

3.04

+52%

5.14

+157%

METIS

3.44

+72%

8.03

+302%

Random

5.68

+184%

10.75

+434%

Replication with random partitioning, too costly!

54

ACM/SIGCOMM 2010 – New Delhi, India

How many replicas are generated? Twitter 16 partitions

Twitter 128 partitions

Rep. overhead

% over K=2

Rep. overhead

% over K=2

SPAR

2.44

+22%

3.69

+84%

MO+

3.04

+52%

5.14

+157%

METIS

3.44

+72%

8.03

+302%

Random

5.68

+184%

10.75

+434%

Other social based partitioning algorithms have higher replication overhead than SPAR.

55

ACM/SIGCOMM 2010 – New Delhi, India

Problem Solution Implementation Conclusions

56

ACM/SIGCOMM 2010 – New Delhi, India

The SPAR architecture

57

ACM/SIGCOMM 2010 – New Delhi, India

The SPAR architecture

Application, 3-tiers 58

ACM/SIGCOMM 2010 – New Delhi, India

The SPAR architecture

SPAR Middleware Application, 3-tiers (data-store specific)

Application, 3-tiers 59

ACM/SIGCOMM 2010 – New Delhi, India

The SPAR architecture SPAR Controller : Partiton manager and Directory Service

SPAR Middleware Application, 3-tiers (data-store specific)

Application, 3-tiers 60

ACM/SIGCOMM 2010 – New Delhi, India

SPAR in the wild • Twitter clone (Laconica, now Statusnet) – Centralized architecture, PHP + MySQL/Postgres

• Twitter data – Twitter as of end of 2008, 2.4M users, 12M tweets in 15 days

• The little engines – 16 commodity desktops: Pentium Duo at 2.33Ghz, 2GB RAM connected with Gigabit-Ethernet switch

• Test SPAR on top of: – MySQL (v5.5) – Cassandra (v0.5.0) 61

ACM/SIGCOMM 2010 – New Delhi, India

SPAR on top of MySQL • Different read request rate (application level): – 1 call to LDS, 1 join to on the notice and notice inbox to get the last 20 tweets and its content – User is queried only once per session (4min)

99th < 150ms

MySQL Full-Replication

SPAR + MySQL

16 req/s

2500 req/s

– Spar allows for data to be partitioned, improving both query times and cache hit ratios. – Full replication suffers from low cache hit ratio and large tables (1B+)

62

ACM/SIGCOMM 2010 – New Delhi, India

SPAR on top of Cassandra •Different read request rate (application level): Vanilla Cassandra

99th < 100ms 200 req/s

SPAR + Cassandra

800 req/s

Network bandwidth is not an issue but Network I/O and CPU I/O are: – Network delays is reduced • Worse performing server produces delays

– Memory hit ratio is increased • Random partition destroys correlations 63

ACM/SIGCOMM 2010 – New Delhi, India

Problem Solution Implementation Conclusions

64

ACM/SIGCOMM 2010 – New Delhi, India

Conclusions

SPAR provides the means to achieve transparent scalability 1) For applications using RDBMS (not necessarily limited to OSN) 2) And, a performance boost for key-value stores due to the reduction of Network I/O

65

ACM/SIGCOMM 2010 – New Delhi, India

Thanks!

[email protected] http://research.tid.es/jmps/ @solso

66

ACM/SIGCOMM 2010 – New Delhi, India