Zoolander: Efficient Latency Management in NoSQL Stores

9 downloads 108 Views 459KB Size Report
Figure 2: We setup Apache Zookeeper[1] on 3 nodes in a private cluster. We issued 100K writes, no concurrency. The slowest 10% accounted for 20% of total  ...
Zoolander: Efficient Latency Management in NoSQL Stores Aniket Chakrabarti , Christopher Stewart , Daiyi Yang and Rean Griffith 1 2 3 The Ohio State University , StumbleUpon.com and VMWare 1

Max

Smart teacher

Surprise!

Mrs. Predictable, please prepare Marshall creates the best project, an example of your student's even though his last few projects work for Parent Night. were not very good. Yes, sir. I will make every student do a project. Then I will pick the best for display.

Service times in key value stores exhibit heavy tails – Typical response times are very fast (1ms) – Type of request (get or put) has little effect – Occasional delays are relatively HUGE • 50ms to 1ms buffer dump; 500ms for DNS issue Figure 2: We setup Apache Zookeeper[1] on 3 nodes in a private cluster. We issued 100K writes, no concurrency

Good results Of course, Parents are not happy. They just are not angry with Mrs. Predictable. #1 Teacher

How about the food at Parent Night? It was awful. I want...

NoSQL Stores – No Latency Surprises

100

0 to 50th





Narrow APIs for data acces; get(key) or put(key,val) Scale throughput via scale out

Zoolander Manager

Application data shards



“99% of NoSQL puts must complete within 15ms”

Key Problem: Periodic background jobs unexpectedly tie up resources, creating delays

Figure 1: An Internet service using NoSQL Storage, 95% of the time user user user user 0 1 n-1 n

Setup: Cassandra flushes write buffers every second. Each flush takes 50ms.

Figure 2: An Internet service using NoSQL Storage, 5% of the time user user user user 0 1 n-1 n

Result: Angry Users View cart

Modify cart

USR

Search item

Inventory

NoSQL Storage

User requests must wait Write buffer dumping daemon

Modify cart

USR

Duplicate (p=0, d=n)

We leased 144 EC2 units for Zoolander + Cass; we hit 960M accesses per day

At night time, arrivals rate drop by up to 50%; Turn servers off to save money

At night time, Zoolander replicated for predictability; Mitigate surprises

Get B

Search item

Inventory

NoSQL Storage

Estimated cost TripAdvisor

1.5 1.0 0.5 0.0 0.0001

Private Cloud EC2 0.0010

0.0100

0.1000

1.0000

Emerging Workloads = Bigger Impact Map-Reduce Services: – SLO violations force managers to keep map/reduce nodes on pay-as-you-go clouds

2. Selectively Use Replication for Predictability – Send all requests to all nodes, take first response – Adding nodes improve predictability, not throughput – Mitigate surprises Get A

2.0

Cost of SLO Violations (x1000)

Partition #1

1. Target SLO is an input – Zoolander meets very strict, low latency SLOs – 99.99% of requests within 15ms (orders of magnitude better than the state of the art)

Get A View cart

Duplicate (p=0, d=1)

TripAdvisor published its Memcached workload; 700M accesses per day

k4=v4 k5=v5

Service level objectives now include latency 

Montioring/ Feedback

Partition #0 Duplicate (p=0, d=0)

Large-Scale Case Study

Result: Zoolander reduced SLO violations by 32%, but used twice as many servers.

Zoolander Key-Value Store

k2=v2 k3=v3

Low latency for every access remains challenging 

add/remove duplicates

add/remove partitions

k6=v6 

Systems data

Analytic Model

k1=v1

In practice, 1010 accesses per day is common

90 to 99.9th

Zoolander: Middleware for NoSQL Stores SLO details



50 to 90th

It's not just Zookeeper! – 99.9th Percentile in Google BigTable is 31X the mean[2] Cassandra on EC2, only 98% of reads within 3X of mean

Apache Zookeeper

Diminishing returns

0.06 0.04 0.02 0

Utilization – Should we replicate for predictability? • Only if it is clearly better that partitioning

Percentile Range The slowest 10% accounted for 20% of total delay.

remap

Zoolander Manager adds nodes carefully, Model Driven – Efficient policy, i.e., meet SLO using fewest nodes Figure 3: M/M/1 model – Should we partition? 0.1 0.08 • Not under low utilization

0.9 0.7 0.5 0.3 0.1

50 0

SLO is 98% of puts within 15ms – How do we reach 99% or 99.99%? Add nodes!

Latency (secs)

Mrs. Risky, please prepare an Max picks the weekend of Of course, Parents are not example of your student's Parent Night to try something new, Empathetic. They are just angry. work for Parent Night. forgetting to do his homework Yes, sir. I will ask Max to do it. He Max is my best student. Why isn't my kid learning anything here! I want a refund. I want free tuition. I want new teachers. I WANT!

3

Zoolander Manager

Relative Cost

Poor results

Secs

Surprise!

2

Latency Surprises are not Surprising

Comedic Analogy Naive teacher

1

Get A

Get A

Get B

Get A

Get A

Get B

Scientific Computing Services: – Moving to the cloud but use barriers heavily – Each violation delays the whole workload (recall, delays are heavy tailed) References

Partition #0 A= “Apple”

Partition #1 B= “Brae”

Replica #0 A= “Apple” B = “Brae”

Replica #1 A=”Apple” B= “Brae”

Duplicate #0 A= “Apple” B = “Brae”

Duplicate #1 A=”Apple” B= “Brae”

[1] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait-free coordination for internet-scale systems. In USENIX ATC, 2010. [2] J. Dean. Achieving rapid response times in large online services. Presentation at UC Berkeley, 2012.