Figure 2: We setup Apache Zookeeper[1] on 3 nodes in a private cluster. We
issued 100K writes, no concurrency. The slowest 10% accounted for 20% of total
...
Zoolander: Efficient Latency Management in NoSQL Stores Aniket Chakrabarti , Christopher Stewart , Daiyi Yang and Rean Griffith 1 2 3 The Ohio State University , StumbleUpon.com and VMWare 1
Max
Smart teacher
Surprise!
Mrs. Predictable, please prepare Marshall creates the best project, an example of your student's even though his last few projects work for Parent Night. were not very good. Yes, sir. I will make every student do a project. Then I will pick the best for display.
Service times in key value stores exhibit heavy tails – Typical response times are very fast (1ms) – Type of request (get or put) has little effect – Occasional delays are relatively HUGE • 50ms to 1ms buffer dump; 500ms for DNS issue Figure 2: We setup Apache Zookeeper[1] on 3 nodes in a private cluster. We issued 100K writes, no concurrency
Good results Of course, Parents are not happy. They just are not angry with Mrs. Predictable. #1 Teacher
How about the food at Parent Night? It was awful. I want...
NoSQL Stores – No Latency Surprises
100
0 to 50th
Narrow APIs for data acces; get(key) or put(key,val) Scale throughput via scale out
Zoolander Manager
Application data shards
“99% of NoSQL puts must complete within 15ms”
Key Problem: Periodic background jobs unexpectedly tie up resources, creating delays
Figure 1: An Internet service using NoSQL Storage, 95% of the time user user user user 0 1 n-1 n
Setup: Cassandra flushes write buffers every second. Each flush takes 50ms.
Figure 2: An Internet service using NoSQL Storage, 5% of the time user user user user 0 1 n-1 n
Result: Angry Users View cart
Modify cart
USR
Search item
Inventory
NoSQL Storage
User requests must wait Write buffer dumping daemon
Modify cart
USR
Duplicate (p=0, d=n)
We leased 144 EC2 units for Zoolander + Cass; we hit 960M accesses per day
At night time, arrivals rate drop by up to 50%; Turn servers off to save money
At night time, Zoolander replicated for predictability; Mitigate surprises
Get B
Search item
Inventory
NoSQL Storage
Estimated cost TripAdvisor
1.5 1.0 0.5 0.0 0.0001
Private Cloud EC2 0.0010
0.0100
0.1000
1.0000
Emerging Workloads = Bigger Impact Map-Reduce Services: – SLO violations force managers to keep map/reduce nodes on pay-as-you-go clouds
2. Selectively Use Replication for Predictability – Send all requests to all nodes, take first response – Adding nodes improve predictability, not throughput – Mitigate surprises Get A
2.0
Cost of SLO Violations (x1000)
Partition #1
1. Target SLO is an input – Zoolander meets very strict, low latency SLOs – 99.99% of requests within 15ms (orders of magnitude better than the state of the art)
Get A View cart
Duplicate (p=0, d=1)
TripAdvisor published its Memcached workload; 700M accesses per day
k4=v4 k5=v5
Service level objectives now include latency
Montioring/ Feedback
Partition #0 Duplicate (p=0, d=0)
Large-Scale Case Study
Result: Zoolander reduced SLO violations by 32%, but used twice as many servers.
Zoolander Key-Value Store
k2=v2 k3=v3
Low latency for every access remains challenging
add/remove duplicates
add/remove partitions
k6=v6
Systems data
Analytic Model
k1=v1
In practice, 1010 accesses per day is common
90 to 99.9th
Zoolander: Middleware for NoSQL Stores SLO details
50 to 90th
It's not just Zookeeper! – 99.9th Percentile in Google BigTable is 31X the mean[2] Cassandra on EC2, only 98% of reads within 3X of mean
Apache Zookeeper
Diminishing returns
0.06 0.04 0.02 0
Utilization – Should we replicate for predictability? • Only if it is clearly better that partitioning
Percentile Range The slowest 10% accounted for 20% of total delay.
remap
Zoolander Manager adds nodes carefully, Model Driven – Efficient policy, i.e., meet SLO using fewest nodes Figure 3: M/M/1 model – Should we partition? 0.1 0.08 • Not under low utilization
0.9 0.7 0.5 0.3 0.1
50 0
SLO is 98% of puts within 15ms – How do we reach 99% or 99.99%? Add nodes!
Latency (secs)
Mrs. Risky, please prepare an Max picks the weekend of Of course, Parents are not example of your student's Parent Night to try something new, Empathetic. They are just angry. work for Parent Night. forgetting to do his homework Yes, sir. I will ask Max to do it. He Max is my best student. Why isn't my kid learning anything here! I want a refund. I want free tuition. I want new teachers. I WANT!
3
Zoolander Manager
Relative Cost
Poor results
Secs
Surprise!
2
Latency Surprises are not Surprising
Comedic Analogy Naive teacher
1
Get A
Get A
Get B
Get A
Get A
Get B
Scientific Computing Services: – Moving to the cloud but use barriers heavily – Each violation delays the whole workload (recall, delays are heavy tailed) References
Partition #0 A= “Apple”
Partition #1 B= “Brae”
Replica #0 A= “Apple” B = “Brae”
Replica #1 A=”Apple” B= “Brae”
Duplicate #0 A= “Apple” B = “Brae”
Duplicate #1 A=”Apple” B= “Brae”
[1] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait-free coordination for internet-scale systems. In USENIX ATC, 2010. [2] J. Dean. Achieving rapid response times in large online services. Presentation at UC Berkeley, 2012.