Sep 25, 2012 ... Introduction. □ Background. □ Black-box Performance Modeling for SSDs. □
Evaluation. □ Conclusion. Esos Lab. SSD Seminar. 2012-09-25 ...
Mass Storage Systems and Technologies (MSST), 2011 IEEE 27th Symposium on, MSST, 2011
Performance Modeling and Analysis of Flashbased Storage Devices {H. Howei Huang and Shan Li}†, {Alex Szalay and Andreas Terzis} †† †George Washington University ††Johns Hopkins University
September 25, 2012
Contents
Introduction Background Black-box Performance Modeling for SSDs Evaluation Conclusion
漢陽大學校
2012-09-25
Esos Lab. SSD Seminar
Introduction
Benefit of accurate performance model Understand the state-of-art of SSDs Research tool for exploring the design space
Previous models for HDD cannot account for characteristics of SSD
Utilize black-box modeling approach to analyze and evaluate SSD performance Latency, Bandwidth, Throughput
Construct black-box model using synthetic workloads and real world trace on three SSDs, as well as an SSD RAID
漢陽大學校
2012-09-25
Esos Lab. SSD Seminar
3
Black-box Performance Modeling for SSDs
Performance tends to be highly correlated with the workload characteristics
Black-box model can be constructed in two steps Benchmark an SSD and collect the training Utilize the statistical methods to quantify the correlations
漢陽大學校
2012-09-25
Esos Lab. SSD Seminar
4
Basic Model
wc : Workload characteristics p : Predicted performance metric f : Performance function Workload is defined as Stream of IO requests HDD workload can be characterized as
Read-write ratio – Percentage of writes in the request Request size – Number of bytes transferred to/from the storage device Queue depth – Number of outstanding IOs Request randomness – percentage of random accesses in IO request stream
Focus on three performance metric Latency, Bandwidth, throughput in IOs per second
漢陽大學校
2012-09-25
Esos Lab. SSD Seminar
5
Experiment Setup
Run experiments on the machines with Intel Core 2 Duo 2.93 GHz, 4GB memory, and Linux kernel 2.6 Test on three SSDs, one hard drive, as well as an SSD RAID
Training data is generated by a synthetic I/O workload Generator
Intel storage toolkit Each I/O request is run for one minute For each device, we run 12,000 one‐minute workloads
In total take about 200 hours (about 8 days) to complete
Evaluate with synthetic I/O requests, and four real‐world traces from OLTP applications and a web search engine 漢陽大學校
2012-09-25
Esos Lab. SSD Seminar
6
Basic Model
Impact of the write ratio and queue depth on the latency, bandwidth, and throughput of the SSD
Default values when a parameter is not being analyzed
Write ratio – 0% for read, or 100% for write Queue depth – 1 Read size – 256KB Write size – 256KB All tests use 100% randomness for both writes and reads
漢陽大學校
2012-09-25
Esos Lab. SSD Seminar
7
Basic Model
Impact of Write Ratio on Latency, Bandwidth, and throughput SSD outperforms HDD – especially when dealing with a lot of read requests
Write ratio has a large influence on all three performance metrics As writes increase in workload
the latency Bandwidth and throughput
When there are more writes, HDD outperforms some SSD_S(samsung)
150Mb/s
10ms
20MB/s
2ms 漢陽大學校
2012-09-25
Esos Lab. SSD Seminar
8
Basic Model
Impact of Queue depth on Latency, Bandwidth, Throughput
Queue depth significantly affects the latency But impact on bandwidth and throughput are minimal
漢陽大學校
2012-09-25
Esos Lab. SSD Seminar
9
Extended Model
Consider additional parameters to improve model accuracy Read and write stride – effect of request alignments Read and write size – SSD asymmetric in read/write performance Read and write randomness
漢陽大學校
2012-09-25
Esos Lab. SSD Seminar
10
Extended Model
Write size against Latency, Bandwidth, and Throughput As size of write/read increases latency increases
漢陽大學校
2012-09-25
Esos Lab. SSD Seminar
11
Extended Model
Impact of access patterns on latency When write size changes under random access pattern, SSD_S suffers performance degradation
漢陽大學校
2012-09-25
Esos Lab. SSD Seminar
12
Regression Tree
Apply statistical machine learning algorithms Use the least-squares approach to fit the linear model Construct a regression tree from the function Recursively split the input variables into leaf nodes to minimize mean square errors Leaf nodes provide the predicted values for dependent variables as a constant function of independent variables Best split - Minimize the mean square error among all training data at the leaf nodes
W.-Y. Loh, “Guide regression tree version 7.9,” 2009 漢陽大學校
2012-09-25
Esos Lab. SSD Seminar
Prediction of the bandwidth as a function of the workload characteristics that are listed on the path 13
Evaluation Metric
Mean Absolute Error (MAE) – | p - p’ | Difference between the observed and predicated performance
Mean Relative Error (MRE) – | (p - p’) / p | Ratio between the absolute error and the observed performance
R2 = 1 – SSE/SST is used to determine how well the performance is likely to be predicted by the model SSE – Sum of squares due to error ∑(pi - p’i) SST – Total sum of squares ∑(pi - p’i)2 R2 – goodness of fit measure
A better model has Smaller MAE and MRE R2 close to 1
漢陽大學校
2012-09-25
Esos Lab. SSD Seminar
14
Microbenchmarks Very high HDD model does not predict well
漢陽大學校
2012-09-25
Esos Lab. SSD Seminar
Increased significantly
Better accuracy
15
Model Improvements
For MRE, all the devices have close to or higher than 60% improvements for three performance models For SSD_I there is 80% improvement Large increases in R2 values for SSD_I and SSD_S
漢陽大學校
2012-09-25
Esos Lab. SSD Seminar
16
Prediction Accuracy of Eight Workloads
-
漢陽大學校
2012-09-25
MRE values of rd_only and wr_only are between 0.4% and 7% Most MRE values are less than 10%
Esos Lab. SSD Seminar
17
Conclusion
Flash‐based solid‐state drives play an important role in today’s storage systems
An accurate performance model will help
A good black‐box model can be constructed for SSDs Regression Tree
漢陽大學校
2012-09-25
Esos Lab. SSD Seminar
18