Save energy with the DB2 10.1 for Linux, UNIX, and Windows data compression feature Reduce maintenance and make your database greener Andriy Miranskyy (
[email protected]) Software engineer IBM
07 February 2013
Sedef Akinli Kocak (
[email protected]) PhD student Ryerson University Enzo Cialini (
[email protected]) Senior Technical Staff Member IBM Dr. Ayse Basar Bener Professor of Engineering Ryerson University The IBM® DB2® for Linux®, UNIX®, and Windows® data compression feature allows you to store your data in a compact form. There are two known benefits of this approach: first, reduction of storage space, and second, improvement of performance. In this article we describe a case study showing the third benefit: reduction of electricity consumption per unit of work. As a result, data compression reduces the cost of database maintenance and makes your database "greener."
Introduction Disk storage systems can often be the most expensive components of a database solution. For large warehouses or databases with huge volumes of data, the cost of the storage subsystem can easily exceed the combined cost of the hardware server and the data server software. Therefore, even a small reduction in the storage subsystem can result in substantial cost savings for the entire database solution. Data compression reduces storage requirements, improves I/O efficiency, and provides quicker access to the data from the disk. © Copyright IBM Corporation 2013 Save energy with the DB2 10.1 for Linux, UNIX, and Windows data compression feature
Trademarks Page 1 of 7
developerWorks®
ibm.com/developerWorks/
The latest release of DB2, version 10.1, introduced a new type of data compression: adaptive data compression. This feature utilizes a number of compression techniques, including table-wide and page-wide compression. (See the Resources section for a link to the data compression section of the DB2 documentation.) These compression techniques lead to significant reduction of storage space. However using this feature can lead to CPU overhead associated with compression and decompression of the data. Another positive side effect of this technology is speeding up of I/O-intensive workloads (in spite of CPU overhead). Reading data from disk to memory for processing is one of the slowest database operations. Storage of compressed data on disk leads to fewer I/O operations needed to retrieve or store the same amount of data (in comparison with the uncompressed dataset). Therefore, for disk I/O-bound workloads (for instance, when the system is waiting or idling for data to be accessed from the disk), the query processing time can be noticeably improved. Furthermore, DB2 processes buffered data in memory in its compressed form, thereby reducing the amount of memory consumed compared to uncompressed data. This has the effect of immediate increasing the amount of memory available for the database without having to increase physical memory capacity. This can further improve database performance for queries and other operations. Compression can be turned on when tables are created using the COMPRESS YES option. Alternatively, the administrator can enable compression of an existing table T by executing the SQL ALTER TABLE T COMPRESS YES. You can find additional details about using data compression feature in the developerWorks article Optimize storage with deep compression in DB2 10. How does the compression feature affect energy consumption for a given workload? In order to answer this question we have analyzed the total energy consumption of the software when processing a specific workload in two different scenarios: • Without the data compression feature • With the adaptive data compression feature We have found that the compression feature increases average energy consumption per unit of time, but decreases average consumption per statement due to significant reduction of execution time. For our workload under study this led to a 34% reduction of energy consumption per unit of work. Details of the study are given below.
Workload description and case study setup For our case study we measured the effects of two actions on the amount of resources (time and electricity) needed to complete a certain workload: • Using no compression of data • Using adaptive compression of data Our reference workload is the TPC-H standard benchmark. This is created by the Transaction Processing Performance Council and is used as the industry standard for measuring database performance. The workload consists of a set of business-oriented ad-hoc queries. The database Save energy with the DB2 10.1 for Linux, UNIX, and Windows data compression feature
Page 2 of 7
ibm.com/developerWorks/
developerWorks®
has been designed to have broad industry-wide relevance. See the TPC-H specification document for details. Using the tools provided with the workload, we populated the database with 1GB of raw data and generated 240 distinct queries associated with this dataset. The queries were executed sequentially for approximately two hours in a circular fashion on a Lenovo ThinkPad T60 laptop with 3GB of RAM. A two hour interval was selected to reduce measurement errors associated with low precision (0.01kWh) of the energy meter (UPM EM100). Note: Some of the statements consume significant amount of time. For example, a query started at 1:59 can take 5 minutes, finishing at 2:04. We counted the number of statements executed in a given time interval and measured the amount of energy consumed by DB2 running in the the two configurations: without compression, and with adaptive compression. The workload was executed against each configuration three times to estimate measurement error (expressed using relative standard error). Note: Relative standard error is calculated as the sample estimate of the population (in this case 3 workload runs) standard deviation divided by the square root of the sample size and by the population mean.
Benchmark results Table 1 shows the measurement results. First, let us look at the average execution time per statement. The slowest configuration of the database is, as expected, the one with the compression feature disabled. We treated this configuration as a baseline. Compression of data reduces the size of the tables by 61% and leads to 97% performance improvement in comparison with the reference configuration. The compression feature increases the overall energy consumption by 29% (in comparison with the reference configuration). However, energy consumption per unit of work is reduced by 34%, due to increased query throughput. Based on these data, we conclude that enabling adaptive data compression has a mixed effect on energy consumption: consumption per unit of time goes up, but consumption per unit of work goes down. Theoretically, if the workload size is fixed, hardware is dedicated only to this workload, performance of the workload is not critical, and space savings are irrelevant, it may be beneficial to run queries against uncompressed version of the database. These requirements are rarely met in practice: workloads scale up, hardware is shared between multiple tasks (especially in the virtualized or cloud computing environments), and space saving is critical.
Table 1. Workload and database characteristics with and without adaptive compression feature (± denotes relative standard error) Database feature
No compression
Average statement count per hour 415 ± 1.8%
Average electricity consumption (kWh) 0.035 ± 0.5%
Save energy with the DB2 10.1 for Linux, UNIX, and Windows data compression feature
Space consumption (MB)Header 2 1168.2
Compression ratio (see note 1 below) 100%
Watts per statement per second (see note 2 below) 302
Page 3 of 7
developerWorks®
Adaptive compression
814 ± 4.0%
ibm.com/developerWorks/
0.045 ± 0.0%
455.6
61%
199
Note 1: Compression ratio is defined as 1 – compressed size / uncompressed size. Note 2: This metric is equivalent to TPC “power per performance throughput” metric measured in Watt per transaction per second (W/tps) and is calculated as “Energy consumption” / “Work completed”. See TPC Energy Specification for details.
Threats to validity This case study shows that the method can be successfully applied to a particular dataset. Following the paradigm of the ‘‘representative’’ or ‘‘typical’’ case, this suggests that the same approach may be extrapolated to other products. Our test system is not designed for use in a production environment. It is a laptop (tuned to minimize electricity consumption, sacrificing efficacy) with consumer-grade operating system. However, we believe that the results can be extrapolated to a production system, since data compression is system-agnostic. The amount of savings for the production system will vary with system's setup and with associated workload. The savings should be more pronounced for a large system. For example, if an uncompressed system requires 100 hard drives to store the data and a compressed one requires 50, then you can immediately save 50% of energy consumed by your storage system, turning unused hard drives off.
Summary Our case study shows that, in addition to space saving and performance improvement, DB2 adaptive compression feature can reduce energy consumption per unit of work. This leads to reduction of maintenance cost and makes your database “greener”.
Save energy with the DB2 10.1 for Linux, UNIX, and Windows data compression feature
Page 4 of 7
ibm.com/developerWorks/
developerWorks®
Resources Learn • Learn more about data compression in the Data compression and performance section of the DB2 for Linux, UNIX, and Windows Information Center. • Get more information on using deep compression to minimize storage space and improve database performance in the article "Optimize storage with deep compression in DB2 10" (developerWorks, May 2012). • Get information about the TCP-H standard benchmark from the TPC-H page at the Transaction Processing Performance Council (TPC) website. • Learn about the Transaction Processing Performance Council (TPC), a non-profit corporation founded to define transaction processing and database benchmarks. • Access the TCP Benchmark H (Decision Support) Standard Specification. • Access the TCP Energy Specification. • Follow developerWorks on Twitter. • Watch developerWorks on-demand demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers. • Visit the developerWorks resource page for DB2 for Linux, UNIX, and Windows to read articles and tutorials and connect to other resources to expand your DB2 skills. • Expand your knowledge by reading more articles in IBM Data magazine. Get products and technologies • Download a trial version of DB2 for Linux, UNIX, and Windows. Discuss • Participate in the discussion forum for this content. • Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.
Save energy with the DB2 10.1 for Linux, UNIX, and Windows data compression feature
Page 5 of 7
developerWorks®
ibm.com/developerWorks/
About the authors Andriy Miranskyy Dr. Andriy Miranskyy is a software engineer in the IBM Information Management division at the IBM Toronto Software Laboratory. His work and research interests are in the area of mitigating risk in software engineering, focusing on software quality assurance, program comprehension, software requirements, software architecture, project risk management, and Green IT. Andriy received his Ph.D. in Applied Mathematics at the University of Western Ontario. He has 15 years of software engineering experience in information management and pharmaceutical industries. Andriy has numerous publications and patents in the field of software engineering.
Sedef Akinli Kocak Sedef Akinli Kocak is a PhD student in the Environmental Applied Science and Management Program at Ryerson University. Her research interests include information technologies and environmental sustainability, sustainable software, green IT, and green software development. Akinli Kocak has an MSc from the University of Maine and has an MBA from Ankara University. Contact her at
[email protected]
Enzo Cialini Enzo Cialini is a Senior Technical Staff Member in the IBM Information Management Software division at the IBM Toronto Laboratory. He is currently the Chief Quality Assurance Architect responsible for management and technical test strategy for DB2 LUW Engine (Warehouse and OLTP) and Customer Operational Profiling. Enzo joined IBM in 1991 and has been working on the DB2 development team since 1992. He has more than 20 years experience in software development, testing and support. Enzo is actively engaged with customers and the field in OLTP and Warehouse deployments, has authored The High Availability Guide for DB2 and has numerous patents and publications on DB2 and Integration.
Save energy with the DB2 10.1 for Linux, UNIX, and Windows data compression feature
Page 6 of 7
ibm.com/developerWorks/
developerWorks®
Dr. Ayse Basar Bener Dr. Ayse Basar Bener is a professor in the Department of Mechanical and Industrial Engineering at Ryerson University. Her research interests are in building intelligent models for sustainable decision making under uncertainty. She has published more than 100 articles in journals and conferences. Dr. Bener holds a PhD in Information Systems from London School of Economics. She is a member of AAAI, IEEE, and AIS. © Copyright IBM Corporation 2013 (www.ibm.com/legal/copytrade.shtml) Trademarks (www.ibm.com/developerworks/ibm/trademarks/)
Save energy with the DB2 10.1 for Linux, UNIX, and Windows data compression feature
Page 7 of 7