SAS® 9.3® BI Case Study: Performance Scalability and Tuning on Servers with Intel® Xeon® Processor E7 Family Ying-ping (Marie) Zhang, Intel Corporation, Chandler, Arizona ABSTRACT As dataset sizes and analytic complexity and integration grow exponentially in the business analytics field, hardware solutions – multi-core servers, solid state disks, increased memory capacity - evolve to face the challenge ®
®
In this paper, we present a case study on performance scalability and tuning techniques for SAS 9.3 analytics applications to address questions SAS BI customers face, namely:
Will performance scale if processors / cores are added to servers in an environment with increased dataset sizes and more applications? What factors inhibit performance, and what can be done about it? ®
®
Our results and analysis – obtained working with IBM x3850* 4- and 8-socket platforms with Intel Xeon E7 family processors - provide first-hand tangible insights for SAS customers as well as others running similar analytics workloads. In summary, 1) 2) 3) 4)
®
®
®
SAS 9.3 foundational application performance scales well with added processor count on Intel E7 platforms provided the servers are well balanced and configured; Proper storage and operating system settings can significantly improve I/O and unlock scalability; storage tuning work resulted in 3.0x gains; OS configurations yielded 1.5x gains. Adding memory can benefit analytics performance by further reducing I/O. In our study, we saw performance improve 2.1x by increasing memory from 128 GB to 1 TB Customers can simply and accurately measure memory scalability in SAS applications using the method we described below.
INTRODUCTION As dataset sizes and analytics complexity and integration are experiencing exponential growth in the business analytics field, organizations must adjust or scale their hardware and applications to accommodate this increased ® ® demand. The challenge they face is how to choose the right solution. The goal of this SAS 9.3 BI performance scalability study is to find out if the performance scales up with increased dataset sizes, and the number of BI applications and processors. Additionally, it seeks to understand and limit performance bottlenecks for BI applications. This paper showcases improvements gained when the processor count is increased on a given platform along with techniques used to mitigate limiting factors from terabyte high data transfer and I/O needs. Specific results include: 1) 2) 3)
®
®
Comparison of performance scalability results between IBM x3850* 4- and 8 -socket Intel Xeon E7 family platforms Performance tuning tips for storage and Red Hat* Enterprise Linux* 6.1 OS settings Introduction of a simple method to validate application memory scalability
Components used in our work include: Platform
IBM x3850* platforms with multiple processor sockets of Intel Xeon processor E7 family
Sub-storage system
LSI 620J enclosures with Intel solid state disks and LSI SAS I/O controllers
Applications
SAS 9.3
Benchmark
SAS Mixed Business Analytics Workload 4.0 Table 1-1. Components used in the case study
®
®
®
1
< SAS® 9.3® BI Case Study: Performance Scalability and Tuning on Servers with Intel® Xeon® Processor E7 Family >, continued
Figure 1-1. Layout of IBM x3850* 4-Socket Platform The layout of a single IBM x3850* node (4-socket) is shown in Figure 1-1. The tested IBM x3850* platforms were ® ® populated with 4 or 8 Intel Xeon processor E7 family. The 8-socket platform is a 2-node EXA scaling with MAX5 Gen2, which can support up to 3 TB of memory. ®
®
®
The SAS system is composed of a large family of products and solutions. SAS 9.3 is a set of statistical analysis software and tools used for business intelligence – e.g. data / text mining, data quality, forecasting, modeling and predictive analytics. SAS mixed business analytics workload (MBAW) is a workload developed by the SAS team with mixed CPU and I/O intensive SAS applications. It is targeted to simulate about 60% of SAS users, in which various-sized mixed analytics workloads were generated to simulate the users utilizing CPU, RAM and I/O resources which SAS programs heavily utilize during typical program execution [1]. I/O is the most critical aspect of this workload. The tested SAS data are stored in file system and most of them are sequential writes or reads. In the storage system, data are classified into three types: Input, output and saswork.
EXPERIMENTAL FRAMEWORK Performance metrics of SAS MBAW are the execution time (from the first start to the last end) and real time (sum of all SAS MBAW jobs run times). Table 2-1 shows the configuration of the 4- and 8-socket systems we used to run our performance experiments.
1. SYSTEM CONFIGURATION System Configuration OS Application Workload Processor Total Memory HBA Storage
IBM X3850* 4S Platform Red Hat* Enterprise Linux* 6.1 ®
IBM X3850* 8S Platform Red Hat* Enterprise Linux* 6.1
®
SAS 9.3 SAS Business Mixed Analytic 4.0 ® ® 4 x Intel Xeon E7-4870 processors
SAS 9.3 SAS Business Mixed Analytic 4.0 ® ® 8 x Intel Xeon E7-4870 processors
1,024 GB
2,048 GB
4 x LSI SAS IO Controllers
8 x LSI SAS IO Controllers
4 x LSI 620J enclosures
8 x LSI 620J enclosures
2
®
®
< SAS® 9.3® BI Case Study: Performance Scalability and Tuning on Servers with Intel® Xeon® Processor E7 Family >, continued
Disk
60 x Intel® 120GB SSDs 120 x Intel® 120GB SSDs Table 2-1. System Configuration
2. EXPERIMENT DESIGN The experiments were designed for the performance scalability of 4-socket and 8-socket platforms.
4-Socket Platform: 80 sessions of MBAW (304 SAS jobs) running on 4 LSI SSD enclosures and 1 TB memory 8-Socket Platform: 160 sessions of MBAW (608 SAS jobs) running on 8 LSI SSD enclosures and 2 TB memory
Because the 8-socket platform has twice the number of cores and can handle twice the number of SAS jobs than the 4-socket platform, the expected no I/O bottleneck performance scalability is 2.0X.
EXPERIMENTAL RESULTS 1. PERFORMANCE SCALABILITY OF SAS MBAW BETWEEN 4- AND 8-SOCKET SERVERS
Figure 3-1. Performance Scalability at IBM x3850* 4- to 8-socket platforms The results in Figure 3-1 demonstrate that the performance scalability of SAS MBAW from 4- to 8-socket platforms is 2.0X which is aligned with the number of cores.
2. CPU UTILIZATION AND WORKLOAD CHARACTERIZATION The CPU utilization graph in Figure 3-2 indicates that SAS MBAW is not a steady state workload, but instead has higher CPU utilization up front and lower CPU utilization in the middle and the end.
%CPU Characterization of SAS MBAW at 4- and 8-socket Intel Xeon E7 family Platforms 304 SAS Jobs
608 SAS Jobs
80 60 40 20 0 1 12 23 34 45 56 67 78 89 100 111 122 133 144 155 166 177 188 199 210 221 232
CPU Utilization (%)
100
Run Time (x10 Sec) ®
Sorce: Intel
Figure 3-2. CPU Utilization of SAS MBAW at 4- and 8-socket Platforms
3
< SAS® 9.3® BI Case Study: Performance Scalability and Tuning on Servers with Intel® Xeon® Processor E7 Family >, continued
3. I/O THROUGHPUT The I/O throughput curves in Figure 3-3 confirm that SAS MBAW is an I/O intensive workload. The average I/O throughput in the first 5-7 minutes is up to 5.5 GBPS on the 8-socket platform.
6000 4000 2000 0 1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 253
I/O Throughput (MBPS)
I/O Throughput Characterization of SAS MBAW at IBM x3850* 4- and 8-socket Intel Xeon E7 Family platforms 304 SAS Jobs 8000
Run Time (x10 Sec) Figure 3-3. I/O Throughput of SAS MBAW at IBM x3850* 4- and 8-socket platforms
TIPS OF PERFORMANCE TUNING 1. TIPS OF STORAGE SYSTEM TUNING I/O is the most critical aspect for SAS BI applications. The minimum I/O throughput requirement of SAS MBAW is 50 MB/s per core. The recommendation I/O throughput from the SAS EEC team is 100MB per core to keep the cores from waiting on the I/O (e.g. 4 GB/s I/O throughput for a 40 core platform). 1) Baseline Configuration of the Storage System The storage configuration No. 1 shown in Figure 4-1 is used as the baseline configuration:
Create a single hardware RAID R0 partition in each SSD enclosure Group two RAID 0 partitions into one large logical volume (LVM) with 64K stripe size Create two XFS file systems: one for input and output data and another for SAS work data.
Figure 4-1. Storage Configuration No. 1 on the IBM 4-socket Platform With this configuration, the test results in Table 4-1 show that while the number of SAS jobs increased 6 times, performance dropped significantly because of I/O bottleneck, and the real time increased 13.2 times.
4-Socket Platform with 4 SSD SAS Enclosures
Difference
Number of SAS Jobs
Real Time (Sec)
User Time (Sec)
System Time (Sec)
Perf. Over Real Time (Jobs/Hour)
38
8,729
8,289
673
15.7
228
115,080
60,500
27359
7.1
6.0 13.2 7.3 40.7 Table 4-1. Performance Scalability from 10 to 60 sessions
0.5
4
< SAS® 9.3® BI Case Study: Performance Scalability and Tuning on Servers with Intel® Xeon® Processor E7 Family >, continued
2)
Storage Tuning Step 1 – Dual RAID partitions
Figure 4-2. Storage Configuration No. 2 on the IBM 4-socket Platform The first storage tuning step is to increase RAID R0 partitions on each enclosure for parallel R/W accessibility, named as configuration No. 2 shown in Figure 4-2:
Divide the number of disks in half and create two hardware RAID 0 partitions at each enclosure Group R?0 into LVM0 and R?1 into LVM1 with 64K stripe size Create a XFS file system on LVM0 for input and output data Create a XFS file system on LVM1 for SAS work data.
3)
Storage Tuning Step 2 – Single LVM
Figure 4-3. Storage Configuration No. 3 on the IBM 4-socket Platform The second storage tuning step is to increase the number of disks accessed by I/O operation to increase bandwidth and reduce latency, named as configuration No. 3 shown in Figure 4-3:
Divide the number of disks in half and create two hardware RAID 0 partitions for each enclosure Group all of the RAID 0 partitions into one LVM with 64K stripe size Create two XFS file systems on the LVM for input and output as well as for SAS work data
Figure 4-4. I/O Throughput Comparison between Three Different Storage Configurations The total performance gain of average I/O throughput from storage tuning is up to 3.0x.
5
< SAS® 9.3® BI Case Study: Performance Scalability and Tuning on Servers with Intel® Xeon® Processor E7 Family >, continued
4) Storage Tuning Step 3 – Multiple RAID partitions The two RAID partitions tests raise a new question: if we continue to increase the number of partitions in each enclosure, what will happen?
Figure 4-5. Storage Configuration No. 4 and 5 on the IBM 8-socket Platform To get the answer, two new configurations shown in Figure 4-5 with 2 and 4 RAID0 partitions have been designed and performed at the IBM 8-socket platform. The result in Figure 4-6 indicates that total I/O throughput has improved 1.07x by increasing the number of partitions from 2 to 4 with the same number of disks.
Figure 4-6. I/O Throughput Comparison between Storage Configurations No. 4 to No. 5
THE TIPS OF OS TUNING ON RED HAT* ENTERPRISE LINUX* 6.1 Because I/O is a key factor for SAS foundational applications, our OS performance tuning was focused on I/O related options only. The default of the OS setting was used as the baseline of the performance tuning.
1. OS TUNING STEP 1 – USE TUNED TOOL IN THE OS PACKAGE The Tuned tool is a daemon that monitors the use of system components and dynamically tunes system settings based on that monitoring information for any given system. Tuned in RHEL6 has the ability to detect the activity dynamically of devices to adjust the system setting for power and performance perspectives. It will not only tune the kernel, but also: Make sure the devices use IO scheduler deadline, Set the dirty ratio to 40, Remount file systems with barriers disabled (if enterprise-storage profile is used), Make sure the CPU speed is running in performance mode, Here are the steps used for collecting data for the Tuned tool: Install tuned tool - with the command: “yum install tuned” Use command “tuned-adm profile enterprise-storage” to start Tuned with the enterprise-storage profile
2.
OS TUNING STEP 2 - INCREASE READ-AHEAD SUPPORT
The LUNs and logical volumes, used by SAS file systems, could be tuned by increased read-ahead support.
6
< SAS® 9.3® BI Case Study: Performance Scalability and Tuning on Servers with Intel® Xeon® Processor E7 Family >, continued
Increase the read ahead values to 32768 for all of LUNs and logical volumes with: # blockdev -–setra 32768 path-to-block-device
3. OS TUNING STEP 3 - INCREASE THE I/O REQUEST QUEUE SIZE FOR THE SCHEDULER The Linux I/O scheduler has functionality to sort incoming I/O requests in its request-queue for optimization. Increase the I/O request queue size for the scheduler to 1024 with: # echo 1024 > /sys/block/sda/queue/nr_requests
Figure 4-7 Performance Impact of different Red Hat* Linux* OS Tuning The results in Figure 4-7 demonstrate that the total performance gain of OS tuning under our test environment is up to 1.5x. However, the performance improvement of OS tuning has dependency on hardware configuration, such as capacity of memory and type of storage system.
LARGE MEMORY AND PERFORMANCE IBM 8S WSM-EX2.4, 1TB, 160 Sessions Executive Memory Capacity (GB)
Real Time (low is better)
User Time
Sys Time
Jobs/Hour (High is better)
128
584,809
224,185
198,041
3.7
256
411,183
166,743
69,428
5.3
512
350,904
164,281
52,099
6.2
1,024
281,176
157,102
41,769
7.8
Performance difference between 1,024 GB over 128 GB
2.1X
Table 5-1. Results of Memory Scalability at IBM 8-socket Platform Because SAS Analytics need to handle massive amounts of data (e.g. about 4.5TB I/O data for 160 sessions’ MBAW test), memory capacity plays important role on performance. The test results in Table 5-1 show that, as the memory capacity increases from 128 GB to 1,024 GB, the performance improved 2.1x.
TIPS FOR MEMORY SCALABILITY TESTS Following methods have been used for memory scalability tests in this study to scale up memory from 256GB to 1TB with 60 sessions of SAS MBAW at the IBM 4-socket platform: 1) 2) 3)
Physically plug in or remove memory Use 'mem=??GB ' kernel boot parameter to set up the desired memory size Reserve the number of huge pages: Set aside a portion of physical memory in order to addresse it using a larger page size.
7
< SAS® 9.3® BI Case Study: Performance Scalability and Tuning on Servers with Intel® Xeon® Processor E7 Family >, continued
Real Time
Effective System Memory
Kernel Boot Parameter (mem=)
Reserve Huge Pages
User Time
Remove Physical Memory
Kernel Boot Parameter (mem=)
System Time
Reserve Huge Pages
Remove Physical Memory
Kernel Boot Parameter (mem=)
Reserve Huge Pages
Remove Physical Memory
256 GB
233205
84878
85731
67203
60039
62846
63759
6936
8010
512 GB
127650
73518
75801
62024
60107
60245
23928
6580
6727
1024 GB
68175
68175
68175
59849
59849
59849
7303
7303
7303
Table 6-1. Performance Comparison of Memory Scalability with different Methods The test results in Table 6-1 show huge performance differences between “Use 'mem= ' kernel boot parameter” and “remove physical memory” methods because the former method brought memory unbalance issue into the test, which led to inaccurate results. The results between “reserve number of huge pages” and “remove physical memory” are very close because both ensured an even and balanced memory distribution across all memory DIMMs. Using “reserve number of huge pages” for the memory scalability tests on SAS MBAW, it can not only get accurate results but also can be controlled automatically by scripts without rebooting the system. Command line “echo n > /proc/sys/vm/nr_hugepages” [n = (total physical memory – effective system memory)*1000/2] is used to reserve the huge pages, which can keep the amount of memory capacity from use while huge page is not enabled and transparent huge page is disabled.
CONCLUSION ®
®
®
®
This paper presented the results of running SAS 9.3 foundational applications on 4-and 8-socket Intel Xeon processor E7 family and the following optimizations: “Performance and memory scalability”, “Storage system tuning tips” and “Red Hat* Enterprise Linux* 6.1 OS tuning tips” ®
®
It demonstrates that: 1) SAS 9.3 foundational application performance scales well with added processor count on ® ® Intel Xeon processor E7 family provided servers are well balanced and configured; 2) Proper storage and operating system settings can significantly improve I/O and unlock scalability; storage tuning work resulted in 3.0x gains; OS configurations yielded 1.5x gains. 3) Adding memory can benefit analytics performance by further reducing I/O. In our study, we saw performance improve 2.1x from an increase of 128 GB to 1 TB; 4) Customers can simply and accurately measure memory scalability in SAS applications using “reserved huge page” method
ACKNOWLEDGMENTS I acknowledge the support from SAS, in particular Tom Keefer, for his instruction and guidance on the SAS MBAW. I appreciate the help of Red Hat, in particular Barry J. Marson, who provided suggestion and guidance on storage and OS tuning. I thank IBM, in particular John R. Encizo, for supporting us on the platform configuration. I recognize the team members at Intel, in particular Scott M. Seelig and Roger Herrick Jr for technical review of this paper. I also want to thank Debra King, Mark Matusiefsky and Carl Strickland from Intel for their support.
REFERENCE “Filesystem Performance Characterization for Red Hat Enterprise Linux + KVM”, Barry Marson, Principal Performance Engineer, Red Hat Inc.; D. John Shakshober (Shak), Director Red Hat Performance Engineering, Red Hat Inc., http://www.redhat.com/summit/2011/presentations/summit/in_the_weeds/wednesday/shak_barry_w_0530_fi leperf_summit2011.pdf [2] “SAS on Red Hat Enterprise Linux 6 Tuning Guidelines” , http://support.sas.com/resources/papers/proceedings11/72480_RHEL6_Tuning_Tips.pdf [3] “Red Hat Enterprise Linux Performance Tuning Guide”, http://www.eslim.co.kr/pds/pds/2/5/RHEL_Tuning_Guide.pdf [4] “ Tuned and ktune” http://www.linuxtopia.org/online_books/rhel6/rhel_6_power_management/rhel_6_power_management_Tune d.html [1]
8
< SAS® 9.3® BI Case Study: Performance Scalability and Tuning on Servers with Intel® Xeon® Processor E7 Family >, continued
CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Ying-ping (Marie) Zhang Intel Cooperation MS CH7-401 5000 W. Chandler Blvd. Chandler, AZ 85226 480-554-9145 (O) Email:
[email protected] SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
NOTICES INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Any software source code reprinted in this document is furnished under a software license and may only be used or copied in accordance with the terms of that license. Intel, Xeon, and the Intel logo are trademarks of Intel Corporation in the US and/or other countries. Copyright © 2012 Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
9
< SAS® 9.3® BI Case Study: Performance Scalability and Tuning on Servers with Intel® Xeon® Processor E7 Family >, continued
Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2®, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804
10