Running Highly Available, High Performance Databases ... - Symantec

8 downloads 100 Views 1MB Size Report
Running Highly Available, High. Performance Databases in a SAN-Free. Environment. Who should read this paper. Architects
TECHNICAL BRIEF:

........................................

Running Highly Available, High Performance Databases in a SAN-Free Environment Who should read this paper Architects, application owners and database owners who are designing mission-critical systems for performance without compromising on flexibility or data availability

Running Highly Available, High Performance Databases in a SAN-Free Environment Content Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Symantec Cluster File System High Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Intel® SSD Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Traditional Tier-1 Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 High Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 High Availability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Alternate Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Key Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Running Highly Available, High Performance Databases in a SAN-Free Environment Synopsis 1

This paper demonstrates how SymantecTM Cluster File System with Intel® Solid-State Drive DC S3700 Series, provides: • 4X performance @ ~80% reduction in the cost of SAN • 90%+ Oracle® Log Writer transactions at under 1 ms • Full availability and redundancy of internal solid-state drives

Introduction Traditional storage media in the high-capacity, high-performance data center has reached an inflection point. While the amount of shipped capacity continues to grow like clockwork, others claim that SAN (Storage Area Network) is dead. So what's the real truth? As is usually the case, the truth lies somewhere in between. SAN Arrays as we know them are being turned into a capacity tier by the increased adoption and performance advantage of solid state storage. Looking at the key components of a Storage Area Network (Server, Network, and Storage), there are ever increasing improvements when it comes to performance and throughput. Servers are seeing more and faster processors, bigger and better RAM, and faster internals. Networks are increasing in speeds and capacities with 40gbE and InfiniBand, while technologies like RDMA (Remote Direct Memory Access) reduce compute requirements on those bigger and better servers. The only laggard is the IO latency of the SAN with dependence on the spinning hard drive. Latencies and seek times are stagnant and the techniques used to squeeze additional IOPs out of "tier 1" arrays are producing fewer results, opening the door for more direct attached flash storage to meet those performance needs. Regardless of where you put flash to meet your service and performance level agreements, software to manage and optimize this new storage is essential. This paper will outline how SymantecTM Storage Foundation, combined with Intel® SSDs, expands the capabilities of in-server flash to provide higher performance and higher availability without the need of traditional SAN storage.

Symantec Cluster File System High Availability SymantecTM Cluster File System High Availability (CFSHA) provides a distributed file system across 2-64 nodes to allow concurrent reads and writes from shared pools of storage. CFS utilizes proprietary techniques to provide high performance while maintaining data integrity. When combined with high availability products from Symantec, CFSHA also enables fast-failover of mission critical applications. Typical use-cases of CFS include: Oracle® RAC (Real Application Clusters), Clustered NFS (Network File System), Business Intelligence, and custom financial applications that require a high number of concurrent reads and writes. With the 6.1 release, SymantecTM Cluster File System (CFS) introduces the Flexible Storage Sharing (FSS) feature to enable customers to take advantage of the performance benefits of server side SSDs, while removing the barriers of server silos of direct-attached storage. FSS allows CFS customers to "share" internal drives across multiple nodes in a cluster to provide flexibility, enable availability, and reduce the CAPEX and OPEX of traditional shared storage clustering.

Intel® SSD Overview The Intel® SSD DC S3700 Series delivers high performance and consistent random read and write performance for demanding data center workloads. The DC S3700 Series can enable quick and consistent application response times. All this performance is delivered with lower active power consumption. 1-

1

Refer to appendix for full details. Results from Symantec testing with the following configuration - Primary Server: Intel® Xeon® processor E7- 4870, 2.40GHz, 4 CPUs, 10 Cores, 512GB Memory, Intel® SSD DC S3700 800GB & 200GB, Mellanox® CX353-A, Red Hat® Enterprise Linux 6.3, Oracle® 11gR2 Enterprise, SymantecTM Cluster File System HA 6.1

Running Highly Available, High Performance Databases in a SAN-Free Environment The DC S3700 Series can deliver end-to-end data protection, which means your data is protected from the time it enters the drive to the time it leaves. The DC S3700 Series uses advanced error correction that enables data integrity by protecting against possible data corruption in NAND, SRAM and DRAM memory of the drive. The DC S3700 Series also helps protect the data in transit through several techniques including parity checks, cyclic redundancy checks (CRC) and LBA tag validation and XOR data protection which is a mechanism of parity striping to the NAND devices. To improve data protection, the DC S3700 Series uses an array of surplus NAND that caches data to help minimize potential data loss while keeping user operations at their most consistent level. Finally, for write heavy applications, the DC S3700 Series provides industry leading endurance with the ability to support 10 drive writes per 2

day . Ten drive writes equals 14 Petabytes of data written to a single drive over 5 years.

Traditional Tier-1 Applications Outlined in Figure 1 is a traditional Tier-1 application set-up in today's enterprise data center. At the top is a multi-host configuration, typically between 2 and 8 nodes. To enable application availability, various clustering technologies are available to enable the applications, data and stateful information move from server to server during a disaster. Those nodes will be interconnected as well as connected to shared storage through a series of switches, likely 4-8G Fibre Channel. The base of the application is the "Tier 1" array (EMC® VMAX/IBM® DS8000 etc.). This array provides storage to the application while also providing data redundancy and availability through RAID and replication.

Figure 1: OLTP on Tier 1 SAN

This Tier-1 application, whether it's developed internally or purchased from a third party software vendor, typically requires two things: • High Performance • High Availability The same application, whether a CRM, Business Intelligence, or manufacturing typically rely on back-end databases that provide a mix of randomized reads and writes. This workload type is commonly referred to as Online Transaction Processing (OLTP). Maximizing the

2-

Ten drive writes per day tested using the JEDEC JESD219 workload - http://www.intel.com/content/www/us/en/solid-state-drives/ssd-dc-s3700-spec.html

2

Running Highly Available, High Performance Databases in a SAN-Free Environment performance of an OLTP back-end can mean more productivity for your business and, potentially, greater revenue. However, without High Availability being met, performance numbers go out the window. If my sales order application is unavailable and I've missed opportunities due to unplanned downtime, performance of the back-end storage is meaningless. Symantec Cluster File System and Intel can provide the best of both through a flexible, fast, and redundant architecture that can drive not only OLTP workloads, but any mission critical applications that drive your business forward.

High Performance For purposes of this document, we are using a single array (VMAX 20k) for the traditional SAN set-up. Additional techniques exist to squeeze more IOPs out of traditional HDD arrays. Wide-striping and over allocation have been shown to increase performance for IOP sensitive workloads. Even with advanced techniques however, IOPS for these arrays will be in the tens of thousands, with a much higher cost in CAPEX and OPEX due to the increased complexity. More arrays also add incremental costs for floor space, power and cooling. The TCO equation becomes clear when running a TPC-like benchmark. Our “Tier-1” SAN configuration, utilizing fibre channel and HDD, generated a maximum of 81,800 transactions per minute at a prorated cost of $14.71/TPM. Looking at pure performance, the Intel® SSD DC S3700 Series drive can provide much higher IOPS and lower latency than possible with spinning disk. For mission critical applications, however, there are barriers to fully utilizing this form of flash, including: • Physical Barriers – Any server or direct attached storage, either spinning disk or SSD, located on a single node. Any failure in that server removes access to that storage. • Data Availability – Array based tools commonly used for data availability don’t apply to non-shared storage. Flexible Storage Sharing within CFS, as seen in Figure 2, allows customers to remove these barriers and take advantage of the high performance, in-server Intel® SSDs. By allowing “shared nothing” clustering, CFS enables redundancy through mirroring, higher performance striped volumes, and replication capabilities in local or global configurations, all without traditional SAN storage.

Figure 2: Cluster File System + Intel SSDs

3

Running Highly Available, High Performance Databases in a SAN-Free Environment Utilizing performance improvements in the network through a high speed, RDMA enabled, InfiniBand interconnect, the resulting configuration provides 337,000 transactions per minute at a cost of $3.24 per transaction. This architecture delivers a 4x performance gain 3

at 22% of the cost of our SAN . A faster scale-out storage architecture at lower cost? Don’t be concerned, CFS 6.1 provides data redundancy as well as application availability without the need for traditional SAN. With CFS you can mirror REDO log files for redundancy, mirror and stripe data files for redundancy and performance, while utilizing techniques within CFS to avoid “split-brain” or race conditions in a clustered set-up.

High Availability The performance seen above is a dramatic improvement over a typical set-up that utilizes commodity equipment to provide a lower cost. With the continued advances in Solid-State Drive technology, in both capacities and performance, the gap between SSD and spinning disk is expected to continue to widen. For the enterprise to take full advantage of these capabilities, however, software is required to unlock the hardware to provided shared access and data redundancy. Direct Attached Storage (DAS) needs exceptional high availability software to show its potential. The Flexible Storage Sharing technology within CFS allows server administrators to export disk (local, DAS, or SAN) to the members of that cluster. Once exported, any node can read and write from that exported drive just as if it was local or SAN storage. This technique, known as I/O Shipping, makes DAS only configurations capable of running the critical applications in the data center. Once a node has "seen" an exported disk, Volume Manager (VxVM) can utilize them in any number of layouts, depending on a customer's SLA and best-practices, including: mirrored, striped, or concatenated. For this exercise, we are looking to provide a fully redundant system, so both the redo log and data volumes are set up as synchronous mirrors. Mirroring the data across our nodes and across Intel® SSDs helps ensure that in the case of hardware failure, either a drive or server, the application data is consistent and available. Should the server be rebooted or suffer a power failure, any pieces of that mirror, or plex, that reside on the downed server will break off and become in-active. Once that server is powered on and part of the cluster again, those plexes will resync through VxVM technologies like Fast Mirror Resync. By implementing CFS, VxVM, and FSS in your mission critical servers, the two key limitations to internal solid-state drives have been removed. Server administrators can provide higher performance and high availability at a fraction of the cost of traditional SANs, with control and flexibility to work around their application and business needs.

Alternate Configurations That there are exceptions to every rule applies to the data center as much as to everyday life. In this case, some applications may require pure performance and have a higher tolerance of down-time. CFS provides flexibility to move from one to another strictly through software, no hardware re-architecture is required. To illustrate the pure performance characteristics of Intel® SSDs, we can take the same hardware as shown in Figure 2 above, but rather than mirroring the devices, we will stripe across them. Volume striping is a long utilized technique to achieve better performance out of HDDs. By "striping" the I/O across multiple devices, we reduce the seeking and spinning of those drives while utilizing more physical spindles. The same

3-

See Appendix for full cost breakdown

4

Running Highly Available, High Performance Databases in a SAN-Free Environment approach is beneficial in the solid state world, but instead of getting better performance through reduced physical needs of HDD, striping across SSD takes advantage of the dedicated controllers while reducing I/O contention in the write-process of SSD. SSDs are engineered for significant parallelism, CFS knows how to use that parallelism.

Figure 3: Configuration Flexibility

Figure 3 outlines the layout for the performance configuration. Through CFS, you are able to use the remote Intel® devices to create 4-way stripe volumes, using two drives from each node. These volumes are then placed under the redo logs and database files for our OLTP workload. We want to provide an architecture that is faster and cheaper than current options. If we need to bring a level of disaster recovery to our data, we can utilize Symantec Replicator Option or Snapshot techniques to move data to our Disaster Recovery sites. For pure performance, however, moving from mirrored to striped volumes increases our benchmark up to 577,000 transactions per minute. 4

This is a 237,907 increase (71%) than our highly available configuration and 495,177 (700%) more than our SAN configuration! As a Database Administrator or Server Administrator, how much time and effort does it require to get a significant performance increase for an application? Symantec enables Direct-Attached Storage to act like SAN to give ultimate flexibility while maintaining data and application availability.

4-

5

Refer to appendix for full cost breakdown

Running Highly Available, High Performance Databases in a SAN-Free Environment Key Performance Metrics Mirror vs. Local To maintain data redundancy, CFS is mirroring both the Redo and the Data volumes across the “local” storage. Whenever a data mirror is involved, a typical question that arises is the performance “penalty” to handle simultaneous writes across multiple disks. When mirroring data, the write is only as fast as the slowest disk. The flexibility of CFS allows us to quickly determine the performance differences in both transactions and I/O when looking at local and mirrored volumes.

Figure 4: DB Transactions

Figure 4 outlines the difference in transactions per minute when mirroring a local device over RDMA. The OLTP workload used in this case study was 70% read and 30% writes, so it is critical to look at how the mirrored volume handles writes when compared to stand-alone. When using FSS, Symantec Cluster File System will automatically mirror local and remote devices to enable data redundancy. Figure 5 shows how the Oracle data volume processed writes across SAN, local, and mirrored SSD.

Figure 5: Data Volume Performance

6

Running Highly Available, High Performance Databases in a SAN-Free Environment Looking at the output, two things stand out: • Both internal configurations handle significantly more write volume than SAN. • In our two-way mirror we see a minor degradation, which is also to be expected as we are now writing the same data, twice.

Log File I/O

Figure 6: Log Performance

Log files are a critical component to any OLTP workload to enable consistency of the data and allow the system to be recovered in the case of a major outage. Log file writing and switching, however, are a key bottleneck in how many transactions can ultimately be processed by that database. Figure 6 shows how our mirrored SSD configuration for the redo log directory compares to the SAN configuration. Mirroring local solid-state drives show more than twice the performance of SAN devices, with much more consistency. The improved performance of synchronizing REDO logs may also lead to smaller file sizes, which in turn can reduce recovery times. The improved write performance can also been seen when investigating the wait times within the Automatic Workload Repository (AWR) output for our oracle instance.

7

Running Highly Available, High Performance Databases in a SAN-Free Environment

Figure 7: Log Writer I/O Histogram

When looking at the latencies for the main log and DB writer processes, we can see drastic improvements in the operations complete times by using the application histogram within the AWR output. Here we can compare the log writing I/O for both local SSD and our SAN configuration and notice that close to 90% of our log writer I/O completes in less than 1 millisecond, compared to 80% on SAN. The latency differences are even more pronounced when looking at the data I/O histogram. With our internal SSDs we are able to complete 63% under 1ms, and 90% in under 2ms. Our SAN environment is only completing 26% in under 2ms.

Figure 8: DB Writer I/O Histogram

OLTP performance and the number of transactions any system is able to produce is a direct reflection of throughput and latency. In all respects and measurements, Symantec Cluster File System with Intel® SSD DC S3700 Series provides drastic increases in both read and writes for multiple components of a database workload. These improvements are the keys to driving transactions per minute to four times 5

the rate of a SAN configuration .

5-

Results from Symantec testing with the configuration shown in Appendix Table 2

8

Running Highly Available, High Performance Databases in a SAN-Free Environment Summary In-server solid-state drive technologies are enabling a dramatic shift in the way we look at the performance characteristics of applications and what we are able to pull out of existing SAN technologies and techniques. Couple this with increasing capabilities of the servers themselves, and a new set of opportunities is opening up to enterprise architects to utilize this performance. To date, the opposing requirements of direct attached physical storage and data availability have been barriers to unlocking the true potential of in-server SSDs. SAN storage is not dead, but it is not flourishing either, and generally does not let Flash and SSD's reach their full potential, SSD's should be as close to the processor as possible. SAN storage is now merely transitioning into a capacity tier for long-term storage, regulatory compliance, and a repository for "Big Data" analytics. Data centers looking to take full advantage of the performance metrics of SSD and flash to drive productivity and revenue, while reducing costs need to look at new architectures to achieve this. SymantecTM Cluster File System and Intel® SSDs allow the enterprise to deliver an alternative approach to SAN environments that provides exponentially better performance at a fraction of the cost, without sacrificing the high availability and disaster recovery capabilities required in a critical application and data center. For more information on this and other Symantec and Intel solutions for the data center, please see: http://go.symantec.com/ storagefoundation

9

Running Highly Available, High Performance Databases in a SAN-Free Environment Appendix Cost Analysis 6

Table 1 - Overall Co Cosst Analysis

Table 2 – Hardware Components Co Cosst Break Breakdown down

Table 3 - EMC VMA VMAX X Component Break Breakdown down

6-

Pro-rated cost based on overall capacity used for this exercise

10

Running Highly Available, High Performance Databases in a SAN-Free Environment Table 4 - Oracle Components Co Cosst Break Breakdown down

Table 5 - S Symantec ymantec Component Break Breakdown down

11

Running Highly Available, High Performance Databases in a SAN-Free Environment

About Symantec Symantec protects the world’s information, and is a global leader in security, backup, and availability solutions. Our innovative products and services protect people and information in any environment – from the smallest mobile device, to the enterprise data center, to cloud-based systems. Our worldrenowned expertise in protecting data, identities, and interactions gives our customers confidence in a connected world. More information is available at www.symantec.com or by connecting with Symantec at go.symantec.com/socialmedia.

For specific country offices

Symantec World Headquarters

and contact numbers, please

350 Ellis St.

visit our website.

Mountain View, CA 94043 USA +1 (650) 527 8000 1 (800) 721 3934 www.symantec.com

Copyright © 2014 Symantec Corporation. All rights reserved. Symantec, the Symantec Logo, and the Checkmark Logo are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners. 1/2014 21327636