Configuring Data Protection Advisor for Microsoft SQL Server monitoring . ..... RecoverPoint uses lightweight splitting
White Paper
CONTINUOUS DATA PROTECTION FOR MICROSOFT SQL SERVER 2008 R2 ENABLED BY EMC RECOVERPOINT, EMC REPLICATION MANAGER, AND VMWARE A Detailed Review
EMC SOLUTIONS GROUP Abstract This white paper describes the testing and validation of the multisite disaster recovery capabilities of VMware® vCenter™ Site Recovery Manager (vCenter SRM) with EMC® RecoverPoint and EMC Replication Manager using continuous data protection (CDP) and continuous remote replication (CRR) in a Microsoft SQL Server 2008 R2 environment. August 2011
Copyright © 2011 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. VMware, VMware ESX, VMware vCenter, VMware vCenter Site Recovery Manager, and VMware vSphere are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions. All other trademarks used herein are the property of their respective owners. Part Number: H8221.1
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
2
Table of contents Executive summary ............................................................................................................... 6 Business case .................................................................................................................................. 6 Solution overview ............................................................................................................................ 6 Key results ....................................................................................................................................... 7
Introduction .......................................................................................................................... 8 Overview .......................................................................................................................................... 8 Purpose ........................................................................................................................................... 8 Scope .............................................................................................................................................. 8 Audience.......................................................................................................................................... 9 Terminology ..................................................................................................................................... 9
Technology overview ........................................................................................................... 10 Introduction ................................................................................................................................... 10 EMC CLARiiON CX4-960 .................................................................................................................. 10 EMC RecoverPoint .......................................................................................................................... 10 RecoverPoint splitter ...................................................................................................................... 11 EMC Replication Manager .............................................................................................................. 11 EMC Data Protection Advisor Replication Analysis .......................................................................... 11 VMware vCenter Site Recovery Manager ......................................................................................... 11 VMware vSphere ............................................................................................................................ 12
Configuration ...................................................................................................................... 13 Overview ........................................................................................................................................ 13 Physical environment ..................................................................................................................... 13 Hardware resources ....................................................................................................................... 14 Software resources ........................................................................................................................ 14 Environment profile........................................................................................................................ 15
Storage design .................................................................................................................... 16 Overview ........................................................................................................................................ 16 Planning for recovery ..................................................................................................................... 16 Storage layout ................................................................................................................................ 16 Storage configuration..................................................................................................................... 17 CLARiiON storage allocation ........................................................................................................... 18 Storage pools................................................................................................................................. 19 RAID groups ................................................................................................................................... 19 OLTP/DW – I/O differences ............................................................................................................ 19 Database data and log storage devices ......................................................................................... 19 Protection storage .......................................................................................................................... 20
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
3
Consistency group configuration .................................................................................................... 20
VMware design ................................................................................................................... 21 Overview ........................................................................................................................................ 21 Virtual machine allocations ........................................................................................................... 21 Data integrity in VMware ESX ......................................................................................................... 22 Support .......................................................................................................................................... 23
Application design .............................................................................................................. 24 Overview ........................................................................................................................................ 24 Microsoft SQL Server layout ........................................................................................................... 24
Protection design ................................................................................................................ 26 Overview ........................................................................................................................................ 26 RecoverPoint .................................................................................................................................. 26 Local replication process (CDP) ................................................................................................. 26 Remote replication process (CRR) .............................................................................................. 26 Adding CLARiiON splitters ......................................................................................................... 27 Consistency groups ................................................................................................................... 28 Groups Sets............................................................................................................................... 30 Consistency group policies ........................................................................................................ 32 Journal sizing - protection windows ........................................................................................... 32 Integrating with VMware vCenter ............................................................................................... 33 RecoverPoint failover ................................................................................................................. 34 Different subnet ........................................................................................................................ 35 Replication Manager ...................................................................................................................... 37 Integration with Microsoft SQL Server ........................................................................................ 37 Application sets and jobs for Microsoft SQL Server .................................................................... 37 Microsoft SQL Server application consistency ........................................................................... 37 Configuring Replication Manager to communicate with vCenter and RecoverPoint ................... 38 Logical Volume Manager resignaturing ...................................................................................... 39 Discovering the RecoverPoint appliance and CLARiiON array .................................................... 39 Recovering a Microsoft SQL Server user database ..................................................................... 40 Mounting a Microsoft SQL Server database ............................................................................... 43 Data Protection Advisor for Replication Analysis ............................................................................ 44 Data collection and discovery wizards ....................................................................................... 44 Data collection and CLARiiON arrays .......................................................................................... 44 Configuring Data Protection Advisor for Microsoft SQL Server monitoring .................................. 46 Display and report ..................................................................................................................... 48 View customization ................................................................................................................... 49 VMware vCenter SRM ..................................................................................................................... 51 Integrating vCenter SRM with RecoverPoint ............................................................................... 51
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
4
Configuring the consistency group for management by vCenter SRM .............................................. 51 vCenter SRM solution protection............................................................................................... 52 Configuring vCenter SRM protection groups ............................................................................... 52 Modifying the startup priority of a virtual machine ..................................................................... 53 Customizing recovery site IP addresses ..................................................................................... 54 Configuring vCenter SRM recovery plans .................................................................................... 55 Configuring vCenter SRM failover with RecoverPoint CLR ........................................................... 57
Testing and validation ......................................................................................................... 59 Test objectives .............................................................................................................................. 59 Notes ............................................................................................................................................. 59 Testing methodology...................................................................................................................... 59 Test scenarios ................................................................................................................................ 59 Test procedures ............................................................................................................................. 59
Test results ......................................................................................................................... 60 Overview ........................................................................................................................................ 60 Baseline......................................................................................................................................... 60 RecoverPoint compression ............................................................................................................. 60 TempDB databases considerations ........................................................................................... 61 Local replication (CDP) ................................................................................................................... 62 Remote replication (CRR)................................................................................................................ 62 Concurrent local and remote data protection (CLR) ......................................................................... 63 vCenter SRM .................................................................................................................................. 64 Comparison ................................................................................................................................... 64 Summary of test results ................................................................................................................. 64
Conclusion ......................................................................................................................... 65 Summary ....................................................................................................................................... 65 Findings ......................................................................................................................................... 65
References .......................................................................................................................... 66 White papers ................................................................................................................................. 66 Product documentation.................................................................................................................. 66 Other documentation ..................................................................................................................... 66
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
5
Executive summary Business case
Microsoft SQL Server often forms the foundation for today's most demanding, enterpriselevel, transaction-based companies with its rich feature set and ability to store data from structured, semi-structured, and unstructured documents. Data is one of the single most valuable resources a company can own. With the explosion in the both volume of data being retained and its speed of change, administrators are challenged with ensuring constant access and protection against corruption or loss, either through hardware, environmental, or man-made disasters. The introduction of virtualization and the move to the cloud also further complicates this challenge. Online Transaction Processing (OLTP) systems running on a SQL Server platform represent one of the most common data processing systems in today's businesses. The availability requirements of OLTP systems are very demanding. Downtime can represent failure to critical business processes, effectively halting business operations, leading to lost revenue, regulatory fines, and potentially lost customers. The loss of even some megabytes of data in recovery can amount to hundreds and thousands of transactions lost, affecting the business in terms of financial revenue, operational effectiveness, and legal obligation. With databases being so important to application data integrity, database administrators (DBAs) are concerned about SQL Server support in virtualized environments. SQL Server database administrators want to design and deploy a SQL-based OLTP infrastructure that:
Solution overview
•
Reduces the cost of storing vast amounts of data
•
Provides redundancy and high availability throughout the entire system
•
Reduces I/O and locking contention for better application performance
•
Ensures 24-hour, 7-day access to critical business data
•
Achieves enterprise-level performance for transactional latency and user concurrency (the key success criteria for OLTP database systems)
The solution described in this white paper provides a simple and cost-effective solution to meet these challenges, and provides you with the ability to implement local and remote replication without interruption to production environments, and allows the recovery of data to practically any point in time. This white paper documents the use of EMC® RecoverPoint, EMC Replication Manager, and VMware® vCenter™ Site Recovery Manager (vCenter SRM) as a combined protection package to provide business continuity and disaster recovery for an enterprise environment running Microsoft SQL Server 2008 R2. The solution tests and validates both local and remote protection scenarios for SQL Server running online traction processing and data warehousing workloads, and demonstrates the ability to support both low recovery time objectives (RTOs) and recovery point objectives (RPOs), and enabling a digital video recorder (DVR) -like ability to roll back from any disaster.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
6
SQL Server deployments in VMware environments are supported by Microsoft under the Server Virtualization Validation Program (SVVP). Key results
This solution validates the following scenarios: •
Discover and document design considerations when RecoverPoint, Replication Manager, VMware vSphere™ 4.1, vCenter SRM, EMC Data Protection Advisor Replication Analysis (DPA/RA), and Microsoft SQL Server 2008 R2 are integrated into a single solution.
•
Detail the manual failover process using RecoverPoint for remote failover to different subnets.
•
Successfully orchestrate a site failover using vCenter SRM and RecoverPoint replication technology.
•
Use RecoverPoint to provide crash consistency for local and remote sites to any point in time. In this solution, EMC recovered a database with an RPO of 1 second and an RTO of less than 4 minutes.
•
Successfully create application-consistent copies of SQL Server using RecoverPoint concurrent local and remote protection (CLR) and Replication Manager.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
7
Introduction Overview
The purpose of this white paper is to validate the integration of EMC and VMware technologies by providing industry-leading business continuity and disaster recovery for critical enterprise business databases, such as SQL Server databases. The white paper validates the integration of vCenter SRM in the solution in order to provide rapid site recovery over extended distances using EMC RecoverPoint’s replication technology. Replication Manager clearly illustrates its value in simplifying and automating application consistency and near-instant recovery of a virtualized Microsoft SQL Server 2008 R2 environment on an EMC mid-range storage array. In this use case, the EMC CLARiiON® CX4-960 storage array was used. This solution would also work with an EMC VNX5700™ or VNX7500™ array. In this solution, EMC characterizes and validates an enterprise SQL Server environment with multiple SQL Server virtual machines with varying workloads, running on top of VMware vSphere 4.1. EMC validated Replication Manager as a key provider for both local and remote automated SQL Server (recoverable) snapshots and crash-consistent (restartable) bookmarks. This white paper also details the procedures involved in both local and remote restart for SQL Server. For local high availability, EMC used a two-node VMware ESX® cluster in a virtual configuration using virtual machine file system (VMFS) volumes. For site resiliency, a mirror copy of the ESX cluster with two nodes was installed on the disaster recovery site. For IP configurations, both same-subnet and different-subnet topologies were characterized for site-to-site networking, showing that the solution is capable of adapting to such situations.
Purpose
Scope
The purposes of this white paper are: •
Validate EMC technology in providing rapid recovery and integration points into VMware vSphere 4.1 infrastructures.
•
Validate SQL Server manual and automated recovery (restartable and recoverable)
•
Validate SQL Server automated business continuity using RecoverPoint with vCenter SRM
The scope of this white paper includes: •
Validating both local and remote Replication Manager automated SQL Server snapshots and crash-consistent bookmarks
•
Measuring scalability and functionality of RecoverPoint’s replication technology
•
Describing the process for both local and remote recovery ( involving Replication Manager, RecoverPoint, and SQL Server):
Documenting manual virtual machine recovery process using RecoverPoint
Validating vCenter SRM integration with RecoverPoint
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
8
Documenting the process of automated virtual machine recovery using vCenter SRM integration with RecoverPoint
Demonstrating DPA/RA functionality and value by monitoring and reporting on the health of the environment
Audience
This white paper is intended for EMC employees, partners, and customers, including IT planners, storage architects, SQL Server database administrators, and EMC field personnel who are tasked with deploying such a solution in a customer environment. It is assumed that the reader is familiar with the various components of the solution.
Terminology
This white paper includes the terminology defined in Table 1. Table 1.
Terminology
Term
Definition
Array-based write splitter
RecoverPoint supports several types of write splitters: • Array-based • Intelligent fabric-based • Host-based For this solution, EMC used an array-based write splitter that runs inside the storage processors on the CLARiiON CX4 array. In this case, the splitter function is carried out by the storage processor; a KDriver is not installed on the host. The array-based write splitter is supported on VNX series and EMC Symmetrix VMAXe™ series arrays and with CLARiiON CX3 or CX4 series arrays running FLARE 26, 28, 29 and 30.
Image access
Image access refers to providing a target-side host application the opportunity to write data to the target-side replication volumes, while still keeping track of source changes. Image access can be physical (also known as logged), which provides access to the actual physical volumes, or virtual, with rapid access to a virtual image of the same volumes.
Repository volume
A repository volume is defined on the SAN-attached storage at each site for each RecoverPoint cluster. The repository volume serves all RecoverPoint Appliances (RPAs) of the particular cluster and the splitters associated with that cluster. It stores configuration information about the RPAs and consistency groups; this enables a properly functioning RPA to seamlessly assume the replication activities of a failing RPA from the same cluster. Additional copies of the repository volume are stored on the local hard disks of the first two RPAs. This means that if the SAN-attached repository volume is unavailable, there will not be any impact on the RecoverPoint system, and it will continue to replicate normally.
Virtual machine files system (VMFS)
VMFS is a clustered file system used to store virtual machine disk images, including snapshots. Multiple servers can read/write the same file system simultaneously, while individual virtual machine files are locked. VMFS volumes can be logically "grown" (non-destructively increased in size) by spanning multiple VMFS volumes together.
vCPU
Virtual CPU is a processor within a virtual machine.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
9
Technology overview Introduction
This section provides an overview of the primary technologies used in this solution: •
EMC CLARiiON CX4-960
•
EMC RecoverPoint
•
EMC Replication Manager
•
EMC Data Protection Advisor
•
VMware vCenter Site Recovery Manager
•
VMware vSphere
EMC CLARiiON CX4-960
EMC CLARiiON CX4-960 is a high-end, enterprise storage array comprised of a system bay that includes storage processor enclosures, storage processors, disk array enclosures, and separate storage bays that can scale up to 960 disk drives. CX4-960 arrays support multiple drive technologies, including Flash, Fibre Channel (FC), and SATA drives, and the full range of RAID types.
EMC RecoverPoint
EMC RecoverPoint provides a cost-effective solution to the protection of data at local and remote sites, and provides customers with a centralized management interface that allows the recovery of data to practically any point in time. EMC RecoverPoint/SE, used in this solution, is the entry-level offering that allows replication and continuous data protection for the VNX series, and the CLARiiON CX3 and CX4 arrays. •
RecoverPoint provides continuous data protection (CDP) through block-level local replication between LUNs in the same SAN using technology that journals every write-enabling recovery to any point in time.
•
RecoverPoint continuous remote replication (CRR) provides block-level remote replication betweens LUNs in two different SANs using technology that journals groups of writes for recovery to a significant point in time.
•
RecoverPoint concurrent local and remote replication (CLR) provides simultaneous block-level local and remote replication for LUNs with one copy residing locally on the same SAN with every write journaled, while a second copy residing on the remote SAN with significant groups of writes journaled. Recovery of the local or remote copy is possible without affecting the other copy.
RecoverPoint’s ability to map RTOs and RPOs to policies assigned to consistency groups allows a granular and flexible approach to assigning different levels of priority to the data being replicated. RecoverPoint provides a formidable solution to protect the environment.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
10
RecoverPoint protects and supports replication of data that applications are writing to local SAN-attached storage. RecoverPoint uses existing FC infrastructure to integrate seamlessly with existing host applications and SATA storage subsystems. For long distances, RecoverPoint uses either FC for metro area networks or IP for wide area networks (WAN) to send data. RecoverPoint splitter RecoverPoint uses lightweight splitting technology, on the application server, in the fabric, or in the array, to mirror application writes to the RecoverPoint cluster. CLARiiON arrays have integrated RecoverPoint splitters (array-based write splitters), which operate in each storage processor, which ensures that the RecoverPoint Application (RPA) receives a copy of each write. The array-based splitter is the most effective write splitter for VMware replication, by enabling replication of VMFS and raw device mapping physical (RDM/P) volumes without the cost or complexity of additional hardware. The splitter supports both FC and iSCSI volumes presented by the CLARiiON arrays to any host, including to an ESX Server. For this solution, EMC used an array-based write splitter that runs inside the storage processors on the CLARiiON CX4 array. EMC Replication Manager
EMC Replication Manager manages EMC point-in-time replication technologies through a centralized management console. Replication Manager coordinates the entire data replication process—from discovery and configuration to the management of multiple application consistent disk-based replicas. You can auto-discover your replication environment and enable streamlined management by scheduling, recording, and cataloguing replica information including auto-expiration. With Replication Manager, you can put the right data in the right place at the right time—on-demand or based on schedules and policies that you define. This application-centric product allows you to simplify replica management with application consistency.
EMC Data Protection Advisor Replication Analysis
EMC DPA/RA provides the visibility necessary for your organization to understand all of the details in managing large and complex data protection environments. DPA/RA software automatically monitors, analyzes, and provides alerts on many aspects of a business’s IT infrastructure. In this solution, EMC used DPA/RA to analyze and monitor the replication environment.
VMware vCenter Site Recovery Manager
VMware vCenter SRM is a disaster recovery management and automation solution for VMware vSphere. vCenter SRM accelerates recovery by automating the recovery process and simplifies management of disaster recovery plans by making disaster recovery an integrated element of managing your VMware virtual infrastructure. vCenter SRM integrates tightly with VMware vSphere, VMware vCenter Server, and storage replication (in this solution, RecoverPoint) to make failover and recovery rapid, reliable, and manageable. vCenter SRM enables businesses to take the risk and worry
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
11
out of disaster recovery, as well as expand protection to all of their important systems and applications. VMware vSphere
VMware vSphere uses the power of virtualization to transform data centers into simplified cloud computing infrastructures, and enables IT organizations to deliver flexible and reliable IT services. vSphere virtualizes and aggregates the underlying physical hardware resources across multiple systems and provides pools of virtual resources to the data center. As a cloud operating system, vSphere manages large collections of infrastructure (such as CPUs, storage, and networking) as a seamless and dynamic operating environment, and also manages the complexity of a data center.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
12
Configuration Overview
This white paper characterizes and validates a multisite disaster recovery solution for a fully virtualized Microsoft SQL Server 2008 R2 SP2 environment using RecoverPoint, Replication Manager, and DPA/RA. VMware is used to provide the virtualization layer, and vCenter SRM is employed to provide automated site failover. The solution covers the provision of:
Physical environment
•
Continuous data protection (CDP)
•
Continuous remote replication (CRR)
•
Concurrent local and remote replication (CLR)
Figure 1 shows the overall physical architecture of the environment.
Figure 1.
Physical architecture of the solution environment
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
13
Hardware resources
Table 2 shows the hardware resources used in this solution. Table 2.
Hardware
Equipment
Quantity
Configuration
Storage platform
2
CLARiiON CX4-960
Fibre switch
2
8 GB 48 port
Network switch
3
1 GB switch 48 port
ESX host machines
5
• 2 x 24 core/128 GB memory (Production – 7 virtual machines) • 2 x 16 core/64 GB memory (Disaster recovery – 7 virtual machines) • 1 x 16 core/64 GB memory (Load Servers – 4 virtual machines)
RecoverPoint appliances
4
Gen-4 RPAs
Distance simulator
1
Delay 0 ms, 1 ms, 4 ms, 8 ms, 16 ms, 32 ms, 64 ms
Software resources Table 3 shows the software resources used in this solution. Table 3.
Software
Description
Quantity
Version
Purpose
Windows Server 2008 R2
13
2008 R2 x64
Server OS
4
2008 R2
Database Software
2
2008 SP2
vCenter and vCenter SRM databases
VMware vSphere ESX
4
4.1 GA B260247
Hypervisor hosting all virtual machines
VMware vCenter
2
4.1 GA B259021
Management of VMware vSphere
VMware vCenter Site Recovery Manager
2
4.1.1
Managing failover and failback of virtual machines
EMC PowerPath/VE
5
5.4
Multipathing solution
EMC RecoverPoint/SE
4
3.3 SP2
EMC replication software, installed on each of the 4 RPAs
EMC Replication Manager
1
5.3.2
Replication management software
Enterprise Edition Microsoft SQL Server 2008 R2 Enterprise Edition Microsoft SQL Server 2008 SP2 Standard Edition
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
14
Environment profile
Description
Quantity
Version
Purpose
EMC Data Protection Advisor
1
5.7.1
Environment Monitoring software
EMC CLARiiON FLARE
2
30
CLARiiON Operating Environment
EMC Virtual Storage Integrator
2
4.0.1
Storage and mapping
EMC RecoverPoint Adapter for VMware vCenter SRM
2
1.0.3
EMC software for integrating RecoverPoint and vCenter SRM
EMC Solutions Enabler
8
7.2
CLARiiON management software
EMC Admsnap
4
2.3
Command line executable program
EMC Navisphere CLI
6
7.30.0.4.75
CLI software to manage the CLARiiON storage array
This solution was validated with the environment profile listed in Table 4. Table 4.
Environment profile
Profile characteristic
Quantity/Type/Size
VMware ESX Server
4
EMC RecoverPoint appliances
4 (2 each site)
Domain controllers
2 virtual machines
Distance emulator
1
Microsoft SQL Server 2008 R2
4 (2* OLTP, 2* data warehouse ) 2 (mount virtual machines)
Microsoft SQL Server 2008 SP2
2 (VMware vCenter database)
EMC Replication Manager and DPA/RA
1 virtual machine
VMware vCenter SRM
2 virtual machines
OLTP Database 1
75,000 users/TPC-E-like/780 GB
OLTP Database 2
50,000 users/TPC-E-like/500 GB
Data warehouse Database 1
TPC-H-like/2.5 TB
Data warehouse Database 2
TPC-H-like/1 TB
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
15
Storage design Overview
Virtualization improves efficiency and availability of IT resources and applications. It allows a rapid response to change in a business environment, allowing flexibility in meeting the challenges of the modern IT data centre. However, to maintain flexibility and granularity of recovery, in a virtualized infrastructure, it is critical that you ensure the backend layout and configuration of the storage is correct.
Planning for recovery
Flexible and granular data recovery is paramount right from the start of designing your environment. The effort this takes is well worth the gain in being able to provide a flexible and highly granular level of recovery options.
Storage layout
In this solution, EMC shows how the careful layout of the storage LUNs and their relationship to VMFS volumes and NTFS volumes provides a more granular level of recovery. In Figure 2, you can see that, for each LUN containing Boot, OS Pagefile, TempDB, SQL System Databases, SQL Data, and SQL Log files are mapped to VMFS datastores so the virtual machine LUNs have their own VMFS datastores. These are mapped in a one-to-one configuration to the VMDK files and are individually mapped to the Windows NTFS volumes. This maintains a one-to-one relationship throughout the storage layer to the presentation of volumes to the Windows client. The different volumes are grouped together; Boot and OS volumes are categorized and grouped together as they have a natural relationship. TempDB and SQL System Databases are grouped as the system. This solution includes SQL Data files and SQL Log files, with separate LUN-VMFS-VMDK to NTFS Volumes for each of them. These are provided individually for each production database to gain the level of granular recovery that is documented in this white paper. See Figure 2 for details.
Figure 2.
VMware storage layout
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
16
Storage configuration
As shown in Table 5, the production storage configuration for the solution consists of: Table 5.
Production array storage configurations
Raid type
Pool/RAID group
Disk configuration
Purpose
RAID 1/0 FC
FC Pool
32 * FC 600 GB
OLTP database data files
RAID 6 SATA
SATA Pool
16 * SATA-2 1 TB
Data Warehouse (DW) database data files
RAID 5 FC
FC RAID group
5 * FC 600 GB
Virtual machine operating systems, Page files, and RecoverPoint Repository Volume
RAID 5 FC
FC RAID group
5 * FC 600 GB
OLTP and DW SQL Server System (Master, Model, MSDB * 4)
RAID 1/0 FC
FC RAID group
4 * FC 600 GB
OLTP database logs
RAID 1/0 FC
FC RAID group
4 * FC 600 GB
DW database logs
RAID 1/0 FC
FC RAID group
4 * FC 600 GB
OLTP and DW temp database and logs
RAID 5 FC
FC Pool
30 * FC 600 GB
CDP Replicas and CRR RecoverPoint local journals
RAID 1/0 SATA
SATA RAID group
12 * SATA-2 1 TB
CDP Replicas of DW database data files
RAID 1/0 FC
FC RAID group
4 * FC 600 GB
CDP RecoverPoint journals
As shown in Table 6, the recovery site storage configuration for the solution consists of: Table 6.
Recovery site storage configurations
Raid type
Pool/RAID group
Disk configuration
Purpose
RAID 1/0 FC
FC Pool
32 * FC 600 GB
OLTP database data files
RAID 6 SATA
SATA Pool
16 * SATA 2 1 TB
DW database data files
RAID 5 FC
FC RAID group
5 * FC 600 GB
Virtual machine operating systems, Page files, and RecoverPoint Repository Volume
RAID 5 FC
FC RAID group
5 * FC 600 GB
OLTP and DW SQL Server System (Master, Model, MSDB * 4)
RAID 1/0 FC
FC RAID group
4 * FC 600 GB
OLTP and DW database logs
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
17
Raid type
Pool/RAID group
Disk configuration
Purpose
RAID 1/0 FC
FC RAID group
4 * FC 600 GB
DW database logs
RAID 1/0 FC
FC RAID group
4 * FC 600 GB
OLTP & DW temp database and logs
RAID 1/0 FC
FC Pool
8 * FC 600 GB
CRR RecoverPoint remote journals
This separation allows you to service the application requirements and, as shown by the results, a properly configured and sized journal that can service the level of data change being replicated is important in order to minimize the impact on production during replication. The recovery site array configuration consists of replicas, journals, and repository volumes. CLARiiON storage allocation
Table 7 details the storage LUNs provisioned for the solution from the production CLARiiON array. Outside of these production LUNs, volumes replicated with CLR require a target volume of the same size on both the local array and remote array, along with CDP journals on the production array and CRR journals for both the configured production and remote arrays. Table 7.
Production storage LUNs
LUN name
RAID type
Pool/RAID group
User capacity (GB)
OLTP Database 1 data files
RAID1/0
Pool 0 OLTP
1,500
OLTP Database 2 data files
RAID1/0
Pool 0 OLTP
1,024
DW Database 1 data file 1
RAID6
Pool 1 OLAP
2,000
DW Database 1 data file 2
RAID6
Pool 1 OLAP
2,000
DW Database 2 data file 1
RAID6
Pool 1 OLAP
750
DW Database 2 data file 2
RAID6
Pool 1 OLAP
750
Domain controller virtual machine
RAID5
RAID Group 0
45
DW Database 1 page file
RAID5
RAID Group 0
60
DW Database 2 page file
RAID5
RAID Group 0
60
DW Database 2 boot volume
RAID5
RAID Group 0
100
OLTP Database 1 boot volume
RAID5
RAID Group 0
100
OLTP Database 2 page file
RAID5
RAID Group 0
60
OLTP Database 2 boot volume
RAID5
RAID Group 0
100
RecoverPoint Repository Volume
RAID5
RAID Group 0
3
Virtual center virtual machine
RAID5
RAID Group 0
75
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
18
LUN name
RAID type
Pool/RAID group
User capacity (GB)
DW Database 1 boot volume
RAID5
RAID Group 0
100
Replication Manager and DPA/RA virtual machine
RAID5
RAID Group 0
65
OLTP Database 1 page file
RAID5
RAID Group 0
60
OLTP Database 1 SQL Server system databases
RAID5
RAID Group 1
20
DW Database 1 SQL Server system databases
RAID5
RAID Group 1
20
OLTP Database 2 SQL Server system databases
RAID5
RAID Group 1
20
DW Database 2 SQL Server system databases
RAID5
RAID Group 1
20
DW Database 2 TEMP DB and logs
RAID1/0
RAID Group 2
220
OLTP Database 2 TEMP DB and logs
RAID1/0
RAID Group 2
220
DW Database 1 TEMP DB and logs
RAID1/0
RAID Group 2
220
OLTP Database 1 TEMP DB and logs
RAID1/0
RAID Group 2
220
OLTP Database 2 transaction logs
RAID1/0
RAID Group 7
220
OLTP Database 1 transaction logs
RAID1/0
RAID Group 7
220
DW Database 2 transaction logs
RAID1/0
RAID Group 8
220
DW Database 1 transaction logs
RAID1/0
RAID Group 8
500
Storage pools
EMC storage pools were selected for appropriate workloads to simplify the storage planning and design for those storage elements.
RAID groups
RAID groups were selected for specific I/O workloads that were extremely latencysensitive and had a high order of writes. Deterministic performance of these workloads was achieved through spindle isolation over storage pools, which service many different I/O workloads.
OLTP/DW – I/O differences
The solution uses an OLTP database with an 80:20 read-to-write ratio. The data warehouse workload was primarily heavy read.
Database data and log storage devices
Database datafile LUNs are isolated from database log file LUNs to enable granular recovery at LUN level. Log LUNs and their disk spindles are isolated from database data file LUNs so there is no impact during operations, such as database maintenance.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
19
Protection storage
Configuration for storage for protection is outlined in Table 8, which shows the configuration of each protection storage element for RecoverPoint. Table 8.
Protection storage configuration
RecoverPoint storage type
Details
Target journals
CDP/CRR journals on RAID groups (high I/O rate)
Source journals
Source journals on pools (RAID 1/0 OLTP, RAID 6 DW) (lower I/O rate)
Target volumes
OLTP : RAID 1/0 - high incoming I/O rates DW: RAID 6 - less writes, sequential over random I/O patterns
Consistency group configuration
Configuration of the RecoverPoint consistency groups is outlined in Table 9, which shows the journal sizing used and the consistency grouping used for each virtual machine. Table 9.
RecoverPoint consistency groups journal configuration
Server
Consistency group
Consistency group volumes
CDP journal sizing
CRR local journal sizing
CRR remote journal sizing
OLTP Server1
OLTP DB CG
SQL DBs and Logs
275
275
275
(TPCE1)
OLTP OS CG
OS and Page File
10
10
10
OLTP TEMP-DB CG
Temp DB and System DBs
10
10
10
OLTP Server2
OLTP DB CG
SQL DBs and Logs
175
175
175
(TPCE2)
OLTP OS CG
OS and Page File
10
10
10
OLTP TEMP-DB CG
Temp DB and System DBs
10
10
10
DW Server1
OLAP DB CG
SQL DBs and Logs
50
50
50
(TPCH1)
OLAP OS CG
OS and Page File
10
10
10
OLAP TEMP-DB CG
Temp DB and System DBs
30
30
30
DW Server2
OLAP DB CG
SQL DBs and Logs
45
45
45
(TPCH2)
OLAP OS CG
OS and Page File
10
10
10
OLAP TEMP-DB CG
Temp DB and System DBs
10
10
10
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
20
VMware design Overview
When deploying Microsoft SQL Server 2008 R2 in a virtual environment, you need to make several design decisions, such as specific database requirements in relation to CPU, memory, and storage in order to maintain or improve database performance. In this white paper, EMC provides guidance to help you virtualize SQL Server using VMware vSphere.
Virtual machine allocations
EMC deployed SQL Server virtual machines with the configuration settings as shown in Table 10. Table 10.
Virtual machine configurations
Name
Role
Qty
vCPUs
Memory
Disks (VMDKs)
vSCSI Controller
TCE-SQL-TPCH1
DW Server
1
4
24 GB reserved
60 GB OS
0:0 (LSI Logic SAS)
50 GB Page File
0:1 (Paravirtual)
500 GB Logs
1:2 (Paravirtual)
5 GB SQL System
1:3 (Paravirtual)
200 GB Temp DB & Logs
1:4 (Paravirtual)
1.75 TB OLAP DB1
2:5 (Paravirtual)
1.75 TB OLAP DB2
2:6 (Paravirtual)
60 GB OS
0:0 (LSI Logic SAS)
50 GB Page File
0:1 (Paravirtual)
110 GB Logs
1:2 (Paravirtual)
5 GB SQL System
1:3 (Paravirtual)
200 GB Temp DB & Logs
1:4 (Paravirtual)
1.5 TB OLTP 75 K
2:5 (Paravirtual)
60 GB OS
0:0 (LSI Logic SAS)
50 GB Page File
0:1 (Paravirtual)
200 GB Logs
1:2 (Paravirtual)
5 GB SQL System
1:3 (Paravirtual)
200 GB Temp DB & Logs
1:4 (Paravirtual)
700 GB OLAP DB1
2:5 (Paravirtual)
700 GB OLAP DB2
2:6 (Paravirtual)
60 GB OS
0:0 (LSI Logic SAS)
50 GB Page File
0:1 (Paravirtual)
110 GB Logs
1:2 (Paravirtual)
5 GB SQL System
1:3 (Paravirtual)
200 GB Temp DB & Logs
1:4 (Paravirtual)
1 TB OLTP 50 K
2:5 (Paravirtual)
TCE-SQL-TPCE1
OLTP Server -
1
4
75,000 user database
TCE-SQL-TPCH2
TCE-SQL-TPCE2
DW Server
OLTP Server – 50,000 user database
1
1
4
4
32 GB reserved
24 GB reserved
32 GB reserved
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
21
Storage design and performance is key to any SQL Server deployment. In this solution, SQL Server virtual machines’ VMDKs were allocated to dedicated VMFS datastores. In turn, these dedicated VMFS datastores had a one-to-one mapping to dedicated LUNS on the CLARiiON array. This is primarily for recoverability and performance. For the Microsoft Windows 2008 R2 OS VMDKs, EMC used the default LSI Logic SAS adapter. For the more I/O intensive virtual disks, Paravirtual SCSI (PVSCSI) adapters were used, which are more efficient and recommended for these types of workloads. As recommended by storage and operating system vendors, VMFS partitions created using vSphere Client are automatically aligned on 64 KB boundaries. In VMware environments, a special consideration is made for Replication Manager to ensure that unique SCSI target IDs are used across all SCSI controllers on the virtual machine, that is, the SCSI target of the virtual disk being replicated must not be used on other SCSI controllers. This is used for volume identity during recovery. As shown in Table 10, the number before the colon is the controller; the number after the colon is the target. The target must be unique and not used on more than one controller as shown in the “vSCSI controller” column. Following VMware’s recommendation for SQL Server, EMC set a static reservation in ESX for a virtual machine, as shown in Figure 3, in order to prevent memory from being ballooned.
Figure 3. Data integrity in VMware ESX
Virtual machine memory allocation
One of the biggest concerns in a virtual environment, with a hypervisor residing between the database’s OS and the storage hardware, is that data integrity is maintained. You must ensure that the SQL Server deployments in a virtual environment are I/O crash consistent. I/O crash consistency requires that the correct order of writes is maintained so that an application can restart properly in the event of a server crash. Crash consistency ensures that I/O acknowledged successfully by the application is stored persistently on disk. VMware ESX, which uses hypervisor architecture, guarantees I/O crash consistency. SQL Server instances that run on Windows Server virtual machines on VMWare ESX have the same crash-consistency guarantees as applications running on physical machines.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
22
For more information on I/O crash consistency, see VMware KB Article 1008542 Storage IO crash consistency with VMware products. All of EMC’s storage platforms adhere to the Microsoft SQL Server I/O Reliability program. For more information, see the following:
Support
•
EMC Mid-Range Storage and the Microsoft SQL Server I/O Reliability Program
•
Microsoft SQL Server I/O Reliability Program Overview
You may have concerns about the support of your Microsoft SQL Server environments in a virtualized environment. Microsoft supports SQL Server virtual machines in a VMware environment under its Server Virtualization Validation Program (SVVP). For more information, see Windows Server Virtualization Validation Program.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
23
Application design Overview
To guarantee performance of Microsoft SQL Server 2008 R2 databases, separation of data files and log files is necessary, which ensures isolation of files to storage pools or RAID groups containing adequate spindle counts to service the given workloads. Similar consideration should be employed when designing storage for the purpose of recovery. This allows RecoverPoint to deliver highly granular levels of protection.
Microsoft SQL Server layout
Data files and the transaction log are separated and isolated to dedicated spindles to guarantee performance. In this solution, all the data files are located on one volume if the required data files can be isolated to separate Windows NTFS Volumes with their respective file groups spanning multiple volumes. The relationship between data files and file groups is shown in Figure 4. The important consideration is that critical databases should be isolated to dedicated volumes. This allows consistency groups to be configured for individual databases with the priority of that data set, as required. Figure 4 shows the storage layout for a SQL Server database, which includes: •
Windows NTFS Volumes for a SQL Server instance
•
C & P for OS and PageFile respectively
•
F & S, which are TempDB and SQL System databases
•
G & H, which host the SQL Server database
Figure 4.
Storage layout for SQL Server database
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
24
The separation of the database data files and the transaction log has been maintained. When guaranteed performance is required, dedicated spindles should be provided for LUNs. A SQL Server transaction cannot be committed until the transaction has been made durable, that is, written and committed to the SQL Server Log file. Separating random I/O (data file patterns) from latency-sensitive (synchronous I/O operation) write-sequential I/O (log files) ensures best performance from the underlying physical disks maintaining the database.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
25
Protection design Overview
This section describes the configuration required between RecoverPoint, Replication Manager, DPA, and vCenter SRM when implementing protection for the solution.
RecoverPoint
Local replication process (CDP) Figure 5 shows RecoverPoint’s Continuous Data Protection (CDP) process that synchronously replicates data from the production (source) volumes to the local target volumes, while maintaining reversible recovery through journaling storage.
Figure 5.
EMC RecoverPoint local replication (CDP) data flow
The CDP process consists of: 1.
The application server issues a write to a LUN that is being protected by RecoverPoint. The write is intercepted by the RecoverPoint splitter.
2.
The splitter splits the write and simultaneously sends it to the RecoverPoint appliance (RPA) and to the production volume.
3.
Writes are acknowledged back from the RPA and production LUN.
4.
The RPA writes data to the journal volume along with time stamp and bookmark metadata.
5.
Once the data is safely stored in the journal, write-order-consistent data is then distributed to the local replica.
Remote replication process (CRR) Figure 6 shows RecoverPoint’s Continuous Remote Replication (CRR) process that replicates blocks of data to the CLARiiON array at the remote site. The data is replicated either synchronously over a Fibre Channel connection up to 200kms/4ms or asynchronously over an IP link, in this solution tested up to 64 ms/6,400 km round trip.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
26
Figure 6.
RecoverPoint remote replication (CRR) data flow
The CRR process consists of: 1.
The application server issues a write to a LUN that is being protected by RecoverPoint. The write is intercepted by the RecoverPoint splitter.
2.
The splitter splits the write and simultaneously sends it to the production volume and to the local RPA.
3.
When the RPA receives the write, the RPA immediately acknowledges it back to the server, unless synchronous remote replication is in effect. With synchronous replication, the acknowledgment (ACK) is delayed until the write has been received at the remote site.
4.
When the write is received by the local RPA, it is bundled with other writes, sequenced, and time stamped. The package is then compressed and transmitted with a checksum for delivery over IP to the remote RPA.
5.
When the package is received at the remote site, the remote RPA verifies the checksum, to ensure the package was not corrupted in transmission, and decompresses the data.
6.
The RPA writes the data to the journal volume at the remote site.
7.
Once the data has been written to the journal volume, it is distributed to the remote volumes. Write order is preserved during this distribution.
Adding CLARiiON splitters An array-based splitter is a device or process that is responsible for splitting the incoming write I/O from the host to both the intended production LUN and also to the RecoverPoint appliance (and onwards to the target LUN). CLARiiON arrays have a splitter process already built in and just require enabling through a license.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
27
To configure RecoverPoint to access the splitter on the CLARiiON array, you need to run the Add New Splitter wizard in the RecoverPoint Management Application. As shown in Figure 7, the splitters are added for both local and remote arrays.
Figure 7.
Adding new splitters
Consistency groups A consistency group is a logical container within RecoverPoint, which ensures all devices within that consistency group are consistent (write-order fidelity) with each other. RecoverPoint version 3.3 supports up to 128 consistency groups. Consistency groups have a number of settings to suit the data types being replicated and how best to replicate them. You can customize consistency group policy settings to set priorities for replication, such as resource allocation. This prioritization determines the amount of bandwidth allocated to the consistency group in relation to all other consistency groups. When configuring consistency groups for the solution, volumes relating to the different virtual machine functions were placed in their own dedicated consistency group, this resulted in each virtual machine consisting of three consistency groups. EMC then defined the policies based on volume requirements: •
OS/Page File – Operating system and page file volumes that have a natural relationship were kept together. As these volumes have a very small amount of change rate, set the Resource Allocation Priority to Normal as shown in Figure 8.
Figure 8.
Normal priority for OS/Page File
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
28
•
TempDB/SystemDBs – TempDB is being replicated so that you can bring up entire virtual machine at the remote site using either the manual recovery steps or an automated process with vCenter SRM. The high volume of processing in the data warehousing databases generates large amounts of TempDB activity. Since TempDB is recreated when the SQL Server instance is restarted, it can be considered expendable, so set the Resource Allocation Priority to Low as shown in Figure 9.
Figure 9. •
Low priority for TempDB/SystemDBs
Data/Logs – User databases are running online transaction processing, which is considered the most important data in the environment, therefore set the Resource Allocation Priority to Critical as shown in Figure 10. This ensures replication of this data occurs ahead of consistency groups with lower priorities.
Figure 10.
Critical priority for Data/Logs
Figure 11 shows the relationship between the Windows NTFS volumes, their relevant consistency groups and what consistency group policy settings were defined for both local and remote policies.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
29
Figure 11.
Windows volumes to RecoverPoint consistency group mapping
For each consistency group, when using CLR, three journals were set up, two at the production site to support both CDP and CRR, and one at the recovery site for CRR. Groups Sets The RecoverPoint Group Sets feature allows the bookmarking of consistent recovery points across multiple consistency groups. A Group Set contains chosen consistency groups constituting a single virtual machine, as shown in Figure 12. An appropriately named bookmark entry is created for each group.
Figure 12.
Group Sets
RecoverPoint makes it simple to create a group set that periodically tags each consistency group’s image with the same bookmark. Once the images have been recovered back to the appropriate point in time, the virtual machine is considered consistent and integrity is ensured. In this solution, it was decided that a Group Set bookmark be performed every 15 minutes. (You can call a bookmark from every second up to hours apart.) The production copies for each consistency group in a group set must reside on the same site. The Group Set members can have different target copies, such as local, remote, or both.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
30
Using Group Sets makes it easy for you to identify the correct image to access across all members. In the example shown in Figure 13, the bookmark TPCE1 GROUP 11741 is selected as the image to access for each of the three groups.
Figure 13.
Enabling image access
In order to access a copy of the replicated data, within the RecoverPoint console, you can perform the step Enable Image Access on each of the consistency groups as required, for the time required. When accessing the selected image for all groups in the Group Set, the RecoverPoint Management Console (Figure 14) shows that all three consistency groups have been rolled back to the same target image. At this point, all volumes at the disaster recover site required for TPCE1 virtual machine are fully consistent, and the server can be restarted in a crash-consistent manner.
Figure 14.
RecoverPoint Management Console – consistency groups
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
31
Consistency group policies As previously mentioned, RecoverPoint supports many options for replications. The main options are explained below; •
Asynchronous — In asynchronous replication mode, the host application initiates a write and does not wait for an acknowledgement from the remote RPA before initiating the next write. Asynchronous replication is supported over FC and IP networks.
•
Synchronous — In synchronous replication mode, the host application initiates a write and then waits for an acknowledgement from the remote RPA before initiating the next write. Synchronous replication is not the default and must be specified by the user. Synchronous replication is supported over FC.
You must perform the following steps to configure RecoverPoint to replicate VMware File Systems (VMFS) normally. You perform these procedures while creating the consistency groups for the LUNs to be used by VMware. 1.
2.
Verify that reservation support is enabled. a.
Open the RecoverPoint Management Application
b.
In the left navigation pane, select the consistency group.
c.
In the Components pane, locate the Advanced section. Verify that Reservations support is checked.
In the Advanced section of the Components pane, set Host OS.
Journal sizing - protection windows A very important consideration is the sizing of your RecoverPoint journals. They must have the correct performance characteristics in order to handle the total write performance of the LUN being protected. They must also have the capacity to store all the writes of the protected LUN. The two most important questions to ask are: •
What change rate does the source LUN generate?
•
What retention window is required?
To calculate the journal capacity, you must measure the rate of change on the production LUNs. Perfmon counters are set on each of the SQL Servers to capture the write bandwidth in megabytes per second (MB/s). On the CLARiiON array, data per second values are found with the Unisphere Analyzer. The Unisphere Analyzer provides
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
32
good historic data, while the Stats tab gives a point-in-time window to see what each Storage Processor is doing. The Journal Volume Sizing formula is: Journal size =
(data per second) ∗ (required rollback time in seconds) ((1 − target side log size) x 1.05)
Twenty percent of the journal must be reserved for the target side log and five percent for internal system needs. For example, to support a 24-hour rollback requirement (86,400 seconds), with 5 megabits per second (Mb/s) of new data writes to the replication volumes in a consistency group, the calculation is: 5 ∗ 86,400 = 567,000 Mb = 69.213 GB (~70 GB) ((1 − 0.2) x 1.05)
For the solution, all journals were sized to enable RecoverPoint to roll back at least seven days. Integrating with VMware vCenter The vCenter Servers view displays data from the vCenter Server through the RecoverPoint graphical user interface (GUI). In addition to displaying ESX Servers and all their virtual machines, datastores, and RDM drives, the vCenter Servers view also displays the replication status of each volume. The protection status of every virtual machine is measured multiple times per hour. This window is updated when a new virtual machine is created or the protection status of a virtual machine changes. The vCenter Servers view is for monitoring only (read-only). For example, as shown in Figure 15, the RecoverPoint Management Application shows that all of the relevant volumes for TPCE1 and TPCH2 are successfully replicated. Included are the respective consistency groups, the copy being replicated, and the associated replication sets.
Figure 15.
vCenter Servers view in RecoverPoint Management Application
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
33
RecoverPoint failover If a virtual machines crashes, you can use RecoverPoint’s failover process to replicate data from the production site to the disaster recovery site.
Figure 16.
RecoverPoint failover process
In this scenario (as shown in Figure 16), use the following steps for failover when a virtual machine crashes at the production site: 1.
Enable image access to the latest bookmark. This provides read/write image access of the CRR copy to the remote ESX Servers and allows the mount of the VMFS volumes on the remote vCenter.
2.
In the RecoverPoint Management Application or with the Unisphere Management Console, choose failover to the remote replica.
3.
Rescan all storage on the remote ESX Server through the remote vCenter console.
4.
Right-click on the VMX file in the OS LUN VMFS datastore to register the virtual machine, then choose to inventory the virtual machine.
5.
Power up the virtual machine and connect the vNIC to the network to re-enable IP access to the database. The admin can then fail back the virtual machine by shutting down the virtual machine on the remote site and repeating the steps to complete failback to the production site.
This solution allows for full portability of your SQL Server Instances between sites.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
34
Different subnet If the virtual machine is being failed over to a vSphere cluster on a different subnet (for example, from 10.10.10.x to 10.20.20.x), you need to create a distributed switch on the production ESX cluster with the same properties of the actual disaster-recovery-site distributed virtual switch. In order to configure for failover to a different subnet, you would assign a vNIC on the dummy switch to the SQL Server virtual machines in production. As shown in Figure 17, a dummy switch is created at the production site vCenter Server. The virtual machine is then configured with a second vNIC on this dummy virtual disaster recovery switch. This allows the virtual machine to fail over seamlessly to the remote site without any additional network configuration on the disaster recovery side.
Dummy DR switch configured for second NIC on virtual machine to allow for failover to different subnet on DR side. No physical NIC is connected to this switch.
Figure 17.
vNIC is configured and connected
As shown in Figure 18, the disaster recovery site vNIC is configured and connected, but the network is unidentified. This is because there is no physical NIC connected to the specified (sqluce1.com) network.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
35
Unidentified network This is because no physical NIC is connected to the DR switch on production. When virtual machine is failed over to DR, it will be connected to sqluce1.com Domain.
Figure 18.
Prod site connected to sqluce1.com network
When the virtual machine is failed over to the disaster recovery side, this NIC will be connected to specified (sqluce1.com) network as shown in Figure 19. The production site vNIC will then have an unidentified network because the production switch configured at the disaster recovery site is only there for configuration purposes and is not live on the network.
sqluce1.com network This is because a physical NIC is connected to the DR switch on the DR site. When PCE1 virtual machine is failed over to DR, it is connected to sqluce1.com domain through “DR Site” NIC.
Figure 19.
Disaster recovery site connected to sqluce1.com network
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
36
Replication Manager
Integration with Microsoft SQL Server To support SQL Server, Replication Manager uses Microsoft SQL Server’s Virtual Device Interface (VDI) snapshot API as the method in providing online, rapid applicationconsistent snapshots of very active enterprise-class SQL Server databases with negligible host overhead. Replication Manager enables you, using a simple wizard-driven interface to: •
Specify which Instances, databases and corresponding file groups to replicate
•
Ensure that the data can be replicated safely and quickly
•
Return the database to normal operation after creating the replica
•
Mount or recover a database on another host so it can be used for other operations such as testing, reporting, or data mining
•
Quickly recover a database on the production host in the event of data corruption
Application sets and jobs for Microsoft SQL Server Replication Manager uses the concept of application sets as containers to define what data to protect (for example, Database 1) and jobs as a way to protect that data (for example, RecoverPoint bookmark image). Microsoft SQL Server application consistency In this solution, the Replication Manager jobs were configured to protect SQL Server user databases using the option Replica Type— Full, Online with advanced recovery (using VDI), as shown in Figure 20. This option replicates the entire database and transaction log. This replica type is typically used when the replica will be considered a backup of the database or when the replica will be mounted in order to use a thirdparty product to create a backup of the database. In order to bring the database forward to a point in time that is newer than the replica, this replica type allows you to restore transaction logs, assuming you have backed up those transaction logs. Replication Manager uses VDI-enabled snapshots to create this replica type, guaranteeing application-consistent data.
Figure 20.
Advanced replication settings—consistency method
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
37
Note
The system databases (master, MSDB, model) should not be located on the same volume as user databases. Microsoft SQL Server does not support using VDI and snapshot technology to restore system databases.
Configuring Replication Manager to communicate with vCenter and RecoverPoint Replication Manager can replicate, mount, and restore a VMFS datastore at a LUN level, that resides on an ESX Server managed by vCenter. Since all operations are performed through the vCenter, neither Replication Manager nor its required software need to be installed on a virtual machine or on the ESX Server where the VMFS resides. Operations are sent from a proxy host that is either a physical host or a separate virtual host. The Replication Manager VMware proxy host used in this solution shares the same virtual machine as the Replication Manager Server. The Replication Manager VMware proxy host must be registered with the Replication Manager Server, and the credentials for vCenter that manages the proxy host must be provided. The Replication Manager VMware proxy host communicates with vCenter over port 443. Once this is complete, Replication Manager can map the associated VMFS volumes to LUNs. Replication Manager can also discover the LUNs replicated by RecoverPoint. Communication to RecoverPoint is done through the Replication Manager agent installed on the production and mount virtual machines. You can configure both of these communication options within Replication Manager using the Options tab, shown in Figure 21, for each Replication Manager host.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
38
Figure 21.
Replication Manager—Options tab
Logical Volume Manager resignaturing VMFS replication requires that Logical Volume Manager (LVM) resignature be enabled on both the production and mount ESX Servers. LVM resignature allows VMware to write a new signature to the LUNs when necessary. This switch must be enabled for Replication Manager so that VMFS can be made visible on the replicated LUNs to the ESX Server. This setting should also be enabled on the production ESX Server in order to restore to that ESX Server at any time. The following command must be issued on ESX Servers used to mount replicas: •
esxcfg-advcfg -s 1 /LVM/EnableResignature
For more information on this topic, refer to the EMC Replication Manager Administrators Guide, VMWare Setup section. Discovering the RecoverPoint appliance and CLARiiON array This solution uses the CLARiiON RecoverPoint splitter, which enables the CLARiiON array to discover the RecoverPoint storage within Replication Manager. You need to enter your CLARiiON credentials (as shown in Figure 22) to complete this discovery task.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
39
Figure 22.
CLARiiON credentials for discovering RPA
Once you configure the credentials for at least one Replication Manager host agent, a CLARiiON array discovery operation also detects the RecoverPoint splitter. Recovering a Microsoft SQL Server user database In this solution, EMC simulated a disaster of a live production database and tested the solution by recovering to a specific point in time. As shown in Figure 23, a table from an OLTP database is being deleted at 14:16:00 hours. This is a critical table to the functionality of the database called the Accounts Permissions Table. Users cannot access data without it, so at 14:16:00, the business in which the database was serving is down.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
40
Figure 23.
Table from an OLTP database being deleted
EMC also deleted the database to simulate human error, so the entire database is lost at 14:16:15 hours as shown in Figure 24.
Figure 24.
At 14:16:15, database is deleted
To recover the database, Replication Manager is the only interface required by the user, which coordinates all operations across all levels of the solution stack - SQL Server, Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
41
Windows Server, VMWare, EMC CLARiiON, and RecoverPoint - in order to orchestrate the recovery process. Using a wizard-driven interface is useful in a panic situation, such as recovery of a business-critical transactional database, to ensure all best practices are systematically followed for a successful restore. To prove out the effectiveness of this solution, EMC did a selective restore from the CRR copy from the remote site, and wound the clock back to just one second before the disaster occurred, 14:15:59 seconds. Replication Manager accessed an image of the database at that exact time specified and recovered the database, which resulted in an RPO of one second (as shown in Figure 25). EMC chose to recover all files and file groups for the database.
Figure 25.
Replication Manager console—RPO of 1 second
The recovery operation took 3 minutes 26 seconds. After Replication Manager finishes its recovery process, it will leave the database detached in order to allow the DBA to manually attach the database. The DBA then has the opportunity to ensure integrity of the database before allowing access to users. As shown in Figure 26, with the database attached, online users can again access data at 14:21:45 seconds, and the business unit is now operating again.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
42
Figure 26.
Database attached, recovered, and online
This solution gave us an RPO of one second and an RTO of less than four minutes. The level of recovery is extremely powerful and allows you to commit to strict service level agreements (SLAs). This allows you to easily and quickly recover business-critical, highly transactional OLTP databases with minimal fuss. Mounting a Microsoft SQL Server database Replicas can be mounted on other hosts and used for backup, reporting, or testing. The virtual machine used for mounting the SQL Server replica must have an RDM volume assigned to it in order to issue direct SCSI instructions to the array. You must be sure you specify the mount host in the Replication Manager job as shown in Figure 27.
Figure 27.
Specify the mount host
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
43
The solution names the mount host tce-sql-drsql1. The replica being mounted is the CRR copy of the production TPCE2 virtual machine database. EMC then chose an alternate mount path as shown in Figure 28.
Figure 28.
Alternate mount path
EMC set the recovery type to Recovery—this instructs the restore operation to roll back any uncommitted transactions. After the recovery process, the database is ready for use. During this process, the production database remains unaffected. Data Protection Advisor for Replication Analysis
DPA/RA automates the collection of data from applications, hosts, and arrays, constantly monitoring for exposures, and alerting on potential missed SLAs and gaps in the protection objectives. Data collection and discovery wizards Monitoring devices and applications is automated by the Data Collection wizard and the Discovery wizard, which configure DPA/RA using a series of questions about the device or application to monitor. After defining a device or application in a wizard, one or more nodes are automatically added to the Configuration view and data monitoring by the collector starts. Data collection and CLARiiON arrays In this solution, CLARiiON arrays are monitored remotely from a collector running on the DPA/RA server. The CLARiiON is monitored for recoverability and analysis reporting.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
44
Discovering CLARiiON arrays from DPA/RA requires EMC Solutions Enabler to be installed. To install Solutions Enabler, use the following steps: 1.
Install Solutions Enabler on the DPA/RA server.
2.
Create a text file with the following CLARiiON information, one line per CLARiiON (in this solution, the file name is Clar.txt):
3.
To register the CLARiiON, run the following command on the DPA/RA server:
4.
To verify the CLARiiON was added successfully, run the following command:
To discover CLARiiON storage arrays, use the following steps: 1.
From the DPA/RA toolbar, select Tools, then choose Discovery Wizard.
2.
Select Storage Arrays, then to proceed to the Import Source panel, click Next. The Discovery wizard displays a list of all of the storage arrays.
3.
Select the storage arrays that you want to import and click Next.
4.
Select a Schedule for the recoverability data gathering request and click Finish.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
45
Configuring Data Protection Advisor for Microsoft SQL Server monitoring To configure DPA/RA for SQL Server monitoring, use the following steps: 1.
From the DPA/RA toolbar, select Tools then choose Data Collection Wizard.
2.
Click Host and click Next. The Host Details panel appears.
3.
Enter the name and a description of the host. State the OS running on the host. In this solution, Microsoft SQL Server 2008 R2 was running on Windows 2008 R2.
4.
Under Collector Location, for Is there or will there be a Collector installed on the Host?, choose Yes.
5.
To gather CPU performance and memory utilization data from the OS, under Data Gathering, a.
For Do you want to gather system information?, choose Yes.
b.
For Do you want to monitor applications on this host?, choose Yes.
c.
Check Microsoft SQL Server.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
46
6.
To add a SQL Server instance, in the Data Gathering pane, click Add. The Add SQL Server Instance dialog box appears.
Figure 29 shows the TPCE1 SQL Server is successfully added to the DPA/RA configuration.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
47
Figure 29.
DPA/RA database server view
Display and report DPA/RA provides an intuitive graphical map of the relationship between the host and storage. DPA/RA presents the recoverability gaps and exposures by using reports and views that can be used to resolve issues. There are numerous replication error conditions for which DPA/RA can monitor. In this solution, EMC tested a database that was added to the TPCE1 SQL Server. The reporting utility that was scheduled detected that the database was not being protected and reported this as an exposure. Figure 30 shows the configuration for setting up a scheduled report, while Figure 31 shows the exposure details for TPCE1. This maps the storage to RecoverPoint and then into the SQL Server virtual machine residing on a ESX cluster.
Figure 30.
Scheduled Report Editor showing monitored SQL Servers
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
48
Figure 31.
Exposure details for TPCE1 SQL Server
From this, EMC determined that the solution was missing an application volume, the replica was incomplete, and the application might not be recoverable. View customization Because DPA/RA can monitor and alert on many aspects of an IT environment, defining a view for an individual or department is quite important. To define a customized view, use the following steps: 1.
In DPA, choose View then choose New Policy. Under Properties, in the Name field, type your view name. For example, this creates a new configuration view called Cork.
2.
Copy the entire SQL Server configuration view and paste to Cork view.
3.
Choose View then click on the previously created configuration Cork.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
49
4.
Within the Cork configuration, right-click Cork then click Paste. This copies the SQL Server view to the Cork configuration so that you only see the SQL Server environment.
Defining who can view this configuration is done through User Properties as shown in Figure 32. LDAP authentication can be used by configuring connection details to a domain controller.
Figure 32.
Defined user view
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
50
VMware vCenter SRM
Integrating vCenter SRM with RecoverPoint vCenter SRM reduces the RTO for disaster recovery and relies on block-based replication to reduce the RPO for disaster recovery. The RecoverPoint SRA is used to map the vCenter SRM requests into the appropriate RecoverPoint actions. vCenter SRM and RecoverPoint automate the virtual machines recovery process, which makes it as simple as pressing a single button. The user has no interaction with the RecoverPoint console; instead, vCenter SRM automates the whole failover process. The integration between RecoverPoint and vCenter SRM is controlled by the RecoverPoint Storage Replication Adapter (SRA). RecoverPoint is responsible for replicating all changes from the production LUNs to the remote replicas at the disaster recovery site. The RecoverPoint SRA is installed on the same servers that are running the vCenter Server and the vCenter SRM plug-in at the production and disaster recovery sites. RecoverPoint SRA supports vCenter SRM functions, such as failover and failover testing, by using RecoverPoint as the replication engine. Configuring the consistency group for management by vCenter SRM After the consistency group is created and vCenter SRM is installed, you need to configure the consistency group for management by vCenter SRM. You do this using the policy settings in the RecoverPoint Management Application, as shown in Figure 33.
Figure 33.
Configuring the consistency group for management by vCenter SRM
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
51
vCenter SRM solution protection In this solution, vCenter SRM is protecting the SQL Server virtual machines. Replication Manager is required on the production site for local protection, mount, and restore. The disaster recovery site contains its own vCenter and Active Directory virtual servers, so there is no requirement to replicate these. Figure 34 shows how vCenter SRM protects the SQL Servers using RecoverPoint integration by automating the required steps.
Figure 34.
vCenter SRM protection procedure for production site
vCenter SRM requires configuration on both the production and recovery sites. The production site requires the following configuration: •
Connection to establish vCenter SRM communication between vCenter Servers
•
Array managers to detect replicated devices
•
Inventory mappings for site-specific folder, network, and resource mappings
•
Protection groups to organize virtual machines on their respective datastores for recovery
The recovery site requires that you configure a recovery plan by creating an automated run book of the recovery process. Configuring vCenter SRM protection groups A RecoverPoint consistency group is a data set of SAN-attached storage volumes at the production site and disaster recovery site. A vCenter SRM protection group is a group of virtual machines that are failed over together (during testing and actual failover).
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
52
When vCenter SRM performs failover, it instructs RecoverPoint to operate on all the LUNs of all the virtual machines in the protection group. RecoverPoint, on the other hand, uses consistency groups to define groups of LUNs that are replicated together. After successfully configuring the connection, array managers, and inventory mappings, you need to configure the protection groups as shown in Figure 35.
Figure 35.
Configuration of protection groups in vSphere Client
For this solution, four individual protection groups were created, two for each of the TPCE virtual machines and two for the TPCH virtual machines. Within each protection group, you can specify the recovery priority to be applied to each of the virtual machines. Modifying the startup priority of a virtual machine It may not be desirable to have all virtual machines boot simultaneously on recovery, therefore, EMC configured the TPCE virtual machines for high priority as they are the most critical to the business (as shown in Figure 36) and left the TPCH virtual machines on a low priority setting.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
53
Figure 36.
Selecting recovery priority for virtual machines
Customizing recovery site IP addresses When failing over to a different data center, some adjustments are required to the host IP settings due to the infrastructure differences. When failing over an entire configuration, this can involve updating the settings for multiple virtual machines. vCenter SRM provides a bulk IP customization utility, dr-ip-customizer.exe, for automatically updating IP settings for recovered virtual machines. The utility generates a CSV file containing the IP settings for all the virtual machines that are configured for vCenter SRM failover. You can edit this file to specify the recovery site IP settings, then run the utility again to upload the new settings to the recovery site vCenter server. For the solution, the utility was used to update the recovery site IP settings as follows: 1.
Log on to the vCenter server at the recovery site.
2.
Run the dr-ip-customizer.exe utility, and specify the name and location for the CSV file as shown.
3.
Edit the CSV file to provide the IP settings for the virtual machines at the recovery site. The following image shows the edited file for this solution.
4.
Run the utility to upload the new settings to the recovery site vCenter server as shown.
Note
If you delete or recreate a protection group, you must repeat this process to reapply the IP customizations.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
54
Configuring vCenter SRM recovery plans Recovery plans are located at the recovery site and define steps for recovering virtual machines. vCenter SRM recovery plans can use the RecoverPoint image access capability to non-disruptively test the failover process. This ensures that the secondary image is consistent and usable. Testing disaster recovery plans are critical to ensure recovery is reliable. Traditionally, this was a complex, time-consuming, and costly exercise. With vCenter SRM, you can overcome these obstacles by enabling realistic, frequent tests of recovery plans and eliminating common causes of failures during recovery. By including multiple protection groups in a single recovery plan, all of the associated virtual machines are available to recover as part of that single recovery plan. Figure 37 shows the first step in running the recovery plan.
Figure 37.
Running the recovery plan
As shown in Figure 38, the prioritized virtual machine is being recovered (powered on) before the other virtual machines contained in the same recovery plan.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
55
Figure 38.
Prioritizing virtual machine recovery
When the failover process finished, vCenter SRM displays a summary report of the recovery, as shown in Figure 39.
Figure 39.
Failover summary report
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
56
Configuring vCenter SRM failover with RecoverPoint CLR After vCenter SRM successfully completes the recovery plan, and all systems are operational again, you must complete the following manual steps to resume full CRR replication back from the disaster recovery site to production site: 1.
Ensure that that the group is in maintenance mode and is being managed by RecoverPoint, with SRM only monitoring.
2.
Set the RecoverPoint Remote Replica copy as production on the disaster recovery site as shown.
3.
Because CDP was also present on the production site before failover, remove one of the replica data copies on the production site, as shown.
After setting the CRR copy as your production copy of the data, you are prompted to choose a copy of the data on the production site to be removed. You must then decide whether to use the production copy or CDP copy as the target for CRR, which is done by removing the unneeded copy as shown in Figure 40.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
57
Figure 40.
Removing unneeded copy of the data
As part of a RecoverPoint CLR configuration, the production site previously hosted both the production and CDP data copies. This new RecoverPoint replication configuration is CRR, so only one target copy of the data is possible on the production site. These settings do not affect the recovery of the virtual machines on the disaster recovery site. They are specific to RecoverPoint and are required for configuring new CRR relationships back to the production site. In the event that a disaster has rendered the production site unreachable, these steps are not necessary until communications with the production site are restored. If communications with the production site is still available after the recovery, these steps can be scripted in the RecoverPoint CLI. As part of a controlled failover, these commands can be included as a post-script operation in the vCenter SRM recovery plan. The result of reconfiguring the consistency group is a straightforward RecoverPoint CRR replication from the disaster recovery site to the production site, as shown in Figure 41.
Figure 41.
Reconfiguring of consistency group
Once a failback completes, the production site resumes production, a full reconfiguration and resynchronization of the CDP copies are required. Note
It is possible to configure CDP on the disaster recovery site and remain in this configuration, if appropriate.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
58
Testing and validation Test objectives
Testing of this solution validates the functionality of RecoverPoint, Replication Manager, vCenter SRM, and Data Protection Advisor when run as a combined protection suite that allows local, remote, and a combination of local and remote protection.
Notes
Benchmark results are highly dependent upon workload, specific application requirements, and system design and implementation. Relative system performance will vary as a result of these and other factors. Therefore, this workload should not be used as a substitute for a specific customer application benchmark when critical capacity planning and/or product evaluation decisions are contemplated. All performance data contained in this report was obtained in a rigorously controlled environment. Results obtained in other operating environments may vary significantly. EMC Corporation does not warrant or represent that a user can or will achieve similar performance expressed in transactions per minute.
Testing methodology
Testing methodology required TPCE-like (OLTP) and TPCH-like (DW) type workloads to be run against four target databases, two OLTP and two Data Warehouse databases.
Test scenarios
EMC used a number of scenarios to test the solution. These included:
Test procedures
•
Baseline testing with no replication in place
•
Local protection only (CDP)
•
Remote protection only (CRR)
•
Concurrent local and remote protection (CLR)
Testing was conducted by running concurrent TPCE-like and TPCH-like workloads against the target databases. For testing remote protection scenarios, distance was simulated by applying a delay latency equivalent to 1 ms per 100 kilometers.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
59
Test results Overview
The results presented in this section show a baseline for the environment prior to any replication being introduced; local replication is introduced, then remote replication, after which the two are combined to provide both local and remote replication at the same time. Workload consisted of two OLTP databases running a TPC-E-like load, and two Data Warehouse databases running a TPC-H-like workload. During testing, latencies were monitored to show performance on the largest 75,000 user database SQL1. Both OLTP databases SQL1 and SQL2 were given priority during the replication process with their database data log file consistency group resource allocation policies being set to Critical.
Baseline
Figure 42 shows the baseline results for OLTP database SQL1, workload was generated against the database and SQL Server logical disk perfmon counters Avg. Disk/sec Read and Avg. Disk/sec Write monitored to gauge a baseline result. All workload setting were then recorded and consistent used again during subsequent testing.
Figure 42.
Baseline results for OLTP database SQL1
For data, Avg. Disk/sec read is 10 milliseconds, Avg. Disk/sec write is 5 milliseconds, and Log file has an Avg. Disk/sec write of 2 milliseconds. These results are accordance with current SQL Server best practices. RecoverPoint compression
RecoverPoint compression algorithms and bandwidth policy management reduce congestion of IP or FC links between sites. Figure 43 shows the RecoverPoint WAN optimization test results.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
60
Figure 43.
Optimizing WAN with RecoverPoint
In this solution, EMC observed an average combined write rate of 177.9 megabits (Mb). This breaks down as: •
OS and PageFile 3.6 Mb
•
SQL System DBs and SQL TempDBs 151 Mb
•
User Databases 23.4 Mb
The percentage of the total site writes being replicated is: •
OS and PageFile makes up 2 percent
•
SQL System DBs and SQL TempDBs makes up 85 percent
•
User Databases makes up 13 percent
TempDB databases considerations User databases, which represent critical data, only accounts for 13 percent of replicated data. This contrasts heavily with 85 percent for SQL System and TempDBs. Investigation revealed this consisted primarily of TempDB activity from the two data warehouse SQL Server Instances. TempDB can be classed as expendable data as the TempDB is recreated whenever the SQL Server Instance is restarted. This highlights a consideration that not all volumes may need to be included in the replication process if the environment has a limited inter site link. In this solution, EMC had a workaround for this type of scenario: bring the whole virtual machine up after failover as not replicating the TempDB would require additional manual steps during the recovery process. After registering the virtual machine at the remote site, you would then need to edit the virtual machine setting to attach the TempDB VMDK file before powering up. RecoverPoint’s ability to assign resource allocations to our consistency groups allow EMC to assign the OS and PageFile consistency group as being normal priority, the SQL System and TempDBs consistency group as having low priority and our user database consistency group priority being marked as critical and ensure that the critical data, the user databases are given priority.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
61
Looking at the total write rate of the site compared with the data being replicated, an average compression ratio of 4:1 was seen as 177.9 Mb is compressed to just 43 Mb. This highlights the RPA’s ability through compression algorithms and bandwidth policy management to reduce the data being replicated. RecoverPoint 3.4 software introduces the ability to perform deduplication within the replication process which may further reduce this ratio. Local replication (CDP)
RecoverPoint replication is introduced to the environment to provide local protection. RPAs provide synchronous replication, where the array-based splitter splits all the writes and simultaneously sends them to the RecoverPoint appliance and to the production volumes. In Figure 44, results show negligible impact to disk latencies on the environment; even though RecoverPoint is maintaining full local protection, Avg. Disk/sec read and Avg. Disk/sec write remain consistent.
Figure 44. Remote replication (CRR)
CDP disk latencies
The environment was reconfigured to provide remote protection with RecoverPoint CRR running in asynchronous mode. To test latency across varying distance, a distance emulator was introduced to the environment, the device being configured to provide a delay latency of 0 milliseconds or zero distance, with distance in controlled increments to 64 milliseconds, equivalent to 6,400 kilometers. Figure 45 shows how a rise to 11 milliseconds is seen in Avg. Disk/sec read for data as remote replication is introduced with a configured delay of 1 millisecond. This fluctuation disappears as you increase delay through 4, 8, 16, 32, 64 milliseconds.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
62
Response times for the database log file remains consistent at 2 millisecond Avg. Disk/sec write.
Figure 45. Note
Concurrent local and remote data protection (CLR)
CRR disk latencies
Synchronous replication during remote replication is possible, depending on WAN bandwidth and the level of data being replicated. Generally, local replication is configured in a synchronous mode and remote replication is configured in asynchronous mode.
Next, the RPAs were configured to CLR, which provides local and remote data protection, with local protection using synchronous mode, and remote protection using asynchronous mode. Figure 46 shows how data Avg. Disk/sec reads remain consistent, with a fluctuating rise of 1 millisecond, which is negligible for Microsoft SQL application latencies.
Figure 46.
RecoverPoint CLR data protection enabled
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
63
Log Avg. Disk/sec writes do increase. In Figure 46, you can see an increase of 1 millisecond with zero distance configured. This increases to a peak of 5 milliseconds at a configured delay distance of 5 milliseconds (500 km), before dropping to remain consistent at 3 milliseconds; the delay is increased through to 64 milliseconds (6,400 km). This highlights the RPA’s ability through innovative compression algorithms and intelligent bandwidth policy management to react to increases in delay latency. With resource allocation policies set to critical for data and log volumes, RecoverPoint gives priority to this critical data, which helps maintain application performance. vCenter SRM
In Table 11, you can see the latency in milliseconds and the distance in kilometers. These are the values EMC used to test vCenter SRM failover using a distance simulator. You can see the times taken to fail over the virtual machines. The recovery plan consisted of the four SQL Server virtual machines (TPCH1, TPCH2, TPCE1, TPCE2). The failover was performed while the virtual machines were under load. Table 11.
SQL Server production environment failover times
Latency
Distance
Site failover time
0 ms
0 km
10 minutes 12 seconds
4 ms
800 km
10 minutes 34 seconds
16 ms
3,200 km
10 minutes 39 seconds
Comparison
The challenge of the white paper is to show RecoverPoint functionality on a given environment. If adequate resources and allocation priorities are correctly configured, then the effect of introducing RecoverPoint into the environment will be minimized. During testing, the main (critical) databases, SQL1 and SQL2, saw comparatively little change in latency. This was mainly due to the fact LUNS were sized correctly and appropriate spindles counts allocated to handle the workloads applied to this environment. RecoverPoint’s ability to set allocation priority for consistency groups also ensured that critical data was given priority, which allows latencies for the critical databases to be maintained.
Summary of test results
Testing showed RecoverPoint’s ability to prioritize the replication of data based on the resource allocation settings assigned to consistency groups. This ability to prioritize the replication allows DBAs to maintain latency and response times for critical applications while at the same time providing a highly granular level of recovery should a disaster to production data or the production site occur.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
64
Conclusion Summary
This white paper details a continuous data protection solution for virtualized Microsoft SQL Server 2008 R2 environments, and uses EMC RecoverPoint and VMWare vCenter Site Recovery Manager for automated failover to the disaster recovery site, with EMC Replication Manager to restore crash and application consistent copies of the database. In a SQL Server environment, RecoverPoint replication technology provides a granular point-in-time crash-consistent recovery of a SQL Server database. Consistent recovery points are maintained by using RecoverPoint’s sophisticated journaling technology. This allows for DVR-like any-point-in-time recovery of SQL Server databases. Replication Manager can use the SQL Server’s VDI technology to schedule and run jobs to achieve application consistent copies of SQL Server databases. In the event of a production site disaster, vCenter Site Recovery Manager ensures reliability by guaranteeing that a predefined and consistent recovery process is followed, which also eliminates complex manual recovery steps and enables nondisruptive testing of recovery plans.
Findings
The key findings of the solution include: •
RecoverPoint offers an application-agnostic replication solution, which provides DVR-like recovery for databases such as Microsoft SQL Server, operating systems, and virtual machines. RecoverPoint also integrates with VMware ESX Server and vCenter Site Recovery Manager to provide support for both public and private cloud environments.
•
RecoverPoint allows you to define protection RPO policies based on the criticality of your applications. Furthermore, you can use WAN optimization features, such as compression and deduplication, to reduce the amount of replicated data sent across the network. It has a small footprint for the four RecoverPoint appliances used in this solution to deliver easy integration for recovering a SQL Server.
•
Replication Manager delivers automated replica management with application consistency for assured recovery as well as the ability to create replicas of VMFS datastores.
•
Data Protection Advisor for replication analysis provides visibility and analysis of all recovery points in your environment, which means that you know you can always recover your applications.
•
vCenter Site Recovery Manager provides automated disaster recovery for fast and efficient recovery of critical applications such as SQL Server, by simplifying recovery and eliminating human error from the process.
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
65
References White papers
Product documentation
Other documentation
For additional information, see the white papers listed below. •
Improving VMware Disaster Recovery with EMC RecoverPoint—Applied Technology
•
Using EMC RecoverPoint Concurrent Local and Remote for Operational and Disaster Recovery—Applied Technology
•
EMC Business Continuity for VMware vSphere 4 Enabled by EMC RecoverPoint, Replication Manager, and VMware vCenter Site Recovery Manager—A Detail Review
For additional information, see the product documents listed below. •
EMC RecoverPoint Site Recovery Manager Failback Plug-in—Technical Notes
•
EMC Replication Manager Administration Guide
•
EMC RecoverPoint Administration Guide
•
EMC Data Protection Advisor Administration Guide
For additional information, see the VMware documents listed below. •
Getting Started with VMware vCenter Site Recovery Manager 4.0 and Later
•
VMware vCenter Site Recovery Manager Administration Guide
Continuous Data Protection for Microsoft SQL Server 2008 R2 Enabled by EMC RecoverPoint, EMC Replication Manager, and VMware
66