PERFORMANCE OF VMWARE VSPHERE HOSTS ..... providing advanced multipathing capabilities including dynamic load balancing
White Paper
TUNING THE VMWARE NATIVE MULTIPATHING PERFORMANCE OF VMWARE VSPHERE HOSTS CONNECTED TO EMC SYMMETRIX STORAGE
Abstract This white paper presents the results from tests conducted to understand the performance-enhancing features of VMware® Native Multipathing (NMP) technology when utilized with EMC® Symmetrix® storage. The performance characteristics of VMware Native Multipathing technology with the Round Robin path selection policy are analyzed for various types of workloads on the EMC Symmetrix VMAX™ storage system. November 2010
Copyright © 2010 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate of its publication date. The information is subject to change without notice. The information in this publication is provided “as is”. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. VMware, ESX, vCenter, and vSphere are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions. All other trademarks used herein are the property of their respective owners. Part Number h8119
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
2
Table of Contents Executive summary.................................................................................................. 4 Purpose .............................................................................................................................. 4 Audience ............................................................................................................................ 4
Multipathing and load balancing with the Pluggable Storage Architecture ................. 5 Native Multipathing Plug-in ................................................................................................ 5 NMP path selection profiles................................................................................................ 6 Managing NMP in vSphere Client ........................................................................................ 9
VMware Native Multipathing performance analysis ................................................ 10 Key components ............................................................................................................... 11
Workload simulation software ............................................................................... 11 Physical architecture ............................................................................................. 11 Architecture diagram ........................................................................................................ 11
Environment profile ............................................................................................... 12 Hardware resources.......................................................................................................... 12 Virtual machine setup ...................................................................................................... 13 Symmetrix device information .......................................................................................... 15
Software resources ................................................................................................ 15 Test design and validation ..................................................................................... 15 Test plan .......................................................................................................................... 16 Test parameters ............................................................................................................... 16
VMware Native Multipathing performance analysis tests and results ...................... 17 Device configuration......................................................................................................... 17 Test steps ......................................................................................................................... 19
Results.................................................................................................................. 20 Conclusion ............................................................................................................ 33 Findings ........................................................................................................................... 33 Recommendations ........................................................................................................... 33
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
3
Executive summary The VMware vSphere™ platform provides a generic multipathing plug-in (MPP), called Native Multipathing Plug-in (NMP), which, among many other important functions, processes I/O requests to logical devices (LUN) by selecting an optimal physical path for the request. The path selection is based on the policy that is set for that particular logical device. Four policies influence the I/O path that NMP selects: Most Recently Used (MRU )– Selects the path most recently used to send I/O to a device Round Robin (RR) – Uses the MRU target selection policy and any HBA selection policy to select paths Fixed – Uses one path, the active path Custom – Sets the LUN to expect a custom policy The Round Robin I/O selection policy guarantees that all paths to a LUN are used. Customers looking to efficiently utilize the resources between the ESX® host and the LUN use the Round Robin policy. The Round Robin policy has an I/O operation limit parameter that determines the number of I/Os that go down each path before switching to the next path. The default value of this parameter is 1000 and can be tuned. Customers can achieve the optimal I/O throughput and response times for different types of workloads by adjusting the value of the Round Robin I/O parameter. Combined with the other features of NMP such as the ability to handle failure and request retries, physical path management, and others, the RR path selection policy can be very effective in providing a highly resilient, available, and efficient multipath environment.
Purpose This white paper evaluates the impact of tuning the Round Robin I/O parameter on various workloads imposed on the EMC® Symmetrix VMAX™ storage system and provides best practices recommendation for selecting the value of the Round Robin I/O parameter for the evaluated workloads.
Audience This white paper is intended for VMware administrators, server administrators, and storage administrators responsible for creating, managing, and provisioning a VMware® vSphere 4.x environment on EMC Symmetrix®. The white paper assumes the reader is familiar with VMware technology, EMC Symmetrix, and related software.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
4
Multipathing and load balancing with the Pluggable Storage Architecture To ensure maximum storage resource availability and performance a data center infrastructure must: Provide multiple physical data paths between the server and the storage resources Allow path rerouting around problems such as failed components Balance traffic loads across multiple physical paths To maintain a constant connection between a virtualized host and its storage, a technique called multipathing is used. Multipathing maintains more than one physical path for data between the host and storage device. If any element in the SAN such as an adapter, switch, or cable fails, the virtualized server host can switch to another physical path that does not use the failed component. The process of path switching to avoid failed components is known as path failover. In addition to path failover, multipathing provides load balancing. Load balancing is the process of distributing workloads across multiple physical paths to reduce or remove potential traffic bottlenecks. The ability to dynamically multipath and load balance through the use of the native or third-party storage vendor multipathing software is a feature of VMware vSphere. The Pluggable Storage Architecture (PSA) is a modular storage construct that allows storage partners (such as EMC with PowerPath®/VE) to write a plug-in to best leverage the unique capabilities of their storage arrays. These modules can interface with the storage array to continuously resolve the best path selection, as well as make use of redundant paths to greatly increase performance and reliability of I/O from the ESX host to storage.
Native Multipathing Plug-in In absence of storage partner-provided path management software, the Native Multipathing (NMP) driver supplied by VMware provides multipathing and load balancing of I/O. NMP can be configured to support basic failover, Most Recently Used, and Round Robin configurations. The Round Robin policy, unlike in VMware Infrastructure 3, is now fully supported in vSphere 4.x, and is the recommended NMP setting for Symmetrix devices. NMP and other multipathing plug-ins are designed to be able to coexist on an ESX host; nevertheless, multiple plug-ins cannot manage the same device simultaneously. To address this, VMware created the concept of claim rules. Claim rules are used to assign storage devices to the proper MPP. When an ESX host boots or performs a rescan, the ESX host discovers all physical paths to the storage devices visible to the host. Using the claim rules configuration file, the ESX host determines which multipathing module will be responsible for managing a specific storage device. Claim rules are numbered. For each physical path, the ESX host processes the claim rules starting with the lowest number first. The attributes of the physical path are compared with the path specification in the claim rule. If there
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
5
is a match, the ESX host assigns the MPP specified in the claim rule to manage the physical path. This assignment process continues until all physical paths are claimed by an MPP. Figure 1 has a sample claim rules list.
Figure 1. Default claim rules with NMP The figure shows the default rule set created after initial installation of ESX. Rules are applied in numerical order—meaning rule 10 has a higher priority and is applied before rule 11. Rule 65535 as shown in Figure 1 cannot be removed and effectively claims all devices for NMP if no other options exist. If changes to the claim rules are needed after installation, devices can be manually unclaimed and the rules can be reloaded without a reboot as long as the device is not open for I/O. It is a best practice to make claim rule changes after installation but before the immediate postinstallation reboot. An administrator can choose to modify the claim rules in order to have NMP manage EMC or non-EMC devices. It is important to note that after initial installation of a MPP, claim rules do not go into effect until after the vSphere host is rebooted following the installation. For instructions on changing claim rules, consult the VMware vSphere 4 SAN Configuration Guide available from VMware.com.
NMP path selection profiles VMware NMP supports three path selection policies (PSP) by default. Figure 2, Figure 3, and Figure 4 show the path selection policies as, respectively, Most Recently Used (MRU), Fixed, and Round Robin (RR).
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
6
Most Recently Used - Selects the first working path, discovered at system boot time. If this path becomes unavailable, the ESX host switches to an alternative path and continues to use the new path while it is available. This is the default policy for Logical Unit Numbers (LUNs) presented from an active/passive array. ESX does not return to the previous path if, or when, it returns; it remains on the working path until it, for any reason, fails.
Figure 2. Most Recently Used path selection policy
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
7
Fixed - Uses the designated preferred path flag, if it has been configured. Otherwise, it uses the first working path discovered at system boot time. If the ESX host cannot use the preferred path or it becomes unavailable, ESX selects an alternative available path. The ESX host automatically returns to the previously defined preferred path as soon as it becomes available again. This is the default policy for LUNs presented from an active/active storage array.
Figure 3. Fixed path selection policy
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
8
Round Robin - Uses an automatic path selection rotating through all available paths, enabling the distribution of load across the configured paths. For active/passive storage arrays, only the paths to the active controller will used in the Round Robin policy. For active/active storage arrays, all paths will used in the Round Robin policy. This can be seen in Figure 4 in which all paths to the Symmetrix devices have been marked as active and handling I/O.
Figure 4. Round Robin path selection policy
Managing NMP in vSphere Client Claim rules and claiming operations must all be done through the CLI, but the ability to choose the NMP multipathing policy can be performed in the vSphere Client. By default, Symmetrix devices being managed by NMP will be set to the policy of “fixed.” This policy is not optimal and does not take advantage of the ability of the Symmetrix array to actively process I/Os from multiple paths to devices, thus leading to unused, wasted resources. Therefore, EMC recommends setting the NMP policy of all
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
9
Symmetrix devices to “Round Robin” to maximize throughput. Round Robin uses an automatic path selection rotating through all available paths and enabling the distribution of the load across those paths. The simplest way to ensure that all Symmetrix devices are set to Round Robin is to change the default policy setting from the “Fixed” policy to the “Round Robin” policy. This should be performed before presenting any Symmetrix devices to ESX to ensure that all of the devices use this policy from the beginning. This can be executed through the use of the service console or through vSphere CLI (recommended) by issuing the command: esxcli nmp satp setdefaultpsp -s VMW_SATP_SYMM
-P VMW_PSP_RR
From then on, all Symmetrix devices will be, by default, set to “Round Robin.” Alternatively, the path selection policy for each device can be manually changed in the vSphere Client as portrayed in Figure 5.
Figure 5. NMP policy selection in vSphere Client
VMware Native Multipathing performance analysis As discussed previously, VMware Native Multipathing is the native MPP that provides path management to ESX hosts. It is a built-in kernel module on the vSphere host providing advanced multipathing capabilities including dynamic load balancing and automatic failover to the vSphere hosts. As per the best practice recommendation above, the NMP path selection policy should be set to Round Robin. The NMP Round Robin path selection policy has a parameter known as the “I/O operation limit” that controls the number of I/Os sent down each path before switching to the next path. This parameter has a default value of 1000 and can be tuned to any positive non-zero value. For example, setting the value of this parameter to a value of 1 will change the behavior of NMP to switch paths after sending down 1 I/O on a path. The rest of this paper investigates and analyzes if there are any potential performance benefits in tuning the I/O operation limit parameter for the NMP Round Robin policy on various workloads.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
10
Key components For this performance analysis study VMware Native Multipathing was deployed in a virtualized environment that included: 2 ESX servers A Symmetrix VMAX system vSphere vCenter™ The ESX hosts were configured to utilize the VMware Native Multipathing software with the Round Robin path switching policy. The setup also used Iometer to generate the workload. Specific details of the hardware and software used in the test environment are discussed in the following sections.
Workload simulation software Iometer workload simulation software was used for conducting a performance analysis of VMware Native Multipathing. Iometer is an I/O generation tool that allows users to configure various parameters such as I/O block size, type of I/O (random, sequential, read, write, and so on), and direct I/Os to appropriate devices. For testing the performance characteristics of NMP our tests used Iometer to generate I/Os to the virtual disks of various virtual machines that are hosted on a 1.2 TB datastore that is mapped to a LUN on the Symmetrix VMAX storage system. For more details on Iometer and its usage please refer to the documents on www.iometer.org.
Physical architecture Architecture diagram Figure 6 depicts the overall physical architecture of the test environment.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
11
Virtual Center Management Console
ESX Host A
Fibre Channel Switch
Fibre Channel Switch
Emulex Dual port FC HBA
ESX Host B
Emulex Dual port FC HBA !!
B A
E X P P R I
!
I P E R X # P
P E R X # P
!
!!
E X P P R I
I E X P P R I
!
!!
I P E R X # P
P E R X # P
!
!!
E X P P R I
B A
I E X P P R I
!
!!
I P E R X # P
P E R X # P
!
E X P P R I
B A
!!
I E X P P R I
O NIO O F F
!
I
P E R X # P
!
P E R X # P
E X P P R I
I
O NIO O F F
O NIO O F F
!!
O NIO O F F
!!
B A
O NIO O F F
O NIO O F F
0
0
0
!!
B A
1 2 1
3 2
1
3 2 E X P P R I
3
I
!
5
P E R X # P
!
P E R X # P
E X P P R I
B A
!!
O NIO O F F
4
I E X P P R I
!
P E R X # P
P E R X # P
E X P P R I
I E X P PNO R IO I O F F
!
!!
I
P E R X # P
!
!!
P E R X # P
E X P P R I
B A
!!
I
!
!!
B A
I E X P P R I
!
!!
I
P E R X # P
!
P E R X # P
E X P P R I
!!
I
Symmetrix LUN
Figure 6. Physical architecture of test environment Each host has the same storage adapter configuration as shown in the diagram. Both hosts (A and B) were configured to use VMware NMP.
Environment profile Hardware resources The hardware used in the performance analysis environment is listed in the following table.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
12
Table 1. Hardware resources Equipment
Quantity
Configuration
EMC Symmetrix VMAX Enginuity™ version 5874
1
1 x 1.2 TB LUNs RAID 5 (3+1) (4 paths to the LUN) to host virtual disks on each VM to which I/Os were issued 1 x 1 TB LUN RAID 5 (3+1) for storing virtual machine boot disks and configuration files 4 x 4 Gb/s FC front-end ports
Dell PowerEdge 2950
2
Xeon Core 4 dual processor 16 GB RAM Emulex HBA, dual-port, 4 Gb/s
Fibre Channel switch
2
Brocade 5300 (Maximum speed of 8 Gb/s per port)
Further details about how the resources were utilized are discussed in the following section.
Virtual machine setup Figure 7 shows the virtual machine organization and allocation.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
13
VM1
STP VM
VM16
IOmeter VM
VM15
VM14
VM13
VM2
Boot Disk DataStore (1 TB)
VM12
VM3
VM4
VM11
Application Datastore (1.2 TB)
VM5
VM6
VM7
VM10
VM8
VM9
Figure 7. Layout of the virtual environment with VMs and datastores As seen in Figure 7, the environment was configured with 16 virtual machines (VM1 through VM16), each of which mapped a portion of “Application Datastore” as a virtual disk. In addition there were two virtual machines, one of which – Iometer VM – was solely used to run the Iometer GUI that connected to the workload driver component of the Iometer software, dynamo, running on all the other VMs. The other virtual machine called STP VM was created to monitor the performance of the Symmetrix VMAX system by running the storstpd daemon program. The 16 virtual machines (VM1 through VM16) were created and utilized as per the following rule: 8 VMs running Windows Server 2003 and 8 VMs running Red Hat Linux Enterprise with 4 of each assigned to ESX Host A and ESX Host B 8 VMs were assigned to use ESX Host A and 8 VMs are assigned to use ESX Host B VMware DRS was not deployed in this configuration to ensure that there is no migration of the workload from one ESX host to another during the test.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
14
Symmetrix device information In this setup two datastores were utilized by all the virtual machines: Boot Disk Datastore – This datastore was created from a Symmetrix LUN that is 1 TB in size. All virtual machine configurations and the virtual disk for booting the virtual machine resided on this datastore. Application Datastore – This 1.2 TB datastore was created from a Symmetrix LUN, which in turn is a Symmetrix concatenated metadevice with 150 GB meta members bound to a virtual provisioned device pool. Although the device is bound to a VP pool, the space is committed upfront. This ensured predictable performance while providing the benefits of wide striping offered by Virtual Provisioning™. Each VM used to generate the workload had a 75 GB virtual disk on the datastore. This architecture ensured that I/Os are spread out across evenly on the front-end and back-end devices of the Symmetrix system. The architecture and nature of the datastores described are representative of a typical datacenter environment where there are many VMs that store their virtual disks on a single datastore and allocate disk resources from a single large disk pool.
Software resources The software used in the performance analysis study is listed in the following table: Table 2. Components of the software environment Software
Version
vSphere Enterprise Plus vCenter Server Iometer Symmetrix VMAX Enginuity Symmetrix Solutions Enabler
4.0 (build 258672) 4.0.0 (build 258672) 2006.07.27 5874 7.1
Test design and validation This section outlines the test plan and implementation for the multipath performance analysis using VMware NMP. Performance tests will show that tuning the Round Robin I/O operation limit parameter for various types of workloads can improve I/O performance (I/O throughput of the VM as well as I/O response times) of a virtual machine that is running a particular workload type.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
15
Test plan As outlined above, the test bed with various components such as the virtual machines, datastores, software, and so on was created. The test plan consists of two generic steps outlined below: 1. Generate various I/O workloads, such as random small block read, and test system performance using VMware NMP, setting the I/O selection policy to Round Robin on the Application Datastore and using the default /O operation limit parameter. 2. Repeat the above step by changing the Round Robin I/O operation limit parameter to various values and find the value of the parameter that provides the optimal performance for a particular workload. Both of these steps were repeated for a variety of workloads that are representative of real world application I/Os.
Test parameters The NMP performance analysis test examined I/O throughput performance at different points of the solution stack — namely the virtual machines, ESX hosts, and the Symmetrix VMAX storage system —by tuning the Round Robin I/O operation limit parameter to different values between 1 and 1000. The following I/O parameters were changed in Iometer to generate a variety of workloads: Block size – Small Block (4K), Large Block (32K or 64K) I/O type – Read, Write, and Distribution of reads/writes in any workload Workload type – Random, Sequential, Custom (OLTP type 2) I/O bursts – Inject bursts of I/O into workloads by varying the size and number of I/Os
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
16
The resulting combinations are listed in the table. Table 3. Workload combination matrix Test type Small Block Random Read Small Block Random Write Small Block Sequential Read Small Block Sequential Write Large Block Random Read Large Block Random Read Large Block Sequential Read Large Block Sequential Write
Block size 4K
I/O type Read
Workload type Random
4K
Write
Random
4K
Read
Sequential
4K
Write
Sequential
32K/64K
Read
Random
32K/64K
Write
Random
32K/64K
Read
Sequential
32K/64K
Write
Sequential
The eight possible combinations of the parameters listed here represent the various extremes of the I/O mix that can be generated from real applications. In addition to the combinations listed a workload that is representative of the I/O pattern generated by the OLTP application was used to characterize the performance behavior of virtualized databases running on a VMware infrastructure. Furthermore, since real workloads tend to be bursty, the Iometer configuration was changed to inject various amounts of bursts into the simulated workloads cited above.
VMware Native Multipathing performance analysis tests and results This section describes the performance results of the test conditions when executed using the VMware VMkernel multipathing plug-in and the default VMware NMP.
Device configuration The device configuration is shown in the following figure (Figure 8). Note that by default, VMware NMP will be the owner of all devices if there is no third-party MPP such as EMC PowerPath/VE installed.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
17
Figure 8. Path information of LUNs mapped to an ESX host The default path selection policy (PSP) for the Symmetrix storage array type plug-in (SATP) is fixed. The PSP for the Symmetrix SATP can be changed to the Round Robin policy. When this change is performed, VMware NMP channels I/Os in a Round Robin fashion across all active paths. However, the frequency of the path switching is controlled by the parameter, I/O operation limit, which, by default, is set to 1000. That value can be changed with the following commands:
esxcli nmp device setpolicy --device --psp VMW_PSP_RR esxcli nmp roundrobin setconfig --device --iops 1 --type iops The results after changing the parameter value to 1 are shown in Figure 9.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
18
Figure 9. CLI output display reflecting the Round Robin I/O parameter set to 1 The following performance tests use the Round Robin path selection profile and tune the I/O operation limit parameter to various values between 1 and 1000 to determine a value that yields optimal performance.
Test steps The following table illustrates the steps used when conducting the tests.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
19
Table 4. High-level test procedure Step Action Run the Iometer GUI program on the Iometer VM and connect to the Iometer (dynamo) programs on all 16 VMs by running the dynamo program on all VMs.
1
In the Iometer GUI client define the workload by setting the following parameters: I/O Block Size – 4K I/O distribution – 100% Random Read I/O burst – If generating bursts of I/Os, define the number of I/Os per burst and the block size Select the appropriate VM disks in the Iometer GUI and assign the workload definition to the disks. Set the amount of time the test runs to 1 hour. Initiate the test in Iometer and record the results.
2
Record performance data of the ESX server. Run ESXTOP. Collect esxtop data periodically by running esxtop in batch mode with a collection duration of 30 seconds and 720 iterations – ‘esxtop –b –d30 –n720’ Record performance data of the Symmetrix using storstpd (the Symmetrix STP daemon).
3
1. Run storstpd. 2. Collect performance data of the Symmetrix such as Front End Utilization, Back End Utilization, and I/O throughput. Collect and record all results.
4
Collect the IOPS throughput and response time metrics from Iometer, ESXTOP, and the Symmetrix storstpd daemon. It is important to note that this procedure is an illustration of the test with a 4K block size and 100% random read distribution. The same procedure was repeated for all types of workloads discussed in Table 3 on page 17. It should also be noted that all the virtual machines were dedicated to running the Iometer tests and did not have any other applications running. The procedure above was applied to the following workloads by setting the Round Robin I/O operation limit parameter to values of 1, 20, 50, 300, and 1000 (default value) using the procedure mentioned in the Device configuration section measuring both the I/O throughput and response times.
Results This section discusses the results of testing performed on various workloads listed in the workload described previously. The results are represented mostly as bar charts
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
20
and line charts with each bar/line in the chart representing an IOPS throughput value or response time or utilization rate for a particular value of the NMP I/O operation limit parameter. The bar chart categorizes the output in three ways – VM output (IOPS measured at the VM level), ESX output (IOPS measured at the ESX host level), and VMAX output (IOPS measured using tools such as STP). Further, the I/O throughput values represented by the bars are relative, that is, they are not the actual measured absolute numbers and are only useful for relative comparison of the performance attributes at various values of the NMP parameter. The baseline for the comparison is the value of the output (either IOPS or response time) when the I/O operation limit parameter is at its default value of 1000 and this value is always 1, which implies that all the results below are normalized to a value of 1. For example, in Figure 10, the IOPS throughput from Iometer is 1 for an I/O operation limit parameter value of 1000 (which is the default), whereas it is 0.98 for a value of 20 – which implies that IOPS throughput at a value of 20 is degraded by 2.04 percent as compared to the number of IOPS obtained at a value of 1000. Similarly, in Figure 11, the response time at 1000 is 1, whereas it is 1.02 at a value of 20, which implies that the response time is 2 percent better at a value of 1000 compared with a value of 20. It is important to note that the results are very specific to the test environment outlined previously. The results can vary depending on the user’s specific environment and type of workloads along with other applications and constraints that may exist in such an environment. Small Block Random Read I/O - The results from the small block (4K) random read (100 percent random) I/O test are shown in Figure 10 and Figure 11. It can be seen from Figure 10 that the total amount of I/Os observed while measuring the value from the VM, ESX hosts, and VMAX are in agreement. This is an indication of the validity of the measurement tools and utilities that were used for the analysis. It can also be seen from the graphs that tuning the NMP I/O operation limit parameter to various values does not yield significant performance improvements in measured I/O throughput and is within a margin of +/- 5 percent from what was measured with the default value of 1000. Further, it can be seen that the I/O response times are also within a margin of +/- 5 percent with the best response time observed at an NMP I/O operation limit value of 1. Therefore, it can be safely concluded that in vSphere environments that are subject to predominantly small block random read workloads, changing the I/O switch parameter for tuning the behavior of the Round Robin I/O operation limit parameter has no significant impact on the performance characteristics of the system. In addition to the regular small block random I/O, to simulate real application workloads, I/O bursts were injected from four of the 16 VMs. The difference in the performance characteristics (not shown in the figures) with or without bursts was minimal.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
21
Small Block (4K) Random Read I/O 1.030 1.020
Normalized IOPS
1.010 1.000
NMP 1000 NMP 1
0.990
NMP 20 0.980
NMP 50 NMP 300
0.970 0.960 0.950 0.940 Iometer IOPS per IOMeter total VM IOPS throughput
ESXTOP IOPS
VMAX Front End IOPS
IOPS Category
Figure 10. IOPS throughput for Small Block Random Read I/O Small Block (4K) Random Read I/O Response Times 1.0250 1.0200
Normalized Response Time
1.0150 1.0100
NMP 1000 NMP 1 NMP 20
1.0050
NMP 50 NMP 300
1.0000 0.9950 0.9900 0.9850 Response Time
Figure 11. Response times for Small Block Random Read I/O Small Block Random Write I/O - The results from the small block (4K) random write (100 percent random) I/O test are shown in Figure 12. From the graphs it is clear that tuning the NMP I/O operation limit parameter to various values does not yield significant performance improvements in measured I/O throughput and is within a margin of +/- 8 percent from the default value (1000) for the I/O operation limit parameter. The I/O response times, although not shown, were also within a similar margin (+/- 8 percent) with the best response time observed at an NMP I/O operation limit value of 1. Therefore, it can be safely concluded that in vSphere environments
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
22
that are subject to predominantly small block random write workloads, changing the I/O switch parameter for tuning the behavior of the I/O operation limit parameter has no significant impact on the performance characteristics of the system. Small Block (4K) Random Write I/O 1.100
Normalized IOPS
1.050
NMP 1000
1.000
NMP 1 NMP 20 NMP 50
0.950
NMP 300
0.900
0.850 Iometer IOPS per IOMeter total VM IOPS throughput
ESXTOP IOPS
VMAX Front End IOPS
IOPS Category
Figure 12. IOPS throughput for Small Block Random Write I/O Small Block Sequential Read I/O - The results from the small block (4K) sequential read (100 percent sequential) I/O test are shown in Figure 13, Figure 14, and Figure 15. It can be seen from the graphs that tuning the I/O operation limit parameter to various values yields significant performance improvements in measured I/O throughput. Although not presented as a graph, the I/O response times were also generally improved by tuning the parameter. From the results it is clear that tuning the I/O operation limit parameter to a value of 1 yields optimal performance and response times. However, it should be noted that the utilization rates of the Symmetrix front-end CPUs and the ESX host CPU increase when the parameter is tuned to a value of 1. Figure 15 shows the Symmetrix Fibre Channel port utilization at various values for the I/O operation limit parameter. It is clear that the utilization rate is significantly increased when the parameter is set to 1. In real customer environments, the increased utilization would translate to additional useful work performed by the applications running inside the virtual machines. For example, if a Microsoft SQL Server instance is hosted by the virtual machine, the additional I/O workload would be generated by the processing of additional client connections. Nonetheless, it is important to note that if the environment has many applications that are all running on the same ESX host and accessing the Symmetrix the performance of all the applications may be limited by the resources available on both the ESX host and the Symmetrix VMAX.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
23
Small Block (4K) Sequential Read IO 2.500
Normalized IOPS
2.000
NMP 1000
1.500
NMP 1 NMP 20 NMP 50
1.000
NMP 300
0.500
0.000 Iometer IOPS per IOMeter total VM IOPS throughput
ESXTOP IOPS
VMAX Front End IOPS
IOPS Category
Figure 13. IOPS throughput for Small Block Sequential Read I/O Small Block (4K) Sequential Read I/O Response Times 1.200
Normalized Response Time
1.000
0.800 NMP 1000 NMP 1 0.600
NMP 20 NMP 50 NMP 300
0.400
0.200
0.000 Response Time
Figure 14. Response times for Small Block Sequential Read I/O
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
24
Symmetrix VMAX Front End CPU Utilization for Small Block (4K) Sequential Read I/O 2.500
Normalized % Utilization
2.000
NMP 1000
1.500
NMP 1 NMP 20 NMP 50
1.000
NMP 300
0.500
0.000 Utilization
Figure 15. Symmetrix VMAX front-end CPU utilization for small block sequential read I/O Small Block Sequential Write I/O – The results from the small block (4K) sequential write (100 percent sequential) I/O test are shown in Figure 16. From the graph it is clear that tuning the I/O operation limit parameter to various values yields significant performance improvements in measured I/O throughput. The I/O response times (not shown) were also generally improved by tuning the parameter. From the results it is clear that tuning the I/O operation limit parameter to a value of 1 yields optimal performance and response times. However, it should be noted that the Symmetrix front-end utilization and the ESX host CPU utilization both increase when the parameter is tuned to a value of 1. In real customer environments, the increased utilization would translate to additional useful work performed by the applications running inside the virtual machines. For example, if a Microsoft SQL Server instance is hosted by the virtual machine, the additional I/O workload would be generated by the processing of additional client connections. Nonetheless, it is important to note that if the environment has many applications that are all running on the same ESX host and accessing the Symmetrix the performance of all the applications may be limited by the resources available on both the ESX host and the Symmetrix VMAX.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
25
Small Block (4K) Sequential Write IO 2.500
Normalized IOPS
2.000
NMP 1000
1.500
NMP 1 NMP 20 NMP 50
1.000
NMP 300
0.500
0.000 Iometer IOPS per IOMeter total VM IOPS throughput
ESXTOP IOPS
VMAX Front End IOPS
IOPS Category
Figure 16. IOPS throughput for Small Block Sequential Write I/O Large Block Random Read I/O – The results from the large block (64K) random read (100 percent random) I/O test are shown in Figure 17. From the figure, it can be seen that tuning the I/O operation limit parameter to various values does not yield significant performance improvements in measured I/O throughput and is within a margin of +/- 8 percent from the default value of 1000. Further, it was also observed that the I/O response times, although not presented in this document, were also within a margin of +/- 8 percent with the best response time observed at an NMP I/O operation limit value of 1. Therefore, it can be safely concluded that in vSphere environments that are subject to predominantly large block random read workloads, changing the I/O operation limit parameter for tuning the behavior of the Round Robin I/O operation limit parameter has no significant impact on the performance characteristics of the system.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
26
Large Block (64K) Random Read I/O 1.100 1.080
Normalized IOPS
1.060 1.040
NMP 1000 NMP 1
1.020
NMP 20 1.000
NMP 50 NMP 300
0.980 0.960 0.940 0.920 Iometer IOPS per IOMeter total VM IOPS throughput
ESXTOP IOPS
VMAX Front End IOPS
IOPS Category
Figure 17. IOPS throughput for Large Block Random Read I/O Large Block Random Write I/O – The results from the large block (64K) random read (100 percent random) I/O test are shown in Figure 18. From the figure, it can be seen that tuning the I/O operation limit parameter to various values does not yield significant performance improvements in measured I/O throughput and is within a margin of +/- 5 percent from the default value of 1000. Further, even though not shown as a bar chart, it was also observed that the I/O response times were also within a margin of +/- 5 percent with the best response time observed at an NMP I/O operation limit value of 1. Therefore, it can be safely concluded that in vSphere environments that are subject to predominantly large block random write workloads, changing the I/O operation limit parameter for tuning the behavior of the Round Robin path switching policy has no significant impact on the performance characteristics of the system.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
27
Large Block (64K) Random Write I/O 1.040
1.030
Normalized IOPS
1.020 NMP 1000 1.010
NMP 1 NMP 20 NMP 50
1.000
NMP 300 0.990
0.980
0.970 Iometer IOPS per IOMeter total VM IOPS throughput
ESXTOP IOPS
VMAX Front End IOPS
IOPS Category
Figure 18. IOPS throughput for Large Block Random Write I/O Large Block Sequential Read I/O – The results from the large block (32K) sequential read (100 percent sequential) I/O test are shown in Figure 19. It can also be seen from the graphs that tuning the I/O operation limit parameter to various values yields significant performance improvements in measured I/O throughput. The I/O response times (not shown) were also observed to have improved by tuning the parameter. The same behavior was observed when the block size was changed to 64K. From the results it is clear that tuning the I/O operation limit parameter to a value of 1 yields optimal performance and response times. However, it should be noted that the Symmetrix front-end utilization and the ESX host CPU utilization both increase when the parameter is tuned to a value of 1. In real customer environments, the increased utilization would translate to additional useful work performed by the applications running inside the virtual machines. For example, if a Microsoft SQL Server instance is hosted by the virtual machine, the additional I/O workload would be generated by the processing of additional client connections. Nonetheless, it is important to note that if the environment has many applications that are all running on the same ESX host and accessing the Symmetrix the performance of all the applications may be limited by the resources available on both the ESX host and the Symmetrix VMAX.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
28
Large Block (32K) Sequential Read I/O 1.800 1.600
Normalized IOPS
1.400 1.200
NMP 1000 NMP 1
1.000
NMP 20 0.800
NMP 50 NMP 300
0.600 0.400 0.200 0.000 Iometer IOPS per IOMeter total VM IOPS throughput
ESXTOP IOPS
VMAX Front End IOPS
IOPS Category
Figure 19. IOPS throughput for Large Block Sequential Read I/O Large Block Sequential Read I/O with I/O Bursts– The results from the large block (64K) sequential read (100 percent sequential) I/O with injected bursts are shown in Figure 20. Bursts are very common in VMware environments and for these tests we simulated bursts of 10 I/Os and 100 I/Os from four VMs in the environment. It can be seen from the graphs that tuning the I/O operation limit parameter to various values yields significant performance improvements in measured I/O throughput. The I/O response times (not shown) were also generally improved by tuning the parameter. From the results it is clear that tuning the I/O operation limit parameter to a value of 1 yields optimal performance and response times. However, it should be noted that the Symmetrix front-end utilization and the ESX host CPU utilization both increase when the parameter is tuned to a value of 1. In real customer environments, the increased utilization would translate to additional useful work performed by the applications running inside the virtual machines. For example, if a Microsoft SQL Server instance is hosted by the virtual machine, the additional I/O workload would be generated by the processing of additional client connections. Nonetheless, it is important to note that if the environment has many applications that are all running on the same ESX host and accessing the Symmetrix the performance of all the applications may be limited by the resources available on both the ESX host and the Symmetrix VMAX.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
29
Large Block (64K) Sequential Read I/O with Bursts of Size 10 1.800 1.600
Normalized IOPS
1.400 1.200
NMP 1000
1.000
NMP 1 NMP 20
0.800
NMP 50
0.600
NMP 300
0.400 0.200 0.000 Iometer IOPS per VM
IOMeter total IOPS throughput
ESXTOP IOPS VMAX Front End IOPS
IOPS Category
Large Block (64K) Sequential Read I/O with Bursts of Size 100 1.800 1.600
Normalized IOPS
1.400 1.200
NMP 1000 NMP 1
1.000
NMP 20 0.800
NMP 50 NMP 300
0.600 0.400 0.200 0.000 Iometer IOPS per IOMeter total VM IOPS throughput
ESXTOP IOPS
VMAX Front End IOPS
IOPS Category
Figure 20. IOPS throughput for Large Block Sequential Read I/O with Bursts Large Block Sequential Write I/O – The results from the large block (64K) sequential write (100 percent sequential) I/O with injected bursts are shown in Figure 21. It can be seen from the graphs that tuning the I/O operation limit parameter to various values yields significant performance improvements in measured I/O throughput. The I/O response times, although not shown, were also improved by tuning the parameter. From the results it is clear that tuning the I/O operation limit parameter to a value of 1 yields optimal performance and response times. However, it should be
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
30
noted that the Symmetrix front-end utilization and the ESX host CPU utilization both increase when the parameter is tuned to a value of 1. In real customer environments, the increased utilization would translate to additional useful work performed by the applications running inside the virtual machines. For example, if a Microsoft SQL Server instance is hosted by the virtual machine, the additional I/O workload would be generated by the processing of additional client connections. Nonetheless, it is important to note that if the environment has many applications that are all running on the same ESX host and accessing the Symmetrix the performance of all the applications may be limited by the resources available on both the ESX host and the Symmetrix VMAX. Large Block (64K) Sequential Write I/O 2.000 1.800 1.600
Normalized IOPS
1.400 NMP 1000
1.200
NMP 1
1.000
NMP 20 NMP 50
0.800
NMP 300
0.600 0.400 0.200 0.000 Iometer IOPS per IOMeter total VM IOPS throughput
ESXTOP IOPS
VMAX Front End IOPS
IOPS Category
Figure 21. IOPS throughput for Large Block Sequential Write I/O OLTP Type 2 Workload –This test is representative of an OLTP type 2 workload that is the same workload pattern as generated by transaction processing systems running applications such as Oracle. The workload has the following composition: 65 percent Random Reads with a 8K block size 15 percent Random Writes with a 8K block size 10 percent Sequential Reads with a 64K block size 10 percent Sequential Write with a 64K block size This test mixes the sequence of the workloads above so that each VM generates the same workload but in different order and measures the performance differences adjusting the Round Robin I/O operation limit parameter. The results from the test are shown in Figure 22 and Figure 23. The results indicate that tuning the I/O operation limit parameter to a value of 1 yields the optimal response time as well as optimal throughput. However, the improvement in performance from the default value of 1000 is only marginal and within +/- 7 percent. This result is not surprising considering that the major component of the I/O workload in an OLTP application is random in nature.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
31
OLTP2 Type I/O Workload 1.100
Normalized IOPS
1.050
NMP 1000
1.000
NMP 1 NMP 20 NMP 50
0.950
NMP 300
0.900
0.850 Iometer IOPS per IOMeter total VM IOPS throughput
ESXTOP IOPS
VMAX Front End IOPS
IOPS Category
Figure 22. IOPS throughput for an OLTP type 2 I/O workload OLTP2 I/O Response Times 1.020
Normalized Response time
1.000
0.980 NMP 1000 NMP 1 NMP 20
0.960
NMP 50 NMP 300 0.940
0.920
0.900 Response Time
Figure 23. Response times for OLTP2 workloads
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
32
Conclusion Performance tests show that VMware Native Multipathing with the Round Robin path selection policy provides improved performance for certain types of workloads when the I/O operation limit parameter is tuned. Although the lower value of the I/O operation limit parameter uniformly yields better I/O throughput, the improvement is significant for few specific workloads. However, tuning the parameter to lower values has implications on CPU utilization on the ESX host, Symmetrix VMAX front-end directors, and Symmetrix VMAX back-end directors.
Findings The following results were determined using the stated test plan and methodology: Random workloads — In the case of random workloads (small and large block reads and writes with or without I/O bursts) tuning the I/O operation limit parameter makes little difference both in terms of the I/O throughput as well as response times. However, setting the I/O operation limit parameter to 1 yields the optimal throughput and optimal response time. Sequential workloads — Tuning the I/O operation limit parameter to various values between 1 and 1000 makes a significant difference in the case of sequential workloads. Setting the parameter to a value of 1 yields the optimal throughput and the best response time. However, it should be noted that when the parameter is set to 1, compared to the default value of 1000, the CPU utilization of the ESX host, as well as utilization of the front-end and back-end directors of the Symmetrix VMAX system, go up. When tuning this parameter to a value of 1, users must be aware of this increase in utilization and the effects it may have on other applications that may be running on the same ESX host and Symmetrix VMAX array. OLTP type 2 workloads — OLTP type 2 workloads are similar to random workloads in the sense that tuning the I/O operation limit parameter does not yield significant performance improvements. The findings are not surprising since a significant portion of the OLTP workload is random in nature.
Recommendations From these tests and results it can be seen that tuning the Round Robin I/O operation limit parameter can significantly improve the performance of sequential workloads. Hence customers with predominantly sequential workloads should set the value of the parameter to a small value, preferably 1. For sequential workloads, it should be noted that while setting the parameter to 1 yields optimal IOPS throughput, it increases the CPU utilization on both the ESX host and the Symmetrix VMAX system, which can impact other applications. Customers who have predominantly random and OLTP type 2 workloads in their environments should not change the I/O operation limit parameter from the default
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
33
value since the changes in the throughput and response time with the values of the parameter different than the default are insignificant. Customers who have varied workloads that change with time, or those customers who do not have a complete understanding of the workloads in their vSphere environments, should set the I/O operation limit parameter for the Round Robin policy to 1. This ensures the best possible performance independent of the instantaneous characteristics of the workload being generated from the vSphere systems.
Tuning the VMware Native Multipathing Performance of VMware vSphere Hosts Connected to EMC Symmetrix Storage
34