Cloud optimization using IBM Power platform

19 downloads 189 Views 739KB Size Report
IBM Power servers are designed for big data, provides superior cloud economics and enables open innovation. This paper d
Cloud optimization using IBM Power platform

Dipankar Sarma Pradipta Kumar Banerjee Vaidyanathan Srinivasan Sudipta Biswas Madhavan Srinivasan Hemant Shaw Linux Technology Center IBM Systems December 2015

© Copyright IBM Corporation, 2015

Table of contents Abstract...................................................................................................................................... 1 Introduction................................................................................................................................ 1 Architecture overview...............................................................................................................2 Installation and setup details....................................................................................................5 Test hardware......................................................................................................................................... 5 Setup...................................................................................................................................................... 5 Firmware.......................................................................................................................... 6 Linux kernel...................................................................................................................... 6 Performance Co-Pilot....................................................................................................... 6 OpenStack........................................................................................................................ 7

OpenStack workflow validation................................................................................................8 Summary.................................................................................................................................. 11 Acknowledgment..................................................................................................................... 11 Resources................................................................................................................................ 12 About the authors.................................................................................................................... 13 Trademarks and special notices.............................................................................................14

Cloud optimization using IBM Power platform

Abstract IBM Power servers are designed for big data, provides superior cloud economics and enables open innovation. This paper describes how to use the IBM Power platform features to create innovative and differentiated cloud solutions. Specifically, this paper describes how you can make use of rich platform instrumentation capabilities in OpenStack to optimize cloud orchestration. If you are in the process of building a cloud infrastructure, you can use the information provided in this paper to build advanced monitoring and orchestration capabilities and provide customer value .

Introduction Currently, workload provisioning and runtime optimization in OpenStack is based on the capacity of the resources (processor, memory, and disk) available on the compute nodes. For resource-intensive workloads, decision based on capacity alone might negatively impact its performance. For example, in the case of memory-intensive workload such as analytics, there might be scenarios where memory capacity is sufficiently available on the compute node. However, memory bus bandwidth might be starved, thereby negatively affecting the workload performance. In other words, there are scenarios where the full processor capacity of the compute node in a cloud infrastructure cannot be used for running the workloads optimally due to its resource usage characteristics. New features in the hardware performance monitoring framework enable continuous availability of rich instrumentation data. This makes it feasible to use the same in cloud management software for optimal workload provisioning and runtime optimization. Examples of such instrumentation data are memory bus bandwidth, I/O bus bandwidth, processor execution efficiency (cycles per instruction) and so on. Traditionally, such data from the hardware was available, but was invasive to retrieve and constant monitoring affected workload behavior. Hence the metrics were used only for debugging or workload performance analysis and not for quick decision making. Hardware features on IBM® Power® platform coupled with firmware and operating system enhancements have opened new possibilities for using these data at the cloud orchestration layer for quick decision making. The OpenPOWER platform enables open innovation by allowing business partners and users to customize low-level firmware and Linux® components to extract the platform data in any format suitable for cloud exploitation. This paper describes the changes required in various layers of firmware and Linux as an example. This also serves as a reference for further experimentation and research in this area. With this background, you can take a look at how one such hardware metric – host memory bandwidth, can be used with OpenStack.

Cloud optimization using IBM Power platform

1

Architecture overview The following figure provides a high-level architectural overview of the components involved in using host memory bandwidth through an external consumer such as OpenStack.

Applications

OpenStack

Ganglia

Local/Remote Hardware Instrumentation API

Linux/KVM

Performance Co-Pilot Memory

Hardware Instrumentation Framework

Disk IO POWER8

cores

POWER8

Performance Monitor Counters

VM resource utilization

Figure 1: Architecture overview diagram depicting usage of platform metrics in application

Cloud optimization using IBM Power platform

2

Network IO

Memory and IO hotspot detection

Guest Instances

The following figure shows a more detailed view in the context of KVM and OpenStack.

Customers

VM1

VM2

Hypervisor (Linux/KVM)

vCPUs

Performance CoPilot Openstack node agents

Hardware

Memory PCI IO POWER8

cores

Accelerator

POWER8

Performance Monitoring Counters Memory and IO hotspot Detection VM resource utilization

Figure 2: Architecture overview diagram depicting usage of platform metrics in OpenStack

Updates to the following components of the stack (bottom up) are required to enable usage of host memory bandwidth in OpenStack: 1) Firmware Enhancements are required in firmware and microcode layer to collect the hardware data and export them to the Linux OS. The firmware build for an OpenPOWER platform, such as TYAN TN71-BP012 (refer: http://www.tyan.com/solutions/tyan_openpower_system.html) and Power® System S812LC (refer: ibm.com/systems/power/hardware/s812l-s822l/browse.html or Rackspace Barreleye (refer:http://blog.rackspace.com/openpower-open-compute-barreleye/) consists of a PNOR image with various modules.

Cloud optimization using IBM Power platform

3

The following two modules required enhancements to extract platform data. 

IBM POWER8™ on-chip controller firmware changes are required to allow the on-chip controller to copy memory bandwidth information to host memory.



Open Power Abstraction Layer (OPAL) firmware layer need to provide memory mapping information to Linux kernel.

2) Linux kernel New device drivers are required in Linux kernel to discover new instrumentation data that is available in reserved memory and export them to user space using standard Linux perf framework.

3) Performance Co-Pilot (PCP) Performance Co-Pilot can aggregate data from Linux kernel using perf-api and export them in a consumable format. The kernel perf interfaces export individual counters and you need a middleware such as PCP to aggregate information and also provide python bindings and application programming interface (API) for consumption in OpenStack. The PCP enablement consists of two parts: libpfm and PCP.

4) OpenStack Changes in OpenStack layer are in two parts: (1) The Nova compute agent running in the node (2) The scheduler and filter. Memory bandwidth is a host-level metric and is measured at the granularity of a nonuniform memory access (NUMA) node. OpenStack provides a scheduler filter (NUMAToplogyFilter) to find the most appropriate NUMA node for guest deployments that require its resources such as processor and memory to be always local to a given NUMA node.. Hence, filtering NUMA nodes based on the memory bandwidth consumption of the node, became a natural extension for this filter. The Nova compute agent on the node has been enhanced to discover and use the data available from PCP and export them to OpenStack controller database. OpenStack scheduler can then use this data for basing its orchestration decision. A graphical depiction of using memory bandwidth during provisioning of VM is shown in the following figure.

Cloud optimization using IBM Power platform

4

Figure 3: Depiction of OpenStack instance provisioning workflow using host memory bandwidth

Installation and setup details This section gives an overview of the steps required to setup everything on an OpenPOWER server.

Test hardware All these steps have been carried out on IBM POWER8 processor-based OpenPOWER servers – TYAN TN71-BP012 (refer: http://www.tyan.com/solutions/tyan_openpower_system.html) and Power® System S812LC (refer: ibm.com/systems/power/hardware/s812l-s822l/browse.html)

Cloud optimization using IBM Power platform

5

Setup The following sections explains how to enable the experimental code changes at various layers and build the end-to-end monitoring framework for host instrumentation data and using it for optimizing OpenStack provisioning. A simple automation script to build all the components can be found here https://github.com/sudswas/membw-automation-scripts

Note that these are the experimental enablements intended as a proof of concept and to get feedback on the design elements. Upstreaming of the changes to various open source components is ongoing.

Firmware A modified version of op-build script is available at https://github.com/maddy- kerneldev/op-build.git which includes the following changes for building a new firmware PNOR image (host firmware): 1. OCC firmware changes 2. Host boot firmware changes 3. OPAL firmware changes Building from the earlier mentioned op-build repository automatically pulls in the respective changes that are required to make host memory bandwidth data available to Linux. Detailed instructions to build and use OpenPOWER firmware is available from the following URL: http://jk.ozlabs.org/blog/post/159/customising-openpower-firmware/

Linux kernel Linux kernel needs enhancements in perf driver in order to retrieve the platform instrumentation data. The perf changes against v4.3 kernel tree are hosted at https://github.com/maddy-kerneldev/linux.git You can use your current Linux distribution configuration file and build the changes hosted in the above kernel tree.

Performance Co-Pilot The libpfm changes required for exposing host memory bandwidth metrics is available at git://git.code.sf.net/u/hkshaw1990/perfmon2 After successful build and installation, you can use the following commands to validate if host memory bandwidth data is available through libpfm. For example: # ./check_events | grep MEM [198, bandwidth, "POWERPC_NEST_MEM_BW"]

Cloud optimization using IBM Power platform

6

PCP changes for exposing host memory bandwidth to external consumers is available at https://github.com/hkshaw1990/pcp.git After build and install of PCP, perform the following steps to verify if host memory bandwidth is getting reported by PCP.

1. Start the PCP services using the following command: # service pcp restart 2. Run the pminfo command to check the memory bandwidth-related counters: # pminfo | grep mem_bw hinv.mem_bw.max # pminfo | grep MEM perfevent.hwcounters.bandwidth__MEM_BW.dutycycle perfevent.hwcounters.bandwidth__MEM_BW.value 3. Run the pmval command to see the values: # pmval perfevent.hwcounters.bandwidth__MEM_BW.value metric: host: semantics: units: samples:

perfevent.hwcounters.bandwidth__MEM_BW.value hab01.in.ibm.com cumulative counter (converting to rate) count (converting to count / sec) all cpu0 0.0 4.3 10.2

# pmval hinv.mem_bw.max metric: host: semantics: units: samples:

hinv.mem_bw.max hab01.in.ibm.com instantaneous value none all node0 216.0 216.0 216.0

Cloud optimization using IBM Power platform

7

OpenStack The minimum OpenStack version required to use host memory bandwidth is Liberty. The following steps assume that there is a working OpenStack setup having POWER8 compute nodes. Changes are required for both OpenStack controller and compute. The changes are available at https://github.com/sudswas/nova.git The compute side changes introduces a new configuration element in the nova.conf file. Add the following instruction in the nova.conf file: compute_monitors=numa_mem_bw.virt_driver The changes on the controller side are related to nova-scheduler. The existing nova-scheduler NUMATopologyFilter is updated to use host memory bandwidth to identify target compute nodes for instance creation.

OpenStack workflow validation Perform the following steps for OpenStack workflow validation. 1. Create a flavor specifying the NUMA topology. In this case, m1.medium was edited to have numa_nodes=1.

\ Figure 4: OpenStack flavor with NUMA specification

This is from the Nova compute node’s table. The value of current memory bandwidth shows that bandwidth consumption is lower on NUMA node 16 and 17.

Figure 5: OpenStack Nova DB showing memory bandwidth values

Cloud optimization using IBM Power platform

8

2. Launch a VM from the OpenStack UI using the m1.medium flavor.

Figure 6: OpenStack instance creation request page

The following figure shows the list of CPUs belonging to the NUMA nodes on the host, highlighting the list of CPUs for node 16.

Cloud optimization using IBM Power platform

9

Figure 7: NUMA topology of the host

The following figure illustrates that the newly created VM is bound to NUMA node 16.

Figure 7: OpenStack instance description file

Cloud optimization using IBM Power platform

10

Summary This paper demonstrated the end-to-end usage of host platform instrumentation data (for example, memory bandwidth) for optimizing instance provisioning in an OpenStack environment. Hope this gives you an idea of the numerous possibilities of using host platform instrumentation data for cloud optimization. In future, this technology is expected to be available by default in Linux distributions running on IBM Power Systems servers.

Acknowledgment This paper is never possible without the guidance and involvement from a larger set of people. The team would like to thank the following people for their guidance and help in making this possible. 

Balaram Sinharoy, Power Systems Technology, IBM



Steve Fields, Chief engineer of Power Systems, IBM



Ronald Kala, DE, Processor Development



Todd Rosedahl, Chief Energy Management Engineer on POWER



Ananth M G, RAS Architect, Linux Technology Center



Aneesh KV, Linux and KVM on Power developer, Linux Technology Center



Sudipto Ghosh, Project Manager, Linux Technology Center



Shilpasri Bhat, Linux kernel developer, Linux Technology Center



Shivaprasad Bhat, Libvirt developer, Linux Technology Center

Cloud optimization using IBM Power platform

11

Resources The following websites provide useful references to supplement the information contained in this paper:  IBM Systems on PartnerWorld ibm.com/partnerworld/systems 

IBM Power Systems Information Center http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp



IBM Redbooks ibm.com/redbooks



IBM Publications Center www.elink.ibmlink.ibm.com/public/applications/publications/cgibin/pbi.cgi?CTY=US



Customizing OpenPower firmware http://jk.ozlabs.org/blog/post/159/customising-openpower-firmware/



Performance Co-Pilot http://www.pcp.io/



OpenStack https://www.openstack.org/



OpenPOWER overview http://openpowerfoundation.org/



Open Power Abstraction Layer (OPAL) https://github.com/open-power

Cloud optimization using IBM Power platform

12

About the authors Dipankar Sarma is a distinguished engineer in Linux Technology Center, IBM Systems Group. You can reach him at [email protected] Pradipta Kumar Banerjee is Power Cloud and Docker architect in Linux Technology Center, IBM Systems Group. You can reach him at [email protected] Vaidyanathan Srinivasn is OPAL developer and Platform architect in Linux Technology Center, IBM Systems Group. You can reach him at [email protected] Sudipta Biswas is OpenStack developer in Linux Technology Center, IBM Systems Group focusing on OpenStack compute (Nova). He is responsible for upstreaming OpenStack changes required for host metric exploitation. You can reach him at [email protected] Madhavan Srinivasan is a kernel developer in Linux Technology Center, IBM Systems Group focusing on platform bring up. He is responsible for upstreaming the kernel and firmware changes. You can reach him at [email protected] Hemanth Shaw is a kernel developer in Linux Technology Center, IBM Systems Group focusing on RAS features. He is responsible for upstreaming the PCP code changes. You can reach him at [email protected]

Cloud optimization using IBM Power platform

13

Trademarks and special notices © Copyright IBM Corporation 2015. References in this document to IBM products or services do not imply that IBM intends to make them available in every country. IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" www.ibm.com/legal/copytrade.shtml. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. Information is provided "AS IS" without warranty of any kind. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Contact your local IBM office or IBM authorized reseller for the full text of the specific Statement of Direction. Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here. Photographs shown are of engineering prototypes. Changes may be incorporated in production models.

Cloud optimization using IBM Power platform

14

Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk.

Cloud optimization using IBM Power platform

15