arXiv:1710.01794v1 [quant-ph] 4 Oct 2017
This manuscript has been authored by UT-Battelle, LLC, under Contract No. DEAC0500OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for the United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan.
Extreme-Scale Programming Model for Quantum Acceleration within High Performance Computing Alexander J. McCaskey1,2 , Eugene F. Dumitrescu1,3 , Dmitry Liakh1,5 , Mengsu Chen6,7 , Wu-chun Feng7 , Travis S. Humble1,3,4 1 Quantum
Computing Institute, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA 2 Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA 3 Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA 4 Bredesen Center for Interdisciplinary Research, University of Tennessee, Knoxville, Tennessee, USA 5 Oak Ridge Leadership Computing Facility, Scientific Computing, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA 6 Department of Physics, Virginia Tech, Blacksburg, VA, USA 7 Department of Computer Science, Virginia Tech, Blacksburg, VA, USA E-mail:
[email protected] Original 4 October 2017 Abstract. Heterogeneous high-performance computing (HPC) systems offer novel architectures accommodating specialized processors that accelerate specific workloads. Near-term quantum computing technologies are poised to benefit applications as wide-ranging as quantum chemistry, machine learning, and optimization. A novel approach to scale these applications with future heterogeneous HPC is to enhance conventional computing systems with quantum processor accelerators. We present the eXtreme-scale ACCelerator programming model (XACC) to enable near-term quantum acceleration within existing conventional HPC applications and workflows. We design and demonstrate the XACC programming model within the C++ language by following a coprocessor machine model akin to the design of OpenCL or CUDA for GPUs. However, we take into account the subtleties and complexities inherent to the interplay between conventional and quantum processing hardware. The XACC framework provides a high-level API that enables applications to offload computational work represented as quantum kernels for execution on an attached quantum accelerator. Our approach is agnostic to the quantum programming language and the quantum processor hardware, which enables quantum programs to be ported to multiple processors for benchmarking, verification and validation, and performance studies. This includes a set of virtual numerical simulators as well as actual quantum processing units. The XACC programming model and its reference implementation may serve as a foundation for future HPC-ready applications, data structures, and libraries using conventional-quantum hybrid computing.
Keywords: Quantum Computing, Quantum Programming, Software Engineering, High Performance Computing, Accelerated Computing
XACC - Quantum Acceleration within High Performance Computing
2
1. Introduction High-performance computing (HPC) architectures continue to make strides in the use of specialized devices as computational accelerators, and future HPC designs are expected to take advantage of extreme-scale heterogeneity. Among many potential accelerators, a quantum processing unit (QPU) represents a unique type of coprocessing device that leverages the information-theoretic principles of quantum physics. Recently, there have been several advances in the development of prototype QPUs including a quantum annealing processor from D-Wave [1], a 16-qubit gatemodel chip from IBM [2], and a growing number of small-scale QPUs are expected to come online within the next 5 years. The emergence of these devices raises the questions of how they may be used within conventional computing environments. The complex infrastructure for early QPUs is likely to limit their usage to stateof-the-art HPC systems [3], where unique algorithms that take advantage of both conventional and quantum computing concepts may be realized [4]. For example, this hybrid computing paradigm is poised to broadly benefit scientific applications that are ubiquitous within research fields such as modeling and simulation of quantum many-body systems, applied numerical mathematics, and data analytics [5]. Developing methods for integrating quantum algorithms into conventional programming models is an outstanding technical challenge. While several efforts in quantum programming have focused on formal language development, embedded domain specific languages [6], integrated development environments [7], and low-level compiler tools [8], there has yet to be a concerted effort to provide a standard and general-purpose mechanism for integrating quantum programs into the sophisticated and complex HPC environments that are the staple of modern scientific computing. This lack of software infrastructure and absence of open-source standards represents a major roadblock to programming future quantum-enhanced computing systems. An open, high-level standard for hybrid programming and program execution could enable and facilitate broader adoption of quantum computing by domain computational scientists. The problem of generalizing HPC programming paradigms to include new computational accelerators is not without precedent. The integration of GPUs into HPC systems was also a challenge for many large-scale scientific applications because of the fundamentally different way programmers interact with the hardware. Hardware-specific solutions provide language extensions, like the CUDA library from NVidia [9], that enable programming natively in the local dialect. Hardware-agnostic solutions, like the Open Compute Language (OpenCL), define a hybrid programming specification for offloading work to attached accelerators (GPUs, MIC, FPGA, etc.), in a manner that masks or abstracts the underlying hardware type [10]. These hardware-agnostic approaches have proved useful because they retain a wide degree of flexibility for the programmer by automating those aspects of compilation that are overly complex. Despite prior efforts, we anticipate that programming models for QPUs will be even more challenging due to the radically different physical features and behaviors brought by quantum computing, as well as the wide variety of hardware types (superconducting, ion traps, adiabatic superconducting, etc.). Therefore, it is necessary to program quantum computers in a hardware-agnostic manner. Inspired by the GPU analogy, hardware-specific solutions like CUDA likely do not necessarily make sense in the near-term for enabling quantum acceleration due to the fact that a better technology very easily could be introduced, forcing a re-write of quantum
XACC - Quantum Acceleration within High Performance Computing
3
accelerated code. Existing approaches for interfacing domain computational scientists with quantum acceleration have progressed extensively over the last few years. A number of quantum programming languages have been developed, and novel development efforts are currently under way to provide high-level mechanisms for writing, compiling, and executing quantum code. State-of-the-art approaches provide embedded domainspecific languages for quantum program expression. Efforts from Rigetti [11], Microsoft [12], Google [13], and IBM [14] have followed this strategy and currently have open-source Python libraries for quantum programming alongside existing code, with some providing built-in extensibility for QPU or simulation backends. These efforts are great steps in the development of modern quantum computer science. An unintended down-side to these efforts is that for each programming framework or approach, there must be implementations of the hardware backends that interpret and execute the programming and compilation result. This lack of programming integration is an impediment to large-scale adoption of quantum computing by domain computational scientists, as it forces the use of one method or programming framework when others may prove better for the problem at hand. We present a specification for a programming model that integrates QPU devices into conventional computing. We describe the eXtreme-scale ACCelerator (XACC) programming model as a general-purpose and open programming model and software framework for accelerating conventional high-performance computing with quantum acceleration in a manner that promotes quantum language (embedded or stand-alone) and hardware (gate or adiabatic model) interoperability. This programming model targets C++ and the C++11 standard with extensions to other languages achieved through appropriate language bindings. XACC defines interfaces and abstractions that enable hybrid conventional-quantum programming at a level that is accessible to domain computational scientists. This model borrows concepts from CPU-GPU heterogeneous programming and enables language interoperability in a manner similar to the LLVM compiler framework. XACC interfaces provide extensible mechanisms for describing quantum kernels, compilers, programs, and accelerators in a manner that is agnostic to the quantum language and/or hardware. The remainder of this paper is as follows: first, we detail XACC’s hybrid conventional-quantum platform and memory models. Then we outline the specifics of the programming model: a detailed description of interfaces and concepts that enable the execution of quantum programs alongside existing computational workflows. Finally, we will demonstrate the overall workflow and utility of XACC, and an opensource reference implementation of the XACC interfaces and concepts, Eclipse XACC. 2. Extreme-Scale Acceleration We discuss concepts that are unique to an HPC system with quantum accelerators in terms of the platform and memory models. The platform model describes the hardware components used by the hybrid application and how these components behave in relation to one another, while the memory model details program management and movement of data between these components. These model abstractions drive the design and implementation of the XACC programming model, which specifies an API to offload computations to an attached quantum accelerator.
XACC - Quantum Acceleration within High Performance Computing
4
2.1. Hybrid Computing Platform and Memory Model Current and planned near-term quantum computing technologies provide a relatively small number of quantum register elements and lack of sufficient error correction to sustain fault-tolerant operations. However, these pre-threshold devices demonstrate sufficient hardware control to support programmable sequences of operations that may be used as primitive quantum accelerators within an existing computing context [5]. Only a few of these early QPUs are publicly available, and almost all are located remote to the application programmer and end user. These many limitations impact how hardware components interact with each other and how the programmer may view the system. We handle these considerations within the XACC platform model by
Figure 1: Graphical representation of XACC platform model. using a client-server model [15] in which programmers of the conventional computing system are on the client side and the quantum accelerator system is on the server side. As shown in Fig. 1, XACC defines three non-trivial hardware actors in this model: (1) the Host CPU, (2) the Device CPU, and (3) the quantum Accelerator itself. The Host CPU drives the interaction of applications with an attached quantum Accelerator. Hybrid applications are written and compiled on the Host CPU, and execution of embedded compiled quantum subroutines or functions is pushed to the Device CPU through a remote service invocation (an HTTP Post, for example). The results of that execution are returned to the Host CPU system via another remote invocation (an HTTP Get, for example). The role of the Device CPU is to listen for execution requests, and then drive the execution of the compiled quantum program using vendor-supplied Accelerator APIs. It also keeps track of a bit register that is populated with qubit measurement results. These bits can then be returned back to the Host CPU, post-processed, and used as input to the rest of the computation. The cardinality of these hardware actors is of particular interest to the XACC platform model. The XACC programming model can be leveraged for both serial computing and massively-parallel, distributed high-performance computing. Therefore, cardinality of the Host CPU is 1..*, one to many, as shown in Figure 1. We could have one or many Host CPUs as part of a given hybrid execution. This corresponds to the many cores available in a given HPC application. Furthermore, one could imagine having enhancing a computation with many quantum Accelerators as
XACC - Quantum Acceleration within High Performance Computing
5
well. In the near term this will be unlikely, but future systems may have a number of quantum Accelerators available to the compute cores, as we do with GPUs in existing heterogeneous computing. Therefore, the cardinality of Device CPUs and quantum Accelerators is 1..* (one to many). XACC allows for the inclusion of multiple cores having access to multiple quantum Accelerators. The XACC memory model builds off the platform model to ensure that clientside applications that offload work to an attached quantum accelerator can retrieve the results. As discussed in the XACC platform model, it is the responsibility of the Device CPU to keep track of all qubit measurement results and store them in a bit array. Since the Accelerator system may be remote, there must be some remote sync mechanism to ensure that these results are available during runtime to the application. There are two other constraints that drive the design of the XACC memory model are (1) the probabilistic nature of quantum computing and (2) the lack of quantum hardware fast-feedback technology. The probabilistic nature of physical QPUs implies that the XACC memory model must take into account the need to gather an ensemble of qubit measurement results. This ensemble must be available to hybrid application programmers for the easy computation of statistics that will provide a quantum program result. Fast-feedback is the ability for the QPU to execute, halt, pass back control to the CPU to execute more quantum code on the current QPU state. It is not expected that near-term QPUs will be able to do this due to lack of long coherence times. Therefore, our memory model must take this into account and only allow qubit measurement requests within quantum programs. XACC assumes that the QPU is reset to a hardware-specified default state before each execution. We address these constraints within the XACC memory model by defining an AcceleratorBuffer concept on the Host CPU or client side of the platform model. This abstraction models a register of qubits that is available to the programmer, and it masks the complexity inherent to the QPU’s remote location. The AcceleratorBuffer is responsible for storing the ensemble of qubit measurement results. The data it contains may be leveraged by programmers to compute expectation values and other statistical quantities of interest. 2.2. Hybrid Programming Model The XACC programming model is designed to enable the expression of quantum algorithms alongside existing code in a quantum language-agnostic manner. Furthermore, the compiled result of the expressed quantum algorithm is designed to be amenable to execution on any quantum hardware with appropriately implemented device driver wrappers. To achieve this, XACC defines six main concepts: quantum kernels, accelerator intermediate representation, transformations on the intermediate representation, compilers, accelerators, and programs. These concepts enable an expressive API for offloading quantum algorithm execution to an attached quantum accelerator. 2.2.1. Quantum Kernels XACC requires that clients express code intended for quantum acceleration in a manner similar to CUDA or OpenCL for GPU acceleration: code must be expressed as stand-alone quantum kernels. Figure 2 provides a description of the requirements for a kernel to be valid in XACC. At its core, an XACC quantum kernel is represented by a function in C++. This function must take as its first argument the AcceleratorBuffer instance representing the qubit register that this
XACC - Quantum Acceleration within High Performance Computing
6
kernel operates on. A quantum kernel, therefore, is a programmatic representation of the unitary operations applied to a quantum register of qubits, e.g., represented by a quantum circuit, or described as a quantum annealing process. Furthermore, in XACC, quantum kernels do not specify a return type; all information about the results of a quantum kernel’s operation are gathered from the AcceleratorBuffer’s ensemble of qubit measurements (see Section 2.1). Quantum kernels in XACC are differentiated from conventional library calls using the qpu keyword.
Figure 2: XACC requirements for quantum kernels. The function body of an XACC quantum kernel may be expressed in any available quantum programming language. An available quantum programming language is one such that the XACC implementation provides a valid Compiler implementation for the language (see 2.2.3). Finally, quantum kernels may take any number of kernel arguments that drive the overall execution of the quantum code. This enables parameterized quantum circuits that may be evaluated at runtime. 2.2.2. Quantum Intermediate Representation In order to promote interoperability and programmability across the wide range of available QPU types and quantum programming languages (embedded or stand-alone), there must be some common, standard low-level program representation that is simple to understand and manipulate. An example of this in the conventional computing world is the LLVM - a compiler infrastructure that maps a programming language to an intermediate representation that can be used to perform hardware-dependent and independent optimizations and generate native, hardware-specific executable code [16]. This representation enables efficient language and hardware interoperability. Similarly for quantum computing, with the variety of available QPU types (superconducting, ion trap, adiabatic, etc.) and quantum programming languages (Scaffold, QCL, Quipper, Quil, etc.) there is a strong need for some standard low-level intermediate representation that serves as the glue between languages and hardware (see Figure 3). A standard in this regard would enable a wide range of quantum programming tools and provide early users the benefit of programming their domain-specific algorithms in a manner that best suits their research and application. It will enable the execution of those programmed
XACC - Quantum Acceleration within High Performance Computing
7
algorithms on a number of differing hardware types.
Figure 3: Diagram describing the XACC programming model layered architecture, as well as its language and hardware interoperability. XACC defines a novel intermediate representation (IR) infrastructure that promotes the overall integration of existing programming techniques and hardware realizations. The IR specification provides four main forms for use by clients: (1) an in-memory representation and API, (2) an on-disk, persistent representation, (3) human-readable quantum assembly representation, and (4) a graph representation (for example, a quantum circuit or tensor network for gate model computing and Ising Hamiltonian graph for quantum annealing). This specification enables efficient analysis and isomorphic transformation of quantum kernel code and provides a common representation for executing code written in any quantum language for any available quantum hardware, given constraints on the model of quantum computing being leveraged. XACC does not enable execution across quantum computing models. The specification for the IR infrastructure interfaces is shown in Figure 4 using the Unified Modeling Language (UML). The foundation of the XACC IR specification is the Instruction interface, which abstracts the concept of an executable instruction for an attached Accelerator. Instructions have a unique name and reference to the accelerator bits that they operate on. Instructions can operate on one or many accelerator bits and can be in an enabled or disabled state to aid in the definition of conditional branching. Instructions can also be parameterized. Each Instruction may optionally keep track of one or many InstructionParameters, which are essentially a variant data structure that can be of type float, double, int, string, or complex. XACC defines a Function interface to express kernels as compositions of instructions. The Function interface is a derivation of the Instruction interface that itself contains Instructions. The Instruction/Function combination models the familiar composite design pattern [17]. In XACC, kernels to be executed on an attached accelerator are modeled as an n-ary tree with Function references as nodes and Instruction references a leaves. XACC defines a container for Functions as the XACC IR interface. This interface provides an abstraction to model a list of compiled Functions, with the ability to map those Functions to both an assembly-like, human-readable string and
XACC - Quantum Acceleration within High Performance Computing
8
a graph data structure. For the case of gate model quantum acceleration, the graph models the quantum circuit and provides a convenient data structure for isomorphic transformations and analysis. For quantum annealing, the graph structure can model the Ising Hamiltonian parameters that form the machine-level instructions for the quantum computation. The structures just described give us three of the four forms that the XACC’s intermediate representation provides - the in-memory data structure representation and API, the human-readable assembly representation, and the graph representation. To provide an on-disk representation, the IR interface exposes load and persist methods that take a file path to read in, and to write to, respectively. In this way, IR instances that are generated from a given set of kernels can be persisted and reused, enabling faster just-in-time compilation. Due to the tree-like nature of the XACC IR infrastructure, where each node can be one of many different types (subclasses of Function and Instruction, for example Hadamard, CNOT, etc. for gate model computing), there is a need for an extensible mechanism for walking this tree and performing subclass-specific tasks on each node. XACC defines an InstructionVisitor for this purpose, which models the familiar visitor pattern [17]. To implement this design pattern, InstructionVisitor provides a visit Figure 4: Architecture for the XACC method for each exposed Instruction subIR interfaces. type, and each Instruction implements an accept method that takes as input an InstructionVisitor instance. Since each accept method has type-specific information about the Instruction, the correct visit method is invoked on the InstructionVisitor, and therefore, type-specific routines are run for the given Instruction. The XACC IR interfaces provide a unique way to describe quantum algorithms through an in-memory, object model representation. To build off this idea, XACC defines an AlgorithmGenerator interface, which enables the generation of XACC Function implementations in an extensible manner. This interface provides a generate method that can be implemented by subclasses, which takes as input the bits that the algorithm should operate on, and produces a Function representation of the algorithm. This provides a unique way to expose common quantum algorithmic primitives such as quantum Fourier transforms, phase estimation, etc., to XACC programmers. Specifically, this generator service can be leveraged at compile time to search for common algorithm subroutine invocations, and replace them with IR Function instances that can be executed by XACC Accelerators.
XACC - Quantum Acceleration within High Performance Computing
9
2.2.3. Quantum Language Compilers To provide extensibility in quantum programming languages (QPL), XACC describes an interface for QPL compilers simply called the Compiler interface. At its core, this interface provides a compilation method that subclasses implement to take quantum kernel source code as input and produce a valid instance of the XACC IR. Derived Compilers are free to perform quantum compilation in any way they see fit, as long as they return a valid IR instance. This compile mechanism can also be provided with information on the targeted accelerator at compile time. This enables hardware-specific details to be present at compile time and thus influence the way compilation is performed. For example, quantum compilation methods often require information about the hardware connectivity graph - XACC and its compiler mechanism ensures this type of hardwarespecific information is available at compile time. Compilers also provide a translate method to enable quantum language sourceto-source translation. This method takes as input an IR Function instance to produce an equivalent source string in the Compiler’s quantum programming language. The overall workflow for XACC source-to-source translation relies on the flexibility of the XACC IR specification. A kernel source code can be compiled with its appropriate ComFigure 5: A UML view of the piler instance. The Function IR instance proCompiler infrastructure in XACC. duced by that mechanism can then be passed Compilers provide compile and to the translate method of the Compiler for translate mechanisms, while the language being generated. The implePreprocessors enable kernel source mentation of the translate method maps the code analysis and isomorphic IR Function Instructions to language-specific transformation. source code and returns it. In addition to the Compiler interface, the concept of compilation in XACC also defines a Preprocessor interface. Preprocessors are to be executed before compilation and take as input the source code to analyze and process, the compiler reference for the kernel language, and the accelerator being targeted for execution. Using this data, Preprocessors can perform operations on the kernel source string to produce a new kernel source string. All modifications made by the Preprocessor should be isomorphic in nature, i.e. the resultant kernel source code should, upon execution, should produce the same result as the provided kernel source code. An example of the Preprocessor’s utility would be searching kernel source code for certain key words describing a desired algorithm to be executed on a set of bits and replacing that line of code with a source-code representation of the algorithm. A Preprocessor like this would alleviate tedious programming tasks for users. 2.2.4. Intermediate Representation Transformations The native assembly generator component in Figure 3 plays the important role of providing an extensible hook for modifications of the generated intermediate representation that make it amenable to execution on the desired quantum hardware. XACC defines an IRTransformation interface that provides a method for taking a valid IR instance and generating a modified, optimized, or generally transformed isomorphic IR instance.
XACC - Quantum Acceleration within High Performance Computing
10
Accelerator implementations can provide realizations of this interface that can be executed by the backend native assembly generator to ensure the compiled IR instance can be executed on the hardware. For example, a hardware implementation that does not provide a physical implementation of a given gate could expose an IRTransformation that searches for all instances of that gate instruction and replaces them with some other gate or set of gates that achieves the same functionality, thereby ensuring the new IR instance is isomorphic to the provided IR instance. 2.2.5. Accelerators The inevitable near-term variability in quantum hardware designs and implementations forces any heterogeneous programming model for quantum acceleration within existing workflows to be extensible in the hardware types it interacts with. XACC is no exception to this, and therefore provides an interface for injecting custom accelerator instances. This Accelerator concept provides an extensible abstraction for the injection of current and future quantum accelerator hardware. Accelerators provide an initialize mechanism for implementors to handle any startup or loading procedures that need to happen before execution on the device. This, for example, can include creating remote connections to the Device CPU / Accelerator system, or retrieving qubit connectivity inforFigure 6: A view of the Accelerator mation to inform and affect kernel code cominterface in XACC. pilation. Accelerators expose a mechanism for creating instances of AcceleratorBuffers, which provide clients of XACC with a handle on measurement results. Additionally, as seen in the previous section, Accelerator implementations can provide any necessary transformations on the compiled IR instances. These transformations will be run after compilation has taken place, but before execution begins. Accelerators provide a method for exposing the bit connectivity of the hardware. For example, the D-Wave QPU has a very specific qubit connectivity structure, which plays a very important role in mapping programs onto the hardware. The getAcceleratorConnectivity method can be used by compilers to aid in the compilation or mapping of high-level problems onto the Accelerator. Finally, Accelerators expose an execute method that takes as input the Accelerator Buffer to be operated on and the Function instance representing the quantum kernel to be executed. Realizations of this interface are responsible for leveraging these data structures to affect execution on their desired hardware or simulator. It is intended that Accelerator implementations leverage vendor-supplied APIs to perform this execution. All execute implementations are responsible for updating the Accelerator Buffer with measurement results. 2.2.6. Programs and Execution Workflow The main entry-point for interaction with the XACC programming model and API is the concept of a Program. The XACC Program orchestrates the entire quantum code compilation process and provides users with an executable functor or lambda to affect the execution of the quantum code on the desired Accelerator. Figure 8 gives a high-level view of this workflow represented as a UML sequence diagram.
XACC - Quantum Acceleration within High Performance Computing
11
Figure 7: The execution workflow for a hybrid conventional-quantum program in XACC. The execution workflow starts with a call to the XACC framework to get reference to the desired Accelerator. With that Accelerator, users can request an allocation of bits to operate on represented as an Accelerator Buffer instance. Then, to begin the compilation process of the quantum source code kernel, users instantiate an XACC Program. All Programs are instantiated with a reference to the desired Accelerator so that the compilation process may leverage hardware-specific information. Programs also take as input the source code to be compiled, an already constructed XACC IR instance, or a file(s) reference of a persisted XACC IR instance for fast loading at runtime. Once a Program is created, users can use the Program to compile the source code through a public build method. This method handles the creation of the appropriate language-specific Compiler, the execution of any desired Preprocessors, the actual compilation, and the execution of any required IR Transformations. The result is an IR instance that is stored by the Program. Once built, users can request a kernel by name, or get all compiled kernels as lambda or functor objects. These functors handle the interaction of the IR, Accelerator Buffer, and the Accelerator’s execution mechanism. This execution is kicked off by invoking the kernel functor
XACC - Quantum Acceleration within High Performance Computing
12
(in C++ a call to the operator()(Args...) method). Execution occurs on the Accelerator, and measurement results are stored on the Accelerator Buffer, which the user has reference to. Therefore, once the execution is complete, the results are with the user and are ready for post-processing and use in the workflow. 3. XACC Reference Implementation for Quantum Computing We have developed an open-source reference implementation [18] of the XACC programming model - Eclipse XACC [19]. In this section, we discuss the Eclipse XACC implementations of the core XACC interfaces and abstractions. These implementations enable programming and acceleration for both gate model quantum computing and quantum annealing. 3.1. Gate Model Quantum Computing 3.1.1. Intermediate Representation For gate model quantum computing we have developed specialized implementations of the XACC IR infrastructure: the GateInstruction, GateFunction, and GateQIR. Each GateInstruction is a realization of the XACC Instruction interface that models a typical quantum mechanical unitary gate operating on a register of qubits. Examples include Pauli gates, Hadamard and CNOT gates, or general rotations about the X, Y, and Z axes. As such, each GateInstruction keeps track of its unique name, the set of bits (one or many) it operates on, and a list of InstructionParameters which may be of type int, float, double, complex, or string. For example, a rotation gate keeps track of one InstructionParameter which models the rotation angle, and this angle can be either a float, double, or a string variable. This variant-like InstructionParameter is unique, as it enables ahead-of-time compilation of parameterized circuits that can be evaluated with concrete values at runtime. GateInstructions can be set to an enabled or disabled state at runtime to enable conditional branching of quantum kernels based on qubit measurement results. Eclipse XACC currently has the following available built-in GateInstruction subtypes for a given quantum kernel: Hadamard, CNOT, Measure, Rx, Ry, Rz, X, Y, Z, SWAP, and CPhase. The GateFunction is a realization of the XACC Function interface that models a collection of GateInstructions. Furthermore, since an XACC Function is an Instruction, GateFunctions can contain other GateFunctions, leading to a tree structure of GateInstructions. Each node in this tree is a GateFunction, while each leaf is a specific GateInstruction. GateFunctions may also be parameterized with string variable InstructionParameters to model function arguments. These arguments are parsed and passed to all internal GateInstructions that depend on that InstructionParameter string variable. The GateFunction provides an evaluateVariableParameters method that is invoked at runtime to update InstructionParameters with runtime values. The GateFunction enables the insertion or removal of GateInstructions at runtime, a key feature for any IR Transformations that optimize or isomorphically transform IR for execution on Accelerators. Finally, Eclipse XACC provides an IR implementation, called GateQIR, which models a collection of GateFunctions. This class acts as a container for compiled XACC quantum kernels. It implements the four XACC IR characteristics discussed in Section 2.2.2. GateQIR can be persisted to a human-readable QASM string, to and from file as JSON, can produce a graph representation of itself, and is itself an
XACC - Quantum Acceleration within High Performance Computing
13
in-memory model of the quantum kernels making up a given hybrid classical-quantum execution. 3.1.2. Compilers Eclipse XACC has gate model Compiler support for a number of currently-available languages: ScaffoldCompiler, QuilCompiler, and ProjectQCompiler. Each of these enables the integration of various quantum programming efforts with any available gate model XACC Accelerator.
Figure 8: Currently available gate-model quantum Compilers and Accelerators in Eclipse XACC The ScaffoldCompiler allows users to write quantum kernels in the Scaffold quantum programming language [8]. This language is embedded in C++ and builds off of the extensible Clang compiler infrastructure, and as such, fits very well with the XACC C++ programming model. To expose the Scaffold Clang and LLVM bindings to XACC, the ScaffoldCompiler leverages a custom Clang Abstract Syntax Tree (AST) Consumer, which enables custom routines to be run at each node in the AST representing the quantum kernel being compiled. At each of these points, the Consumer takes information about the current node, such as gate name, gate parameters, qubits, etc., and creates a new GateInstruction or GateFunction. It walks this tree, constantly creating these XACC IR constructs, and upon finishing, adds all GateFunctions to a new GateQIR instance and returns it. The ScaffoldCompiler takes this IR instance and returns it to the XACC framework. For interoperability and integration with other Python-based quantum programming frameworks, Eclipse XACC provides the QuilCompiler and ProjectQCompiler implementations that target Rigetti’s pyQuil and Microsoft’s ProjectQ programming frameworks, respectively. Both of these Compilers are simple implementations that map Quil [20] and the ProjectQ [21] QASM format to XACC IR. Both work by tokenizing the incoming quantum kernel code and mapping all instructions to XACC GateInstructions. These simple implementations are key to enabling execution of Python-based pyQuil and ProjectQ quantum programs on any attached XACC Accelerator.
XACC - Quantum Acceleration within High Performance Computing
14
3.1.3. Accelerators Eclipse XACC has gate model Accelerator support for a number of currently available quantum processors. We have developed Accelerator implementations that target the IBM Quantum Experience 5 and 16 qubit QPUs, the Rigetti Forest QVM, and a tensor network quantum circuit simulator we have developed called TNQVM [22]. The IBM and Rigetti Accelerators are remote QPUs and/or simulators that rely on remote Rest Client invocations to drive the execution of the QPU resource. To do this, Eclipse XACC relies on the open-source CppRestSDK from Microsoft [23], which provides convenient data structures for executing HTTP Post/GET calls to the IBM and Rigetti servers. The IBM and Rigetti Accelerator implementations have an associated InstructionVisitor implementation. These visitors process the incoming IR tree and convert it to the QASM-like string required by the remotely hosted quantum resource. For example, the QuilVisitor maps the XACC IR to a Quil string, while the OpenQasmVisitor maps the XACC IR to the IBM OpenQasm [24] specification. For our tensor network simulator, we provide a visitor implementation that maps IR to an MPS tensor state data structure that can be executed by some available tensor algebra library.
Figure 9: C++ header file, h2vqe.hpp, containing quantum kernels for computing H2 VQE computation. 3.1.4. Use Case - Variational Quantum Eigensolver An excellent demonstration of the XACC programming model as applied to gate model quantum computing is the variational quantum eigensolver algorithm [4]. This algorithm relies explicitly on both classical and quantum resources, and involves quantum source code that is parameterized on a number of variational parameters. Furthermore, the algorithm is trivially parallelizable in the number of QPU executions needed. Here we demonstrate the utility of the Eclipse XACC API by writing a C++
XACC - Quantum Acceleration within High Performance Computing
15
Figure 10: Source code for computing expectation values of terms in H2 molecular Hamiltonian. code that computes the expectation values of the individual terms in the molecular Hamiltonian for molecular hydrogen, as put forth in [25]. This example demonstrates how to program quantum kernels using the Scaffold programming language, with variable runtime circuit parameterization, and the ability to target any gate model XACC Accelerator. The code is shown in Figures 9 and 10. Figure 9 shows a typical C++ header file, called h2vqe.hpp. It is a collection of XACC quantum kernels, all annotated with the required qpu keyword, all parameterized on the register of qubits being operated on (the AcceleratorBuffer instance), and all parameterized by the variational parameter theta. The first kernel represents the state preparation circuit, and serves to initialize the attached quantum Accelerator to some non-trivial state parameterized by theta. The remaining kernels each represent a term in the molecular Hamiltonian found in [25], where each is essentially a one or two qubit measurement, with appropriate change of bases for the term it represents. Each of these kernels invokes the statePrep kernel as its first instruction. Figure 10 demonstrates the runtime-compilation and execution of these kernels using the Eclipse XACC API. First the framework is initialized, a step which is key for gathering and making available all possible XACC plugins (Accelerators, Compilers, etc.). Next, clients get reference to the desired Accelerator, here the IBM Accelerator. This step does not require the Accelerator name string (”ibm” here), and if left out, the framework delegates to the accelerator command line argument passed at runtime. In this way, users can dynamically run XACC quantum programs on any available Accelerator without having to re-compile their code. Next, clients allocate
XACC - Quantum Acceleration within High Performance Computing
16
an AcceleratorBuffer, here the qubitReg variable instance modeling the two qubits we need to model molecular hydrogen. Then a Program instance is creating, taking the Accelerator and the kernel source code as input (here a file stream for h2vqe.hpp). The program is then built, which orchestrates the compilation of the quantum kernels and generates XACC IR that the Program internally controls. This IR is a tree of GateFunctions and GateInstructions that represents the quantum kernels in Figure 9. Clients get reference to those compiled, executable kernels through the Program’s getKernels method. With these kernels in hand, we loop over all theta between −π and π. For each value of theta we execute the compiled kernels, and retrieve the expectation value and store it to an output CSV file. 3.2. Quantum Annealing 3.2.1. Intermediate Representation We have developed specialized implementations of the XACC IR infrastructure for quantum annealing, specifically targeting the D-Wave QPU. Eclipse XACC provides DWQMI, DWKernel, and DWIR that implement Instruction, Function, and IR, respectively. DWQMI implements the Instruction interface to model a quantum machine instruction for a QPU that implements quantum annealing. The bits implementation returns two nonnegative integers. If the integers are equal, then the DWQMI models a qubit bias value. If the integers are not equal, then the DWQMI models a qubit coupler value. In this way, a list of DWQMI, or a DWKernel, forms a graph of logical problem nodes and couplers - an Ising representation of a quadratic unconstrained binary optimization problem. DWIR then simply serves as a container for DWKernels.
Figure 11: UML model for the DWQMICompiler showing its use of the EmbeddingAlgorithm and ParameterSetting extension points. 3.2.2. Compiler Eclipse XACC provides a Compiler implementation for highlevel programs targeted at the D-Wave QPU. The DWQMICompiler implements the Compiler compile method to map quadratic unconstrained binary optimization problems to the D-Wave hardware. This compiler makes us of two extension points: (1) the EmbeddingAlgorithm interface, which provides a mechanism for computing minor graph embeddings of the problem graph into the QPU hardware graph [26], and (2) the ParameterSetter interface, which, given a valid embedding, sets the Ising Hamiltonian parameters that the QPU executes. Note that these interfaces enable runtime extensibility of minor graph embedding and parameter setting, therefore,
XACC - Quantum Acceleration within High Performance Computing
17
these can be updated at execution time through command line arguments. This enables efficient performance studies for various compiling workflows [27]. The DWQMICompiler takes as input kernel source code that describes quantum machine instructions. Specifically it takes kernels formatted as a list of triples, where the first two elements are the qubit indices for the quantum machine instruction, and the third element is the bias or coupling value for that machine instruction. These machine instructions can be pre-embedded machine instructions, i.e., they describe a problem graph prior to minor graph embedding. If that is the case, the DWQMICompiler delegates to the appropriate EmbeddingAlgorithm and ParameterSetter at compile time. 3.2.3. Accelerator Eclipse XACC provides an Accelerator, the DWAccelerator, that drives executions on a remote D-Wave QPU, specifically on a remote D-Wave Qubist server. To acheive this, it leverages the Microsoft CppRestSDK (in the same way the Rigetti and IBM Accelerators do), with remote JSON posts to a user-specified D-Wave Qubist server. The DWAccelerator starts by requesting information about the available QPUs at the remote Qubist server. It picks out the pre-programmed ranges for the qubit bias and coupling values. With those ranges, it takes the incoming XACC IR Function instance and normalizes all DWQMI Instruction values. It then builds up an appropriate JSON string with these values, and associated QPU runtime parameters (annealing time, number of reads, etc.), and posts them to the server. The results are pulled back through an HTTP Get and stored on the client-owned AcceleratorBuffer for future post-processing and analysis. 3.2.4. Use Case - Simple Integer Prime Factorization on D-Wave Here we demonstrate the use of the Eclipse XACC quantum annealing implementations by using the D-Wave QPU to factor 15 into 5 and 3. The quantum kernel for factoring 15 on the D-Wave QPU, and the associated code required to compile and execute it using the XACC API are shown in Figure 12. As stated before, and now shown in the above Figure, The DWQMICompiler takes as input kernels structured as a new-line separated list of quantum machine instructions. These, as in this case, can be pre-embedded instructions. The compilation and execution workflow starts by getting reference to the D-Wave Accelerator, which gives the user access to all remotely hosted D-Wave Solvers (physical and virtual resources). Next, users request that an AcceleratorBuffer be allocated, which gives them a reference to the D-Wave QPU qubits, as well as all resultant data after execution. Then, a Program is created with reference to the Accelerator and source code, and the Kernel for execution of the compiled result is requested (this kicks off the minor graph embedding and parameter setting workflow). With reference to the Kernel, users simply execute on the AcceleratorBuffer instance, which populates that instance with the resultant data (energies, measurement bit strings, etc.). The bit string corresponding to the minimum energy can then be used to reconstruct the binary representation of the factors of 15, namely 5 and 3.
XACC - Quantum Acceleration within High Performance Computing
18
Figure 12: (Left) C++ header file, factoring15.hpp, containing quantum kernel for factoring 15 on a D-Wave QPU, and (Right) Eclipse XACC source code for orchestrating compilation and execution of the factor15 kernel on the D-Wave QPU. 4. Discussion We have presented a novel programming framework that permits integration of QPU within HPC systems for the purpose of quantum acceleration. We have demonstrated a high-level set of interfaces and programming concepts that support QPU acceleration akin to existing GPU acceleration. These interfaces enable domain computational scientists to migrate existing scientific computing code to early QPU devices while retaining prior programming investments. The XACC programming model embodies six major concepts: the specification of quantum acceleration using quantum kernels, the generation of a novel quantum intermediate representation, the integration of quantum compilers and preprocessors into existing workflows, the isomorphic transformation of the quantum intermediate representation, the allocation of compiled programs to QPU accelerators, and the construction of hybrid system programs. These concepts form an API that support development of higher-level quantum data structures, libraries, and applications and which makes quantum acceleration approachable to existing domain computational scientists.
XACC - Quantum Acceleration within High Performance Computing
19
5. Acknowledgements This work has been supported by the ORNL Director’s Research and Development Fund (LDRD) and the Department of Energy Early Career Research Program. This manuscript has been authored by UT-Battelle, LLC, under Contract No. DEAC0500OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for the United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan. References [1] E.
[2] [3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11] [12] [13] [14] [15] [16]
Gibney, “D-Wave upgrade: How scientists are using the world’s most controversial quantum computer,” Nature News, vol. 541, no. 7638, p. 447, Jan. 2017. [Online]. Available: http://www.nature.com/news/ d-wave-upgrade-how-scientists-are-using-the-world-s-most-controversial-quantum-computer-1. 21353 “IBM builds its most powerful universal quantum computing processors.” [Online]. Available: https://phys.org/news/2017-05-ibm-powerful-universal-quantum-processors.html K. A. Britt and T. S. Humble, “High-performance computing with quantum processing units,” ACM Journal on Emerging Technologies in Computing Systems (JETC), vol. 13, no. 3, p. 39, 2017. J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik, “The theory of variational hybrid quantum-classical algorithms,” New Journal of Physics, vol. 18, no. 2, p. 023023, 2016. [Online]. Available: http://stacks.iop.org/1367-2630/18/i=2/a=023023 A. Aspuru-Guzik et al., “ASCR Workshop on Quantum Computing for Science,” Department of Energy Office of Science Advanced Scientific Computing Research Program, Tech. Rep., (2015). A. S. Green, P. L. Lumsdaine, N. J. Ross, P. Selinger, and B. Valiron, “Quipper: a scalable quantum programming language,” in ACM SIGPLAN Notices, vol. 48, no. 6. ACM, 2013, pp. 333–342. T. S. Humble, A. J. McCaskey, R. S. Bennink, J. J. Billings, E. F. D’Azevedo, B. D. Sullivan, C. F. Klymko, and H. Seddiqi, “An Integrated Programming and Development Environment for Adiabatic Quantum Optimization,” Computational Science & Discovery, vol. 7, no. 1, p. 015006, 2014. [Online]. Available: http://stacks.iop.org/1749-4699/7/i=1/a=015006 A. Javadi-Abhari, S. Patil, D. Kudrow, J. Heckey, A. Lvov, F. T. Chong, and M. Martonosi, “Scaffcc: a framework for compilation and analysis of quantum computing programs,” in Proceedings of the 11th ACM Conference on Computing Frontiers. ACM, 2014, p. 1. J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with cuda,” Queue, vol. 6, no. 2, pp. 40–53, Mar. 2008. [Online]. Available: http://doi.acm.org/10.1145/1365490.1365500 J. E. Stone, D. Gohara, and G. Shi, “Opencl: A parallel programming standard for heterogeneous computing systems,” IEEE Des. Test, vol. 12, no. 3, pp. 66–73, May 2010. [Online]. Available: http://dx.doi.org/10.1109/MCSE.2010.69 R. S. Smith, M. J. Curtis, and W. J. Zeng, “A practical quantum instruction set architecture,” 2016. D. S. Steiger, T. H¨ aner, and M. Troyer, “ProjectQ: An Open Source Software Framework for Quantum Computing,” ArXiv e-prints, Dec. 2016. J. McClean, T. Haner, D. Steiger, and R. Babbush, “Fermilib v0.1,” Feb 2017. [Online]. Available: http://www.osti.gov/scitech/servlets/purl/1345517 A. W. Cross, L. S. Bishop, J. A. Smolin, and J. M. Gambetta, “Open Quantum Assembly Language,” ArXiv e-prints, Jul. 2017. A. Sinha, “Client-server computing,” Commun. ACM, vol. 35, no. 7, pp. 77–98, Jul. 1992. [Online]. Available: http://doi.acm.org/10.1145/129902.129908 C. Lattner and V. Adve, “Llvm: A compilation framework for lifelong program
XACC - Quantum Acceleration within High Performance Computing
[17]
[18] [19] [20] [21] [22] [23] [24] [25]
[26]
[27]
20
analysis & transformation,” in Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, ser. CGO ’04. Washington, DC, USA: IEEE Computer Society, 2004, pp. 75–. [Online]. Available: http://dl.acm.org/citation.cfm?id=977395.977673 E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design Patterns: Elements of Reusable Object-oriented Software. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1995. A. McCaskey, “Xacc - extreme-scale accelerator programming framework,” https://github.com/ ORNL-QCI/xacc, 2017. “XACC extreme-scale accelerator programming framework,” https://projects.eclipse.org/ proposals/eclipse-xacc. R. S. Smith, M. J. Curtis, and W. J. Zeng, “A Practical Quantum Instruction Set Architecture,” ArXiv e-prints, Aug. 2016. D. S. Steiger, T. H¨ aner, and M. Troyer, “ProjectQ: An Open Source Software Framework for Quantum Computing,” ArXiv e-prints, Dec. 2016. A. McCaskey and M. Chen, “Tnqvm - tensor network quantum virtual machine,” https: //github.com/ORNL-QCI/tnqvm, 2017. “Cpprestsdk,” https://github.com/Microsoft/cpprestsdk, 2017. A. W. Cross, L. S. Bishop, J. A. Smolin, and J. M. Gambetta, “Open Quantum Assembly Language,” ArXiv e-prints, Jul. 2017. P. J. J. O’Malley, R. Babbush, I. D. Kivlichan, J. Romero, J. R. McClean, R. Barends, J. Kelly, P. Roushan, A. Tranter, N. Ding, B. Campbell, Y. Chen, Z. Chen, B. Chiaro, A. Dunsworth, A. G. Fowler, E. Jeffrey, E. Lucero, A. Megrant, J. Y. Mutus, M. Neeley, C. Neill, C. Quintana, D. Sank, A. Vainsencher, J. Wenner, T. C. White, P. V. Coveney, P. J. Love, H. Neven, A. Aspuru-Guzik, and J. M. Martinis, “Scalable quantum simulation of molecular energies,” Phys. Rev. X, vol. 6, p. 031007, Jul 2016. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevX.6.031007 C. Klymko, B. D. Sullivan, and T. S. Humble, “Adiabatic quantum programming: minor embedding with hard faults,” Quantum Information Processing, vol. 13, no. 3, pp. 709–729, Mar 2014. [Online]. Available: https://doi.org/10.1007/s11128-013-0683-9 T. S. Humble, A. J. McCaskey, J. Schrock, H. Seddiqi, K. A. Britt, N. Imam, undefined, undefined, undefined, and undefined, “Performance models for split-execution computing systems,” 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), vol. 00, pp. 545–554, 2016.