Utilizing a Heterogeneous Computing Cluster & IBM ...

Utilizing a Heterogeneous Computing Cluster & IBM InfoSphere Streams: Real-time Digital Autocorrelation Spectrometer for Radio Astronomy

by M. S. Mahmoud

A dissertation submitted in partial fulfillment of the requirements for the degree of Bachelor of Mathematical Sciences (Honours)

Summer 2010

School of Computing and Mathematical Sciences Faculty of Design & Creative Technologies AUT University

Primary Supervisor: Dr Andrew Ensor Astronomy Supervisor: Professor Sergei Gulyaev

Acknowledgements My thanks to Dr A. Ensor and Prof. S. Gulyaev for their valued guidance and support with this dissertation. The assistance from members of the Institute for Radio Astronomy and Space Research (IRASR) has also been very much appreciated. Finally I would like to thank the IBM Corporation for providing the computing equipment, technical assistance and granting access to early releases of InfoSphere Streams.

i

ii

Abstract Ambitious radio astronomical observatories such as the proposed Square Kilometer Array (SKA) international project will require currently unavailable exascale computing power to process the huge volumes of data generated by thousands of antennae and millions of receptors. Here, to contribute towards building capability for the future SKA telescope, an integrated approach is utilized with the AUT University 12m radio telescope, KAREN network and stream computing. Stream computing is the fundamental strategy utilized by most custom-built high performance hardware signal processing systems. Fundamentally, a stream-centric software paradigm may provide the needed performance to replace current rigid computing resources, allowing significantly reduced design costs, development and maintenance, as well as electric power requirements of custom built solutions. A digital autocorrelation spectrometer pipeline was written using the SPADE stream application language. It was successfully deployed to a heterogeneous computing cluster comprising of x86 and PowerPC Cell BE architectures. Analogue data from radio telescope was digitised then streamed to the IBM Blade Center using the KAREN network. The stream application produced correct power spectral density results. Performance tests with real radio astronomical data allowed identification of the main barriers to system performance. In particular, it was shown that the stream application may be capable of achieving real-time performance if higher performance broadband inter-node communication links were available.

iii

iv

Contents Introduction

1

1

HPC Requirements for Radio Astronomy 1.1 Radio Astronomy Telescopes 1.2 Cross-Correlating and Integrating 1.3 HPC Requirements for Modern Radio Telescopes

5 5 8 10

2

Heterogeneous Computing and Parallelization Middleware 13 2.1 Heterogeneous Computing Systems 13 2.2 Parallelization Middleware 16 2.3 InfoSphere Streams 18 2.4 Stream Processing Application Declarative Engine (SPADE) 22

3

Implementation of a Real-Time Autocorrelation spectrometer 29 3.1 Introduction 29 3.2 Cell B.E. Fast Fourier Transform Server 31 3.3 Defining the Auto-Correlation Pipeline using SPADE 36 3.4 Computer Network Optimizations 40

4

Results and Discussion 4.1 Correctness of the Auto-Correlation Output 4.2 Performance Results 4.3 Discussion of the Results

43 43 44 53

5

Conclusion 5.1 Research Outcomes 5.2 Limitations and Recommendations 5.3 Future Work

59 59 60 62

References

65

A Code

69 v

vi

Introduction The Square Kilometre Array (SKA) is a revolutionary new radio telescope with an unprecedented large collecting area comprising thousands of dishes and antennae. The SKA will be 50-100 times more sensitive than any radio telescope/array previously built. Continuing exponential developments in both computing and radio frequency devices will have made it possible and affordable for the SKA to be built by mid-2020s. When built, the SKA will be capable of answering fundamental questions about the Universe, its constituent parts, its origin and evolution. One of the main challenges of the SKA is the estimated requirement of currently unavailable Exaflops computing power to process the huge volumes of data in real-time. To address this challenge, the SKA computing will have to utilize a stream computing approach. Stream computing is the fundamental strategy used by most custom-built high performance hardware signal processing systems. Fundamentally, a stream centric software paradigm may provide the needed performance to replace current rigid computing resource and allow significantly reducing the cost of design, development and maintenance, as well as electric power requirements of custom built solutions. Conventionally high performance signal processing systems are constructed as custom-built hardware to provide the needed performance. Whilst custom built hardware does meet today's performance requirements, it fails to satisfy future requirements when an upgrade is necessary, which is usually expensive due to nonrecoverable engineering costs. Thus the trend is changing towards 1

2

Introduction

using parallel software applications executing on dedicated general purpose computer clusters [11]. Software solutions are flexible and can be easily maintained and upgraded in comparison to specialist hardware. Unfortunately current mainstream parallelization frameworks and middleware such as the Message Passing Interface (MPI) and OpenMP lack the scalability and flexibility required to commence addressing the SKA computing problem. However, in the research domain a new breed of parallelization middleware has begun to emerge based on a stream-centric computing paradigm. One such middleware has been released by the IBM Thomas J. Watson Research Centre, and is known as InfoSphere Streams. InfoSphere Streams is designed to be a real-time, distributed and adaptive data-stream processing middleware capable of ingesting and analyzing huge volumes of continuously streaming data, as well as being capable of automatic reconfiguration of system resources to satisfy changing requirements and objectives [38]. This dissertation aims to explore the use of InfoSphere Streams, and deploy it to a heterogeneous computing resource comprised of x86 and PowerPC Cell BE CPU architectures. A real-time autocorrelation spectrometer was chosen to be implemented to analyze data from a single radio telescope with intent to better understand the appropriateness of the technologies involved. Chapter 1 commences by giving a brief introduction to the field of Radio Astronomy and the motivations for studying electromagnetic emissions of an extraterrestrial origin. This is followed by an explanation of the correlation and in particular the auto-correlation process for producing the power spectral density of the signal received from a radio telescope. Chapter 1 concludes by outlining the high performance requirements for current and future radio telescopes especially considering the processing needs required for ingesting and analyzing data from very large arrays of antennae. Chapter 2 attempts to justify a possible solution for handling the immense processing requirements, which utilizes various commodity computing

1

Introduction

3

architectures in a configuration that is best suited for the given stages of a stream processing pipeline. Achieving an optimal configuration, or close to it, imposes requirements on software parallelization techniques to offer dynamic runtime services such as quality of service control, load-balancing, automatic code generation and seamless scalability. One such parallelization framework that claims to offer dynamic runtime services to best utilise various computing resources is InfoSphere Streams. Chapter 2 introduces and outlines InfoSphere Streams and its functionality and explains the many facets of its stream-centric programming approach with the Stream Processing Application Declarative Engine (SPADE) language. Chapter 3 explains how a real-time auto-correlation pipeline was constructed with SPADE and other software components. It also describes the deployment of the SPADE application and its complementary software components to a computing cluster comprising of x86 and Cell BE CPU architectures. The chapter also explains various network optimizations used to improve the bandwidth use of communication links between the physical cluster nodes. In chapters 4 and 5 correctness and performance results are presented and discussed. From the results limitations are identified and conclusions are presented.

Chapter 1

HPC Requirements for Radio Astronomy 1.1 Radio Astronomy Telescopes The field of Radio Astronomy is concerned with reception and analysis of naturally occurring extraterrestrial radio wavelength electromagnetic emissions. Astronomers use radio telescopes to detect, identify, measure and investigate a variety of celestial events and objects over time (see Figure 1.1). Scientific interpretation of extraterrestrial radio wave emissions enable astronomers to study the physics, chemistry and evolution of Image courtesy of NRAO/AUI

radio sources (such as quasars, pulsars, active galactic nuclei, supernova remnants, star formation regions, etc.) as well as the nature of the Universe as a whole [1]. A very important advantage of radio waves is their ability to penetrate the Earth's atmosphere

allowing

ground-based

radio

observations in contrast to much more costly space-borne observations (see Figure 1.2 & Table 1.1). Moreover radio waves also 5

Figure 1.1 Seyfert galaxy (3C219) imaged in the radio spectrum. Main radio features of this source include: an unresolved core at the center of the galaxy, a partial jet at one end of the core and extended hot-spots in both lobes.

6

Chapter 1 HPC Requirements

for

Radio Astronomy

penetrate through galactic dust and gas allowing the study of objects located in high density gas and dust regions (such as the galactic plane and galactic center) where optical observations are often impossible [2].

Image courtesy of NRAO/AUI

Figure 1.2 The radio window detectable by earth-based observatories is quite broad in comparison to the visible optical window. To detect wavelengths opaque to the Earth's atmosphere and ionosphere requires space-borne stations.

Observatory

Operator

Observing spectrum window

Estimated capital cost (in 2010 US$)1

Chandra

NASA

X-ray

$2.6 billion2

VLA

NRAO

Radio

$420 million3

Table 1.1 Although this table shows that Earth based stations are less costly than Space based stations, it does not negate the necessity and usefulness of observing in all bands of the electromagnetic spectrum. The table suggests that the current trend is heading towards constructing much larger and more sophisticated cost-effective Earth based stations.

Since radio wave emissions are of an extraterrestrial origin the strength of the signal received by a radio telescope (flux density) is very weak. Consequently the principal requirement for a radio telescope antenna receiving system is very high sensitivity, also referred to as low system temperature/noise [1, p. 7-0]. Greater antenna sensitivity contributes to better angular resolution, meaning the ability to better distinguish or resolve between signals emanating from very 1. Based on inflation data from Statistical Abstracts of the United States. 2. Obtained from Harvard University (http://chandra.harvard.edu/resources/faq/chandra/chandra-8.html). 3. Obtained from NRAO (http://www.vla.nrao.edu/genpub/overview/).

1.1 Radio Astronomy Telescopes

7

distant sources [2]. Initially greater sensitivity was achieved by constructing single dish-type antennae with a very large diameter. However it became readily apparent that the cost of constructing dish-type antennae increases with diameter D , estimated to be of the order D2.7 [3]. As a result of the limitations imposed by the prohibitive cost arising from increasing the diameter of a single antenna the strategy changed to constructing arrays of antennae comprised of cost-efficient small diameter dish-type antennae, or in some cases less complex antenna types such as tile arrays or Yagi antennae (see Figure 1.3). The array of antennae can be aggregated into one virtual antenna with a greater resolving power than any of its physical lower resolving power antenna elements by means of crosscorrelating and integrating the digitized signals arising from each receiving element using some digital signal processing computing system [4]. Arraying antennae overcame the limitations of a single antenna and allowed greater resolving power. However the arraying strategy introduced a new Image courtesy of CSIRO

(a)

Image courtesy of Max Plank Institute Image courtesy of SETI Institute

(b)

(c)

Figure 1.3 (a) Eight-element Yagi antenna array at Dover Heights, Sydney, Australia. (b) Tile antenna array station part of LOFAR at Efflesberg, Germany. (c) The Allen Telescope Array (ATA), California, U.S.A.

complication in terms of increased data volumes and processing requirements. This can be readily observed by considering a single antenna receiving system producing a signal sampled at a frequency of 64MHz with a sample quantization of 8 bits; the amount of data such a receiver system would produce in 10 seconds from a single polarization is 640MB. Hence an array of n antennae operating with the same previously mentioned specifications would produce n

8


for

Radio Astronomy

times 640MB every 10 seconds. Ultimately the limitation imposed on antenna arrays is bound by an interplay of how much digitized signal data can be locally stored (recorded) or transferred over digital networks versus how quickly it can be processed by the digital signal processing computing system. LOFAR outrigger in Scandinavia (LOIS) [5] and the proposed Square Kilometer Array (SKA) [6] are some very prominent examples of antenna array systems that are comprised of thousands of receiving elements. The data flow produced by arrays with such a large number of receiving elements can total up to many Terabits per second effectively requiring high performance digital networks along with Teraflops to Exaflops scale of computer processing power [7].

1.2 Cross-Correlating and Integrating There are many radio telescope digital signal processing applications that rely on cross-correlation and integration as an initial step. For example, both spectroscopy and interferometry are very commonly used Radio Astronomy analysis techniques that use cross-correlation and integration as an initial step [8]. This dissertation concentrates on a special case of cross-correlation known as auto-correlation. Auto-correlation is a technique used when the digitized signal from a single telescope is cross-correlated with itself followed by integration to produce an estimate of the power spectral density of the received signal originating from an extraterrestrial source or event. It is worth mentioning at this point that since the output of the receiving system consists of the radio source signal and the noise of receiving system pipe, the main purpose of integrating for a suitable period of time is to improve the signal to noise ratio. In essence, frequency channels in the power spectrum that contain no periodic coherent signal grow in amplitude slower as the noise is integrated out due to its random nature. Consequently spectrum channels that do contain periodic coherent signals gain in amplitude faster due to their non-random nature, therefore increasing the signal to noise ratio.

1.2 Cross-Correlating

and Integrating

9

The auto-correlation of a signal can be used to evaluate the signal's power spectral density (PSD), which is given by the Wiener-Khinchin theorem, which states, \the power spectral density of a wide-sense-stationary random process is the Fourier transform of the corresponding auto-correlation function" [9]. Hence the power spectral density Sxx _ f i of a signal x _ t i is given by:

Sxx (f) =

#r

(x) e-i2rft dx

(1)

# x (t) x (t - x) dt

(2)

xx

where 3

rxx (x) =

*

-3

is called the auto-correlation function of x (t) . From the above it is readily observable that auto-correlation is somewhat related to the concept of convolution, that is the received digital signal is simply convolved with itself to produce the power spectral density. Convolving the signal with itself is difficult if not impossible when one considers that time

series digital data arising from the signal function x _ t i in most cases runs over very long time periods given by x . Essentially a greater the time period

given by x equates to more storage requirements as well as more time taken to produce the final analysis of results. Since x can represent a very long period of time (in some cases hundreds of hours) it is impractical to delay computations until t = x . To reduce the complexity of convolution and allow computations to occur immediately the time series samples from the digitized signal need to be transformed from the time domain to the frequency domain. Transformation from time domain to frequency domain is accomplished by using a Fourier transform. The frequency domain allows samples belonging to the same time interval

t

to be multiplied

and consequently permits computations to occur immediately. In practice, once a suitable channel resolution

n

is selected,

2n contiguous

10


samples are fast Fourier transformed, which produces

for

Radio Astronomy

2n frequency domain

samples. Each frequency domain sample is multiplied by its complex conjugate then half the samples

n

(first half or second half taken depending on the

desired Nyquist window [10]) are taken resulting in the PSD. Naturally, if the PSD is inverse fast Fourier transformed, then this would result in the autocorrelation of the signal in accordance with the Wiener-Khinchin theorem. The entire power spectral density for time period x is obtained by integrating and averaging. Integration and averaging involves element-wise addition of each power spectral density strip of size

n

frequency domain samples followed

by an element-wise division by the number of PSD strips accumulated so far. At any time

t < x the obtained power spectral density is the best estimate so

far.

1.3 HPC Requirements for Modern Radio Telescopes The Square Kilometer Array (SKA) will comprise of thousands of antennae generating up to 80Gbps of data per antenna, resulting in 100 Tbps to 1 Pbps of data generated every second [5]. In order to transport and analyze the huge volume of data produced by a complex radio telescope system such as the SKA, parallel software executed by high performance computing and networking facilities capable of processing and transporting huge volumes of data at very high rates is required. It is not feasible with respect to both cost and time to store and later process such an enormous volume of data. Intuitively it is desirable to transport and process the data flow in real-time or as close as possible to real-time in order to avoid data storage and produce the scientific results in a timely and cost effective manner. Traditionally high performance real-time computing systems are purposebuilt hardware systems, especially for applications such as real time signal processing that involve high data flow rates. Purpose-built computing systems achieve optimization, however this comes at an expense associated with

1.3 HPC Requirements

for

Modern Radio Telescopes

11

customization and specialization. Also considering that technology continuously evolves, computing systems must eventually be upgraded. Upgrading purpose built computing systems usually leads to non-recoverable engineering costs [11]. This expense can be mainly attributed to the rigid-design nature of purpose built hardware systems. It is much more economically favorable to construct high performance computing systems and networking facilities by interconnecting commodity computing hardware resources capable of running parallel software [37]. Software design is flexible in nature and is therefore generally more cost effective to construct and maintain in comparison to purpose-built hardware. Although software design is flexible in nature, one should also aim to minimise the need to redesign. As a consequence, this implies that parallel software should ideally be designed in such a way that is agnostic to the type of hardware architecture in order to minimise significant re-design. Redesign may occur if the parallel software were constructed for a specific commodity hardware architecture that got replaced with a superior yet different hardware architecture. Furthermore, it is also important to consider that some commodity computing hardware architectures are more suited to certain problem types than others, thus constructing computing clusters that offer a variety of commodity hardware architectures may allow for some optimizations to be made. Ultimately the parallel programming environment responsible for executing the parallel software should be able to dynamically optimize its deployment or reconfiguration in response to changes that may arise anywhere from the operating environment to the science requirements [12]. The demands for cutting edge high performance computing and networks has led to modern radio astronomy telescope antenna arrays to be referred to as digital IT telescopes.

Chapter 2

Heterogeneous Computing & Parallel Middleware 2.1 Heterogeneous Computing Systems Heterogeneous computing is the utilisation of different hardware architectures to process and manage computations. This is achieved in a micro sense by integrating different CPU architectures onto a single silicon wafer, or in a macro sense by interconnecting different CPU architectures. Combining different hardware architectures on such a broad spectrum offers a dynamic computing infrastructure capable of providing specialised computational services best suited for the current problem environment [13]. Sony, Toshiba and IBM Cell Broadband Engine (STI Cell BE) [14] One prominent example of a heterogeneous architecture at the microlevel is the STI Cell BE. The Cell BE is a single-chip multi-core CPU that is comprised of nine processing elements (each element contains a single homogeneous processing unit) which can share or access the entire available memory. Moreover the Cell BE processor (see Figure 2.1) is considered to be a micro heterogeneous computing architecture since the processing elements are specialized into two types and distributed into one Power Processing Element (PPE) and eight Synergistic Processing Elements (SPEs).

13

14

Chapter 2 Heterogeneous Computing & Parallel Middleware

Image courtesy of E¶rick Luiz Wutke Ribeiro

Figure 2.1 Sony Toshiba IBM Cell Broad Band Engine.

The PPE encapsulates a 64-bit PowerPC architecture duo-core processing unit that is intended for control-intensive operations. In contrast each single SPE encapsulates a 128-bit single-core RISC processing unit that supports single-instruction multiple-data (SIMD), and is intended for running computer intensive operations. The SPEs are processing elements independent from each other. An SPE executes single-threaded program at a time. Each SPE has complete access to main memory via multiple direct memory access (DMA). There is however a bidirectional dependency between the PPE and the SPEs. The SPEs rely upon the PPE to run the operating system or more generally speaking the main thread of control for an application, whilst the PPE delegates the computation intensive operations required by the application to the SPEs. Both the PPE and SPEs can be programmed using high-level languages, for example C/C++. The SPE's instruction set provides comprehensive support for single instruction multiple data (SIMD) operations. The use of SIMD operations is not compulsory, however it is advantageous to use SIMD operations whenever possible for significant computational performance gains. To further bolster the emphasis of utilizing SIMD operations, the PPE's standard PowerPC instructions set is augmented to support vectorized SIMD operations to exploit the instruction level parallelism provided by the SPE's 128-bit architecture.

2.1 Heterogeneous Computing Systems

15

Currently the Cell/B.E. ranks highest amongst most CPU's in terms of its computational performance versus its power usage which is measured in flops/ watt. This is mainly due the Cell/B.E.'s multi-core specialization that arises from its heterogeneous architecture. From a programmer's perspective the Cell/B.E. provides a single processing core with two threads, and eight additional specialized processing cores each having a single thread of execution and its own local store (LS), which has a size of 256KB. The most significant difference to conventional multi-threaded programming arises from how the SPE's access main memory. The PPE accesses main memory just as any conventional CPU by using standard load and store instructions, which transfer data from main memory to a private register file where its contents can be cached if needed. However the SPEs access main memory via DMA commands that transfer data to or from main memory to the SPE's LS. An SPE may have up to 16 asynchronous DMA transfers in progress at any one time. Enabling multi asynchronous DMA transfers to occur between an SPE's LS and main memory allows an SPE to transfer data faster to and from its LS. The main motivation for this type of mechanism is to overcome memory latency that hinders the performance of software. Memory latency, which is measured in processor cycles, has been increasing dramatically over the years. Software performance is predominately adversely affected as a result of the CPU idling up to several hundred cycles due to a load instruction missing the local caches rather than insufficient computational resource availability. Blade Systems [15] A common technique to achieve heterogeneous computing on a macro-level is the utilisation of Blade Systems. A Blade Server is a functional computer with minimal components in comparison to a standard rack-mount server, which at the very least requires power and networking components. A Blade Center is a chassis capable of holding multiple Blade Servers (see Figure 2.2).

16


The chassis provides non-computing services for example power, networking, storage various interconnects and management. Essentially a Blade System is designed to save space and minimise power usage by removing non-computing service components and placing them in one place that can be shared by Blade

Images courtesy of IBM

servers.

(a)

(b)

(c)

Figure 2.2 (a) IBM Blade Center H Chassis . (b) IBM HS-12 Blade Server. (c) IBM QS-22 Blade Server.

Blade Centers can also be interconnected to form more complex systems. Moreover, a Blade Center may enclose Blade Servers of different types of hardware architectures such as IBM's RoadRunner heterogeneous Tri-Blade architecture [16]. Fundamentally, the modular approach adopted by Blade Systems allows for a more diverse super computing infrastructure that is heterogeneous, scalable and upgradable. This dissertation utilizes a Blade Center that encloses Blade Servers with two types of hardware architectures: Intel Xeon and Cell BE hardware architectures.

2.2 Parallelization Middleware Middleware is software that lies between application code and run-time infrastructure. Parallelization middleware is a software framework, which allows the distribution and execution of some compute intensive work-load to multiple computing resources. Usually the computing resources are a cluster of interconnected computer nodes, however it can be a single computing node

2.2 Parallelization Middleware

17

capable of performing simultaneous tasks. The main intention for distributing the workload is so that it can be performed in parallel to achieve a gain in speed-up for overall work-load execution. There are many types of parallelization frameworks, some of which offer an API with perhaps some libraries and others offer more comprehensive frameworks comprised of a parallel programming language along with some parallel execution environment. Conventionally, parallelization from a programming language aspect is achieved either in an explicit and/or implicit fashion [17]. In explicit parallel programming the responsibility of scheduling which parts of the parallel program execute on which physical computing nodes is entirely up to the programmer, which allows flexibility at the expense of more programming effort. For example MPI [18] (Message Passing Interface) is an API and libraries for constructing parallel programs that execute on multiple physical computing nodes. Implicit parallel programming allows parallelization by offering the programmer a set of compiler directives for identifying parallel sections of the program. Whilst this reduces the overhead in programming, it is however done at the expense of flexibility. OpenMP [19] is an example of implicit parallel programming. However OpenMP is limited to a single physical computing node which is capable of executing concurrent process, and OpenMP compiler directives can only be applied to certain constructs such as for loops. Usually a combination of both an explicit parallelization technique used to distribute tasks to nodes, and an implicit technique used to achieve parallelization within a node. In this dissertation we conduct an initial probe into the possibility of utilizing a parallel computing middleware capable of dynamically scheduling and executing computations on a heterogeneous computing system as a possible solution for handling the current and future high performance computing needs of modern real-time radio telescope antenna systems. The

18


chosen parallel computing middleware known as InfoSphere Streams offers a comprehensive framework offering the programmer both explicit and implicit ways of implementing a parallel program using a stream-centric programming paradigm. A stream-centric approach is a departure from traditional processing involving posing queries to more or less static data. A stream-centric computing paradigm supposes the data set is continuously changing over time hence continuous results are produced from long-running queries (see Figure 2.3). InfoSphere Streams takes the stream-centric paradigm a step further by allowing modification of long-running queries over time [20].

(a) Image courtesy of IBM

(b)

Figure 2.3

(a) Static data (store and process later). (b) Streaming data (process data immediately as it arises).

2.3 InfoSphere Streams InfoSphere Streams or Streams for short is a data stream management system (DSMS) middleware designed to ingest, filter, analyze and correlate enormous amounts of data incoming from an unlimited number of data stream sources. Streams' intention is to enable organizations and institutions to rapidly respond to their respective changing environments without the need to store and later process huge amounts of data. Streams aims to fulfil its intention by achieving the following objectives [21]: ² The ability to scale up or cluster to a wide variety of hardware architectures as the demand for more processing power increases. ² Provide automation for the handling of data streams that is responsive to changing user requirements as well as data and system resource availability. ² Incremental tasking for changing data schemes and types. ² Secure transmission of data streams at all system levels, along with comprehensive auditing of the execution environment.

2.3 InfoSphere Streams

Utilizing a Heterogeneous Computing Cluster & IBM ...

Utilizing a Heterogeneous Computing Cluster & IBM ...

Suggest Documents

utilizing spare cycles on supercomputers - Cluster Computing, 2003

heterogeneous gpu&cpu cluster for high performance computing in ...

CLUSTER COMPUTING

IBM Cloud Computing

QP: A Heterogeneous Multi-Accelerator Cluster - CiteSeerX

A Heterogeneous Cluster Ensemble Model for ... - ScienceDirect

QP: A Heterogeneous Multi-Accelerator Cluster - IMPACT

Heterogeneous Computing Workshop, 1997.

Distributed Spectrum Sensing Utilizing Heterogeneous ... - CiteSeerX

A MapReduce Framework for Heterogeneous Computing Architectures

HeNCE: A Heterogeneous Network Computing ... - CiteSeerX

Heterogeneous Computing in Economics: A Simplified Approach ...

Cluster Computing - Semantic Scholar

Cluster Computing - CiteSeerX

Heterogeneous computing for a hybridizable discontinuous ... - arXiv

Computing upper cluster algebras

Dictionary of IBM and computing

IBM and GRID Computing - 6NET

IBM and GRID Computing - 6NET

Assimilation of Heterogeneous Resources by Utilizing a Unified ...

Comparison between Cloud Computing, Grid Computing, Cluster ...

Heterogeneous Distributed Computing - Semantic Scholar

Heterogeneous Streaming - Innovative Computing Laboratory

a framework for opportunistic cluster computing