Fast 2D and 3D image processing with OpenCL (PDF Download ...

19 downloads 131632 Views 569KB Size Report
Image processing with GPUs requires the use of an API like OpenCL or CUDA. A higher ... libraries with GPU compatible routines have also been de-. veloped ...
FAST 2D AND 3D IMAGE PROCESSING WITH OPENCL Daniel Oliveira Dantas, Helton Danilo Passos Leal, Davy Oliveira Barros Sousa Universidade Federal de Sergipe Departamento de Computac¸a˜ o ABSTRACT Image processing with GPUs requires the use of an API like OpenCL or CUDA. A higher level library that hides these APIs is a better option if the programmer does not need to fine tune or implement his own image processing operations. Among recent libraries that support OpenCL, there are OpenCLIPP and OpenCV, although none of them supports 3D images. Using OpenCL, however, is not as simple as programming CPUs, and many API calls are needed to prepare the environment before calling the image processing function (shader). In this article we describe a library with a code generator that, from a few directives merged in a shader source code, generates a wrapper code with all the OpenCL API calls needed before calling the shader, simplifying, thus, the maintenance of an image processing library. The proposed library performance is better than OpenCV, CImg, ITK libraries for all the tested operators. Index Terms— OpenCL, OpenGL, 3D image processing. 1. INTRODUCTION Real time video processing and fast 3D image processing can be quite challenging because of the processing power needed. For specific applications, such as MPEG compression/decompression, there are integrated circuits capable of executing them. But for non-specific applications, we rely on generic CPUs or modern programmable GPUs. New CPUs also are capable of parallel processing, and four to eight cores CPUs are common today. Another option is the APU (Accelerated Processing Unit), usually made by AMD. The advantage of the GPUs is its capacity of parallel processing, useful for pixel and voxel processing. There are at least three languages that can be used to program GPUs: GLSL, CUDA and OpenCL. GLSL was designed with graphics processing in mind, so, it needs some hacking to be used in generic processing. CUDA, differently from GLSL, is designed for generic processing, but is compatible only with NVIDIA GPUs. OpenCL, besides being designed to tackle generic programming problems, is compatible with most big GPU brands, multicore CPUs and also with APUs [1]. So, for both its design and compatibility, it seems to be the best language to our purpose, which is to do fast image processing.

978-1-4799-8339-1/15/$31.00 ©2015 IEEE

4858

However, the use of OpenCL is not as straightforward as the high level languages to program CPUs we are used working with. A series of API calls are needed to do tasks not directly related to our work. Before actually processing the image, we have to: detect devices compatible with OpenCL in the computer, initialize the OpenCL context and execution queue, compile and link the code that will run in the GPU (shader), transfer the image and parameters to the GPU etc. To help the programmer using parallel processing power, many tools have been developed in the last years. OpenMP is an API with a set of compiler directives, routines and variables to specify high level parallelism. OpenMP runs on CPUs and APUs [2]. Some image and video processing libraries with GPU compatible routines have also been developed, such as GPUCV [3] and OpenVidia [4], but they are not being actively maintained anymore. OpenCV [5] seems to be the most widely used and actively maintained image processing library with GPU support. OpenCLIPP [6] is another example of open source image processing library with GPU support. CUVIlib [7] is a library that support GPU processing, although it is not free. None of them support 3D image processing. Among the libraries that support 3D image processing there are ITK [8] and CImg [9]. CImg does not support GPU processing. ITK does support GPU processing of some operations. Another example is the library VisionGL [10, 11] which is open source, supports GPU processing with GLSL and CUDA, and has a tool to generate wrapper code automatically. The objective of this work is to extend the VisionGL library [10, 11] in order to support OpenCL operators, both 2D and 3D; to facilitate the implementation of image and video processing functions, also known as shaders; to facilitate the integration of shaders written in different languages; and also to maximize the processing speed. We will refer to the extended library as VCL. To reach the aforementioned goals, was created a wrapper code generator that automatically generates C wrapper code from the code written in OpenCL, augmented with some directives. The generated wrapper code is responsible for: uploading the images to the GPU only when necessary; compiling and linking the shaders on demand only once per program execution; and calling the image processing shaders. The generated wrapper code keeps track of the context that contains

ICIP 2015

the most up-to-date version of the image, thus avoiding unnecessary transfers. When using the VCL library, the user only needs to do a function call to initialize the contexts, read one or more images and start calling the wrapper functions. The user can alternate CUDA, GLSL and OpenCL operators without worrying about memory copies or about initializing the different contexts, just as when using OpenCV, for example. The VCL library is about as fast as OpenCV-OpenCL module, with the advantage of generating the GPU shaders wrapper code. It has fewer functions than OpenCV, but stores two-dimensional images in a format compatible with OpenCV, allowing its use when needed. A few operators were added as proof of concept and a benchmark was run to compare the times of the VCL library with other available libraries: 2D operators were compared with OpenCV running in the CPU, and with OpenCV-OpenCL module running on GPU; 3D operators were compared with CIMG and ITK running on CPU, and with ITK-OpenCL module running on GPU. 2. METHODS The library has a structure called vglImage that has pointers to images stored in each supported context, which are RAM, OpenCL, CUDA and GLSL. The supported formats are twoand three-dimensional images of one, three (RGB) and four (RGBA) channels. The library has a script that generates the wrapper code with the API calls needed before calling the shader. This wrapper code is written in C language. The wrapper code is created in a way that all the user needs to do to apply some operation is to call a single function with its parameters. The wrapper function is responsible for linking and compiling the shader source code, sending the image and parameters to GPU, and calling the shader. The image data is sent to GPU only when necessary to avoid the transfer costs. Each shader code is compiled and linked only when called for the first time. To support OpenCL, a link to an image in OpenCL context was added to the vglImage structure. Besides that, a pointer that supports three-dimensional arrays was added to store the voxels in RAM. A small array stores the sizes of each dimension and an integer value stores the number of dimensions. With these changes it is now possible to use threedimensional images.Loading three-dimensional images can be done: by reading a sequence of two-dimensional images (stack); or by using a high level function that loads DICOM images with the GDCM [12] or DCMTK [13] libraries. A new wrapper code generator script was created to generate the code needed by OpenCL shaders. High level functions that detect OpenCL compatible devices, compile and link shader code, and initialize OpenCL environment were also added to the library. To process 3D images, some shaders were created. A 3D

4859

blur shader with hard coded mask, an erosion shader and a convolution shader, both with parameterized masks, a threshold and a negation shader. 2.1. Wrapper code generation The code is generated by a script directly from the shader source code, augmented with some directives. The output of the script is a pair of files, one with the C functions and another with the headers. There are two kinds of directives recognizable by the script. The first kind is a documentation comment compatible with doxygen, as in line 1 of Figure 1. This comment is copied verbatim to the output C source file. The second kind of directive, the ARRAY directive, as in line 3 of Figure 1, is used to define the size of arrays passed as parameters, for example, convolution masks. To generate the wrapper code, it is necessary to know whether an image is used for input, output or both. This information can be inferred from the access qualifiers placed before the image3d t data type. There are three kinds of access qualifiers: read only, write only and read write. Depending on the access qualifier, a different code will be generated for a given image. In line 4 of Figure 1, there are two parameters used to pass images to the shader, both with access qualifiers. The mapping between shader parameters and C language wrapper function parameters is one to one. Each parameter of the shader with type image2d t or image3d t will be mapped to a parameter of the wrapper function with type vglImage. Parameters with type int or float are mapped to parameters with same type in the wrapper function. Parameters with type int* or float* are mapped to parameters with same type in the wrapper function but they need an ARRAY directive to define the size of the array that will be uploaded to the GPU. Examples of these types of parameters can be seen in line 5 of Figure 1. Every parameter must be uploaded to the GPU by means of API calls. 2.2. Automatic context tracking To track the most up-to-date context, each image stores a bitfield called InContext. In the proposed implementation, there are four valid contexts: RAM, GL, CUDA and CL. Each of these contexts is represented by a different bit. There is also a blank context, represented by all bits being equal to zero. The blank context is used when an image is allocated without storing any information in it. When the context is blank, there is no need to do memory copies. This context is used when we need to allocate an image that will later store the result of some operation. Once the operation is finished, the context will assume some value different from blank. When an image is read from disk or captured from a camera, the data is stored in RAM context. After an upload to a GPU context, the new context is added to the bitfield. In this

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

/** Convolution of src image by mask. Result is stored in dst image. */ // ARRAY convolution_window [window_size_x*window_size_y*window_size_z] __kernel void vglCl3dConvolution(__read_only image3d_t img_input, __write_only image3d_t img_output, __constant float* convolution_window, int window_size_x, int window_size_y, int window_size_z){ int4 p = (int4)(get_global_id(0), get_global_id(1), get_global_id(2), 0); const sampler_t smp = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST; float4 result = (0,0,0,0); int radius_x = floor((float)window_size_x / 2.0f); int radius_y = floor((float)window_size_y / 2.0f); int radius_z = floor((float)window_size_z / 2.0f); int c = 0; for(int k = -radius_z; k

Suggest Documents