use of GPUs. Benchmarking experiments were run with win-. dow and point operations to compare Python, MATLAB and. VisionGL when processing 1D to 5D ...
FAST MULTIDIMENSIONAL IMAGE PROCESSING WITH OPENCL Daniel Oliveira Dantas, Helton Danilo Passos Leal, Davy Oliveira Barros Sousa Universidade Federal de Sergipe Departamento de Computac¸a˜ o ABSTRACT Multidimensional image data, i.e., images with three or more dimensions, are used in many areas of science. Multidimensional image processing is supported in Python and MATLAB. VisionGL is an open source library that provides a set of image processing functions and can help the programmer by automatically generating code. The objective of this work is to augment VisionGL by adding multidimensional image processing support with OpenCL for high performance through use of GPUs. Benchmarking experiments were run with window and point operations to compare Python, MATLAB and VisionGL when processing 1D to 5D images. As a result, speedups of up to two orders of magnitude were obtained. Index Terms— Multicore processing, OpenCL 1. INTRODUCTION Multidimensional image data, i.e., images with three or more dimensions, are used in many areas of science, as astronomy, remote sensing, medical imaging, optical and electron microscopy. Hyperspectral astronomical images [1] and satellite images [2] can have hundreds of bands where each band represents a different wavelength, Medical images obtained by Computerized Tomography (CT) scan, Positron Emission Tomography (PET) scan and Magnetic Ressonance Imaging (MRI), usually have three dimensions. Flow MRI adds a dimension that stores fluid flow direction in three spatial axes [3]. Optoacoustic imaging generates three dimensional images plus an additional wavelength dimension [4]. Current microscopy techniques can generate three dimensional images, plus a wavelength dimension [5]. Thus, by adding a time dimension to these images, we reach four or even five dimensions. Multidimensional data processing is supported by tools either proprietary, as MATLAB, or free, as Python. MATLAB supports multidimensional matrices and images, and can do convolutions, morphological operations and many other operations in both CPU and GPU [6]. However, when using GPUs, MATLAB supports only two-dimensional kernels or structuring elements, and requires NVIDIA GPUs. Multidimensional image support in Python is provided by scipy.ndimage library and numpy.ndarray data
,(((
type. Many image filters are available, as convolution, median, morphological filters and others [7]. VisionGL [8, 9] is a freely available library that processes two- and three-dimensional data with GPUs. It has a wrapper code generator to ease the task of creating functions that run in GPUs. Also tracks and automatically controls the transfer between RAM and GPU contexts to eliminate unnecessary transfer costs. The library can process twodimensional images by GLSL, CUDA or OpenCL functions. Three-dimensional images are supported only by the OpenCL module of the library. The objective of this work is to augment VisionGL to support multidimensional image processing support with OpenCL. Benchmarking experiments were run with window and point operations to compare Python, MATLAB and VisionGL performance when processing 1D to 5D images. As a result, speedups of up to two orders of magnitude were obtained. This paper presents, in Section 2, a generalization of mathematical morphologic structuring elements to n dimensions. Section 3 describes the methodology to implement the support to four and higher dimensions. Section 4 presents the results obtained and Section 5, the conclusion. 2. MATHEMATICAL MORPHOLOGY Mathematical morphology defines a collection of useful image processing operations. Dilation is one of them [10]. Dilation in gray-scale images is defined as follows. f ⊕ S(x) = max{ f (x + z) : z ∈ S}
(1)
where f is a grayscale image, S is a set of pixel coordinates centered at the origin, x is the n-dimensional coordinates of a pixel in the input image, and z is the n-dimensional coordinates of a pixel in S. A two-dimensional horizontal line of size 3 can be represented by the set {(−1, 0), (0, 0), (1, 0)}, a vertical one, by the set {(0, −1), (0, 0), (0, 1)}, and an elementary cross, by the set {(−1, 0), (1, 0), (0, 0), (0, −1), (0, 1)}. A common structuring element is the box. Let A = {−1, 0, 1}, the two-dimensional box B2 is given by the cartesian product A × A. The box can be generalized to any dimension n, being Bn = An , and having cardinality 3n .
,&,3
The dilation by the box can be optimized by separating it in n dilations by linear structuring elements with size 3, each one along a different axis from dimension 1 to n. By doing this separation, the total number of pixels in the used structuring elements grows linearly with n instead of exponentially [11]. In the 1D dilation by the box, the optimization has no effect as a single structuring element is used, which is exactly the same as the 1D box. An optimized 2D dilation by the box would use two structuring elements, one is 1x3 and the other is 3x1. The optimized 3D dilation would use three structuring elements, a 3x1x1, a 1x3x1 and a 1x1x3. The other dimensions are analogous. Another common structuring element is the cross, defined in two dimensions as the pixel in the origin plus its neighbors in the 4-adjacency totaling 5 pixels. Just like the box, it can be generalized to other dimensions. For sake of comparison, we will define a 1D cross as being equal to the set A, with 3 pixels. The two- and higher-dimensional cross can be defined as the union of the 4-neighborhoods of a pixel in all possible orientations. An n-dimensional cross is denoted by Cn and has 2n + 1 pixels. Below are some Cn examples: C1 = {(0), (±1)} C2 = {(0, 0), (±1, 0), (0, ±1)} C3 = {(0, 0, 0), (±1, 0, 0), (0, ±1, 0), (0, 0, ±1)} .. . Cn = {(0, ..., 0), (±1, 0, ..., 0), ..., (0, ..., 0, ±1)}
(2)
In this paper, the morphological operations used to obtain the benchmarks were the dilation by separated n-dimensional box and dilation by n-dimensional cross. 3. METHODS The library VisionGL provides a set of tools that helps the use of GPUs to process images. Wrapper code generation and context tracking are two features of the library. It supports image processing in GPUs by using the languages GLSL, CUDA and OpenCL. To add multidimensional image support to VisionGL, the chosen language was OpenCL for having more flexibility than GLSL, and for the much broader compatibility than CUDA, which runs only in NVIDIA GPUs. To handle images and textures, OpenCL has the built in data types image2d t and image3d t, called OpenCL Image. These datatypes are useful for image processing. By using these data types, clamp to edge can be done automatically, a useful feature for windows operations as convolutions and morphological filters. Access to pixels by their coordinates is provided by the built in function read imagef. Another feature is the possibility to store a one-, two- or four-channel pixel in a variable and to do operations with it.
Drawbacks of the OpenCL Image are the support to data with no more than three-dimensions, and image width, heigth and depth size limited to a few thousand pixels each. So, an alternative is required for working with big images. An alternative to the OpenCL Image is the OpenCL Buffer. The Buffer does not support channels, automatic clamping, or built in pixel access through pixel coordinates. The Buffer is stored in a linear array and, as such, the shader has to calculate the linear pixel index from its coordinates. Despite the disadvantages mentioned, the supported size is big enough to store multidimensional images as desired, so it suits our needs. As a proof of concept, we added two kinds of image operations to the VisionGL library: point and window operations. Point operations are very simple, as the output pixel depends on a single input pixel. Window operations on a pixel are more complicated, and require looking up the value of its neighbors by calculating their indices from coordinates. Listing 1 shows an implementation of a multi-dimensional convolution in OpenCL. We store images with up to 10 dimensions in raster format inside a one dimensional array. Support to more dimensions requires recompiling the library. Dimension 1 is width, dimension 2 is height, dimension 3 is depth and so on. Dimension 0 can be used to represent image channels. The variable img shape (Listing 1, line 8) of type VglClShape stores the shape of the image. The shape structure stores the number of dimensions, total image size, the size of each dimension and offset needed to travel one pixel in each dimension. The offset is pre-calculated to save time when doing window operations in the GPU. If we have a pixel p with coordinates c = {c1 , c2 , ..., cn } and offset o = {o1 , o2 , ..., on }, the linear index of the pixel p will be c1 o1 + c2 o2 + ... + cn on . The variable win (Listing 1, line 9) of type VglClStrEl stores the convolution window, or structuring element when doing a morphologic operation. This structure stores the same kind of information as VglClShape but related to the geometry of the window. Stores also a data array of float, with the weights of the convolution window, or a non-zero value if the pixel belongs to the structuring element. Notice that in Listing 1, line 28 there is a test that checks if a pixel of the convolution window is zero. The same kind of test allows the morphological operations to skip pixels not belonging to the structuring element. 3.1. Wrapper code generation The library VisionGL has an automatic wrapper code generator that generates a C++ function and its prototype automatically from the OpenCL shader source code. We will refer to functions that run in the GPU as shaders. Usually, each parameter in the shader maps to its counterpart in the wrapper function. Images in shader are mapped to images of type VglImage in the wrapper function; scalars are mapped
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
Listing 1 shows an example of the SHAPE directive. The first parameter indicates the variable name in the OpenCL shader, and the second gives the C++ expression that should be used in the wrapper function to generate the shape structure.
/** N-dimensional convolution */ //SHAPE img_shape (img_input->vglShape->asVglClShape()) #include "vglClShape.h" #include "vglClStrEl.h" __kernel void vglClNdConvolution( __global unsigned char* img_input, __global unsigned char* img_output, __constant VglClShape* img_shape, __constant VglClStrEl* win) { int coord = ( get_global_id(2) * get_global_size(1) * get_global_size(0) ) + ( get_global_id(1) * get_global_size (0) ) + ( get_global_id(0) ); int irem = coord; int idim, i, d, off; float result = 0.0; int img_coord[VGL_ARR_SHAPE_SIZE]; int win_coord[VGL_ARR_SHAPE_SIZE]; for(d = img_shape->ndim; d >= 1; d--) { off = img_shape->offset[d]; idim = irem / off; irem = irem - idim * off; img_coord[d] = idim - ((win->shape[d] - 1) / 2); } int conv_coord = 0; for(i = 0; i < win->size; i++) { if (!(win->data[i] == 0)) { irem = i; conv_coord = 0; for(d = img_shape->ndim; d > win->ndim; d--) { conv_coord += img_shape->offset[d] * img_coord[d]; } for(d = win->ndim; d >= 1; d--) { off = win->offset[d]; idim = irem / off; irem = irem - idim * off; win_coord[d] = idim + img_coord[d]; win_coord[d] = clamp(win_coord[d], 0, img_shape->shape[d]-1); conv_coord += img_shape->offset[d] * win_coord[d]; } result += img_input[conv_coord] * win->data[i]; } } img_output[coord] = result; }
3.2. Input and output In VisionGL, multidimensional images are loaded and saved as a stack of two-dimensional images numbered sequentially. Sizes of first and second dimensions are by default obtained from the images, but sizes of third and higher dimensions must be provided by the user. Saving the image is straightforward: all frames are written to disk automatically as two-dimensional images numbered sequentially. Threedimensional image formats are optionally supported by linking to libraries GDCM [12], DCMTK [13] or libTIFF [14]. 4. RESULTS
Listing 1. Multidimensional convolution shader in OpenCL
to scalars; and structuring elements of type VglClStrEl are mapped to structures of type VglStrEl. Arrays are mapped to arrays also, but require an ARRAY directive inside the shader specifying the array size. Inside the wrapper code, all API calls needed to execute the shader are made, including shader compilation, parameter specification, upload of image data to GPU memory when necessary, enqueueing of shader execution and specification of the so called work size. When dealing with two- and three-dimensional OpenCL Images, the image shape does not matter, as access to a pixel neighborhood is done transparently by calls to the built in function read imagef. Automatic clamping, i.e. truncating coordinates when accessing pixels outside the image, is also provided. On the other hand, multidimensional images are stored in an OpenCL Buffer, and there is no way to infer its shape. So, the implementation of windows operations requires the multidimensional image shape to be passed as parameter to the shader. A new directive called SHAPE was added to specify from which VglImage received as input the shape will be obtained. The shape parameter appears only in the shader, being omitted in the wrapper function parameter list. Line 2 of
To assess the performance of VisionGL with OpenCL Buffer for multidimensional image support, i.e. VisionGL-BUF, we compared it with tools capable of processing nD images with nD window operations. The tools that met the criterium were Python and MATLAB. Although MATLAB is capable of optionally doing image processing in NVIDIA GPUs, only 2D kernels and structuring elements are supported [6]. Python and MATLAB both ran in the CPU. VisionGLBUF, by using shaders implemented in OpenCL, ran in the GPU. The machine used to generate the benchmarks was an Intel Core i5 4590, 8 GB RAM 1666 MHz, GPU Radeon R9 270X, with 2 GB of memory. The input image used to generate the benchmarks was the mitosis-5d dataset obtained from the ImageJ website [15]. The image used has 8 bits per pixel and five dimensions, 256x256x2x24x7, totaling 22 MBytes. To evaluate the costs of operations in dimensions other than the original, the same image was reshaped to 4D (256x256x2x168), 3D (256x256x336), 2D (256x86016) and 1D (22020096). All the results are shown in seconds. In the benchmark, three kinds of window operations and five kinds of point operations were used. Window operations were evaluated in five different dimensions, from 1D to 5D, always in an image with same number of dimensions. Let n be the dimension of the image being used. The first window operation was a convolution by an nD kernel with floating point values. The window was a mean filter with size 3n . Results are shown in Figure 1. The second window operation was a dilation by a box with size 3n separated in n dilations by structuring elements with size 3, as explained in Section 2. Results are shown in Figure 2. The third window operation was a dilation by an nD cross, as defined in Section 2. Results are shown in Figure 3. The point operations tested were five: image negation, thresholding, copy, copy from CPU to GPU, and copy from GPU to CPU.
Fig. 1. Convolution by nD mask in seconds.
Fig. 3. Dilation by nD cross in seconds.
Fig. 2. Dilation by nD separated box in seconds.
Fig. 4. Time of point operations in seconds.
As can be seen in Figures 1, 2 and 3, VisionGL-BUF is faster than both Python and MATLAB. In dilation by box and by cross, it is between one or two orders of magnitude faster than Python. MATLAB presents better performance than Python but is still slower than VisionGL-BUF by about one order of magnitude. In two-dimensional images, MATLAB shows times closer to VisionGL-BUF, probably due to some internal optimization, but is still slower. In convolution, VisionGL-BUF is much faster in 1D, about two orders of magnitude, than Python and MATLAB. When the number of dimensions grow, all performances tend to be closer to each other. As seen in Figure 4, when comparing point operations of image negation and thresholding, VisionGL-BUF was one order of magnitude faster than MATLAB and two orders of magnitude faster than Python. The operation of copy was compared only with Python, as MATLAB has a lazy evaluation implementation that postpones copies until necessary. One may wonder why not to use only VisionGL with OpenCL Buffer as it supports any number of dimensions. It turns out that VisionGL with OpenCL Image, thus compatible with only 2D and 3D images, has much better performance than VisionGL-BUF, as shown by VisionGL-IMG points in Figures 1, 2 and 3.
As VisionGL was the only library tested in GPU, it’s the only one showing times of copies between CPU and GPU memory. 5. CONCLUSIONS When faced with the need to process higher dimensional images, the options are not many, and the available options are not very fast. As an alternative, here we present the library VisionGL, augmented with functions to load, process and save images with four or more dimensions. The proposed library has shaders written in OpenCL, compatible with most current GPUs and CPUs, accelerating image processing by up to about two orders of magnitude when compared with Python and MATLAB. VisionGL also can be easily extended. After creating a new custom OpenCL shader, its wrapper code can be automatically generated by a Perl script included in the library. The library is freely available from Github [16]. Future work may include functions to calculate FFT and other transforms. Updating the library to use OpenCL 2.0 is also desirable. Acknowledgements: The authors thank Edward Iamamoto and David Martins Jr. for the comments about the text.
6. REFERENCES
[14] “LibTIFF TIFF Library http://www.libtiff.org/.
[1] N. Hagen and M. W. Kudenov, “Review of snapshot spectral imaging technologies,” Optical Engineering, vol. 52, no. 9, September 2013. [2] M. Govender, K. Chetty, and H. Bulcock, “A review of hyperspectral remote sensing and its application in vegetation and water resource studies,” Water SA, vol. 33, no. 2, pp. 145–151, April 2007.
and
[15] W. S. Rasband, “ImageJ, tional Institutes of Health,” http://imagej.nih.gov/ij/.
U.
Utilities,” S. Na1997-2015,
[16] “The VisionGL Library,” http://github.com/ /ddantas/visiongl/.
[3] M. Markl et al., “4D Flow MRI,” Journal of Magnetic Resonance Imaging, vol. 36, pp. 1015–1036, 2012. [4] X. L. De´an-Ben and D. Razansky, “Adding fifth dimension to optoacoustic imaging: volumetric time-resolved spectrally enriched tomography,” Light: Science & Applications, vol. 3, no. 2014. [5] N. Bonnet, “Some trends in microscope image processing,” Micron, vol. 35, pp. 635–653, 2004. [6] “MATLAB Image Processing Toolbox,” http://www.mathworks.com/help/images/. [7] “Python Multi-dimensional image processing,” http://docs.scipy.org/doc/scipy/ /reference/ndimage.html. [8] D. O. Dantas, H. D. P. Leal, and D. O. B. Sousa, “Fast 2D and 3D image processing with OpenCL,” in ICIP ’15: IEEE International Conference on Image Processing Proceedings. 2015, pp. 4858–4862, IEEE. [9] D. O. Dantas and J. Barrera, “Automatic generation of wrapper code for video processing functions,” Learning and Nonlinear Models, vol. 9, no. 2, pp. 130 – 137, 2011. [10] E. R. Dougherty and R. A. Lotufo, Hands-on Morphological Image Processing (SPIE Tutorial Texts in Optical Engineering Vol. TT59), SPIE Publications, July 2003. [11] Pierre Soille, Morphologigal image analysis: principles and applications, Springer, 1999, pag 57: Composition. [12] M. Malaterre et al., GDCM Reference Manual, http://gdcm.sourceforge.net/gdcm.pdf, first edition, 2008. [13] M. Eichelberg et al., “Ten years of medical imaging standardization and prototypical implementation: the DICOM standard and the OFFIS DICOM Toolkit (DCMTK),” in Medical Imaging 2004: PACS and Imaging Informatics, Osman M. Ratib and H. K. Huang, Eds. SPIE, 2004, vol. 5371, pp. 57 – 68, ISBN 0-8194-5284X.