Real-time simulation of large-scale neural ... - Semantic Scholar

3 downloads 3124 Views 702KB Size Report
Nov 2, 2012 - the programmer and the graphics card API. ... CUDA SDK.2 .... Algorithm 1, where the interaction among the different layers and the canonical ... difference A ¼ L А R. For each spatial orientation, a set of K binocular phase.
Network: Computation in Neural Systems 2012, 1–20, Early Online

Network Downloaded from informahealthcare.com by Universita Studi Genova on 11/02/12 For personal use only.

Real-time simulation of large-scale neural architectures for visual features computation based on GPU

MANUELA CHESSA, VALENTINA BIANCHI, MASSIMO ZAMPETTI, SILVIO P. SABATINI, & FABIO SOLARI Department of Informatics, Bioengineering, Robotics, and Systems Engineering, University of Genoa, Italy (Received 15 June 2012; accepted 3 October 2012)

Abstract The intrinsic parallelism of visual neural architectures based on distributed hierarchical layers is well suited to be implemented on the multi-core architectures of modern graphics cards. The design strategies that allow us to optimally take advantage of such parallelism, in order to efficiently map on GPU the hierarchy of layers and the canonical neural computations, are proposed. SpeciEcally, the advantages of a cortical map-like representation of the data are exploited. Moreover, a GPU implementation of a novel neural architecture for the computation of binocular disparity from stereo image pairs, based on populations of binocular energy neurons, is presented. The implemented neural model achieves good performances in terms of reliability of the disparity estimates and a near real-time execution speed, thus demonstrating the effectiveness of the devised design strategies. The proposed approach is valid in general, since the neural building blocks we implemented are a common basis for the modeling of visual neural functionalities. Keywords: visual neural model, population coding, binocular energy model, disparity computation, GPGPU

Introduction The neural architectures for the coding and the estimation of complex visual features, such as binocular disparity and optic Fow, have reached the required reliability and accuracy to be used in real-world situations. Nevertheless, their effective application in Computer Vision and Robotics is hampered by their poor performances in terms of execution time. The Graphics Processing Units (GPUs), Correspondence: Manuela Chessa, Department of Informatics, Bioengineering, Robotics, and Systems Engineering, viale Causa 13, 16145, Genoa, Italy. E-mail: [email protected] ISSN 0954-898X print/ISSN 1361-6536 online/02/040001–20 ß 2012 Informa Healthcare Ltd. DOI: 10.3109/0954898X.2012.737500

Network Downloaded from informahealthcare.com by Universita Studi Genova on 11/02/12 For personal use only.

2

M. Chessa et al.

outperforming the conventional processor architectures in many Eelds of applications (Kloeckner et al. 2012), can be a valid solution to achieve efficient implementations of neural architectures for several real-world applications. In this paper, we analyze how such GPUs can be used to efficiently implement neural architectures for accurate visual feature estimations and real-time performances. We show that in presence of hierarchical neural architectures with a large number of units, and a high-dimensional parameter space, it is necessary to properly choose the data structures and the related algorithms in order to obtain an effective simulation of such complex neural networks. In particular, we show that such devised design strategies produce better performances in terms of execution time and a simpliEed implementation of the neural models. Moreover, to avoid the development of customized programming code and the re-writing of existing functions we decide to use high level libraries, that are the de-facto standard in the Computer Vision Eeld, i.e. the OpenCV library,1 in order to take advantage of some implemented basic functionalities (e.g. convolutions, sums, multiplications). Since the hierarchical architectures of layers of neural units and the canonical neural computations (that operate over such layers) are a common basis for the modeling of visual neural functionalities (Douglas and Martin 1991; Serre et al. 2007; Kouh and Poggio 2008; Carandini and Heeger 2012), our approach for implementing them on GPUs is valid in general, especially when working with a large amount of data and a large parameter space. The main contributions of the paper are: (i) To devise design strategies that allow us to fully exploit the GPU architecture for visual feature estimation through neural models, and to efficiently implement the building blocks for a software neural substrate of computational hierarchy architectures that are able to solve complex visual tasks. (ii) To implement on GPU a novel distributed population coding based on binocular energy neurons for disparity estimation. The preliminary work for the bio-inspired estimation of binocular disparity, described in Chessa et al. (2009a), has been greatly improved in order to obtain a neural model that takes into account the canonical neural circuits. It is worth noting that the current implemen- tation reaches near real-time performances on a single off-the-shelf GPU by simulating a neural architecture with 5.8  107 units. The paper is organized as follows: in Section ‘‘Related works’’ we present the stateof-the-art concerning the use of GPUs for simulating neural architectures. The issues about neural computation on GPUs and the proposed design strategies are analyzed in Section ‘‘Neural computation on GPU’’. Our neural model for disparity computation and its implementation are presented in Section ‘‘Population code for disparity computation’’, and the results and the comparisons with other approaches are discussed in Section ‘‘Results’’. The conclusion are presented in Section ‘‘Conclusions’’.

Related works In the literature, several simulation frameworks for neural architectures on GPUs are presented. The implementation on graphics cards of biologically motivated classiEers and feature descriptors, which model the ‘‘What’’ pathway of the visual cortex, are described in Woodbeck et al. (2008), Brumby et al. (2010) and Nere et al. (2011). These works demonstrate the efficacy of GPUs in simulating

Network Downloaded from informahealthcare.com by Universita Studi Genova on 11/02/12 For personal use only.

Real-time simulation of neural architectures on GPU

3

such kind of cortical networks. Moreover, a real-time implementation of Bayesian algorithms for robotic multisensory perception is described in Ferreira et al. (2011). Since the development of GPU code could be extremely challenging, several authors propose programming frameworks that work as an abstraction layer between the programmer and the graphics card API. In particular, a simulation environment for a large-scale model of spiking neurons for cortical areas V1, V4 and MT, that achieves near real-time performances, is described in Richert et al. (2011). A more general programming framework for simulating cortical models, such as convolutional networks and models of spiking neurons, is presented in Mutch et al. (2010). These two last approaches are based on Python and Matlab interfaces, respectively, thus they do not require any low level GPU programming, e.g. by addressing CUDA SDK.2 Two recent approaches provide metaprogramming techniques for the run-time GPU code generation, through Python interfaces (Kloeckner et al. 2012; Pinto and Cox 2012). Nevertheless, in these works the automatic code generation handles the interface towards the hardware details, but disregards the higher level routines and functionalities, which are still to be implemented by the user. Although in the literature there are several attempts to obtain fast implementations of such functionalities on the GPUs,3 such as the two-dimensional convolution between an image and a Elter (Wang and Shi 2010), it is preferable to rely on stable and maintained libraries, in order to avoid compatibility issues. In this paper, we choose to use the OpenCV library, since it provides a large collection of both computer vision algorithms and basic functionalities, such as convolutions and per-element matrix operations that can be used as building blocks to develop neural architectures. Moreover, such a library offers both C/Cþþ and Python interfaces. Here, we consider the Cþþ interface. As we mentioned before, we present, as a case study, a neural model to compute binocular disparity. It is worth noting that many approaches in the literature address disparity estimation on GPU (Humenberger et al. 2010; Zhao and Taubin 2011; Pauwels et al. 2012), thought they are not based on neural architectures. Comparisons with such approaches are presented in Section ‘‘Comparison with other approaches’’.

Neural computation on GPU Computation in distributed networks Visual features can be extracted from image sequences, by using distributed bioinspired architectures that resort to populations of tuned cells. In the literature, many authors analyze different approaches to design those neural populations, and to properly combine their responses in order to obtain reliable information from the visual signal (Pouget et al. 2003). In distributed representations, or population codes, the information is encoded by the activity pattern of hierarchical layered architectures of simple S and complex C units. Each S unit yields to a weighted sum of the afferent input signals from the previous layer through its tuning proEle, whereas each C unit performs non-linear operation, such as squaring and maximum, over its input. Each cell in a layer is

Network Downloaded from informahealthcare.com by Universita Studi Genova on 11/02/12 For personal use only.

4

M. Chessa et al.

characterized by its topographical position and by a broad tuning in its parameter space (e.g., orientation and frequency), and the selectivity to a speciEc feature (e.g., elemental vision attributes, such as oriented edges, direction of motion, texture, and binocular disparity (Adelson and Bergen 1991)) emerges, in the next layer, from the combination of the responses of groups of cells from the previous layer. In the literature, several models based on hierarchies of S and C units have been proposed to describe both the ventral visual pathway (‘‘What’’ stream) (Serre et al. 2007), and the dorsal visual pathway (‘‘Where’’ stream) (Chen and Qian 2004). The functional models of the intra- and inter-layers processing are based on a set of canonical neural computations that can be applied to solve different problems. The canonical neural computations that can be considered to describe the functional models of the visual cortex are summarized in the following (Kouh and Poggio 2008; Carandini and Heeger 2012): – Linear filtering is the weighted sum performed through linear receptive Eelds (RFs), and it is used to describe the neural responses in different areas of the visual system, such as the primary visual cortex V1 (Movshon et al. 1978) and the Middle Temporal area MT (Rust et al. 2006). – Energy model is a squaring operation applied on a pool of neurons characterized by the same orientation tuning but with different phase tuning. Such model describes the phase invariance that can be observed in the complex cells of V1 (Adelson and Bergen 1985; Fleet et al. 1996). – Divisive normalization is a mechanism that normalizes the response of a neuron by the summed activity of a pool of neurons. Such a computation is used in the retina to obtain light adaptation (Solomon et al. 2006), in area V1 (Heeger 1992), and in area MT (Simoncelli and Heeger 1998), where it explains the non-linear properties of neurons. Normalization is also used to remove noise from the responses of the units of a population code, thus improving the quality of the encoded information (Deneve et al. 1999). – Soft-thresholding is used to obtain narrow tuning curves, thus maintaining sensor selectivity. Such a feed-forward circuit can be considered as a valid alternative to the lateral inhibition mechanisms (Priebe and Ferster 2008).

Design strategies for neural computation on GPU The intrinsic parallelism of a distributed network can be efficiently mapped onto the hardware architecture of modern GPUs, since the graphics cards are SIMT (SingleInstruction, Multiple-Thread) architecture. A SIMT architecture is similar to a SIMD (Single Instruction, Multiple Data) one, where a single instruction controls multiple processing elements. Moreover, it is worth noting that the device memory is allocated as a linear memory (NVIDIA 2012). By considering these features of the GPU architecture and the well known advantages provided by the vectorized computation techniques (Brette and Goodman 2011), the arrays of pointers data structure should be replaced by two-dimensional arrays. Thus, in this paper, we propose that to optimally exploit the computing capabilities and the memory layout of the GPUs, both the spatial position and the parameter space of the cells’ responses of a layer should be mapped onto a two-dimensional array that resembles the cortical map organization (Everson et al. 1998) (see Figure 1).

Real-time simulation of neural architectures on GPU

5

parameter space

orientation

frequency

orientation

frequency

topographic space

Network Downloaded from informahealthcare.com by Universita Studi Genova on 11/02/12 For personal use only.

mapped space

Figure 1. (Left) The standard description of the cells’ responses of a layer through an array of pointers: each location in the topographic space points to an array of responses in the parameter space. (Right) The representation of the cells’ responses of a layer suitable to be implemented on a GPU: both the spatial position and the parameter space share the same two- dimensional mapped space.

An operation, e.g. the computation of the square value, performed over all the elements in an array of pointers can be written in a nonvectorized form (see Figure 1 left) as follows, by considering the OpenCV library and Cþþ language: int x¼640, y¼480, ori¼8, freq¼5; gpu::GpuMat T¼ new gpu::GpuMat[xy]; for(int i¼0;idisparity_population_code(srcR, srcL, dispH, dispV); //Device to host memory transfer Mat dispH_host,dispV_host; dispH.download(dispH_host); dispV.download(dispV_host); imwrite("dispH.png",dispH_host); imwrite("dispV.png",dispV_host); delete disp;

Suggest Documents