Efficient low-level vision program design using Sub ... - CiteSeerX

1 downloads 0 Views 229KB Size Report
ephemeral random constant (ERC) [1, 2], that can take values within the whole range of. 32-bit unsigned integers, and in the range [0,16), respectively.
Efficient low-level vision program design using Sub-machine-code Genetic Programming Giovanni Adorni , Stefano Cagnoni, and Monica Mordonini Dip. Informatica, Sistemistica e Telematica, Universita` di Genova Dip. Ingegneria dell’Informazione, Universita` di Parma

Abstract. Sub-machine-code Genetic Programming (SmcGP) is a variant of GP aimed at exploiting the intrinsic parallelism of sequential CPUs. The paper describes an approach to low-level vision algorithm design for real-time applications by means of Sub-machine-code Genetic Programming(SmcGP), a variant of GP aimed at exploiting the intrinsic parallelism of sequential CPUs. The SmcGPbased design of two processing modules of a license-plate recognition system is taken into consideration as a case study to show the potential of the approach. The paper reports results obtained in recognizing the very low-resolution binary patterns that have to be classified in such an application along with preliminary results obtained using SmcGP to design a license-plate extraction algorithm.

1 Introduction In machine vision applications, and especially at the low-level image pre-processing stage, it is often required that the same operations be performed at one time on small sets of binary neighboring data. This happens, for example, in binary-pattern recognition or in the analysis of edge images, when the relevant feature is the presence of edges rather than their intensity. Compacting such data sets into a long computer word makes it possible to process binary image data in a parallel or quasi-parallel fashion on sequential machines. This is the case of the multimedia extensions of PC processors (e.g., Intel MMX and AMD 3DNow technologies). However, designing algorithms that can effectively exploit that property to perform complex tasks is not always easy. This makes automatic code generation approaches, such as Genetic Programming (GP), very appealing. Genetic Programming [1, 2] is an Evolutionary Computation (EC) paradigm in which individuals are programs, typically encoded by syntactic trees or, equivalently, prefix-notation functions like LISP functions. GP is usually much more computationally intensive than Genetic Algorithms (GAs), although the two evolutionary paradigms share the same basic algorithm. The higher requirements in terms of computing resources with respect to GAs are essentially due to the much wider search space and to the higher complexity of the decoding process and of the crossover and mutation operators. Therefore, there is great interest in developing new variants of GP that improve the computational efficiency of the paradigm.

Sub-machine-code GP (SmcGP) [3, 4] is a GP variant aimed at exploiting the intrinsic parallelism of sequential CPUs. Inside a sequential N-bit CPU (where typically N=32 or N=64), each bitwise operation on integers is performed by concurrently activating N logical gates of the same kind. Therefore the application of a sequence of bitwise logical operators to an integer can be seen as the parallel execution of a program on N 1-bit operands in parallel, according to the Single Instruction Multiple Data (SIMD) paradigm. Such an approach can, on one hand, speed up Genetic Programming (GP) by making it possible to evaluate several fitness cases at the same time, if the teaching input of the fitness cases is binary. On the other hand, when the input patterns are long bit strings, SmcGP also makes it possible to process an N-bit subset of each input at one time. In this case, besides speeding up evolution, SmcGP produces programs that are intrinsically parallel and therefore also computationally very efficient when applied in practice. We have experimented SmcGP within a project aimed at developing a vision-based carplate recognition system (APACHE, Automated PArking CHEck) [5, 6]. The system comprises three modules (see figure 1: i) plate detection, in which a region of interest containing a plate is extracted from the input image, ii) character extraction, in which characters and other symbols that compose the plate are isolated and rescaled, and iii) symbol recognition, in which the symbols extracted by the previous module are classified. To evaluate SmcGP potential in that application domain, we are using such a technique to automatically re-design and possibly optimize the three modules of the system computationally. In the following sections we describe our approach and report and comment on the results achieved by the genetically-evolved functions. The tests have been performed: i) on a set of traveling car images; ii) on the same sets of binary patterns on which the presently running symbol recognition module of APACHE (an LVQ neural net trained with a modified algorithm [7, 8]) had been trained. Vision Algorithm

Classification Output

Input License−plate detection

Symbol segmentation

Image

Symbol Classification

Symbols

Figure 1: Architecture of the APACHE system.

2 License-plate recognition The APACHE system is designed as a set of cascaded modules, each of which is responsible for one step of the process. The plate-segmentation module detects the position of the vehicle plate within the input image and extracts it based on horizontal gradient statistics; the symbolsegmentation module singles out the symbols in the plate using prior knowledge about plate syntax and recovering possible pixel disconnections through a filling algorithm; the symbolrecognition module is based on a Learning Vector Quantization neural network that classifies  pixel two-dimensional the symbols, after each character has been rescaled down to a binary pattern.   The system input is an 8-bit grey-level image with a resolution of  pixels, received from a camera that acquires a scene whose width may vary from slightly more than



the width of a car (about  meters), in those cases when it is possible to focus on a car rear  with good confidence, to the width of a road/parking-entrance lane (typically about meters).

 The scene can be pictured from a height that may vary from  to meters, at a distance of  about meters from the car location that is optimum for classification. The system output is the recognized symbol string. Figure 2 is a screenshot of the system user interface, that provides control and setup functions, and displays the partial results obtained by the pre-processing modules along with the final classification results.

Figure 2: The user interface of the APACHE system. The acquired frame is shown in the upper left corner; in the image below, the detected plate area is highlighted with a rectangular box. On the right, from top to bottom: the binarized gradient image and the extracted plate area, after and before the recovery of the extracted characters by the connected-region filling algorithm, respectively. On the bottom, the textbox displays the final results of the classification.

3 Experimental setup The GP-designed programs were evolved using lil-gp1.01[11], a popular package that implements Koza-like (i.e., LISP-like or tree-like) GP. As shown in table 3 the function set was composed by the main bitwise boolean operators, the logical NOT (that outputs a full word whose bits are 0 if the argument is not zero, and whose LSB is 1 otherwise), and a set of circular (the LSB is considered to be adjacent to the MSB) shift operators with variable shift direction (left or right) and entity (1, 2, or 4 bits). The terminal set was composed by four unsigned long integers into which the input data is encoded, and two unsigned long integer ephemeral random constant (ERC) [1, 2], that can take values within the whole range of 32-bit unsigned integers, and in the range [0,16), respectively.

Function set Function AND OR XOR NOT N32 SH

Arity 2 2 2 1 1 1

Notes bitwise AND bitwise OR bitwise XOR bitwise NOT logical NOT (C language ! operator)

 !"#$!%&('")*

Terminal set Terminal pat[0] pat[1] pat[2] pat[3] R1 R2

Type term term term term ERC ERC

Notes input subset 1 input subset 2 input subset 3 input subset 4 full-range unsigned long

+',.- /*0%2143

Table 1: The function and operator sets used to evolve the programs

Due to the closure requirement of GP, the output of each program, that operates on unsigned long words, is an unsigned long as well. However, what is needed is a single-bit output. If one considers the unsigned long output as the output of 32 different (even if not independent) binary functions, the output bit can be chosen as the one that maximises the fitness. Therefore, decoding each individual in the population implies two steps. In the first one, the function encoded by the corresponding tree is decoded. In the second one, each bit of the 32-bit word obtained in the first step is considered as the output of the program and fitness is evaluated accordingly. The fitness of each individual is therefore equal to the maximum fitness obtained in the second step.  Evolution was applied to a population of 1000 individuals, with a crossover rate of 65 ,  a mutation rate of 5 and a reproduction rate of 5 . Tournament selection was used with tournament size equal to  . Each program was evolved for 1000 generations. At least two runs for each program were performed. The resulting programs were converted from prefix notation to infix notation, and translated into C functions to allow for their compilation and optimization. The experiments were run on a 32-bit architecture (600 MHz Pentium III PC running Linux 2.2 using the gcc 2.95 C compiler). 3.1 SmcGP-based plate detection The goal of this experiment was to produce a binary license-plate segmentation program that outputs 1 if the input data belongs to the license-plate and 0 otherwise. The ideal output of this program is therefore a binary image in which only the pixels that are comprised in the license-plate area are set to 1, while all others are set to zero. Since the evolved programs are expected to operate on binary images, a set of suitable images was produced by pre-processing 130 images acquired by the APACHE system with the same algorithm used in the license-plate recognition system. It consists of a horizontal  gradient computation, followed by adaptive local thresholding on a   pixel neighborhood, where the threshold is set equal to the average gradient intensity within the neighborhood. To build the training set, the plate position in the 130 images was manually annotated. A  7 random set of 5190 sub-images of size pixels were extracted from regions of the 130 images where the binarized gradient image was not null. Each example is therefore made up of 4 32-bit words, in which the content of the four rows of the corresponding sub-image is encoded, and of a teaching input, whose value is 1 if the upper left pixel of the sub-image belongs to the license plate, and 0 otherwise.

The fitness function was defined as 8

9;:

Suggest Documents