A TRANSLATION, ROTATION AND SCALE INVARIANT ... - CiteSeerX

A TRANSLATION, ROTATION AND SCALE INVARIANT TRANSFORM FOR GREY SCALE IMAGES: A PARALLEL IMPLEMENTATION R. Davoli♣, F. Tamburini♦ and R. Gaioni♣

Abstract This paper presents a numerical algorithm which realises a translation, rotation and scale invariant transform for single object grey scale images. This kind of algorithm is valid for all those applications which need to deal with shape information, independent of the orientation, distance and position of an object. The algorithm we present takes its mathematical foundation from the log-polar transform [Casasent, Psaltis (1976)]. These analytical results cannot be mapped directly into an algorithm: they involve generalised integrals that are hard to compute and the numerical implementation has to take account of discretisation errors which may propagate and lead to meaningless results. The aim of this paper is to present an algorithm which overcomes the major numerical problems and a massively parallel implementation of this transform based on the SGI/Cray T3E architecture. The results presented here are an evolution of an earlier work [Davoli, Tamburini, (1993)] both in terms of numerical computations and in terms of algorithm implementation.

1. Introduction Many applications depend on object recognition features. Robotic applications need to recognise mechanical parts in order to perform some automatic actions on them. Other typical industrial applications for this kind of algorithm are those used to catalogue objects. In biology, it is important to be able to recognise cells which have characteristics dependent on their shape. In Xray slides, echography and tomography opaque zones can be symptoms for different diseases depending on their shape. Other applications might be military, astronomical and so on. When an electronic system digitises an object the distance, the position and orientation of the desired object in the picture are not generally known a priori. Our approach to this problem is to split the recognition process into a translation, rotation and scale invariant transform to eliminate the extraneous factors from the image, followed by a pattern recognition algorithm. The essential features of a transform algorithm for the first phase of this process can be summarised as follows: • It has to maintain shape information. The result must be only shape dependent. • It has to be robust: additive noise, up to a reasonable level, as a result of digitisation errors, must be masked out. • It has to be flexible: most of the recognition applications today are tailored to specific problems, whereas such an algorithm has to be used for a wide range of recognition problems. • Computation should be performed quickly: speed is a critical factor for some applications. By increasing the number of processors, the computational load must be distributed among the processors very efficiently. This paper is organised as follows: section 2 illustrates the related works, section 3 presents the analytical foundations, section 4 describes the algorithm implementation. Section 5 presents some test results on geometrical images as well as some considerations on PE load balancing.

♣ ♦

Department of Computer Science, University of Bologna, ITALY. E-mail: {davoli, gaioni}@cs.unibo.it CILTA, University of Bologna, ITALY. E-mail: [email protected]

2. Related works Bi-dimensional shape recognition algorithms and methodologies are well documented in the literature. A lot of research has been carried out on this topic and has used many different approaches. Xu and Yang (1990) used multidimensional orthogonal polynomials, He and Kundu (1991), Sekita, Kurita and Otsu (1992) used autoregressive models. Neither of these approaches are well-suited to grey scale images: in fact these methods take as input the boundary lines of objects. Other methods are based on the numerical Mellin transform of data. Zwicke and Kiss [8] used a Mellin transform to recognise the shapes of ships from radar signals, Altmann and Reitbock (1984) used a scale and translation invariant transform. A complete recognition system is presented in Wechsler and Zimmerman (1988). Its pattern matching algorithm is an associative memory, and the pre-processing transform provides rotation and scale invariancy. Other researches such as Fuller et al. (1996) use a rotation and translation invariant transform to recognise virus shapes in microscopy images. None of the above algorithms achieves a full translation, rotation and scale invariant transform. The critical point in exploiting such a feature is that the algorithm needs more than one transform step with a consequent propagation of computational errors. 3. Analytical foundations The log-polar transform can be seen as the composition of four operations: a first Fourier Trasform (FT), a rect-to-polar conversion, a log-scale along the radius axis, and a second FT. The module of a FT is translation-invariant, i.e. it does not change when the image in the domain is moved provided that scale and orientation is preserved. When an image is rotated the FT is rotated by the same angle, and when an image is scaled by a factor, say k, the FT is also scaled but by a different factor (1/k). If the FT is computed using polar coordinates, each rotation becomes a translation along the angle axis θ; the log-scaling of the radius axis transforms also scale changes of the original image into translations. Thus, rotation and scale has been changed into translations after the first three step of the transform. The fourth phase is then able to lead to the requested degree of invariance: translation, rotation and scale. A very good degree of accuracy can be achieved by performing the first three step of the log-polar transformation as a single step. Given the well-known formula of a FT: +∞ +∞

FTf (u,v) =

∫ ∫ f (x,y)e

− i( xu +yv )

dx dy

(1)

−∞ −∞

in (ρ,θ) coordinates where ρ is the log-scaled radius and θ is the angle it is: LPf ( ρ , θ ) =

+∞+∞

∫ ∫ f ( x, y )e

− ie ρ ( xsinθ + y cos θ )

dxdy

(2)

− ∞− ∞

This technique, however, is not sufficient to build a real algorithm which implements the transform: the accuracy of the discrete approximation of the integrals and efficiency of the computation are issues to be addressed carefully. The input image is a discrete image, a NxN grid of real values (grey scale pixels). f ( x, y ) = w(int( x ),int( y )) . The result of the transform must be discretised in a similar manner in a grid of values: let us define: 2π n θ = θ0 + . m; ρ = ρ0 + N N A direct substitution of m and n into (2) would lead to poor accuracy for high values of n (or ρ). FT involves the computation of an integral of the form: +∞

∫ f ( x )e

−∞

− ie ρ K (θ ) x

+∞

dx = ∫ f ( x )(cos(e ρ K (θ ) x ) − i sin(e ρ K (θ ) x ))dx −∞

(3)

and a simple discretisation of x would lead to inconsistent values because highly dependent on the discretisation points. Considering f(x) as a step function, having constant value in the intervals [int( x),int( x) + 1[ , an integral of the form (3) can be rewritten as: +∞

x +1

N

− ie ρ K (θ ) x

e − ie

ρ

K (θ ) ( x +1)

ρ

− e − ie K (θ ) x f ( x ) e d x = f ( x ) e d x = f ( x ) ∑ ∑ ∫ ∫x i e ρ K (θ ) i =1 i =1 −∞ By applying the same method to (2) the first three phases can be computed as follows: −i e ρ K (θ ) x

N

N N   N N    LPf (m, n) = ∑∑ D(n, m) (G (n, m, x + 1) − G (n, m, x ))  G n, m + , y + 1 − G  n, m + , y   4 4    x =1 y =1   where :

D (n, m) = e C (m) = e

G (n, m, x) = C (m )

d ( m)

n

eN x

n  2 ρ 0 +  N 

2π   − i e ρ 0 sin  θ 0 + m  N  

d ( m) =

(4)

1 4π 2π   2π   sinθ 0 + m  cosθ 0 + m 2 N   N  N 

The functions C, d, D and G has been defined in (4) not only to enhance the readability of the formulae by avoiding repeated parts but also to show how the method can be efficiently implemented. In fact, it is possible to define G and D recursively as follows: 1

G (n + 1, m, x ) = G (n, m, x )

eN

1

G (n, m + 1, x ) = G (n, m, x) D(n, m) D(n + 1, m) = 2

eN

G (n, m, x + 1) = G (n, m, x) G (n, m,1)

eN (5)

The recursive definitions in (5) speed up the computation of the transform by re-using partial computations. 4. Algorithm implementation The computation of Log-polar transform is divided into three phases: • Initial point search ( ρ 0 ) for scaling invariance, • LPf ( ρ 0 + ρ ,θ ) computation for ρ = 0..N − 1 θ = 0..π . • FT on the θ axis to remove rotation. 4.1 Initial point search The techniques outlined in the previous sections are not sufficient to build a real algorithm which implements the transform we need. The output of the first numerical step, given by (4) and (5), is defined on a non-finite stripe of height π . The FT row computations involve generalised integrals (as seen above). To achieve greater accuracy and better performance, the analytical algorithm needs to be changed; a different kind of scale invariant transform has to be used. Another drawback in using or modifying the original method for scale invariancy is that no meaningful FT information on discrete data can be obtained beyond a square having the same size of the input

grid. In fact, the value of the FT at each of these points, represents the amplitude and phase of a component having more than one oscillation per pixel. The methods presented in section 3 remove such components from FT. The numerical algorithm needs to "cut" a finite interval of the stripe in a scale independent way containing the essential information of the picture shape. All the rows of the LP transform output (on a discrete NxN grid of real positive input data) can be proven to be definitely monotonic as ρ → −∞ . More precisely: ∀θ ∈ [0, 2π [ LPf ( ρ ,θ ) is monotonically decreasing ∀ρ ∈ [−∞, log( 2 / 8)[ . Consequently also the mean on θ 1, LPf ( ρ ) , is monotonically decreasing in the same interval. The starting point ρ 0 is computed using an exponential function (precisely y = 1 − a exp(b / N ) ) to interpolate the mean values LPf ( ρ ) . Geometrically LPf ( ρ ) represents the mean of LPf ( ρ , θ ) on the circular region showed in the following figure. v

The process starts computing LPf (−∞) and LPf ( 2 / 8) , then using successive exponential interpolations, in order to minimize the number of mean computations, finds the value ρ0 for which

ρ θ u

LPf ( ρ 0 ) / LPf (−∞) = Ξ , where Ξ is a predefined constant. The computation of each value LPf ( ρ ) is divided among all available PEs. Each PE (say x) computes the subcolumn LPf ( ρ , θ = x * N / NPE..( x + 1) * N / NPE − 1) and completes the sum for θ = 0..π by exchanging its partial sum with other PEs using a butterfly communication scheme. This method achieve optimal computation distribution among all PEs, and minimise their communications (see fig 1). LPf ( ρ x )

LPf ( ρ y )

LPf ( ρ 0 )

PE0

∑

LPf ( ρ x )

∑

...

LPf ( ρ z )

∑

LPf ( ρ 0 )

PE1

∑

LPf ( ρ x )

∑

...

LPf ( ρ z )

∑

LPf ( ρ 0 )

PE2

∑

LPf ( ρ x )

∑

...

LPf ( ρ z )

∑

LPf ( ρ 0 )

PE3

∑

LPf ( ρ x )

∑

...

LPf ( ρ z )

∑

LPf ( ρ 0 )

Fig1: The work distribution for the initial point search.

LPf ( ρ ) is the mean value of the LP function computed on a finite set of points. For the purpose of this paragraph it can be considered either as a continuous or discrete mean.

1

4.2 LP matrix Computation Once a starting point ρ 0 has been found, as described before, it is possible to compute the entire matrix LPf ( ρ 0 + ρ , θ ) ( ρ = 0..N θ = 0..π ) by distributing the work among all processors. The computation is divided in such a way that all the processors have to compute the same number of LP matrix rows. So PEx computes LPf ( ρ 0 + ρ ,θ = x * N / NPE..( x + 1) * N / NPE − 1) . 4.3 FT on θ axis When the matrix LPf ( ρ 0 + ρ ,θ ) has been computed it is necessary to apply a FT on each of its columns. Again the work is divided among all PEs in such way that PEx computes FT f ( ρ 0 + x * N / NPE .. ρ 0 + ( x + 1) * N / NPE ,θ ) using a Fast Fourier Transform algorithm. 4.4 Code optimisation The code has been written in C. The first step was the creation of a fast sequential algorithm, taking into account of all the formulae outlined in section 3, for re-using all the intermediate results. Then, once the code has been ported onto the T3E architecture at CINECA, a number of further optimisations has been introduced: • First of all the code was obviously made parallel, following the description given above. Particular attention has been paid to the load balancing among PEs. As outlined in the next section, the results are very good; in fact the maximum value of idle time exhibited on some PE is a very fraction of the total elapsed time. • Code inlining in the core of the program, where the computation spends a lot of time, was a very successful technique. Inlining the functions for complex number manipulation (not available as intrinsics in C) allowed a speed-up of a factor 6. • Instructing the compiler to perform loop unrolling and cache aligning for some data structure helped to speed-up the behaviour of the program even further. 5. Experimental results Figure 2 shows five artificial test images containing three squares in random positions, a circle, a triangle and their logpolar transform. Although the square images are very different in position rotation and scale, their transformed images are very similar, whereas the other object images have very different transforms.

Fig. 2: Artificial test images and their transforms.

The efficiency of the parallel implementation has been tested using two kinds of measurements: the first one examine the speed-up of the program with respect to the number of PEs involved in the computation, while the second one shows the level of load balancing and activity of each PE during the computation. With respect to speed-up (see fig. 3), our method exhibits a good scaling behaviour providing that the computed images are big enough; if they are not, the communication among PEs overcome the computation time, not allowing a good program scaling. However, we have to underline that, a quasi-optimal speed-up is obtained for reasonable image, useful in real situations. In fact, as showed in fig. 3, at N=128, the program exhibits already a quite good speedup, as function of the number of PEs involved in the computation. Increasing too much the number of PEs involved in the computation results in a poor scaling behaviour.

140 120

32x32

100

Sec

64x64 80

128x128

60

256x256 512x512

40

Ideal S(p)

20 0 1

2

4

8

16

32

64

128

Fig. 3: Speed-up test results. Elapsed time VS number of PEs involved in the computation.

The last test performed is a load balancing and activity classification for each PE. As showed in fig 4, the 97% of the elapsed time is spent doing parallel computation on each PE, with a very good load balancing among PEs. The values outlined in fig. 4 are valid only for 256x256 pixel images using 32PEs, but the quasi-optimal load balancing among PE obtained by this test is a result extensible to all other dimensions of the problem. Parallel Work (CPU) PE0 = 97.69% PEn = 96.34% MPI Communic. PE0 = 0.76% PEn = 0.19% I/O on PE0 1.55%

IDLE Time on Slave PEs 3.46% (mainly spent waiting PE0 I/O)

Fig. 4: Load balancing test results for a 256x256 image using 32 PEs.

6. Conclusions Pattern recognition programs for images and video often make use of a pre-processing transform to distinguish shape information from noise information like size, position and orientation. Log-polar transform, in its mathematical description, is a nice instrument for this kind of pre-processing but it is very computationally intensive, unsuitable for on-line processing of images or video frames of any useful size. In the implementation presented here the original method of the Log-polar transform has been modified to reduce the sequential computational load by re-using partial results as well as to achieve both numerical stability and efficient parallel execution. As an example using 32 PEs of a Cray T3E it is possible to transform, for each second, a pipeline of about 10 images of size 64x64 pixel or 5 images 128x128. The algorithm is also well scalable and larger images can be processed on-line by a larger number of processors. References Altmann J.and Reitbock H. (1984). “A fast correlation method for scale and translation invariant pattern recognition.”, IEEE Tr. Patt. Analysis Machine Intelligence, 6 (1), p. 47—57. Casasent D. and Psaltis D. (1976). “Position, rotation and scale invariant optical correlations.”, Applied Optics, 15, p. 1793—1799. Davoli R. and Tamburini F. (1993). “DATA algorithm: a numerical method to extract shape information from gray scale images”. Image Analysis and Syntesis, Pölzleitner W. And Wengen E, p. 219—230. Frosini P. (1991). “Measuring shapes by size functions”, Intelligent Robots and Computer Vision X: Algorithms and Techniques, SPIE Proc., 1607, p. 122—133. Fuller S.D., Butcher S.J., Cheng R.H. and Baker T.S. (1996). “Three-Dimensional Reconstruction of Icosahedral Particles - The uncommon Line”, Journal of Structural Biology, 116 (1), p. 48—55. He Y. and Kundu A. (1991). “2-D Shape classification using hidden Markov model.”, IEEE Tr. Patt. Analysis Machine Intelligence, 13 (11), p. 1172—1185. Sekita I., Kurita T. and N. Otsu (1992). “Complex autoregressive model for shape recognition.”, IEEE Tr. Patt. Analysis Machine Intelligence, 14 (4), p. 489—496. Xu J. and Yang Y-H. (1990). “Generalized multidimensional orthogonal polynomials with applications to shape analysis.”, IEEE Tr. Patt. Analysis Machine Intelligence, 12 (9), p. 906—913. Wechsler H. and Zimmerman G.L. (1988). “2-D invariant object recognition using distributed associative memory.”, IEEE Tr. Patt. Analysis Machine Intelligence, 10 (6), p. 811—821. Zwicke P.E. and Kiss I. jr. (1983). “A new implementation of the Mellin transform and its application to radar classification of ships.”, IEEE Tr. Patt. Analysis Machine Intelligence, 5 (2), p. 191—199.