Colour-Based Model Pruning for Efficient ARG ... - Semantic Scholar

2 downloads 0 Views 167KB Size Report
Colour-Based Model Pruning For Efficient ARG Object Recognition. Alireza Ahmadyfard and Josef Kittler. Center for Vision Speech and Signal Processing.
Colour-Based Model Pruning For Efficient ARG Object Recognition Alireza Ahmadyfard and Josef Kittler Center for Vision Speech and Signal Processing University of Surrey,Guildford, GU2 7XH ,UK fA.Ahmadyfard, [email protected] Abstract Ri

In this paper we address the problem of object recognition from 2D views. A new approach is proposed which combines the recognition systems based on Attribute Relational Graph matching (ARG)[2] and the Multimodal Neighbourhood signature (MNS) [7] method. In the new system we use the MNS method as a pre-matching stage to prune the number of model candidates. The ARG method then identifies the best model among the candidates through a relaxation labelling process. The results of experiments show a considerable gain in the ARG matching speed. Interestingly, as a result of the reduction in the entropy of labelling by a virtue model pruning, the recognition rate for extreme object views also improves.

ai

Rj

ci

bi

aj

cj

bj

Figure 1. Binary measurements associated with pair of regions proach of Matas et al[7] to prune the number of model hypotheses for the objects in test image. In the next two sections the ARG method[2] and the MNS method[7] are briefly overviewed. The experimental results with the proposed approach are reported in Section4. In the last section we draw the paper to conclusion.

2 ARG method

1 Introduction The recognition of objects from their 2D image is one of the crucial tasks in computer vision. The methods which address this problem from the matching points of view can be classified into two categories: contextual and noncontextual. Methods in the first category establish a correspondence between the extracted features of an object in the scene and its model images by considering geometric constraints imposed in the two image planes.In contrast, in non-contextual matching the features of an object in the test image are directly compared with those in the object model based simply on the similarities of the features[8]. Although the non-contextual matching is much faster than the former approach[3], in many realistic situations the non-contextual matching fails to recognise the object in the scene unambiguously. This is particularly true when objects to be recognised have a similar appearance, the background clutter in the image changes or in case of occlusion. In this paper we propose to take advantage of the two matching approaches in one recognition system to achieve an accurate and fast matching process. We combine the object recognition method [1] based on Attributed Relational Graph (ARG) representation[1, 2] with the colour-based ap-

In this method an object, or more specifically an image of the object is represented in terms of its segmented regions. The segmentation is accomplished based on colour homogeneity of the pixels using the region growing method[6]. Each extracted region Ri is characterised individually using its (Y U V ) colour vector and we refer to this description as unary measurement vector i . The relationship between a pair of regions Ri , Rj is described using geometric and colour measurements which constitute a so called binary measurement vector, ij , defined as follows: Let us consider a pair of regions Ri and Rj in fig 1. The line which connects the centroid points ci and cj intersects with the regions boundaries at ai , bi , aj and bj . Under affine transformation assumed here, the ratio of segments on a line reaa mains invariant. Using this property, we define m1 = cii cjj

X

A

bb

and m2 = cii cjj as two elements of the binary measurement vector. In addition, the area ratio AreaRatio = Ai =Aj and the distance between colour vectors ColourDis = Ci Cj are used as complementary components of the binary measurement vector ij . Using the extracted regions and the associated measurements vectors we construct an Attributed Relational Graph

A

1051-4651/02 $17.00 (c) 2002 IEEE

in which a graph node Oi represents region Ri . The measurement vector, i , is the node unary attribute. The binary measurement vector ij describes the link between the pair of nodes Oi , Oj . Using this approach an object is modelled in the recognition system by an attributed relational graph constructed from its representative image. The graphs of all objects in the model database are collected in a composite model graph. The content of an imaged scene is interpreted by constructing an ARG referred as the scene graph. Scene objects are then identified by matching the model and scene graphs using the relaxation labelling technique[4] which has been modified for the object recognition application[2]. We allocate to each node of the scene graph a label. Set  = f1 ; 2 ;    ; N g denotes the scene labels where i is the label for node Oi . Similarly we use = f!0 ; !1 ;    ; !M g as the label set for the nodes of the composite model graph where !0 is the null label used to be assigned to the scene nodes for which no other label in is appropriate[4]. The contextual information in a graph is conveyed to a node from a small neighbourhood. In this regard, node Oj is a neighbour of Oi if the Euclidean distance between the associated regions is below a predefined threshold. We use set Ni to refer to the nodes in a neighbourhood of Oi . Similarly the labels in the neighbourhood of ! are referred by set . By labelling we mean the assignment of a proper label from set to each node of the scene graph. In this regard, P (i = ! ) denotes the probability that node Oi in the scene graph takes label ! . Obviously the majority of labels in are not admissible for Oi . Therefore in the first stage of matching we compile a list of admissible labels for any scene node Oi denoted by i . This list is constructed by measuring the mean square error between the unary measurement vector for scene node Oi and the vectors of unary relations for all nodes in the model graph. In the second stage of matching the modified labelling probability updating formula is applied[2]:

X

P (n+1) (i Q(n) (i

Y

A

P (n) (i = ! )Q(n) (i = ! ) (n) (n) ! 2 P (i = ! )Q (i = ! ) (1)

= ! ) = P

= ! ) =

X P (n)( = ! )P (A  = ! ;  = ! ) i j j j 2Ni ! 2f i \ g X P (n)( = ! ) + (2) ij j

f

! 2 i

f i \ g

j



g

The relaxation labelling technique updates the labelling probabilities in an iterative manner using the contextual information provided by the nodes of the graph. In this formulation Q(i = ! ) is the support function which measures the consistency of the label assignments to the scene

nodes in the neighbourhood of Oi assuming Oi takes label ! . The labelling consistency is expressed as a function of the binary measurement vectors associated with the centre node Oi and its neighbours. The support function consists of two parts: the first part measures the contribution from neighbours (the main support) and the second part is added to balance the number of contributing terms via the other labels in [2].  is a parameter which plays the role of the binary relation distribution function P ( ij ji = ! ; j = ! ) when the model nodes ! and ! are not neighbours. Upon termination of the relaxation labelling process, we have a list of correspondences between the nodes of the scene and model graphs.

A

3

MNS method

In the MNS method proposed by Matas et al[7] an image is described using a number of local invariant colour features computed on the image multimodal neighbourhoods. In the first step of MNS representation, the image plane is covered by a set of overlapping windows. For every neighbourhood defined in this manner, the modes of the colour distribution are computed with the mean shift algorithm[5]. The neighbourhoods are then categorised according to their modality as unimodal, bimodal, trimodal, etc. The invariant features are only computed from the multimodal neighbourhoods. For every pair of mode colours mi and mj in a multilodal neighbourhood the 6-dimensional vector v = (mi ; mj )( in the RGB 2 domain ) is constructed. The computed vectors are then clustered in the RGB 2 space using the mean shift algorithm[5]. As the output of this process, for each detected cluster its representative vector is stored. The collection of all cluster representatives constitutes the image signature. At the query time the signature of a test image is matched to each of the model signatures separately. In this process each model is given a score according to dissimilarity between its signature with the associated test image signature. The models are then rank ordered according to their scores. The details of the matching between a test signature D and a model signature Q are as follow: Consider the test and model signatures as a set of features D = ff i D : i = 1::mg and Q = ff j Q : j = 1::ng. Recall that each feature in these sets is a 6-dimension vector in the RGB 2 space. For i ,f j the distance d(f i ; f j )  dij is used as every pair fD D Q Q the similarity measure between the two features. Now the test and model signatures D and Q are considered as a bipartite graph where the edge between pair of nodes i and j is described by the distance dij (dij = dji ). In this regard, a match association function u(i) : Q ! 0 D is defined as a mapping of each model feature i to a proper test feature or to 0 (in case none of the test features matches). In the same manner a test association function v (j ) : D ! 0 Q maps

1051-4651/02 $17.00 (c) 2002 IEEE

S

S

( )=0

= (

() =

(

()= ( )=0 P ij +

)

d

(

: ( )=0)

(

T

: ( )=0)

and model signatures. This measurement consists of two parts. In the first part the quantity that the features of the candidate model are matched to the test features is measured whereas second part is added to penalise any unmatched model features. Finally the dissimilarity measurements provided for all models are taken into account to rank the models in the order of their similarities to the test image.

4 Using MNS to prune the model candidates As the complexity of labelling in the ARG method directly depends on the size of the graph involved in a match, any effort to reduce the graph complexity would speed up the matching process. It is worth recalling that during ARG matching, before labelling and at the end of each iteration, we prune inadmissible labels from the candidate list of labels for each node in the test graph. We expect to achieve a further gain in the matching speed by pruning the model candidates using the MNS method. This new system is referred as the MNS-ARG method. We designed an experiment to demonstrate the effect of model pruning on the performance of the ARG method. We compared ARG with the MNS-ARG method from the recognition rate and the recognition speed points of view. The experiment was conducted on the SOIL-47 (Surrey Object Image Library) database objects each of which has been imaged which contains from viewing angles spanning a range of up to  degrees. The database is available online[9]. In this experiment we model each object using its frontal image while the views of the objects are used as test images. The other  pixels. size of images used in this experiment is For each test image we applied the MNS method to rank the candidate models matched to it. In this regard, different rank orders were selected to evaluate the MNS matching capability. In fig 2 we plot the percentage of cases in which the candidate rank order includes the correct model as a function of the object pose. The results illustrate that

21

20

47

90

288 360

80 70 60 50 40 Rank Rank Rank Rank

30 20 5

10 image pose

order: order: order: order:

15

1 3 6 15 20

100

)= h 8i u i 6 8i u i The function  measures the dissimilarity between the test D; Q

90

Figure 2. The likelihood of the correct model being in the rank order

()=0

P

100

0

percentage of correct recognition( rank 1)

()=0

Likelihood of correct model being in the rank

each test feature in D to a feature in Q or to 0. A threshold Th is used to define the maximum allowed distance between two matched features. The algorithm is summarised as follows: and v j 8 i; j . 1. Set u i 2. From each signature s compute the invariant features i j according to the colour change model dictated fD ; f Q by the application. i j be3. Compute all pairwise distances dij d fD ; f Q tween the test and model features. 4. Set u i j; v j i if dij < dkl and dij < Th 8 k; l with u k and v l . 5. Compute signature dissimilarity as

90 80 70 60 50 Rank Rank Rank Rank

40 30 20 0

5

10 image pose

order: order: order: order:

3 6 10 15

15

20

Figure 3. The correct recognition rate ( rank 1 after the ARG matching)

1

the recognition rate (rank ) for the MNS method is not very high. As expected, when the rank order increases it becomes likely to include the correct model in the candidates rank list. For instance for rank order , in more than of cases we have the correct model among the candidates. The ARG method was then applied to identify the object model based on the rank order list selected by the MNS method. In fig 3 the recognition rate for the MNS-ARG method is plotted as a function of object pose for different rank order. The results show a good recognition performance for the case when the rank order is more than . The results in figs 2 and 3 show that, apart from the extreme object views, the recognition rate is limited by the MNS performance. For extreme object views, as expected, the MNS-ARG method fails to recognise the object, but notably all candidates are rejected( the miss-classification rate is as ). This rate is remarkable in comparison with the low as miss-classification rate in MNS which is up to (fig 2). The failure to recognise objects from their extreme views is due to the significant distortion of the segmented regions. In these situations ARG is not able to establish correspondence between the test image and the correct model. To demonstrate the effect of model pruning on the recognition rate of the ARG method the associated rates for ARG (without model pruning) and the MNS-ARG method for rank are plotted in fig 4. As a base line we added the matching performance of the MNS method for rank order to this graph. The results show that the model pruning improves the recognition rate for extreme object views. In

15

95%

10

10%

1051-4651/02 $17.00 (c) 2002 IEEE

75%

15

15

percentage of correct recognition

100 90

The average process time for a test image in seconds

these situations the hypotheses at a node of the test graph do not receive a good support from its neighbours (problem of distortion in image regions). Moreover a large number of labels involved in matching increases the entropy of labelling. Consequently it is less likely for a test node to take its proper label (instead of the null label). Referring to the result in fig 4 the recognition rate of MNSARG for some object views is occasionally slightly lower than that of the ARG method . The failure is due to the absence of the correct model among the candidates in the rank order list. We now consider the computational advantage of the model pruning. As the model images in both the ARG and MNS methods are represented off-line we do not consider the cost of model construction in the recognition system processing. In the ARG method the recognition task consists of two stages: the representation of the test image in an ARG form and the graph matching. We refer to the associated process times as tGR and tGM respectively. By analogy, MNS matching also involves two stages: the extraction of the image MNS signature and the signature matching. The corresponding process times are referred as tSR and tSM respectively. The total recognition time for the ARG matching is TARG = tGR + tGM . When we deploy MNS for model pruning, the total MNS-ARG process time is TMNS ARG = tSR + tSM + tGR + tGM . In fig 5 we plot the average process time which the ARG and MNS-ARG methods take to recognise the object in a test image. The results demonstrate that the speed gain obtained by pruning the model list is significant. For instance considering MNS-ARG with the rank order 15, the recognition time is about 18 seconds which is less than half of the recognition time for the ARG method. Note that this gain of speed is achieved without losing recognition rate.

40 35 30 T tot tSR t SM t GR t

25 20 15

GM

10 5 0

3

6

10

15

47

Number of model Candidates in rank

Figure 5. The average time for the test image recognition ARG and MNS-ARG methods contextual matching using MNS to prune the number of candidate models.In the next stage ARG matching is applied to identify the correct model for the object in a test image. The results of experiments showed a considerable gain in matching speed. As another benefit of model pruning, the results showed the improvement in the recognition rate for extreme object views.

References [1] A.R Ahamdyfard and J Kittler. Region-based object recognition: Pruning multiple representations and hypotheses. In Proceedings of BMVC, pages 745–754, 2000. [2] A.R Ahmadyfard and J. Kittler. Enhancement of arg object recognition method. To appear in EUSIPCO2002. [3] A.P. Berman and L.G. Shapiro. Efficient content-based retrieval: Experimental results. Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Databases, pages 55–61, 1999. [4] W.J. Christmas, J. Kittler, and M. Petrou. Structural matching in computer vision using probabilistic relaxation. IEEE Transactions on PAMI, pages 749–764, 1995.

80

[5] D. Comaniciu and P. Meer. Mean shift analysis and applications. In Proceedings of ICCV, pages 1197–1203, 1999.

70 60 50 ARG Rank order: 15 MNS−ARG method ARG method

40 30 0

5

10 image pose

15

20

Figure 4. The percentage of correct recognition for ARG and MNS-ARG methods (rank size 15)

5 Conclusion The problem of object recognition from 2D views was addressed. A new object recognition system based on a combined Attributed Relational Graph(ARG) and Multimodal Neighbourhood Signature (MNS) methods was proposed. In the proposed system first we perform non-

[6] R. Haralick and L. Shapiro. Image segmentation techniques. Computer Vision, Graphics and Image Processing, pages 100–132, 1985. [7] J. Matas, D. Koubaroulis, and J. Kittler. Colour image retrieval and object recognition using the multimodal neighbourhood signature. In Proceedings of ECCV, pages 48–64, 2000. [8] M.J. Swain and D.H Ballard. Colour indexing. Intl. Journal of Computer Vision, 7(1):11–32, 1991. [9] www.ee.surrey.ac.uk/CVSSP/demos/colour/soil47/.

1051-4651/02 $17.00 (c) 2002 IEEE

Suggest Documents