Exploiting Spatial Context in Image Region Labelling ... - userpages

1 downloads 0 Views 543KB Size Report
was prepared based on the segmentation produced during hypothesis set generation. We support 10 concepts (per- son, boat, sand, building, road, mountain, ...
Exploiting Spatial Context in Image Region Labelling Using Fuzzy Constraint Reasoning Carsten Saathoff and Steffen Staab ISWeb - Information Systems and Semantic Web University of Koblenz-Landau, Germany http://isweb.uni-koblenz.de/

Abstract We present an approach for integrating explicit knowledge about the spatial context of objects into image region labelling. Our approach is based on spatial prototypes that represent the typical arrangement of objects in images. We use Fuzzy Constraint Satisfaction Problems as the underlying formal model for producing a labelling that is consistent with the spatial constraints of prototypes.

1

Introduction

In order to leverage semantic retrieval of images, formal annotations are required. Region level annotations improve the global annotation accuracy [3] and provide additional and more detailed information about the image contents. But only few approaches try to exploit the relations between different regions of an image, such as the spatial arrangement, although recent publications [8] have shown their potential to improve the accuracy of region labelling. Graph models, such as conditional random fields, have mainly been proposed to model the spatial features so far, but also need large amounts of training data. In this paper, we present an approach for exploiting explicitly represented spatial constraints, so called spatial prototypes, in image region labelling using Fuzzy Constraint Satisfaction Problems. They provide a model to solve systems of mutually constrained variables, and due to their foundation on fuzzy set theory also account for the uncertainty inherent to image understanding. They provide a model with explicit knowledge that we will show is suitable to aid the task of region labelling. An evaluation of our approach indicates that we achieve a labelling accuracy comparable to the approaches just mentioned. Thereby we require fewer training data. Core to our approach is the semi-automatic acquisition of constraints using data mining of manual image annotations of the training data set.

2

Overall analysis framework

Figure 1 depicts the parts of the overall image processing in our system relevant for the purpose of region labelling in this paper (cf. [2] for a more detailed system description). Constraint Acquisition Sea Hypothesis Generation

Spatial Relations Extraction

Spatial Reasoning

Figure 1. The overall analysis chain. The overall framework constitutes two phases. First the background knowledge is created during a constraint acquisition step based on a set of labelled example images (cf. Section 5). In this step we model a set of spatial prototypes using a semi-automatic approach. A constraint prototype defines the legal arrangements of objects within an image as a set of examples, which are later used to create the constraints within the spatial reasoning step.

(a) input image

(b) hypothesis sets

Figure 2. Hypotheses set generation. The second phase is the image analysis procedure itself. In this phase, the input image is first processed for hypotheses set generation, i.e. segmenting the input image and classifying each resulting region. For each label a support vector machine was trained based on the characteristic features of each segment. Each SVM provides a confidence score from the interval [0, 1] based on the distance to the separat-

ing hyperplane. An example segmentation with simplified hypotheses is depicted in Figure 2. In order to integrate the spatial context within our fuzzy constraint reasoning approach, we determine the spatial relations between the regions using the spatial relations extraction module (Section 3). Eventually, we transform the hypotheses sets, spatial relations and spatial prototypes into a fuzzy constraint satisfaction problem (Section 4). The solution to this problem is a good approximation of an optimal solution to a spatially aware labelling of image segments.

3

Spatial relations extraction

Within our region labelling procedure we consider 6 relative and 2 absolute spatial relations to model the spatial arrangements of the regions within an image. The relative spatial relations are either directional (above, below, left, right) or topological (contains, adjacent), and the absolute spatial relations are above-all and below-all. above

b α

left

a aboveall

right

below

(a) directional

(b) absolute

Figure 3. Definition of the a) directional and b) absolute spatial relations. The directional relations are computed based on the centres of the minimal bounding box containing a region. We have illustrated the definition of the directional relations in Fig. 3a. Based on the angle α we determine in which area the centre of the related region lies, and instantiate the corresponding directional relation. For containment we determine whether the bounding box of one region is fully contained in the bounding box of another region, and instantiate the relation if this is the case. Finally, two regions are adjacent, if they share at least one pair of adjacent pixels. Computing whether a region is above-all or below-all is again based on the centre of the bounding box. We include the regions with the highest centre and for which the centre lies above a certain threshold, which is given relative to the image size. An example is depicted in Figure 3b.

4

Exploiting spatial features using Fuzzy Constraint Satisfaction Problems

We transform the segmented and labelled image along with the spatial prototypes into a Fuzzy Constraint Satisfaction Problem. In the following, we will first introduce

Fuzzy Constraint Satisfaction Problems as a formal model and then discuss the transformation. Our definition is based on [5] extended with fuzzy domains.

4.1

Fuzzy Constraint Satisfaction Problems

A Fuzzy Constraint Satisfaction Problem consists of an ordered set of fuzzy variables V = {v1 , . . . , vk }, each associated with the crisp domain L = {l1 , . . . , ln } and the membership function µi : L → [0, 1]. The value µi (l), l ∈ L is called the degree of satisfaction of the variable for the assignment vi = l. Further, we define a set of fuzzy constraints C = {c1 , . . . , cm }. Each constraint cj is defined on a set of variables v1 , . . . , vq ∈ V , and we interpret a constraint as a fuzzy relation cj : Lq → [0, 1], which we call the domain of the constraint. The value c(l1 , . . . , lq ), vi = li is called the degree of satisfaction of the variable assignment l1 , . . . , lq for the constraint c. In case that c(l1 , . . . , lq ) = 1, we say that the constraint is fully satisfied, and if c(l1 , . . . , lq ) = 0 we say it is fully violated. The purpose of fuzzy constraint reasoning is to obtain a variable assignment that is optimal with respect to the degrees of satisfaction of the variables and constraints. The quality of a solution is measured using a global evaluation function, which is called the joint degree of satisfaction. We first define the joint degree of satisfaction of a variable, which determines to what degree the value assigned to that variable satisfies the problem. Let P = {l1 , . . . , lk }, k ≤ |V | be a partial solution of the problem, with vi = li . Let Ci+ ⊆ C be the set of the fully instantiated constraints containing the variable vi , i.e. each constraint c ∈ Ci+ is only defined on variables vj with lj ∈ P . Further, let c without explicitly specified labels stand for the degree of satisfaction of c given the current partial solution. Finally, let Ci− ⊆ C be the set of partially instantiated constraints on vi , i.e. at least one of the variables has no value assigned. We then define thePjoint degree of satisfaction as − 1 1 ( |C + +C dos(vi ) := ω+1 − ( c∈Ci+ c + |Ci |) + ωµi (li )), i i | in which ω is a weight used to control the influence of the degree of satisfaction of the variable assignment on the joint degree. In this definition we overestimate the degree of satisfaction of partially instantiated constraints. We now define the joint degree of satisfaction for a complete Fuzzy Constraint Satisfaction Problem. Let J := {dos(vi1 ), . . . , dos(vin )} be an ordered multiset of joint degrees of satisfaction for each variable in V , with ∀vik , vil ∈ V, k < l : dos(vik ) ≤ dos(vil ). The joint degree of satisfaction of a variable that is not yet assigned a value is overestimated to 1. We can now define a lexicographic order >L on the multisets. Let J = {γ1 , . . . , γk }, J 0 = {δ1 , . . . , δk } be multisets. Then J >L J 0 , iff ∃i ≤ k : ∀j < i : γj = δj and γi > δi . If we have two (partial) solutions P, Q to a

Fuzzy Constraint Satisfaction Problem with according joint degree of satisfactions JP , JQ , solution P is better than Q, iff JP >L JQ . Based on these definitions a Fuzzy Constraint Satisfaction Problem can efficiently be solved using algorithms like branch and bound.

4.2

Representing image region labelling as a FCSP

We will now discuss the representation of the regions, the hypothesis sets and spatial constraints as a Fuzzy Constraint Satisfaction Problem. Let L be the set of labels. Let S = {s1 , . . . , sm } be the set of regions. Each region si is associated with a membership function θi : L → [0, 1] which models the hypothesis set, i.e. the confidence values obtained during classification. Further, let R = {r1 , . . . , rk } the set of spatial relations. Each r ∈ R is defined as r ∈ S in case of absolute spatial relations, and r ∈ S 2 for relative ones. Further, each spatial relation is associated with a type type ∈ T , where type refers to the spatial relations extracted in Section 3. Now we have to define the spatial prototypes, which represent the background knowledge. For each type ∈ T a spatial prototype p ∈ P exists. Each prototype is defined as a fuzzy relation on L, i.e. p : Ln → [0, 1]. Please note that a spatial prototype is defined in the same way a constraint is (cf. Section 4.1). The difference is that a constraint only exists within the Fuzzy Constraint Satisfaction Problem and is associated with a set of variables. The prototype basically defines the domain of the constraint. A Fuzzy Constraint Satisfaction Problem is now created using the following algorithm. 1. For each region si ∈ S create a variable vi on L with µi := θi . 2. For each region si ∈ S and for each spatial relation r of type type defined on si and further segments s1 , . . . , sk create a constraint c on vi , v1 , . . . , vk with c := p, where p ∈ P is a spatial prototype of type type.

5

Constraint acquisition

The spatial prototypes constitute the background knowledge in our approach. Manually defining these prototypes is a tedious task, specifically if the number of supported concepts and spatial relations becomes larger. We derive spatial prototypes representing spatial constraints from example annotations. We mine example annotations using support and confidence as selection criteria. In addition, one may manually refine or delete spatial constraints.

In order to generate the prototypes we select the concrete spatial relations found in the labelled examples. Each spatial relation relates a number of segments s1 , . . . sn (cf. 4.2). Since our images are labelled, each segment si has also a related label li . Now, we want to generate the spatial prototypes p ∈ P , which are relations on Ln . We can interpret the relation as a set of tuples, and the purpose of the constraint acquisition is to define these tuples. Let Rtype be a list (with possibly duplicate entries) of concrete spatial relations found in the labelled examples of the type type. Since each segment has a related label, we can generate a tuple (l1 , . . . , ln ) for each spatial relation on segments (s1 , . . . , sn ). However, using all tuples to generate the spatial prototype would also include a lot of noise for cases, where the segmentation or spatial relations extraction provided suboptimal results. We therefore use support and confidence for selecting the tuples that can be considered robust. Let |(l1 , . . . , ln )| be the number of times a specific tuple, and |(∗, l2 , . . . , Ln )| be the number of times a tuple with an arbitrary l ∈ L in the first position was found. The 1 ,...,ln )| , and support is then defined as σ((l1 , . . . , ln )) = |(l|R type | the confidence as γ((l1 , . . . , ln )|) =

6

|(l1 ,...,ln )| |(∗,l2 ,...,ln )| .

Experimental results and discussion

We have carried out an evaluation on a set of 916 images depicting natural and urban scenes. Our ground truth was prepared based on the segmentation produced during hypothesis set generation. We support 10 concepts (person, boat, sand, building, road, mountain, water, sky, plant, snow), which cover a majority of the concepts found in the test data. If the segmentation produced regions depicting more than one object, we chose the label covering the largest part of the segment. Regions depicting objects that were not in the list of supported concepts or where we could not decide which object is depicted, were labelled with unknown. Unknown segments are ignored both during evaluation and mining of constraints. The ground truth was prepared using the tool and data format provided by Barnard et al. [1]. For acquiring the spatial prototypes we used a set of 151 images. As a threshold for the confidence we used 30%. Tests have shown that filtering based on support did not lead to improved prototypes. We interpret this as an evidence that total number of spatial arrangements is not important, but that there is a statistical dependence between the involved labels. The time needed for manual refinement was low compared to the ground truth preparation, although it is currently not supported by any tool. We have summarised the results on the remaining 765 images in Table 6. The support vector machine results are based on the labels with the highest degree of confidence

person boat sand building road mountain water sky plant snow avg

precision svm csp 0.65 0.71 0.59 0.70 0.66 0.73 0.85 0.84 0.48 0.40 0.62 0.54 0.60 0.65 0.55 0.61 0.67 0.63 0.42 0.36 0.61 0.62

recall svm csp 0.74 0.74 0.34 0.31 0.68 0.72 0.46 0.49 0.09 0.22 0.33 0.47 0.77 0.76 0.88 0.85 0.59 0.63 0.20 0.20 0.51 0.54

f-measure svm csp 0.69 0.72 0.43 0.43 0.67 0.73 0.60 0.62 0.15 0.28 0.43 0.50 0.68 0.70 0.67 0.71 0.63 0.63 0.27 0.25 0.52 0.56

gain % 4 -1 9 4 89 17 3 6 1 -4 7

Table 1. Comparison of best hypothesis vs. FCSP with manually refined constraints.

in the hypothesis set. For nearly all concepts, the FCSP approach achieves an improvement compared to the pure SVM results. The overall gain is 7%, which is comparable with results reported in [8], however achieved with less training examples. Boat and snow were both better detected by the SVMs, and for them spatial context obviously is not a discriminating feature. Road and mountain have the highest gain, so that spatial context seems to be more important for their detection. We achieved the best performance using only adjacent regions, and neither combining adjacent and non-adjacent relations into a single one, nor treating them as distinct relations improved the results. Also using eight directional relations did not lead to an improvement. The absolute spatial relations did improve the results slightly, although they represent rather strong heuristics. Finally we also evaluated the influence of the spatial relation extraction parameters. We observed that distributing the four directional relations equally does result in worse accuracy. We achieved the best performance with an angle of 160◦ for above-of and below-of, respectively, and 40◦ for left-of and right-of, respectively. Further we also merged left-of and right-of into a single relation representing the vertical alignment of two regions.

7

Related work

The problem of identifying and labelling regions in images has gained some attention recently. In [1] a methodology and tools for evaluating the segmentation and labelling of images has been presented, including a set of over 1000 images with ground truth. A comprehensive study of using context for improving object recognition was carried out in [7], showing the importance of contextual information. In [8] a survey of using spatial features for image region labelling based on graph models. Their results are compara-

ble to our results, but required a larger amount of training examples. In [4] another approach based on explicitly defined spatial constraints was introduced that employed genetic algorithms to compute a final labelling. We have previously published preliminary results on using crisp constraints for labelling image regions in a smaller domain in [6], which already indicated the usefulness of spatial context. However, the introduction of uncertainty was required for being able to apply the approach to larger domains.

8

Conclusions and future work

We have introduced an approach for integrating spatial context into image region labelling. A core feature of our approach is the inclusion of explicit knowledge that models the typical arrangement of objects in natural and urban scenes. An evaluation of the approach indicated that we achieve comparable performance with fewer training examples. Our future work will concentrate on the efficient and interactive acquisition of spatial constraints. We hope to achieve a further reduction of the labelled examples required to generate a robust and well performing set of spatial prototypes. Acknowledgements This research was partially supported by the European Commission under contract FP6-001765 aceMedia. The expressed content is the view of the authors but not necessarily the view of the aceMedia project as a whole.

References [1] K. Barnard, Q. Fan, R. Swaminathan, A. Hoogs, R. Collins, P. Rondot, and J. Kaufhold. Evaluation of localized semantics: data, methodology, and experiments. Int. J. Computer Vision, 2007/2008, to appear. [2] S. Dasiopoulou, J. Heinecke, C. Saathoff, and M. G. Strintzis. Multimedia reasoning with natural language support. In Proc. of ICSC 2007, pages 413–420, 2007. [3] J. Fan, Y. Gao, and H. Luo. Multi-level annotation of natural scenes using dominant image components and semantic concepts. In Proc. of MULTIMEDIA ’04, pages 540–547, New York, NY, USA, 2004. ACM. [4] P. Panagi, S. Dasiopoulou, T. G. Papadopoulos, Kompatsiaris, and M. G. Strintzis. A genetic algorithm approach to ontology-driven semantic image analysis. In Proc. of VIE 2006, pages 132–137, 2006. [5] Z. Ruttkay. Fuzzy constraint satisfaction. In Proc. of Fuzzy Systems 1994, volume 2, pages 1263–1268, 1994. [6] C. Saathoff. Constraint reasoning for region-based image labelling. In Proc. of VIE 2006, pages 138–143, 2006. [7] A. Torralba. Contextual priming for object detection. Int. J. Comput. Vision, 53(2):169–191, July 2003. [8] J. Yuan, J. Li, and B. Zhang. Exploiting spatial context constraints for automatic image region annotation. In Proc. of MULTIMEDIA ’07, pages 595–604. ACM, 2007.

Suggest Documents