Solving Diverse Image Understanding Problems Using the Image Understanding Environment John Dolan, Charles Kohl, Richard Lerner Amerinex Arti cial Intelligence, Inc. Amherst, MA
[email protected]
[email protected]
http://www.aai.com
Terrance Boult
Joseph Mundy
GE Corporate Research and Development Schenectady, NY
J. Ross Beveridge
Electrical Engineering and Computer Science Dept. Lehigh University Bethlehem, PA
Computer Science Department Colorado State University Fort Collins, CO
http://www.eecs.lehigh.edu/boult
http://www.cs.colostate.edu/~ross
[email protected]
Abstract This paper presents the Image Understanding Environment (IUE) in the context of a number of image understanding problem domains. The intent is to demonstrate, via concrete examples, how the IUE can assist the user in solving real problems. The particular problems are chosen to provide a representative, rather than exhaustive, sampling of current vision research, and are also intended to illustrate the range of IU constructs provided. The primary components of the IUE are: a rich and user-extensible hierarchy of prede ned, pretested classes; libraries of established IU alogorithms; and an infrastructure that supports interoperability with existing environments. Taken together these allow the user to focus on the IU problem at hand, rather than perpetually reinventing the identical representations and functionality.
1 Introduction Researchers in computer vision have long understood the critical role that representation plays in successfully posing and solving image understanding problems [Marr and Nishihara, 1977; Agin, 1972; Binford, 1971]. In fact, Marr [1978], among others, suggested that multiple representations are required to adequately characterize the various aspects and dimensionalities (2d, 2 12 , 3d) of the vision task. This has led to the exploration d
1 This work supported by ARPA under TEC contract DACA76-93-C-0015.
[email protected]
of speci c representations for images, curves, surfaces, and solids, as well as for relational and computational structures like trees, networks, graphs, indexes, etc. What has emerged from this process is the notion that there exists a set of representations and functionalities whose elements are common across individual research areas. The IUE is an attempt to identify such commonalities and to provide those representations, behaviors, and capabilities that could bene t the Image Understanding (IU) community at large. More speci cally, the IUE is a programming environment designed to promote research and productivity and to facilitate the exchange of research results within the IU community. To enable this, it provides a well-documented, modular, objectoriented, C++ class hierarchy; implementations of established IU algorithms; and the ability to interoperate with existing environments. IUE Core, the rst public release of the IUE, is currently available.
1.1 Class Hierarchy The IUE class hierarchy provides broad coverage of IU constructs. Designed by a team of IU researchers, the IUE strives to satisfy the following goals:
provide a broad range of classes to cover IU
constructs. provide multiple representations for key constructs to oer exibility to the user.
embed common functionality to avoid duplication of eort.
support extension of the hierarchy by users. Because no class hierarchy can anticipate every future development, the IUE provides an extensible framework that can be specialized at individual sites to meet speci c research goals, while maintaining full compatibilityand interoperability with the rest of the IUE hierarchy. The major branches of the IUE class hierarchy are described below.
1.1.1 Base Classes The IUE provides traditional data structures, such as arrays, matrices, lists, trees, graphs, and sets. Due to the importance of mathematical models in IU research, particularly set theory, many IUE classes are derived from set. Most IUE base classes are implemented in terms of the C++ STL library.
1.1.2 Spatial Objects Spatial objects represent objects in space. The hierarchy includes both geometric spatial objects as well as topological spatial objects, which represent the connectivity of an object or set of objects, irrespective of its metric properties. The geometric classes cover all geometric objects in one, two, and three dimensions, and a limited number of objects in dimensions. Arbitrarily complex objects can be represented by composite spatial objects, such as topology networks and part instance networks. n
1.1.3 Coordinate Systems and Transforms IUE coordinate systems and transforms represent the geometric relationships among physical entities, between sensors and scenes, and between pixels and the world. Every geometric entity has an explicit coordinate system. A coordinate transform graph allows objects to be placed in a common coordinate system, and provides the information needed to transform points from one coordinate system to another.
1.1.4 Images IUE image classes support ecient manipulation of many forms of image data, from intensity and color images to complex composites. Image accessor classes provide highly ecient pixel value access that, to the user, is independent of the underlying image representation.
1.1.5 Image Features Image feature classes represent events that are detected in, or extracted from, the image (e.g., edgels, regions, perceptual groups). Image features contain geometric information, inherited from spatial object classes, connectivity information, inherited from topological classes, and references to the supporting image data.
1.1.6 Scenes and Sensors These classes model the physical mechanisms of data sensing and the sensing process. The sensor hierarchy includes classes to represent lters, lenses, radiation, and the scene itself.
1.2 Dynamic Attributes In designing classes for the IUE, there is a continuing tug-of-war between wanting objects that are as small and fast as possible and endowing objects with properties and information that, while only used occasionally or by a small group of users, are invaluable when used. Should objects carry around their drawing color? Should image-regions maintain intensity-distributions? How about distributions for red, green or blue? Should a parametric plane store the implicit equation associated with it? Such choices between space, time, and
exibility are extremely dicult, especially without feedback about system usage. These choices impact not just a single class, but all classes that inherit from it. Adding a property high in the object hierarchy improves consistency and increases the chances for code reuse and polymorphism. Unfortunately, it also imparts a storage penalty on the class itself and all of its subclasses. To ameliorate this condition, the IUE uses a technique, called dynamic attributes, that provides unprecedented exibility in C++ class design. Properties that are expected to be heavily used are declared as hard attributes, and are stored within the object as C++ data members. In addition, IUE classes can declare that an object will have a particular property while not requiring each instance to store it. These soft attributes have default values declared in the IUE speci cation. An object would return this default value until such time as the program modi es that attribute. If and when the object modi es that attribute, storage is allocated on the object's property-list. It is important to note that space is allocated only for the instances that have a non-default value; all other objects still use the default and thus incur no additional space penalty. Furthermore, the attribute access interface is identical for both hard and soft
attributes, allowing a small change in the class speci cation to produce the desired space/time tradeo. Thus, as the IUE design evolves, attributes which might be viewed as optional can be added without incurring signi cant storage overhead on the instances of a class. Accessing soft attributes on the properly list is slower than accessing hard attributes. This can be a problem when an attribute, which appears optional on one class and so is declared soft, proves ultimately to be essential in a subclass that will access it frequently. The solution is again the dynamic attributes mechanism, which allows such subclasses to \harden" any soft attribute thereby gaining the speed (and space) of a dedicated datamember. Signi cantly, all code of the parent classes, in which the attribute was soft, instantly gains most of this speed increase. Even precompiled library code sees this speed increase! To facilitate rapid proto-typing, run-time attributes can also be added to existing classes without the need to update the class de nition or de ne a new subclass. In this sense, such attributes behave much like a property-list in lisp, however there exists a consistent access mechanism for both run-time and predeclared attributes. Thus the user may eventually de ne a new subclass adding the new attributes, with their existing code continuing to function, only faster.
time support for adding arbitrary attributes to objects (\dynamic attributes"), these additional attributes are accessible within IUE programs.
1.3 Data Exchange
Amerinex has completed the rst two years of a staged three year development eort to implement the IUE. During the rst two years, Amerinex developed and released the IUE Data Exchange speci cation and libraries, and IUE Core, the rst public release of the IUE. The current, third year eort focuses on the implementation of a wide variety of IU algorithms (the IUE Libraries), to be distributed with the IUE. This eort will culminate in the release, at the end of 1996, of IUE V2.0.
A prime objective of the the IUE is to facilitate the exchange and reuse of IU research results. To this end, IUE Core de nes a standard le format for exchanging IU constructs, and provides facilities for reading and writing these les, both from within the IUE and from foreign environments. The IUE Data Exchange (DEX) Speci cation (as documented in the IUE Data Exchange Manual) enumerates the IU entities that can be exchanged and the minimum set of components required to represent them. The DEX le format speci cation de nes the physical representation used to encode the abstract object representations. The physical representation is capable of representing both the attribute values and the object relationships that de ne the state of an IU entity. Furthermore, the format supports the representation of arbitrary user-de ned attributes that are attached to objects, in addition to their minimal representation. This capability allows users to extend the set of objects that can be transmitted between related environments, while ensuring that any DEX reader/object instantiator can read these objects. Since the IUE provides run-
1.4 IUE Libraries The IUE libraries provide standard implementations of useful medium-to-coarse grain IU operations. These libraries form the basis for distributing IU algorithms within the community. The basic libraries will include tasks that are frequently reimplemented in every lab, such as standard image operators, common feature extraction and segmentation algorithms, and procedures that support the display and user editing of IUE spatial objects and images. By providing these libraries in a common environment, the IUE enables researchers to apply more eort to advancing IU technology, and to more easily share and validate the results of others. The payo grows even larger as the community adds its results to these libraries. Relieved of the need to recode new algorithms for each local environment, these algorithms can be meaningfully compared with other approaches, and can be eectively used to advance the eorts of others.
2 Development schedule for the IUE
3 Problem Domains The following sections illustrate the applicability of the IUE to various IU problem domains. This is accomplished by identifying some of the key representational and algorithmic issues for each area. In each case, the solution of a representative problem indicates how the IUE can aid the vision researcher in addressing these issues. Although the examples given below are primarily couched in terms of twodimensional images and image features, it is important to note that the IUE provides support for 3d and higher dimensions as well.
3.1 Visual Tasks Visual tasks are frequently categorized as lowlevel, intermediate-level, and high-level vision, depending on where they occur in the visual pathway. Low-level vision is concerned with the early stages of visual processing where simple primitive features like edges, regions, textures, etc. are computed from the image. Processing is typically data-driven and bottom-up, and the computation is usually local and identical over all image positions. The result may be a new image or set of images; or it may be a set of image features, which are geometric, symbolic entities encapsulating local structural properties of the image. Intermediate-level vision occurs in the middle stages of visual processing where primitive features are transformed into more complex, more coherent structures. In contrast to the features computed in low-level vision, features here are more likely to be properties of the objects in the scene than properties of the image itself. The computations themselves are usually a mixture of bottomup and top-down strategies. High-level vision usually occurs in the later stages of visual processing. It typically involves expensive computations that incorporate world knowledge to nally recover objects from their computed features. In the following sections, we explore representative processing from a number of areas of low-, intermediate-, and high-level vision. These include image processing, edge detection, segmentation, perceptual organization, topology, and stereo.
3.1.1 Image Processing Image processing involves mapping an image to an image. In computer vision, image processing techniques are typically used in preprocessing, where their purpose is to either enhance certain properties or correct certain de ciencies in the image. Examples are smoothing, contrast enhancement, quantization, and image warping. The particular operation performed may be de ned over a single pixel or a neighborhood of pixels and it may be applied to the entire image or restricted to a region of interest. The IUE provides a set of diverse image classes (Figure 1). These support multiple data storage formats while providing a uniform view to the user. Thus, when dealing with an RGB image, users can access the data as RGB tuples without concern for whether the data is stored as three 2d
generic−image generic−image−collection mosaic−image pyramid−image stereo−image simple−image scalar−image scalar−image−2d scalar−image−2d−of scalar−image−3d scalar−image−3d−of tuple−image tuple−image−2d tuple−image−2d−of tuple−image−3d tuple−image−3d−of rgb−image rgb−image−2d rgb−image−2d−of rgb−image−3d rgb−image−3d−of complex−image complex−image−2d complex−image−3d sequence−image sequence−image−2d sequence−image−2d−of sequence−image−3d sequence−image−3d−of
Figure 1: Public image classes. arrays or as a 2d array of triples. Methods on images are very general for the image classes at the root of the hierarchy, so that they may utilize \generic" representations of data whose exact form is not known. For instance, the very root of the hierarchy, the IUE generic image, provides methods that act upon pixels whose external data type (complex, RGB, scalar), dimensionality (2d, 3d), and internal data type ( oat, int, or char) are not known. Working down the hierarchy towards the leaf class IUE scalar image 2d of IUE INT , there are classes that provide an interface to any image at all, to any scalar image, to any 2d scalar image, and nally, to 2d scalar images of integers. While the image classes provide generic pixel data accessors, the IUE image accessor classes allow users to gain direct access to the most specialized (and ecient) routines for accessing data in a particular representation. Image accessors are templated on presentation type (the return type of the access methods), and the desired out-ofbounds behavior. For example, the following function uses an image accessor to apply a threshold
to an image:
PT point_imacc.get ( int x, int y);
thresholdImage ( const IUE_scalar_image_2d& inputIm, IUE_scalar_image_2d& outputIm, IUE_INT threshold)
PT window_imacc.get ( int x, int y, int window_dx, int window_dy);
{
PT tuple_point_imacc.get ( int x, int y, int band);
IUE_INT
maxx = inputIm.x_size(), maxy = inputIm.y_size();
// create accessors for the input // and output images // with no out-of-bounds checking. IUE_image_point_accessor_2d < IUE_INT, IUE_image_accessor_boundnocheck < IUE_INT> > in_acc (inputIm), out_acc (outputIm); // iterate over all of the pixels for (IUE_INT y=0; y < maxy; y++) for (IUE_INT x=0; x < maxx; x++)
{ IUE_INT pix = in_acc.get(x,y); out_acc.set ( x, y, (pix < threshold) ? 1 : 0);
} }
This function creates accessors to access the input and output images, and then uses these accessors to threshold the image. The order of the for loops is important since the accessor cache assumes that rows of data will be accessed together. The IUE provides an iterator construct that ensures that the most ecient order is used. The image access interface is similar to that on the image classes. However, the image accessor provides internal caching to minimize accesses to the raw image data. Since the image-accessor methods are inlined, cache hits do not require any function calls. The IUE provides four types of image accessors: point, window, tuple-point, and tuple-window. These provide access to scalar pixels, a window of scalar pixels, a non-scalar pixel, and a window of non-scalar pixels, respectively. The get method for each of these is shown in the following code fragments, where PT is the presentation type the accessor template is instantiated for, x and y refer to a point in the image, window dx and window dy select a window element relative to the reference point, and band selects an element of a tuple:
PT tuple_window_imacc.get ( int x, int y, int window_dx, int window_dy,
int band);
The following code demonstrates the use of a tuple-window image accessor to smooth each band of an RGB image. In this example, we access the elements of each pixel in a 3 3 window around the reference pixel, compute their average and write the result using a tuple-point accessor. When creating the tuple-window accessor, we specify that the values of out-of-bounds values are to be re ections back into the image. Given a 512 512 image, the input image accessor reduces the number of accesses to the image data from 512 512 9 3 = 7 077 888 to 512 (one per row). ;
;
smoothRGBImage ( const IUE_rgb_image_2d& inputIm, IUE_rgb_image_2d& outputIm, IUE_INT threshold)
{ IUE_INT
maxx = inputIm.x_size(), maxy = inputIm.y_size();
// create accessor for the input image // that returns a constant upon // out-of-bounds access IUE_image_tuple_window_accessor_2d < IUE_INT, IUE_image_accessor_boundreflect , 3 > in_acc (inputIm); // point accessor for output image IUE_image_tuple_point_accessor_2d < IUE_INT, IUE_image_accessor_boundnocheck > out_acc (outputIm);
// iterate over all of the pixels for (IUE_INT y=0; y < maxy; y++) for (IUE_INT x=0; x < maxx; x++) for (IUE_INT band=0; band < 3; band++)
{ IUE_INT averaged_pixel = in_acc.get (x, y, -1, + in_acc.get (x, y, -1, + in_acc.get (x, y, -1, + in_acc.get (x, y, 0, + in_acc.get (x, y, 0, + in_acc.get (x, y, 0, + in_acc.get (x, y, 1, + in_acc.get (x, y, 1, + in_acc.get (x, y, 1, ) / 9;
( -1, 0, 1, -1, 0, 1, -1, 0, 1,
band) band) band) band) band) band) band) band) band)
out_acc.set (x, y, band, averaged_pixel);
} }
3.1.2 Edge Detection Edge detection is a typical task of low-level vision that is concerned with detecting discontinuities in the image intensity function. The underlying assumption is that such points frequently correspond to meaningful events in the scene, such as object boundaries. In addition to localizing these events, it is common to compute other properties such as edge direction, strength, contrast, and steepness. In detecting edges, one eectively models edge neighborhoods either explicitly or implicitly, and the IUE provides a number of constructs to do so (e.g., points, vectors, edgels, line segments, and surfaces). Such models may themselves form part of the detection computation or they may be the end result of the detection process. One common procedure for edge detection is that outlined by Canny [1986] which uses local maxima of the rst derivative of the image intensity function to localize edges. In the IUE, estimates of the derivatives of an image may be obtained by the usual procedure of discrete dierencing|i.e., by convolving the image with appropriate dierence masks. Using rst derivative masks, one can compute a discrete approximation of the gradient eld of the image function. Edges are then localized as points of the resulting gradient magnitude function that are locally maximal in the direction of the gradient. A global noise estimate is also used to lter out weak edges. The IUE provides a number of constructs that simplify the edge detection task. Image accessors ensure ecient convolution and a clear user model
(cf. section 3.1.1). Following the derivative computation, the resulting gradient magnitude function and gradient eld may be represented respectively as an IUE scalar image with a superimposed collection of vectors, via the class IUE point edgel 2d. This edgel class encapsulates a number of important attributes associated with an edge, especially position, direction, and strength. It also provides a number of optional attributes such as positional and directional covariances and left and right surface normals. The identi cation of edge points occurs in a nonmaximal suppression step, which eectively discards any edgels that are not locally maximal and those that are below the noise threshold. This computation may be achieved by any one of a number of strategies supported by the IUE including: 1. Neighborhood analysis of the gradient magnitude image (Figure 2(a)). At each point whose magnitude is above the noise threshold, a window accessor is used to extract a neighborhood of the gradient magnitude image. An example of a window accessor is given in Section 3.1.1. The local gradient vector indicates the direction along which to search that neighborhood (forward and backward from the current point). If in searching along the line of the current gradient, the magnitude of any encountered gradient is greater than the magnitude of the current gradient, and the directions of the two are roughly similar, then the current point is not a local maximum. Otherwise, it is marked as an edge. 2. Pro le analysis of gradient magnitude neighborhoods (Figure 2(b)). Although not part of the V1.0 release, V2.0 calls for the implementation of pro le methods on images that will return the interpolated cross-section of the image along a user-speci ed curve. The obvious choice is to use, at each point of suf cient magnitude, the neighborhood-limited line segment along the current gradient. It is then straight-forward to examine the resulting pro le curve, represented by the class IUE standard sampled curve, to determine whether or not the current point is locally maximum and therefore an edge. 3. Surface analysis of the gradient magnitude function (Figure 2(c)). V2.0 will see the addition of IUE discrete functional surface to the class library. An instance of this class can be created from an image or image neighborhood. The advantage of doing so is that
./cube0GradMag.im, (126.989, 113.248) to (130.78, 101.533)
100
80
60
40
20
0 0
2
4
(a)
6
(b)
8
10
12
(c)
Figure 2: Three strategies for edge detection in the IUE. (a) neighborhood analysis of the gradient magnitude image. (b) pro le analysis of a gradient magnitude neighborhood. (c) surface analysis of the gradient magnitude function. one can directly query the resulting surface for principal curvatures at any point. An edge will thus show up as a point in the gradient magnitude surface with large curvature in gradient direction and relatively small curvature in the tangent direction. Although there are bound to be trade-os between one approach and another, it is entirely within the spirit of the IUE to provide a rich set of alternatives and leave it to the user to decide which is best suited to the situation at hand. Therefore, while the last strategy may possibly be less ecient than the others, it oers a conceptually simple, high-level abstraction for the user. The net result of the non-maximal suppression stage is a set of edgels marking the position and direction of edges. Following localization, neighboring edgels can be linked in the tangent direction into extended edge structures, which can be directly represented in the IUE by the class IUE edgel chain 2d. The linking process is treated further in Section 3.1.5. Additional properties like strength, contrast, and steepness can be measured over the pixel neighborhoods abutting these edge structures, yielding a rather elaborate model of the edge neighborhood.
3.1.3 Segmentation Segmentation is another task common to low-level vision. It is concerned with partitioning the image pixels into patches of relative homogeneity in terms of their immediate or computed image properties. The assumption is that nearby points of an object will project to nearby points in the image
and that these will exhibit rough consistencies in terms of intensity, color, etc. Furthermore, nearby image points that originate from dierent objects will likely not exhibit such consistencies. To the extent that such assumptions hold, the regions resulting from a segmentation procedure are likely to represent surfaces (or portions of surfaces) of individual objects in the scene. The segmentation process takes an image as input and produces as output a set of regions, whose intersection is empty and whose union is the set of image pixel locations|i.e., a partition. Each region is represented in the IUE by an instance of the class IUE image region 2d. The segmentation itself is represented by an instance of class IUE label plane, which contains a 2d-array corresponding to the image locations, and an array of pointers, which indexes the individual regions. Figure 3 shows a simple example of a label-plane and two of the regions it indexes. The rationale for the label-plane is that it provides a computational expedient (adjusting the values in the 2d-array is faster than growing and shrinking individual regions) and the label-plane organizes the individual regions into an aggregate whole. Computing a segmentation typically involves three main sub-tasks: 1) assigning initial cluster labels to the image locations based upon the relative similarity of their image properties; 2) relabeling the clusters with region labels based upon their connectivity, as computed by a connectedcomponents algorithm; and 3) generating the corresponding symbolic regions that are used in subsequent processing, e.g., by a grouping process. Of these sub-tasks, the rst is the one that most
image-region-2d
label-plane
7000 6000 5000 4000 3000
• • •
2000 1000 100
Clear Colors
200
Exact Map
(a)
(b)
Figure 3: An example segmentation showing a simple label-plane with a pair of associated regions indexed by the label-plane.
Figure 4: (a) An example histogram partitioned about peaks. (b) The resulting image segmentation.
distinguishes individual segmentation algorithms. The connected-components task can be realized by any one of a number of generic, image-processing or computational-geometry algorithms [Rosenfeld and Kak, 1982, Section 11.3]. The region generation task depends in large part on the particular constructors of the IUE image region 2d class. At present, the most germane constructor is one that takes an extents box and a raw c-array marking the locations in the region. Following the taxonomy of Ballard and Brown [1982, Chapter 5], there are a number of techniques to compute cluster labellings including:
ily extract sub-images from an image and the class IUE histogram 1d. A histogram is a type of IUE accumulation array that represents the discrete, sample distribution of a quantized feature. It also provides methods for cluster identi cation based on smoothing and sensitivity parameters. Figure 4 illustrates a partitioned histogram and the corresponding cluster labels.
Local techniques. These examine only the local
neighbors of a pixel to determine how similar it is to them. If it is suciently similar, then it is assigned the same label as theirs; otherwise, it receives a new label. A simple example is the blob coloring algorithm given in [Ballard and Brown, 1982]. More sophisticated variants maintain aggregate statistics on the current cluster and the current pixel is compared with these to determine its admission to the cluster. Because of their simplicity, these algorithms do not require special support other than that oered by the label-plane. Global techniques. These techniques examine the distributions of properties over large collections (perhaps the entire set) of pixels. Clusters in the histograms of these properties are used to assign cluster labels to the pixels. The problem is that large clusters can swamp small but important features, so more sophisticated variants sub-divide the image and compute local histograms on each subimage [Kohler, 1984], [Beveridge et al., 1989]. Apart from the label-plane itself, the primary constructs that the IUE provides to support these algorithms are the ability to eas-
Split and merge techniques may be used to subsequently re ne an existing segmentation. In general these involve checking whether adjacent regions are suciently similar under some measure of homogeneity. If so, merge them. If any region results that is not suciently homogeneous, then split the region into four subregions. The IUE provides support for these techniques with the discrete topology of IUE label plane, morphological and set operations on IUE image region 2d, and explicit graph representations via IUE digraph.
3.1.4 Perceptual Organization An important task of intermediate-level vision is to make explicit the regularities in the sensory input (low-level features). This process is often referred to as perceptual organization, and these regularities, termed primitive structure by Witkin and Tenenbaum [1983], include perceived edges and regions, collinearity, parallelism, symmetry, etc. Their importance stems primarily from two factors: 1) they emerge from the perceptual process without recourse to knowledge of the objects present in the scene, and 2) they persist essentially unchanged through the later stages of the visual task. In practical terms, this means that structure is, or should be, computable directly from sensory primitives without relying on speci c, a priori knowledge and that what is computed is useful for later, high-level interpretation tasks.
IUE_edgel_chain_2d IUE_image_curve_2d
IUE_image_line_segment_2d
IUE_sampled_image_curve_2d
IUE_pixel_chain_2d
IUE_curve_feature
IUE_line_segment_edgel_2d IUE_edgel
IUE_edgel_2d IUE_point_edgel_2d
1
IUE_image_edge
1
IUE_image_face
1
IUE_image_feature_collection IUE_image_region_2d
IUE_image_feature
IUE_image_region
4
IUE_arrow_junction IUE_image_region_3d IUE_l_junction
1
IUE_image_vertex
IUE_segment_junction IUE_t_junction IUE_y_junction IUE_point_cluster_2d
IUE_perceptual_group
IUE_perceptual_group_2d
IUE_point_feature
IUE_image_point_2d
IUE_segment_group_2d
4
IUE_discrete_image_surface IUE_surface_feature IUE_parametric_image_surface
Figure 5: The rst few levels of the image feature class hierarchy. Perceptual organization or grouping (as it is also known) may be thought of as de ning a structural compatibility relation on a set of symbolic, geometric entities such as points, edges, curves, regions, surfaces, etc. Such entities have been termed tokens Stevens and Brookes [1987], among others. The relation is often higher-order and the eect of the grouping operation is to map each set of sensory primitives, which satisfy the relation, to a corresponding percept |cf. Dolan [1995]. The percept makes explicit the precise structural relation that holds for the set, and it may be either an aggregate object or a new simple object. For example, if a set of points is suciently compatible with respect to a collinear measure, they may give rise to the perception of a line. On the other hand, a set of points lying close to one another may exhibit no clear, simple structure and so may result in an aggregate percept like a point cluster. In the IUE, there are a number of subhierarchies of image feature classes, which can represent simple uni ed structures including: IUE point feature, IUE edgel, IUE curve feature, IUE image region, and IUE surface feature. There are also sub-hierarchies
of aggregate classes, especially those deriving from IUE perceptual group, which is a type of part-instance-network, and from IUE topology node. Some of these sub-hierarchies are shown in the class hierarchy diagram of Figure 5. The subclasses of perceptual group include certain prede ned aggregations, which are common and useful to IU such as IUE segment pair 2d, IUE parallel segment group 2d, and IUE point cluster 2d. Topology classes allow various objects to be stitched together by their shared boundaries. For example, a set of curves may meet in a point, or two abutting regions may share a common curve. The corresponding topological constructions are respectively: a set of edges with a common vertex, and two faces with a common edge. Such representations can be particularly powerful when organizing heterogeneous sets of objects. The grouping computation itself may be decomposed into a number of sub-tasks including: subset selection, predicate application, and grouping action. See for example the work of Boldt et al. [1989], Saund [1992], and Horaud et al. [1990] for variations on this theme. Subset selection is con-
cerned with generating subsets that are potential groups under the given grouping relation. This task will be examined in some detail below. Once a candidate subset has been computed, the application of the structural predicate determines whether or not the particular grouping relation holds for that subset. Equivalently, it determines if the set constitutes a group. The grouping action is mainly concerned with generating a new feature (aggregate or simple), if the relation holds. This new feature embodies the resulting percept. The IUE provides a number of constructs to aid the user in the subset selection task. The primary classes of interest are IUE image feature collection, which is a container with order and spatial semantics to hold the superset of tokens and IUE set to hold the current candidate set. Not surprisingly, the IUE supports a number of strategies for performing subset selection. Among these are: Enumeration. Enumerate all subsets up to some maximum cardinality of interest. Obviously, this is exponential with respect to the subset cardinalities, and so is impractical except for very small candidate sets. Nevertheless, the IUE provides iterators on sets and imagefeature-collections which make this task quite easy. For example, to enumerate all subsets of cardinality 3, one has only to create 3 iterators a, b, c on the collection of tokens and step them progressively through the collection. The code fragment below shows how this might be done for a collection of points. Locus query. Every image-feature-collection has a pointer to an optional spatial index. An IUE spatial index is a structure into which each element of the collection may be registered, according to its spatial layout, so that it may be eciently retrieved by location. This is very useful construct for perceptual organization, because often in grouping proximity is an important cue|i.e., tokens which form part of an integral structure are likely to lie close to one another. Thus, to retrieve all tokens within some distance of a point, one has only to invoke the shape-query method using a circular disk. Eectively, this method takes any spatial-object with the same dimensionality as the index, intersects this shape with the index to identify which cells are hit, and returns the union of all tokens indexed by the hit cells. Spatial indices will be available in V2.0. Graph search. For a set of tokens to be compatible, they will in general exhibit pairwise
IFC; IUE_set ; IUE_image_feature_collection < IUE_point_2d >::iterator
a, b, c;
for(a=IFC.begin(); a!=IFC.end(); ++a)
{ candidates.insert(*a); for(b=a, ++b; b!=IFC.end(); ++b)
{ candidates.insert(*b); for(c=b, ++c; c!=IFC.end(); ++c)
{ candidates.insert(*c); // use candidate set
>
} } }
There are conditions under which each strategy may be preferable, and it is up to the user to decide when to use one or another. The strength of the IUE is that it provides the user with options.
3.1.5 Topology
The use of topological concepts in object representation extends back to the earliest stages of computer vision research. For example, Waltz showed that a three dimensional interpretation of a line drawing could be achieved by propagating topological and geometric constraints on the edges and vertices of the drawing [Waltz, 1975]. However, this extensive body of topological machinery has not had much impact on modern representations used in object recognition. The key reason is that standard boundary segmentation algorithms, such as the Canny edge detector, are not robust at recovering correct junction topologies from real images with complex illumination, dense surface textures and signi cant object oc-
clusion. Under these circumstances, the topology of the extracted image features bears little relation to the surface topology of the underlying objects. This fragmentary representation is to be expected, since the signal properties of boundary intensity discontinuities are not well-represented by a step edge model near junctions and in regions of hightexture density. However, extensions of the Canny algorithm and local topological repair mechanisms have produced much more reliable topological descriptions in recent years. For example, see the work of Rothwell, et al. [1995]. Junction connections at projected object vertices and at occlusion \T" junctions are better preserved, because the strict Canny step edge edgel model is relaxed near such image features. It is therefore now feasible to recover signi cant portions of a projected object boundary. These improved edgel topology construction algorithms have been implemented in a C++ topology representation which is nearly identical the IUE topology speci cation2 These algorithms support the recognition of planar objects based on invariants. This invariant recognition system, called LEWIS3 [Rothwell, 1995], uses the topological connection between features to reduce the combinatorial cost of selecting groups of invariant features. For example, a group of ve line segments in an image provides two projective invariants which can be used to index an object in a model library of objects. Relying on the edgel topology, these feature groups are extracted from object boundaries with cost proportional to the number of line segments, , rather than 5 which would be the case for exhaustive combination of features. Consider the extracted boundary of a bracket shown in Figure 6. N
N
The segmented outer boundary of the bracket is processed by tting the edgel chains with straight line segments. Not all of the boundary will be well-approximated by straight lines due to background texture, shadows and the visible thickness of the bracket. However, most of the missing 2 At GE-CRD, a C++ environment for image understanding research, called TargetJr, has been under development over the last ve years. TargetJr is an evolution of GEOMETER, a Lisp environment developed at CRD and UMass in the mid-1980's. It is planned to migrate the code from TargetJr to the IUE during 1996. 3 Information on LEWIS may be obtained from the MORSE web site at http://www.inria.fr/ robotvis/personnel/crothwel/morse/morse.html.
Figure 6: A segmentation producing a closed region from a sequence of edges, i.e., a one-chain. The edges shown here are either edgel chains or tted line segments. boundary will still be topologically connected by edgel chains. Line segments are merged if they are nearly collinear with adjacent line segments in the sequence. Thus, a path following algorithm can recover the ve line sequences with low combinatorial cost. This application provides a good illustration of the use of topology classes in the IUE. The primary algorithm is the extraction of closed regions from a network of IUE edge(s). The same algorithm can be used in a slightly modi ed form to nd long, connected boundaries even though they are not cycles (closed chains). It can be the case that a boundary is not closed due to low image contrast across the boundary. In this case, the next best grouping principle is to use long one-chains. The region extraction algorithm we describe below uses IUE classes and methods, and uses Standard Template Library (STL) iterator classes to access sequences and sets. The three IUE classes involved in the algorithm are:
IUE vertex. A 0-dimensional entity which rep-
resents the junction of curves. A vertex has a location in space but does not specify the dimension of the space in which it is embedded. IUE edge. A 1-dimensional entity which represents a portion of the boundary of a sur-
to_vertex
V1
current_edge
plus from_vertex
minus
ccw_edge
Figure 7: An edge with the sorted edges at each vertex. face. In a consistent topological structure, two edges intersect only at a vertex. IUE one chain. A 1-dimensional entity which is a connected chain of edges. A closed onechain is called a one-cycle and represents the boundary of a surface region. The de nition of IUE vertex includes the requirement that the edges incident on the vertex are sorted with respect to their orientation. That is, the edges are projected onto a plane passing through the vertex and then the angular orientation of the projected edges provides an ordering for sorting the edges, as shown in Figure 7. In our current example, the edges represent the topology of planar edgel-chain(s) so the edges can be sorted directly in terms of their tangent direction at a vertex. The goal of the algorithm is to operate on a set of connected edgel chains, as shown in Figure 6 and extract all the closed regions, or one-cycle(s). These regions can then be eciently ltered for the construction of invariant feature groups, but we will not discuss this later step. The region building algorithm requires the notion of edge direction, since an edge can be traversed in two directions in adjacent regions as shown in Figure 8. To account for this direction state we construct a subclass of IUE edge, called IUE directed edge as follows. class IUE_directed_edge : public IUE_edge
{ public: // A constructor from IUE_edge IUE_directed_edge(const IUE_edge&); boolean plus_dir; boolean minus_dir; };
If both the boolean variables, plus dir and minus dir are set to false, the directed edge is con-
V0
Figure 8: An edge is traversed in two directions in a connected network of regions. sidered to be fully used. The central loop of the algorithm is as follows: 1. Select an unused edge 2. For this current edge: (a) Trace through connected edges until the path returns to the current edge or terminates. (b) Each edge added to the path is marked as used for the direction of traversal encountered on the path. 3. Construct an output region, i.e, a closed IUE one chain, from the completed path. It is assumed that there is an initial set of edges from the segmentation which are written into edges. The following code example will be made clearer with the de nition of two auxiliary functions. The initialize path method (de ned in Figure 9) sets up the initial edge on the path to be followed for the potential region boundary. Note that a directed edge represents two possible directions along which a path can be traced. The method choose edge direction (de ned in Figure 10) establishes the direction for the next edge chosen on the path. If the proper direction of following the edge is used then the function returns NULL. The trace path method (de ned in Figure 11) nds the next counter clock wise edge that has not been used. The edge will be incident on to vertex and its other vertex is connected to to vertex through a single edge. This function sets the from vertex and to vertex pointers to that next edge. The method get ccw edge recursively searches around to vert to nd an unused edge.
boolean initialize_path(IUE_directed_edge* current_edge, IUE_vertex*& start_vertex, IUE_vertex*& from_vertex, IUE_vertex*& to_vertex)
{ if(current_edge->plus) //if the current edge is positive direction
{ start_vertex = current_edge->v0; from_vertex = start_vertex; to_vertex = current_edge->v1; current_edge->plus = false; //Mark positive edge direction as used return true;
} if(current_edge->minus) //if the current edge is minus direction
{ start_vertex = current_edge->v1; from_vertex = start_vertex; to_vertex = current_edge->v0; current_edge->minus = false; //Mark minus edge direction as used return true;
} return false;
}
Figure 9: De nition of initialize path. IUE_vertex* choose_edge_direction(IUE_vertex* from_vertex, IUE_directed_edge* edge)
{ IUE_vertex* to_vertex = NULL; // check for plus direction if(from_vertex == edge->v0)
{ if(edge->plus)
{ to_vertex = edge->v1; edge->plus = false; return to_vertex;
} else return NULL;
} if(from_vertex == edge->v1)
{ if(edge->minus)
{ to_vertex = edge->v0; edge->minus = false; return to_vertex;
} else return NULL;
} return NULL;
}
Figure 10: De nition of choose edge direction.
IUE_directed_edge* trace_path(IUE_directed_edge* current_edge, IUE_vertex*& from_vertex, IUE_vertex*& to_vertex)
{ IUE_directed_edge* next_directed_edge = current_edge; while(next_directed_edge = to_vert->get_ccw_edge(next_directed_edge))
{ // find the edge that goes from to_vertex to v0 or v1 of next_directed_edge // v0 or v1 of next_directed_edge is returned, and the plus or minus of // next_directed_edge is marked correctly. IUE_vertex* v = choose_edge_direction(to_vertex, next_directed_edge); if(v)
{ from_vertex = to_vertex; to_vertex = v; return next_directed_edge;
} } // this should not happen return NULL;
}
Figure 11: De nition of trace path. The main function for growing regions, create regions, is de ned in Figure 12. The edges are edgel chains with topological connections at chain junctions (Vertices). The original edge set is copied into a set of IUE directed edges to provide the marking state needed to follow the boundary paths. The algorithm as stated does not account for bridges. A bridge is an edge which is traversed twice in a region. The edge, eb, shown in Figure 13, is a bridge. Bridges can be easily detected as directed edges which appear in a region with both the plus and minus directions used. Such edges can be eliminated new regions constructed. To illustrate the operation of create regions consider the set of edges in Figure 13. The graph shows a number of edges and vertices which are connected so that regions can be constructed. Suppose that the rst edge in the set edges is e0. The rst counter-clock-wise edge from e0 is e1. Edge e1 is explored once in each direction yielding a bridge. Edge e2 is selected as the next unused ccw edge leaving e0 and yields the closed path, { 0 ! 1 ! 2 ! 3 ! 4 ! 5 ! 6 ! 7}. e
e
e
e
e
e
e
e
3.1.6 Stereo While stereo sensors are not included in the current IUE release, they provide an example which can help highlight some of the existing classes as well as the rationale for some
e4
e3
e2
e1
e0
e5
e6 e7
eb
Figure 13: An example segmentation to illustrate the construction of regions. of the sensor hierarchy design. In the IUE there are 3 main classes which might be associated with stereo data: subclasses of IUE image, a subclass of IUE stereo sensor, say IUE edge based stereo sensor, and its associated sensor-model, IUE edge based stereo sensor model. Supporting these would be various base classes, IUE coordinate systems, IUE coordinate transforms, IUE lters, and IUE image features plus image-feature extraction routines.4 4 The sensor hierarchy and supporting object are currently being revised to re ect the changes in other
IUE_set create_regions( IUE_set& edges)
{
ref_inf_edges(); // Vertices on the path IUE_vertex* start_vert = NULL, from_vert = NULL, to_vert = NULL; // If the initial edge is valid then begin tracing if(initialize_path(current_edge, start_vert, from_vert, to_vert) == false) edges.remove(current_edge); // this means both plus and minus are used else // current_edge is used as a seed edge
{ //Add the first edge to the region IUEi_insert(region_edges, current_edge); while (start_vert != to_vert)
{ // // // //
trace the next edge that will go from to_vertex to a vertex that is connected to current_edge by a single edge if successful, from_vertex and to_vertex will be moved to next edge in the path, and that edge will be returned.
current_edge = trace_path(current_edge, from_vert, to_vert); if(current_edge) IUEi_insert(region_edges, current_edge); // Insert subsequent edges else
{ cerr