To motivate our grouping process, we observe in an example due to Bregman 10] (see figure 1), that the grouping ... After Bregman 10]. ...... 31] Alex P. Pentland.
Duality in Perceptual Organization: Grouping vs. Shape Decompostion Jonas August
Kaleem Siddiqi
Yale University
Yale University
Department of Electrical Engineering
Department of Computer Science
Steven W. Zucker Yale University Department of Computer Science
February 1998
Abstract Bounding contours of physical objects are often fragmented by other occluding objects. This creates the need for perceptual grouping, or the association of those fragments corresponding to the same object. Our guiding principle suggests that those fragments should be grouped whose fragmentation can be accounted for by a virtual occluder, and we de ne the gap skeleton as a representation of this occluder. A dierent perspective on gap skeleton is then obtained in the domain of shape decomposition, where ligature emerges as the analog to gap skeleton. This leads to a new de nition of a limb as a formal part, and suggests that both grouping and shape decomposition share a common basis.
1 Introduction When we look to our visual world and try to recognize objects in it, we are immediately confronted with the basic diculty that nearby objects obscure our view of distant ones. If a key component in our description of objects derives from their boundary, then this phenomenon of occlusion forces us to piece together these shattered contours. To de ne an appropriate method for associating contour fragments, in the absence of higher level knowledge about the objects themselves, is the problem of grouping. Grouping is a long-standing but important problem in computer vision, in no small part because there is little consensus on its precise nature. For example, where grouping is seen as an intermediate process taking edges to object descriptions supporting recognition, both the reconstruction of disjoint and overlapping closed curves have been attempted [12, 14]. Others have viewed the generation of intermediate-level objects as a problem requiring a process which partitions an image based on regional homogeneity with respect to various image statistics such as colour or texture [25, 30]. In this case, processing is not based on contour at all. This investigation into grouping has been in uenced signi cantly by two bodies of work. The rst is related to the skeleton, a frustratingly subtle but classical shape representation, which began with Blum [6, 7, 8, 9], and was formalized by Calabi and Hartnett [11]. The skeleton has provided a framework for the grouping geometry I have considered. The second body of work is on curve evolution [23], and has been signi cant in supporting our computations [29, 36]. To motivate our grouping process, we observe in an example due to Bregman [10] (see gure 1), that the grouping process which facilitates the recognition of the \B's" is coincident with the perception of an illusory occluder. Indeed, when the conditions of the gure are such that the no illusory occluder is visible, the grouping does not occur. In this paper we consider exploiting this necessary condition for the grouping in this example: evidence of occlusion at the endpoints and an empty region in which there could have been an occluder. The lesson from this example is that fragment grouping and occluder description are related. To understand why, we observe that that long-distance perceptual grouping is a 1
Figure 1: (left) Nine scattered islands of gure fragments with no apparent structure. (center) The mere removal of certain curves releases the endpoints for interaction [21, 27], leading to the perception of three identical and familiar objects behind illusory white patches. (right) Three objects behind solid occluders. Note the similarity to the illusory patches. After Bregman [10]. kind of inverse to physical occlusion:
Principle 1 (Perceptual Occlusion) (i) Forward: Given a collection of objects distributed
in space, occlusion will arise generically under projection onto images. The result is a composite bounding contour which, by transversality [19, 22], will contain discontinuities at the occlusion points. Consequently, the bounding contours of occluded objects may be fragmented in the image. (ii) Inverse: Suppose we are given a collection of contour fragments. Long-distance perceptual grouping is the association of fragments that could have been the result of the occlusion of a single object by a generic occluder.
In other words, we should seek a description of a simple occluder which is consistent with the given arrangement of fragments. To see this principle in action, consider gure 2 (bottom left). Here, the thick, vertical branch of the skeleton of the two fragments of the distant rectangle bears a resemblance to the skeleton of the occluding sausage. This substantial piece of skeleton will be called gap skeleton, for it relates endpoints through the gap left by the occluder which had caused the fragmentation. Two endpoints will be linked only if they 2
Figure 2: Grouping using the gap skeleton. (top left) Two objects with their skeletons. (top right) The nearby sausage occludes the distant rectangle and so T-junctions form. (bottom right) Because of the T-junctions, the fragments of the rectangle are \farther" than the contour of the sausage. (bottom left) The skeletons of the fragments are thin lines, while the gap skeleton is the thick vertical line. The similarity between the gap skeleton and the skeleton of the occluder suggests that the gap skeleton represents a \virtual" occluder.
3
? =
+
Figure 3: (left) A sketch of a horse by Leonardo da Vinci, after [32, p. 99] and its silhouette. Notice how the legs and tail are \geometrically" separate from the body. (right) An unfamiliar object with two apparent parts. On what basis do we make this decomposition? have a corresponding piece of gap skeleton. Furthermore, the gap skeleton can be viewed as a simple, implicit description of that virtual occluder which caused the fragmentation. Thus the gap skeleton is an intermediate representation that supports the solution of the grouping problem by relating endpoints (of contours) and describing region (of occluder). In the second section we formally describe the gap skeleton and provide examples of its computation. Seemingly unrelated is another issue in visual perception, that a given object, as described by its silhouette, say, seems to have natural components or parts. We speak of the head of an animal, a building in a skyline, the trunk of an elephant or the tail of a horse. Indeed the description of an object in terms of parts need not depend on our knowing any actual function of such regions, nor any recognition of or even prior experience with the object: we can even describe an entirely foreign object in terms of parts (see gure 3). Somehow parts arise from the geometry of the object, and the problem of shape decomposition is to discover this geometry. Although fragment grouping and shape decomposition are apparently distinct problems, one problem can be seen as a kind of dual of the other. Grouping is, loosely speaking, the task of taking a \ eld" of contour fragments, or parts, and forming units as sets of fragments, or wholes. Shape decomposition is, loosely speaking, the task of taking an object, or whole, and breaking it into its constituent parts. One dierence is that contour fragment grouping concerns curves while shape decomposition concerns regions. However, the fragment grouping problem has a regional aspect as well, for it is the region of some nearby object that causes the fragmentation of a distant contour. 4
It is interesting to consider whether this parts-to-whole/whole-to-parts duality might be more than super cial. Two principles will guide us in making this connection. For grouping, we have already mentioned the principle of perceptual occlusion, which mandates seeking the region of an implicit occluder to generate groups. For shape decomposition, we shall formulate a principle of perceptual growth, which mandates that a decomposition should be accompanied by a shared region of intersection between parts. In both problems we seek a missing region. For shape decomposition, the presence of concave corners in a silhouette can be viewed as a cue for decomposition. This follows because concave corners arise with the interpenetration of one object into another, by transversality [19], or from the specialization of function to particular regions [37]. As a result of occlusion, endpoints arise and so can be used as a clue for inducing an implicit occluder. Clearly, discontinuities of various sorts [17], or keypoints [41, 18], play a signi cant role in perceptual organization. This intriguing duality with grouping leads to the following conception of shape decomposition. Begin with the physical analogue to occlusion, the simple scenario in which two objects are brought into interpenetration ( gure 4). Note the concave corners in the composite silhouette. Its skeleton has a component, coined ligature by Blum [8], which is related to the concave corners. Observe that the ligature separates parts of the entire skeleton that resemble the skeletons of the objects before interpenetration.1 Each of the components of the silhouette can be regenerated using the maximal discs along these skeletal components, amounting to a limb and a body. What is important about this decomposition is that a natural notion of a join |an approximation of how the parts share region|arises as the intersection of the two parts. It is this join between parts that plays the role in shape decomposition that the virtual occluder plays in grouping. In summary we suggest that the solutions to the perceptual organization problems of grouping and decomposition are both mediated by the relationship between \missing" regions and visible discontinuities. In the grouping problem, we seek the occluder that caused a fragmentation, and so we desire to nd the region relating endpoints. In shape decompoBlum chose the word ligature well. According to Webster's dictionary, ligature means \Something, as a cord or wire, that ties or binds." 1
5
+
=
+
=
Figure 4: Shape decomposition via ligature. A horizontal sausage and a vertical rectangle with their skeletons (top left) are combined to form a single whole (top right). Observe the formation of the two concave corners (in grey). (bottom right) The skeleton of the silhouette has a component which is related to the concave corners and is called ligature (shown in thick black). Note that the ligature links the skeleton on the left, which resembles the skeleton of the unpenetrated rectangle, to the skeleton on the right, which resembles the skeleton of the unpenetrated sausage. By removing this ligature, the remaining skeleton has two components, each of which serve to approximate the original objects which were combined (bottom left). 6
sition, we seek an explanation of any proposed decomposition which accounts for the region of intersection between parts, and so we seek the region relating concave corners. A concluding observation expressed here is that the skeleton, a classical description of a region which relates to boundaries, plays the role of the desired mediator.
2 Fragment grouping with gap skeleton As our approach to grouping formally exploits the notion of gap skeleton, in this section we provide the requisite technical background. Although rst presented elsewhere [4], the following is included to help make explicit the formal correspondence between grouping and shape decomposition. Recall the de nition of the skeleton [8] of a closed set A R 2 in terms of maximal open discs, where an open disc D is maximal if and only if there exists no other open disc contained in the complement of A that properly contains D:
De nition 1 (Blum) The skeleton S of A is the set of centers of maximal open discs
contained in the complement of A.
In their study of the skeleton, Calabi and Hartnett introduced the notion of the projection of a point to the nearest point(s) in A [11]. The distance (x) from the point x 2 R 2 to the set A is the minimum distance from x to any point of A, or (x) , minfkx ? pk : p 2 Ag. The projection (x) is the set of points in A closest to x, or (x) , fp 2 A : kx ? pk = (x)g. The radius function r is the restriction r , jS of to the skeleton S and so r(q) is the the radius of the maximal disc at q. The projection (q) to A is the set of points at which the maximal disc at q \touches" A; typically, the maximal disc will touch A at two points. Suppose that A is the set of curve fragments to be grouped, considered as a disjoint union of traces of a nite number of simple, recti able, piecewise smooth curves. These can be either open or closed curves, and all open curves include their endpoints. We introduce the gap skeleton through the more general notion of pregap skeleton.
De nition 2 The pregap skeleton with respect to endpoints a and b is the set pre(a; b) , fq 2 S : (q) = fa; bgg: 7
pre(a,b) a
r(q)
θa
q
a
θb m b
b
gap(a,b)
Figure 5: (left) The de nition of pregap illustrated: only endpoints a and b are a distance r(q) away from the skeletal point q, so (q) = fa; bg. All such skeletal points form pre(a; b), and extend along the perpendicular bisector of the endpoints. Note that in this example the gap skeleton with respect to a and b is empty because the midpoint between a and b is not in pre(a; b). (right) Here there is nonempty gap skeleton with respect to a and b, and the radius function r obtains a local minimum at m, the midpoint between a and b. See gure 5. The following properties of pregap skeleton are proved in [3].
Proposition 1 pre(a; b) is contained in the perpendicular bisector of the line segment joining a and b.
Proposition 2 pre(a; b) is connected. Let a be the signed angle from the tangent extending out of endpoint a to the line segment ab. De ne b similarly (see gure 5).
Proposition 3 If pre(a; b) is nonempty and ja + b j is not equal to 180, then pre(a; b) has nonzero length.
De nition 3 The gap skeleton with respect to endpoints a and b is the pregap skeleton with respect to a and b provided that r achieves a local minimum in pre(a; b):
8 > :?;
if r has a local minimum in pre(a; b); otherwise:
8
Conjecture 1 The midpoint between a and b lies in pre(a; b) if and only if r has a local
minimum in pre(a; b).
Thus gap skeleton with respect to two endpoints will appear when no curve enters into the maximal disc centered between them. In other words, gap skeleton obtains when there is enough empty space between these two endpoints for this disc to t in. Intuitively, this maximal disc represents a possible simple occluder nestled between the endpoints and is a possible cause of the fragmentation there. Therefore, by the principle of perceptual occlusion the appearance of gap skeleton between two endpoints cues their linking together. Others have used the related notion of Voronoi tessellation for grouping [1, 20, 15], but in these approaches no special status is given to endpoints. Rosin has done a triangulation of a set of fragments' endpoints, where the triangulation is constrained not to cross any fragment; however, his approach leads to groupings dierent from gap skeleton and its motivation is unclear [34]. Endpoints and the discontinuities in boundary orientation that give rise to them are critical for unit formation in human psychophysics [22].
2.1 Computing gap skeleton The gap skeleton is computed using the following algorithm, which is presented in greater detail in [4]: 1. Compute the skeleton of the set of curve fragments [36]. 2. Compute the projection (q) for all skeletal points q. 3. Detect the pregap skeleton for all pairs of endpoints. 4. Search for gap skeleton, if any, within the pregap skeleton. Our method for detecting pregap skeletal points accounts for the uncertainty in the detection of endpoints of a curve. Given an endpoint uncertainty , the pregap with respect to endpoints a and b is the set of skeletal points whose projection falls within an -ball around these endpoints2 . Figure 6 depicts the process of gap detection3 . Note that as predicted the 2
For all examples the size of the -ball at the endpoints was the same.
9
Figure 6: The process of gap detection. (left) The set of contour fragments is displayed in grey and the thin lines are an approximation to its skeleton. (center) The -balls (circles) around the curve fragment endpoints and the projection (q) (dots) of each skeletal point q are overlaid. (right) The detected gap skeleton (thickened lines) is shown connected due to proposition 2. The principle of perceptual occlusion this is invoked to link each of the top and bottom pairs of endpoints. gap skeleton lies on the perpendicular bisector of the line joining two endpoints. Figure 7 illustrates how the gap skeleton explains a visual illusion [24, p. 168, adapted from Kohler]. The skeleton is actually computed as a set of rst-order shock branches obtained from outward evolution of the boundaries of the fragments [36]. Shocks of higher order are not shown, giving a broken appearance to the skeleton. 3
Figure 7: (left) Arrows or bow-ties: illusion by Kohler. (right) The detected gap skeleton (dark lines) \ties" each of the three bow-tie pairs; apparently observers do too.
10
Figure 8: Virtual occluders. Curve fragments are in grey. (far left) A disc of radius r(m) (in black) placed where the radius function takes on a local minimum m in the gap skeleton (thick line) is a \least commitment" occluder, or a lower bound on a virtual occluder. (center left) An upper bound on a virtual occluder (black region) is described by a disc at each gap skeletal point q of radius r(q). (center right) The virtual occluder created using a spline-like curve (white) which only obeys the upper bound (without constraint (d)). (far right) The virtual occluder created using a spline-like curve obeying both the upper and lower bounds (with dashed circles expressing constraint (d)).
2.2 Virtual occluders With the notion of a virtual occluder we now return to our motivating principle of perceptual occlusion. One can exercise a great deal of freedom in the choice of a virtual occluder based on the gap skeleton. This section is meant to suggest some ways of thinking about such an object. Since gap skeleton is a subset of the skeleton, there are maximal discs along which it can be used to characterize a virtual occluder. A \least commitment" virtual occluder is just the maximal disc at the point along the gap skeleton at which the radius function takes on its local minimum (see gure 8, left). This view in some sense provides the most simple shape for an occluder, for as Blum described [8], a disc is the result of an isotropic growth from an initial point. I call this single disc a lower bound on a virtual occluder. 11
Figure 9: (left) Three objects behind many occluders. (center) Corresponding endpoints are linked behind the true occluders by gap skeleton. (right) The (upper bound) virtual occluders approximately cover most of the true occluders. There is also a sense in which the gap skeleton can constrain a virtual occluder. The maximal disc at a given gap skeleton point q guarantees that there is no contour fragment closer to q than r(q), so the union of maximal discs along a piece of gap skeleton is a region of the plane inside of which no contour fragment lies. If we take the union of such regions for all pieces of gap skeleton, then we arrive at a whole region U of the plane. I call this region the upper bound on virtual occluders, for the boundaries of the occluders that caused the fragmentations are those which crossed the curve fragments at the endpoints and plausibly lie somewhere in this region U , as U does not to contain any curve fragments and can be regarded as empty space. Figure 9 strikingly illustrates how the upper bound virtual occluder is present in a fragmented gure and approximately contains the true occluder. We combine these conceptions of lower and upper bounds in the following construction for virtual occluders. Note that the result is consistent with the constraints in the image: 1. Divide U into its connected components U 1 ; U 2; : : : ; U n. 2. For each connected component U i generate a virtual occluder as the region inside a closed spline i which satis es the following constraints: (a) i passes through all of the endpoints bordering on U i, 12
(b) i does not cross itself, (c) i is contained in the upper bound U i, and (optionally) (d) i contains all of the lower bound maximal discs that are contained in U i. The advantage of including the constraint (d) is that the resultant virtual occluder will more clearly cross the endpoints bordering U i, because i will necessarily be tangent to the lower bound maximal discs at the endpoints ( gure 8, right). Without this optional constraint, i could even be tangent to a curve fragment at an endpoint, thus describing a non-transversal occlusion. Unfortunately the optional constraint adds complexity. It is an open question how the uniqueness of i depends on the choice of spline or the use of the optional constraint. In summary, there is clearly some degree of exibility in making implicit virtual occluders explicit. Sometimes they are perceptually unique, while at other times they may not be unique nor even salient. The construction described in this section is sucient but not perceptually necessary. Additional research on the perceptual viability of virtual occluders is required.
3 Shape decomposition via ligature The shape decomposition problem asks: given a silhouette, are there natural perceptual components of it? If so, what are those components and what is their mathematical description? Here we attempt a somewhat speculative and early study of how the skeleton can induce a natural way of nding the parts of a shape. Our intent is to introduce a view of decomposition by exploiting a formal analogy between gap skeleton and a piece of the skeleton of a shape called ligature. This view will provide a dierent perspective on the gap skeleton, suggesting that those computations may be more universal than would be realized from the smaller context. In addition, it also leads to a new de nition for certain types of parts, namely limbs and tails, that have been especially problematic for other theories.
13
3.1 Background Representing an object in terms of parts is useful computationally for recognition. Often we do not see an entire object because it is partially occluded or some of it is missing for various other reasons. Objects can also be articulated and so various regions can change in their relative position. In addition, localized regions can be damaged or altered. Storing all these possible manifestations of an object is immensely impractical without the use of parts, each of which can be described independently. When an articulated object moves, it is chie y the con guration of the parts that changes, not their shape; occlusion and alterations aect only particular parts, not the entire object representation [19, 35]. Part representations can therefore support more robust recognition. In computer vision, primitives or boundary-induced divisions are two ways of de ning the parts of a shape. Some classes of shape primitives that have been considered are superquadrics [31] and geons [5], in which parts are seen as elementary units from a xed and standard table of geometrical atoms. Boundary-induced divisions have been proposed at orientation discontinuities [2, 16] and negative curvature minima [19]. Region-based part de nitons in terms of skeleton junctions [9], second-order shocks [23], neck and limb partlines [35], and degenerate waves [39] have also been proposed.
3.2 Parts from growth Perhaps our motivation for parts needs a biological basis. Siddiqi et al. suggested an \ecological" characterization of the part problem with a notion of form-from-function: objects specialize function through their volumes, and this specialization is manifested physically. Sharp changes in material properties often coincide with \sharp surface changes", which project to high-curvature regions in a silhouette [37]. This form-from-function principle [35] was used to motivate the neck/limb proposal, which dictates a strict (i.e. disjoint) partitioning of a shape. Here we argue against the notion that parts should be disjoint sets by considering how objects grow, an issue for shape also considered by Leyton [26] and Thompson [40]. In gure 10 [38, p. 171], the early development of the amphibious plant Proserpinaca palustris 14
Figure 10: The development of leaf forms in Proserpinaca palustris, after [38, p. 171] (from left to right). Observe that the outward branching is accompanied by concave corners. clearly shows an outward branching from an initial \body". Notice how negative curvature zones are present at the bases of the branches. A single view of the ower Impatiens balsamina reveals a similar correspondence of shoots and negative curvature zones ( gure 11 [38, p. 191]). To understand the way these \protrusions" arise, consider the earliest stages of growth. In gure 12 [13, p. 603], an internal \growth seed" in a young willow suddenly develops as a branch root which pushes through the body and leaves a legacy of negative curvature zones. This source of growth makes the joining of the root to the body explicit, suggesting that an explanation of parts in terms of a shared region is more natural than that of carving a shape into disjoint sets which abut perfectly. Indeed we need not restrict ourselves to youth itself, we only require evidence of it: knots in a log of wood were once the \birthplaces" of tree branches. In this examination of several examples of plant growth, a relationship between parts and growth has emerged, where part decomposition is a kind of inverse to physical growth:
Principle 2 (Perceptual Growth) (i) Forward: Suppose an object under growth is given.
The specialization of a region of the object [37] is typically accompanied by rapid and directed outward growth. The result is a silhouette whose boundary has negative curvature zones. (ii) Inverse: Suppose the silhouette of a mature object is given. Part decomposition entails the association of negative curvature zones with an internal region which could have been the source of an outward growth.
15
Figure 11: Longitudinal section of a oral bud of Impatiens balsamina, after [38, p. 191]. Note the presence of negative-curvature zones at the bottoms of the shoots.
Figure 12: The development of a branch root in a willow, after [13, p. 603]. (left) Within the core, observe the source of the root as a specialized dark region. (center) The root rapidly grows towards the top-left, displacing the epidermis. (right) The resulting silhouette with sharp, negative curvature zones indicated with arrows. 16
Observe how this principle describes a weak form of occlusion: the negative curvature zones in the boundary of a silhouette are just smoothed versions of concave corners, and these concave corners are really degenerate T-junctions, degenerate because the contrast between objects in an occlusion relation is lost. The objects which make up a composite whole as a silhouette are thus viewed as being at the same depth, unlike objects in occlusion. However, the principle of perceptual occlusion also generates an implicit region and thus the two principles have the same form. The principle of perceptual growth leads to an equivalent account of the decomposition of interpenetrating objects in terms of a \shared region". For instance, consider the silhouette of an arrow in your chest: the region in your body taken up by the arrow is exactly the region which is the source of your suering (\sharing" is not always equal). Figure 4(top) shows such an interpenetration. Zhu and Yuille have also made use of a similar idea of overlap at joints of bones [42]; however, they use a distinct variant of the skeleton to articulate this notion and, like Blum and Nagel, they consider all skeletal-like branchings as indicative of new parts.
3.3 Blum's ligature The above growth principle can be used to generate a skeleton-based de nition of limbs and tails. To simplify the discussion, we will rst consider the situation where all the negative curvature minima which could trigger a part are actually concave corners. Now for a given shape, consider those skeletal points whose maximal discs touch these concave corners; Blum called these skeletal points ligature. What follows is a rephrasing of Blum's de nitions in set-theoretic terms [9]. Consider a shape whose skeleton is S . Those skeletal points which touch only one concave corner form semiligature.
De nition 4
Semiligature with respect to concave corner a is:
SL(a) , fq 2 S : (q) = fa; pg; where p is not a concave cornerg: See gure 13. Note that the semiligature with respect to a given concave corner may not be a connected set. I will denote each of the connected components of semiligature with a 17
SL(a) a a b
FL(a,b)
Figure 13: Ligatures illustrated. (left) The grey piece of skeleton is the semiligature with respect to concave corner a. Observe that the maximal discs centred at the ends of SL(a) touch a and the top boundary (non-corner). (right) The thick piece of skeleton is the full ligature FL(a; b) with respect to concave corners a and b. The maximal discs centred at the left and right ends of FL(a; b) touch a and b. superscripted number, so that, for example, the second connected component of SL(a), if it exists, will be called SL(a)2 . Full ligature comprises those skeletal points which touch two concave corners.
De nition 5
Full ligature with respect to concave corners a and b is:
FL(a; b) , fq 2 S : (q) = fa; bgg: Note that FL(a; b) = FL(b; a). Compare the de nitions of full ligature and pregap skeleton: by thinking of a \set of contour fragments" instead of a \shape" and substituting \endpoint" for \concave corner" into the above de nition, we obtain the de nition of pregap skeleton. This formal analogy suggests that propositions 1, 2, and 3 which characterize the shape, connectedness, and length of pregap skeleton have equivalents for full ligature, as gure 13 suggests.
3.4 Limbs from full ligature To see how the notion of ligature leads to a skeleton-based limb, consider gure 14. First 18
FL(a,b) SL(a) a SL(b)1
+
2 SL(b)
b
=
Figure 14: Limb from full ligature. An ellipse with a protruding limb (top left) is shown with its skeleton (top right). Full ligature is in bold, semiligature is in grey, and the remainder of the skeleton is depicted with thin lines. (bottom) The shape is expanded into its constituent parts. (bottom left) Here the two semiligature branches on the left, SL(a) and SL(b)1 , remain with the ellipse to give an \upper bound" to the original ellipse, while the semiligature branch SL(b)2 on the right is associated with the stick (bottom center). Thus in the formation of a whole object, information is lost about its constituent parts, leading to the join region (shown in grey, bottom right). 19
observe how both semiligature and full ligature make up a sizeable part of the skeleton of this shape: ligature makes explicit how the concave corners exert a signi cant in uence over a shape through its interior. The full ligature FL(a; b) with respect to concave corners a and b is a horizontal line segment that represents the linking of the ellipse on the left to the stick on the right. Otherwise full ligature has no role in describing the shape; indeed, full ligature is constituted by skeletal points related to only a single pair of boundary points. On the other hand, semiligature does relate to a signi cant extent of boundary: witness how maximal discs along SL(a) touch not only corner a but also a strip of the left side of the ellipse near the center. Thus full ligature divides the skeleton into two separated parts, leading to skeletal descriptions of the parts themselves, after the full ligature itself is discarded. Now we can separately regenerate the parts that those subskeletons represent using maximal discs whose radii are stored in the radius function. For the ellipse we observe that all of the boundary of the ellipse is recovered, except an arc between a and b which arises as a virtual \upper bound" on the true shape (cf. section 2.2). The recovered stick also has all of the contour of some \undamaged" stick except for that lost inside the ellipse, which is regenerated as a kind of knobby end. When we put these two regenerated forms back together we get the join as their intersection; this join can be viewed as a physically shared region where growth began, corresponding to the principle of perceptual growth. Note how the bulk of the join is due to displacement by the stick which grew out from the ellipse. Finally, since the radius function is increasing along the full ligature from the stick to the ellipse, the stick is thus slave, child, or indeed limb to the larger ellipse as master, parent, or body, expressing the idea of a hierarchy of shape [23]. If the radius function has a local minimum (second-order shock), then the parent-child relationship is lost and we are left with an \equal" decomposition at a neck.
3.5 Tails from semiligature Unfortunately the notion of a limb is too restrictive. For example, the tail of a kangaroo is perceptually triggered by a single concave corner [37]. This phenomenon requires a formal description. To construct an appropriate skeleton-based de nition of a tail, one might 20
SL(a) a
=
+
Figure 15: A tail from semiligature. An object (top left) and its skeleton (thin lines) with semiligature (grey) with respect to corner a (top right) is shown. The lone semiligature triggers the breaking of the skeleton into parts (bottom left), where the corresponding regions are generated using maximal discs. (bottom right) The join region (grey) is de ned as the intersection of the regenerated regions. consider letting a tail be induced by a branch of semiligature, as it relates skeletal points to a single concave corner. Unfortunately a tail is more subtle, for full ligature branches will typically arise with abutting semiligature components with respect to the same concave corners ( gure 14). Nonetheless, the use of semiligature for de ning a tail is promising (see gure 15).
3.6 Computing ligature To compute ligature, we rst need the notion of boundary-axis ratio. Imagine walking along a skeleton branch. Observe that the corresponding maximal discs touch the object's boundary to your left and to your right. Let us focus our attention on the part of the boundary to the left, say. For every step, a, that you take along the skeleton, there corresponds (via maximal discs which touch the boundary) a piece of boundary of length b. We can compute the ratio, ab , of boundary length corresponding to the piece of skeleton, or axis. The limiting ratio, as the steps along the skeleton become arbitrarily small, is known as the 21
boundary-axis ratio, or bar, and was originally described by Blum and Nagel [9]. The bar is de ned both to the left and to the right along a skeleton branch (where left and right are well-de ned by \walking" along the branch). Note that the bar is nonnegative. Semiligature is the subset of skeletal points for which one of the left or right bar is zero; full ligature is the subset of skeletal points for which both bars are zero. For real computations, a threshold on bar is introduced. Note that ligature, because it describes only a small length of the boundary, does not contribute much to the \shape" of the object, but rather it links the parts of the shape together. By having access to the boundary points which correspond to a given skeleton point4, we can compute the left and right bar directly. To compute the left bar, for example, one can compute the sequence of distances between the boundary points on the left and the sequence of distances between skeleton points, and then take their ratio. Unfortunately, this can lead to wildly-behaving results because (1) boundary points can be repeated in the sequence and (2) distances between skeleton points can vary over several orders of magnitude. Uniform smoothing of these ratios, say by a straightforward moving average, fails to improve the situation much because (a) the ratios have already been taken and (b) again the spacing between skeleton points varies a great deal. Our approach was to separately smooth the sequences of distance increments of the boundary and the skeleton, and only then take their ratio. In addition, the scale of smoothing was chosen to increase monotonically5 with the radius function. Dealing with the irregular spacing of the skeleton points required integrating the smoothing kernel (Gaussian) over each skeleton increment. This was done using a Chebyshev approximation to the complementary error function [33, p. 220]. We were particularly interested in obtaining reasonable results near the endpoints of a skeleton branch|near junctions|and therefore we could not use zero-padding [33, 540 .] because it leads to distortion near the ends of a sequence to be The skeleton used for part computations is based on the Voronoi diagram of the points of the boundary and provides this information [28]. 5 We rst experimented with setting the smoothing scale proportional to the radius function. Because the smoothing seemed to increase too much with radius function, we then made the smoothing scale proportional to the square root of the radius function. In addition, we set a lower bound on the smoothing scale, at ve pixels. 4
22
smoothed. Instead we used neighboring skeletal branches to extend the information about boundary and skeleton increments to the size of the smoothing kernel. To see how this computation performs, consider gure 16. Traversing the middle nger (branch 1) from the tip, the top-left corner graph shows that both the left and right bars are near one until the base of the nger, where they fall to zero and below threshold.6 The whole piece of branch 1 from the base of the middle nger to the upper half of the palm is full ligature. When walking from the tip of the thumb (branch 7), only the right bar falls to near zero, corresponding to the boundary (on the right) between the thumb and index nger. Thus the thumb is a \tail of" the palm. The other graphs can be read similarly. After removing ligature, the leftover skeleton reconstructs to the intuitively satisfying gure on the right, where the palm and ngers are preserved and only the \binding" which \glues" them together is missing. Figure 17 shows how the computed ligature indeed corresponds to the junction area of the two \plusses". Using ligature, the junction of four branches (\plus") is rendered equivalent to the pair of junctions of three branches (distorted \plus") and so the undesirable structural instability of the skeleton is mitigated. The reconstructions of the non-ligature skeletal points (right column) reveal how both objects are composed of four parts.
4 Conclusion In this paper we have described two perceptual organization problems whose solutions are sought in the implicit regions corresponding to orientation discontinuities. We began with long-distance perceptual grouping, for which we de ned the gap skeleton. This led to an explanation of a given fragmentation in terms of a virtual occluder. We then introduced a growth principle that suggests that at least certain aspects of shape decomposition require both explicit evidence (corners), and an implicit place (early growth region). In contrast to the disjoint parts de ned in other theories, the limbs and tails introduced here have an The threshold used was :3 cos , where is the half-angle between corresponding boundary tangents. This threshold varies as we travel along a branch, and guarantees that the corresponding boundary curvature is negative. 6
23
1 2 3
4
6 5 8
7
9
1
0 0
20 40 Branch (1) length
1
20 40 Branch (7) length
2 4 6 Branch (6) length
2
1 0 0
1 0 0
1 2 3 Branch (5) length
2 BAR
BAR
2
0 0
1 0 0
20 40 Branch (4) length
20 40 Branch (3) length
2 BAR
1
1 0 0
20 40 Branch (2) length
2 BAR
BAR
2
0 0
1
BAR
0 0
2 BAR
2 BAR
BAR
2
5 10 Branch (8) length
1 0 0
10 20 30 Branch (9) length
Figure 16: Boundary-axis ratio and ligature of a hand. top left, object with numbered skeleton branches (thin lines) and ligature regions (bold). top right, reconstruction of object after removal of ligature. bottom, numbered graphs of the left (solid) and right (dotted) bars along each branch of the hand. 24
Figure 17: \Plus" symbol (top row) and its distortion (bottom row). Skeletons with ligature are in the left column; reconstructions after removal of ligature are in the right column.
25
explicit join region as the overlap of part regions, as the principle suggests. This is analogous to the principle of perceptual occlusion and its requirement of an implicit occluder. Thus the skeleton relates the \missing" regions to the visible discontinuities for both problems.
References [1] Narenda Ahuja and Mihran Tuceryan. Extraction of early perceptual structure in dot patterns: integrating region, boundary, and component gestalt. Computer Vision, Graphics and Image Processing, 48:304{356, 1989. [2] F. Attneave. Some informational aspects of visual perception. Psych. Review, 61:183{ 193, 1954. [3] Jonas August. From contour fragment grouping to shape decomposition. Master's thesis, McGill University, 1996. [4] Jonas August, Kaleem Siddiqi, and Steven W. Zucker. Fragment grouping via the principle of perceptual occlusion. In Proceedings, 13th International Conference on Pattern Recognition, volume 1, pages 3{8, 1996. [5] Irving Biederman. Recognition-by-Components: A Theory of Human Image Understanding. Psychological Review, 94:115{147, 1987. [6] Harry Blum. An associative machine for dealing with the visual eld and some of its biological implications. In Eugene E. Bernard and Morley R. Kate, editors, Biological Prototypes and Synthetic Systems, volume 1, pages 244{260, New York, 1962. Plenum Press. [7] Harry Blum. A transformation for extracting new descriptors of shape. In Weiant Wathen-Dunn, editor, Models for the Perception of Speech and Visual Form, pages 362{380, Cambridge, MA, 1967. MIT Press. [8] Harry Blum. Biological shape and visual science. J. Theor. Biol., 38:205{287, 1973. 26
[9] Harry Blum and Roger N. Nagel. Shape description using weighted symmetric axis features. Pattern Recognition, 10:167{180, 1978. [10] A. L. Bregman. Asking the \what for" question in auditory perception. In M. Kubovy and J. R. Pomerantz, editors, Perceptual Organization. Erlbaum, Hillsdale, NJ, 1981. [11] L. Calabi and W. E. Hartnett. Shape recognition, prairie res, convex de ciencies and skeletons. American Mathematical Monthly, 75:335{342, 1968. [12] Ingemar J. Cox, James M. Rehg, and Sunita Hingorani. A Bayesian Multiple-Hypothesis Approach to Edge Grouping and Contour Segmentation. International Journal of Computer Vision, 11:5{24, 1993. [13] Helena Curtis. Biology. Worth Publishers, New York, fourth edition, 1983. [14] James H. Elder and Steven W. Zucker. Computing Contour Closure. In Proceedings, 4th European Conference on Computer Vision, pages 399{412, 1996. [15] M. A. Fischler and P. Barrett. An iconic transform for sketch completion and shape abstraction. Computer Graphics and Image Processing, 13:334{360, 1980. [16] Herbert Freeman. Shape description via the use of critical points. Pattern Recognition, 10:159{166, 1978. [17] Adolfo Guzman. Decomposition of a visual scene into three-dimensional bodies. In A. Grasselli, editor, Automatic Interpretation and Classi cation of Images. Academic, New York, 1969. [18] Friedrich Heitger and Rudiger von der Heydt. A computational model of neural contour processing: Figure-ground segregation and illusory contours. In Proceedings, International Conference on Computer Vision, pages 32{40, 1993. [19] D. D. Homan and W. A. Richards. Parts of recognition. Cognition, 18:65{96, 1985. [20] Markus Ilg. Knowledge-based understanding of road maps and other line images. In Proceedings, 10th International Conference on Pattern Recognition, volume 1, pages 282{284, 1990. 27
[21] Gaetano Kanizsa. Organization in Vision. Praeger, New York, 1979. [22] Philip J. Kellman and Thomas F. Shipley. A theory of visual interpolation in object perception. Cognitive Psychology, 23:141{221, 1991. [23] Benjamin B. Kimia, Allen R. Tannenbaum, and Steven W. Zucker. Shapes, Shocks, and Deformations I: The Components of Two-Dimensional Shape and the Reaction-Diusion Space. International Journal of Computer Vision, 15:189{224, 1995. [24] K. Koka. Principles of Gestalt Psychology. Harcourt, Brace & World, New York, 1963. [25] Martin D. Levine. Vision in Man and Machine. McGraw-Hill, New York, 1985. [26] Michael Leyton. Symmetry, Causality, Mind. MIT press, April 1992. [27] K. Nakayama and S. Shimojo. Experiencing and perceiving visual surfaces. Science, 257:1357{1363, 1992. [28] Robert L. Ogniewicz. Discrete Voronoi Skeletons. Hartung-Gorre, 1993. [29] Stanley Osher and James Sethian. Fronts propagating with curvature dependent speed: Algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics, 79:12{49, 1988. [30] T. Pavlidis. Structural Pattern Recognition. Springer-Verlag, New York, 1977. [31] Alex P. Pentland. Recognition by parts. In First International Conference on Computer Vision, Washington, D.C., 1987. IEEE Computer Society Press. [32] A. E. Popham. The drawings of Leonardo da Vinci. Pimlico, London, 1994. [33] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical Recipes in C. Cambridge University Press, 2nd edition, 1992. [34] Paul L. Rosin. Grouping Curved Lines. In British Machine Vision Conference, 1994. [35] K. Siddiqi and B. B. Kimia. Parts of visual form: computational aspects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(3):239{251, 1994. 28
[36] Kaleem Siddiqi and Benjamin B. Kimia. A Shock Grammar for Recognition. In Proceedings, Computer Vision and Pattern Recognition, pages 507{513, 1996. [37] Kaleem Siddiqi, Kathryn J. Tresness, and Benjamin B. Kimia. Parts of visual form: psychophysical aspects. Perception, 25(4):399{424, 1996. [38] Taylor A. Steeves and Ian M. Sussex. Patterns in Plant Development. Cambridge University Press, New York, 1989. [39] Huseyin Tek, Perry A. Stoll, and Benjamin B. Kimia. Shocks from images: propagation of orientation elements. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 839{845, 1997. [40] D'Arcy Wentworth Thompson. On Growth and Form. Cambridge University Press, New York, 1971. [41] Lance R. Williams and David W. Jacobs. Stochastic completion elds: A neural model of illusory contour shape and salience. In Proceedings, Fifth International Conference on Computer Vision, pages 408{415. IEEE Computer Society Press, 1995. [42] Song Chun Zhu and Alan L. Yuille. Forms: A exible object recognition and modelling system. Int'l J. Comp. Vision, 20(3):187{212, 1996.
29