OF NATURAL SCENES*. Terry E. Weymouth. Computer and Information Science Department. University of Massachusetts. Amherst, Massachusetts 01003.
EXPERIMENTS IN KNOWLEDGE-DRIVEN INTERPRETATION OF NATURAL SCENES* T e r r y E. Weymouth Computer and I n f o r m a t i o n Science Department U n i v e r s i t y o f Massachusetts Amherst, Massachusetts 01003
region segmentation i n t o n o n - d i s j o i n t subsets of regions, each containing the p r o j e c t i o n of an object from the scene. The data used is an i n i t i a l segmentation of the image i n t o regions of s i m i l a r color and texture [ 7 ] . An image region does not necessarily correspond to a single o b j e c t , nor is it the case that a given object w i l l u n f a i l i n g l y p r o j e c t to a single r e g i o n . Scene knowledge can be used to p a r t i a l l y overcome these difficulties through the formation of a plan which predicts the l o c a t i o n of the image projection of actual scene objects. The subimage produced by a matching process which uses t h i s plan can then be further processed to better extract the image l o c a t i o n of the o b j e c t . One p o s s i b i l i t y is resegmentation, which can be performed with greater accuracy using the l o c a l image context [ 7 ] . Specific knowledge of the surface features of the object can also be used to constrain the subimage even f u r t h e r .
ABSTRACT Among the sources of information that can be used to guide the processing of v i s u a l sensory data are the c o n s t r a i n t s implied by the geometry of the objects being viewed. Experiments have been performed using representations of t h i s type of knowledge to c o n t r o l image processing. They show how such geometric knowledge can be used to aid in the i d e n t i f i c a t i o n of object p r o j e c t i o n s in images of natural outdoor scenes. Introduction The general goal of processes operating on visual data in the VISIONS system is the construction of a symbolic representation of the o b j e c t s , and of the r e l a t i o n s among those o b j e c t s , from images of a given natural outdoor scene. Recent experiments have been designed to determine the effectiveness of the use of world knowledge in the interpretation task and to explore representations of t h i s knowledge [ 9 ] . A portion of the knowledge used in the system consists of d e s c r i p t i o n s of s p e c i f i c objects ( e . g . house, tree, grass, e t c . ) and the representation of three-dimensional s p a t i a l r e l a t i o n s h i p s among those objects [ 3 ] . This paper discusses one way in which such geometric knowledge can be used to c o n t r o l image processing. The use of geometric knowledge to guide image processing has been used w i t h some success by [ 1 ] , [ 2 ] , and [ 5 ] .
If a l l the objects in a scene were of known size, shape, and at a known l o c a t i o n and o r i e n t a t i o n , and if the scene were viewed with a known camera from a known camera p o s i t i o n and o r i e n t a t i o n , then the scene model would be e a s i l y projected to the image using the standard techniques of computer graphics: r i g i d object t r a n s f o r m a t i o n , perspective p r o j e c t i o n , and hidden surface removal [ 8 ] . In general, however, the scene model must capture u n c e r t a i n t i e s as to object l o c a t i o n and permit p r o j e c t i o n methods that are tolerant to errors in expectations of camera p o s i t i o n and o r i e n t a t i o n . Consequently, several less constraining assumptions were made.
In the experiments described here, a three-dimensional scene model which represents the r e l a t i v e l o c a t i o n and size of objects is projected to the two-dimensional Image to form a p l a n . This plan can be used to r e s t r i c t image analysis to c e r t a i n portions of the image and to t r i g g e r the a p p l i c a t i o n of procedures for the recognition of s p e c i f i c objects during image i n t e r p r e t a t i o n . The r e s u l t s w i l l be i l l u s t r a t e d using the house scene shown in Figure 1. A Spatial Plan from a Geometric Scene Model Knowledge about approximate object l o c a t i o n and size was used to i n v e s t i g a t e the e f f e c t of geometric c o n s t r a i n t s on a r e g i o n - l a b e l i n g process. The goal of the experiment was to p a r t i t i o n a •This research was National Science MCS79-18209.
supported in Foundation
part by the under Grant Figure 1. House scene used in experiments.
628
Scene and O b j e c t Model The t h r e e - d i m e n s i o n a l geometry o f a n o b j e c t i n a scene can b e c h a r a c t e r i z e d b y : location, o r i e n t a t i o n , s c a l e ( s i z e ) , and shape. Of these, shape is o b v i o u s l y t h e most c o m p l e x . Easing t h e c o n s t r a i n t on shape by g e n e r a l i z a t i o n to a sphere permits a rough d e s c r i p t i o n of l o c a t i o n , l o c a t i o n u n c e r t a i n t y , and s i z e . The l o c a t i o n o f a n object, expressed in the coordinate system o f t h e scene model, is approximated by the center of a " l o c a t i o n sphere." The radius of t h i s sphere expresses uncertainty as to location of the object center. The object size is also represented by a sphere, centered at the o b j e c t location. The radius of this "size sphere" i s r e l a t e d t o the size o f the object. Because t h e s e spheres a r e embedded in a scene coordinate s y s t e m , t h e a d d i t i o n o f a camera model p e r m i t s t h e t r a n s f o r m a t i o n o f t h e scene model (by translation and rotation) to a specific v i e w p o i n t and t h e p r o j e c t i o n o f t h a t model to the image. P r o j e c t i o n o f Scene Model
t o Image
There a r e two problems i n the projection of this three-dimensional model to the image: the projection of the location sphere and the projection of the size sphere. The p r o j e c t i o n o f t h e l o c a t i o n sphere i s straightforward. Using a perspective transformation (from a camera model) t h e c e n t e r and r a d i u s o f t h e sphere are projected t o t h e image a s a c i r c l e . Figure 2 i l l u s t r a t e s the way i n w h i c h t h i s three-dimensional object model can be p r o j e c t e d o n t o any image ( u s i n g a camera model) t o form an image-specific plan for that object.
The p r o j e c t i o n o f t h e s i z e sphere is handled i n a s l i g h t l y d i f f e r e n t manner. Due t o u n c e r t a i n t y of the exact l o c a t i o n , Just the radius of the size model is projected to the image. The p r o j e c t e d r a d i u s i s added t o t h e p l a n a s a " s i z e f i l t e r ; " no region greater than that size is expected as a projection of that object. In the experiments shown here t h i s size r a d i u s was added to the l o c a t i o n r a d i u s t o form a c o m p o s i t e p l a n , shown as the outer circle in Figure 3. The o t h e r two c i r c l e s shown a r e : the p r o j e c t i o n of the location sphere (the s m a l l e r o f t h e two i n n e r c i r c l e s ) and t h e p r o j e c t i o n o f t h e s i z e sphere ( t h e n e x t larger c i r c l e ) a r b i t r a r i l y positioned at the center of the projection of the location sphere. Note that, although these illustrations show tightly constrained locations (small location spheres), it is possible to represent small objects and/or larger uncertainties in location. Once t h e model has been projected to the i m a g e , t h e r e s u l t i n g p l a n can be used in a m a t c h i n g process. Regions in t h e image can be matched against t h e p r o j e c t i o n c i r c l e s t o form a s u b i m a g e . A l l r e g i o n s t h a t are contained w i t h i n the composite circle are taken t o b e candidates f o r i n c l u s i o n i n t h e subimage. Further, a l l regions for which the r a t i o o f r e g i o n area w i t h i n the c i r c l e t o the t o t a l region area i s above a n object-model-specific t h r e s h o l d a r e accepted a s c a n d i d a t e s f o r i n c l u s i o n . These c a n d i d a t e s a r e t h e n f i l t e r e d b y s i z e , and any region which Is too large is r e j e c t e d . (Such rejected candidates could be split by resegmentation; t h i s was n o t done h e r e . ) The u n i o n o f t h e r e m a i n i n g c a n d i d a t e s forms t h e sub image. The two spherical representations whose projections are shown in Figure 3 are for " t r e e - c r o w n " and " h o u s e - b o d y . " F i g u r e 4 shows the regions selected when t h o s e p r o j e c t i o n s a r e u s e d . F i g u r e 5 shows t h e way in w h i c h g e o m e t r i c matching can b e used t o l i m i t o t h e r r e c o g n i t i o n p r o c e d u r e s . A f e a t u r e m a t c h i n g p r o c e s s f o r h o u s e - b o d y , based on a color and texture prototype from the object m o d e l , was r u n on an image s e g m e n t a t i o n to select the s e t o f r e g i o n s shown i n F i g u r e 5 a . The same segmentation, together with the plan from the projection of the house-body s p h e r i c a l model, produced t h e sub image shown in Figure 4c. Using the g e o m e t r i c a l l y based subimage t o r e s t r i c t t h e f e a t u r e m a t c h i n g Improves t h e r e s u l t s . Compare t h e regions from f e a t u r e matching a l o n e ( F i g u r e 5a) w i t h t h o s e s e l e c t e d when t h e f e a t u r e m a t c h i n g was constrained by the geometric plan ( F i g u r e 5 b ) . The matching using the plan is Improved by the e l i m i n a t i o n of those r e g i o n s which are i n c o n s i s t e n t w i t h s i z e and l o c a t i o n i n f o r m a t i o n .
References
F u t u r e Work In addition to controlling region grouping procedures, knowledge o f t h e g e o m e t r i c c o n s t r a i n t s i m p l i e d b y o b j e c t s u b p a r t r e l a t i o n s can b e used to guide the e x t r a c t i o n o f image f e a t u r e s [ 1 0 ] . The three-dimensional object geometry, when combined with information about the location of object f e a t u r e s i n t h e image, can b e used to b u i l d a t h r e e - d i m e n s i o n a l model o f t h e o b j e c t s i n t h e image [4]. T h i s m o d e l , i n t u r n , can b e p r o j e c t e d t o the image to p r o v i d e more s p e c i f i c i n f o r m a t i o n about the l o c a t i o n of o b j e c t p r o j e c t i o n s in the image. Such specific i n f o r m a t i o n w i l l e v e n t u a l l y b e used to control the invocation of both feature extraction procedures and procedures for the r e c o g n i t i o n o f o b j e c t s from two-dimensional image characteristics [ 6 ] . Work i s c u r r e n t l y underway t o examine how such c o n t r o l can b e s t be i m p l e m e n t e d .
[1]
D.H. B a l l a r d , C M . Brown, and J . A . Feldman, "An Approach to Knowledge-Directed Image Analysis" in Computer Vision Systems, A. Hanson and E. Riseman, E d s . , Academic P r e s s , New Y o r k , 1978.
[2]
Thomas 0. B i n f o r d , Rodney A. B r o o k s , and David G. Lowe, "Image U n d e r s t a n d i n g v i a Geometric Models," in Proceedings of F i f t h ICPR, 1980, pp. 364-9.
[3]
A . R . Hanson and E.M. Riseman, " V I S I O N S : A Computer System for I n t e r p r e t i n g Scenes", in Computer V i s i o n Systems, A. Hanson and E. Riseman, E d s . , Academic P r e s s , New Y o r k , 1978.
[4]
D.L. Lawton, " C o n s t r a i n t - b a s e d I n f e r e n c e from Image Motion," in Proc. AAAI, Stanford U n i v e r s i t y , August 1980, 3 1 - 3 4 .
[5]
A l a n K . M a c k w o r t h , " V i s i o n Research Strategy: Black Magic, M e t a p h o r s , M i n i w o r l d s and Maps" in Computer V i s i o n Systems ( A . Hanson and E. Riseman, eds.). Academic Press, New Y o r k , 1978.
[6]
C.H. McCormick, " S t r a t e g i e s a s Knowledge-Based Control in Image I n t e r p r e t a t i o n " , Technical Report in preparation. Computer and Information Science D e p a r t m e n t , U n i v e r s i t y o f Massachusetts, Amherst, Massachusetts.
[7]
Paul A. N a g i n , " S t u d i e s in Image S e g m e n t a t i o n Algorithms Based on H i s t o g r a m C l u s t e r i n g and R e l a x a t i o n " , D i s s e r t a t i o n published as COINS T e c h n i c a l Report 7 9 - 1 5 , September 1979
[8]
W i l l i a m M. Newman and Robert F. Sproull, Principles of I n t e r a c t i v e Computer G r a p h i c s , second ed '., McGraw H i l l , 1980.
[9]
C.C. Parma, A.R. Hanson, and E.M. Riseman, "Experiments in Schema-Driven I n t e r p r e t a t i o n of a N a t u r a l S c e n e " , COINS T e c h n i c a l Report No. 80-10, University of Massachusetts, Amherst, Massachusetts.
Acknowledgments Edward M. Riseman , A l l e n R. Hanson, CI if McCormick, and Leonard P. Wesley h e l p e d by critiquing the general concepts and specific implementations discussed in t h i s paper.
[10]T.E. Weymouth, "Experiments in Knowledge-Driven Interpretation of Natural Scenes," Technical Report in preparation, Computer and I n f o r m a t i o n Science D e p a r t m e n t , University of Massachusetts, Amherst, Massachusetts.
630