STRUCTURING DOMAIN KNOWLEDGE FOR VISUAL ... - CiteSeerX

5 downloads 198 Views 83KB Size Report
This paper is best read as a sequel to an earlier vision ... Now at Department of Computer Science, University of British ... Dept. of Computer Sciences. Univ. of ...
STRUCTURING DOMAIN KNOWLEDGE FOR VISUAL PERCEPTION

I.

A l a n K. Mackworth

W i l l i a m S. Havens*

Dept. of Computer Science U n i v . o f B r i t i s h Columbia Vancouver, B.C. Canada V6T 1W5

Dept. of Computer Sciences U n i v . o f Wisconsin a t Madison 1210 West Dayton S t r e e t Madison, Wisconsin 53706

4 . I t s r e c o g n i t i o n process i s e s s e n t i a l l y data-driven despite the g r a f t i n g on of the " c y c l e of perception" and interpretation-driven re-segmentation.

INTRODUCTION

Early v i s u a l processing is conjectured to be almost e n t i r e l y domain-independent and d a t a - d r i v e n ; however, h i g h - l e v e l v i s i o n must e x p l o i t domain and object specific constraints and methods, intermingling data-driven and model-driven recognition. We are concerned w i t h knowledge r e p r e s e n t a t i o n and i t s use f o r h i g h - l e v e l v i s i o n . T h i s communication i s a b r i e f r e p o r t o n t h e t h e o r y underlying Mapsee2, a second program for i n t e r p r e t i n g s k e t c h maps.

IT.

5. The network of models is c o m p l e t e l y c o n s t r u c t e d b e f o r e any model c o n s t r a i n s a n o t h e r . 6. The model language is i m p o v e r i s h e d . In particular, l o c a l c o n t r o l o f r e c o g n i t i o n i s not possible. 7 . A l t e r n a t i v e scene i n t e r p r e t a t i o n s are e x p l i c i t l y a v a i l a b l e t o guide t h e program.

The f i r s t t h r e e symptoms can be a s c r i b e d to d e s c r i p t i v e inadequacy w h i l e t h e l a s t f o u r are symptomatic o f p r o c e d u r a l inadequacy.

WHAT'S WRONG WITH MAPSEE1?

T h i s paper is best read as a sequel to an e a r l i e r v i s i o n paradigm embodied i n the program Mapseel [3] . That program i n t e r p r e t s s k e t c h maps r e p r e s e n t i n g scenes c o n t a i n i n g roads, rivers, b r i d g e s , towns, m o u n t a i n s , i s l a n d s , m a i n l a n d , l a k e s and oceans. The paradigm suggests e x t r a c t i n g cues from a lower l e v e l segmentation which are used to i n v o k e , ambiguously, s e t s o f o b j e c t models. These models are then made to s a t i s f y i n t e r n a l and external consistency constraints using a generalized network consistency constraint s a t i s f a c t i o n a l g o r i t h m . Here we s h a l l be concerned not w i t h p r a i s i n g Mapseel b u t w i t h b u r y i n g i t . In our v i e w , Mapseel died of representational inadequacy. I t e x h i b i t s seven symptoms o f t h a t disease: 1 . I t s cue/model s t r u c t u r e has level.

but

a

III.

At a theoretical level, Mapseel's contributions were t o demonstrate t h e use o f network c o n s i s t e n c y c o n s t r a i n t s a t i s f a c t i o n and the exploitation of data-driven, conservative segmentation t e c h n i q u e s . In Mapsee2, we have c o n c e n t r a t e d o n the i n t e r a c t i o n o f decomposition and specialization i n schema r e p r e s e n t a t i o n s , exploiting generalized network consistency techniques and the homogeneous use of d a t a - d r i v e n and m o d e l - d r i v e n i n t e r p r e t a t i o n s t r a t e g i e s .

single

Our scene domain can be s t r u c t u r e d almost h i e r a r c h i c a l l y , based o n the c o m p o s i t i o n o r p a r t - o f r e l a t i o n as shown in F i g u r e 1. A downward p o i n t i n q arc i n d i c a t e s t h a t a n i n s t a n c e o f t h e o b j e c t above is composed of zero or more i n s t a n c e s of the o b j e c t below. The decomposition i s not a s t r i c t t r e e i n t h a t t h e same Town ( i n s t a n c e ) , f o r example, may be p a r t of b o t h a Road-system and a Geo-system.

3. Many of the r e a l w o r l d scene domain c o n s t r a i n t s are r e p r e s e n t e d p o o r l y i f a t a l l .

by

NSF

THE KNOWLEDGE REPRESENTATION

In order to r e c t i f y these inadequacies we have a schema-based knowledge r e p r e s e n t a t i o n . To have a b a s e l i n e t o e v a l u a t e t h e success o r f a i l u r e o f the p r o j e c t and to b u i l d on t h e advances of Mapseel, we have used t h e same scene domain.

2. Each m o d e l ' s domain of i n t e r p r e t a t i o n is a f i n i t e l i s t of labels with no structure within or between t h e i n t e r p r e t a t i o n s .

T h i s work was supported in p a r t MCS-8004882 and NSERC Grant A9281.

not

Grant

* Now at Department of Computer S c i e n c e , U n i v e r s i t y o f B r i t i s h Columia.

625

IV.

schemata in the composition hierarchv. For example, when a Town instance has been found it in used as a cue f o r invoking the procedural methods of both the Road-System and Geo-System schemata above it in the hierarchy. In this way, appropriate h i g h - l e v e l schemata can be hypothesized by the discovery of appropriate h i g h - l e v e l cues, thus r e a l i z i n g a recursive cue/mode1 h i e r a r c h y .

SPECIALIZATION LABELLING

When an o b j e c t is hypothesized to e x i s t in the scene, an u n d i f f e r e n t i a t e d instance of i t s generic schema is created. As new components are recognized as p a r t of the instance, i t s d e s c r i p t i o n becomes progressively s p e c i a l i z e d . At the same time, i t s possible semantic r e l a t i o n s h i p s w i t h other schemata become f u r t h e r c o n s t r a i n e d .

The invoked higher schemata must f i n d or make instances of themselves t h a t can incorporate the lower instance or r e t u r n f a i l u r e if none can. S p a t i a l adjacency is the usual c o n s t r a i n t used to f i n d an e x i s t i n g instance. Mutual consistency is then enforced between the higher and lower instances. Model-consistency modifies the d e s c r i p t i o n of the higher instance 1 to conform to the lower. The m o d i f i c a t i o n includes possible f u r t h e r d i f f e r e n t i a t i o n of the schema instance down i t s specialization tree or failure. (Model consistency is a g e n e r a l i z a t i o n of Waltz-like pruning of the corner label lists.) Object-consistency modifies the lower instance to conform to the upper. (This is a g e n e r a l i z a t i o n of Mapseel-like pruning of the component object chain and region label l i s t s ) . This consistency process may then be propagated to other schema instances r e l a t e d to these under the c o n t r o l of the relevant schema methods.

I n i t i a l l y each incomplete schema instance must represent a l l possible f i n a l i n t e r p r e t a t i o n s for that object. The schema is ambiguously l a b e l l e d . In c o n t r a s t , Mapseel employed e x p l i c i t exhaustive label sets associated with each object. Unfortunately these l a b e l s must be low-level i n t e r p r e t a t i o n s of the p r i m i t i v e l i n e s and regions which appear in the scene. Attempting to encode h i g h - l e v e l abstract r e l a t i o n s h i p s i n these e x p l i c i t l a b e l sets leads to an unavoidable combinatorial explosion o f possible f i n a l i n t e r p r e t a t i o n s . In order to avoid t h i s problem, we have developed an i n t e n s i o n a l l a b e l l i n g method for schemata representations called Specialization Labelling. The set of possible final interpretations (labels) • for a schema are represented as a t r e e . Each node of the tree represents an implicit number of final i n t e r p r e t a t i o n s . Each descendant node represents a f u r t h e r s p e c i a l i z a t i o n o f the possible f i n a l l a b e l s implied by i t s parent. The root node of the tree is the u n d i f f e r e n t i a t e d schema instance. At any time in the recognition of the schema instance, the set o f a l l possible f i n a l i n t e r p r e t a t i o n s for the schema is represented by the union of a l l the labels implied by the current f r i n g e of the t r e e .

For example, suppose a Shoreline SI has been recognized as an i n t e r p r e t a t i o n of a chain closinq on i t s e l f . SI requires two Geo-system instances tc be part o f , G-Sl, the i n s i d e one, and G-S2, the outside one. Suppose G-Sl already e x i s t s and is already known to be a Landmass, because it contains a Road-system. Model consistency of G-Sl w i t h respect to SI requires that G-Sl be further specialized to I s l a n d . Object consistency of SI w . r . t . G-Sl s p e c i a l i z e s SI to be Island-shore (as opposed to Lakeshore). Suppose the outside Geo-system for S I , G-S2, has to be created as a new u n d i f f e r e n t i a t e d Geo-system. Model-consistency of G-S2 w . r . t . the Island-shore SI s p e c i a l i z e s G-S2 to be a Waterbody. Object consistency of SI w . r . t . G-S2 succeeds. All of these specialization c o n s t r a i n t s may propagate to other r e l a t e d schema instances which may in t u r n continue to propagate them.

Figure 2 shows the complete s p e c i a l i z a t i o n tree f o r the Geo-system schema. Initially a Geo-system i s f u l l y u n d i f f e r e n t i a t e d , i t s l a b e l l i n g represented only by the schema instance i t s e l f . The recognition of the Geo-system involves constructing t h i s tree as new components are added t o i t s d e s c r i p t i o n (generally increases ambiguity of l a b e l l i n g ) and pruning the tree as c o n s t r a i n t s are applied amonq related schemata (decreases ambiguity o f l a b e l l i n g ) . S p e c i a l i z a t i o n l a b e l l i n g avoids the problem of representing exponentially large numbers of l a b e l s for abstract h i g h - l e v e l o b j e c t s . As w e l l , it provides a representation for manipulating semantically r e l a t e d labels as a s i n g l e group.

V.

VI.

CONCLUSION

This model of perception is embodied in the program Mapsee 2. Mapsee2 is w r i t t e n in MAYA, a multiprocess LISP d i a l e c t supporting schemata data and c o n t r o l s t r u c t u r e s [ l , 2 ] . Our experience w i t h it so far has been l i m i t e d so t h i s communication should be viewed merely as a b r i e f progress report with tentative summary conclusions. In the meantime it should be clear that each of the inadequacies in Mapseel has been r e c t i f i e d in Mapsee 2. Our model is intended to provide a coherent framework for the desiqn and implementation of high-level, schema-based perceptual systems.

THE RECOGNITION PROCESS

The r e c o g n i t i o n process is a search of the composition hierarchy for a complete and mutually consistent set of schemata instances. The search is guided by the procedural methods embedded in each schema. As each schema instance s a t i s f i e s i t s own i n t e r n a l c r i t e r i a for completed r e c o g n i t i o n , i t is used as a cue for the hypothesis of higher

626

REFERENCES [l]

Havens, W.S., "A Procedural Model of Recognition for Machine Perception", TR-78-3, Ph.D. Thesis, Department of Computer Science, U n i v e r s i t y of B r i t i s h Columbia, Vancouver, Canada, (1978)

(21 Havens, W.S. and Mackworth, A.K., "Schemata-based Understanding of Hand-drawn Sketch Maps". In Proc. 3rd CSCSI Nat. Conf., V i c t o r i a , B.C., May 1980, pp. 172-178. [3] Mackworth, A.K., "On Reading Proc. IJCAI 5, MIT, Cambridge, 1977, pp. 598-606.

Fiqure

1.

Mapsee

Sketch Mass.,

Maos". August

2 Composition Hierarchy

627

Suggest Documents