Data Structures and Pattern Recognition - Springer

22 downloads 118 Views 3MB Size Report
1. Center for Information Research, University of Florida. Authors. Allen Klinger (2). Author Affiliations. 2. University of California, Los Angeles, California, USA ...
Chapter 5

DATA STRUCTURES AND PATTERN RECOGNITION Allen Klinger University of California Los Angeles, California

1. INTRODUCTION The term "data structures,,(1) is well-established in computer science, conveying the idea of how tables and lists are stored, but it is a relatively new concept in pattern recognition. In pattern recognition, the concept of data structures is a continuation of the initial notion of "feature" from statistical pattern recognition(2)-that is, a simple clue helpful in decision-making-to include: 1. Nonstatistical or descriptive pattern recognition methodologies(3):

Linguistic, syntactic, and structural methods.

2. Sequential calculations (a) Feature extraction depending on decision to be made. (b) Multilevel or hierarchical classification processes: "understanding"(4, 5). The result is more than a generalization, since all these are processes, not details, in a block-diagram system description. Taking a data structure approach generates an entirely new group of pattern recognition methodologies, for there is such a change in emphasis that the former breakdownsin statistical approaches, into preprocessing, feature extraction, and classification; in descriptive methods, into primitives and rewriting rules-are no longer adequate to describe what is done. The tree concept is frequently used in descriptive pattern recognition. Image processing gives us linear list input data as do other data analysis domains in pattern recognition such as waveform processing. Descriptive 273

J. T. Tou (ed.), Advances in Information Systems Science © Plenum Press, New York 1978

274

Data Structures and Pattern Recognition

[Chapter 5

methods that yield trees are transforming the source data from one data structure to another: trees and linear lists are standard types of data structures. Although such methods precede feature extraction and classification, they take place in general-purpose digital computers and hence are not preprocessing. Probabilities and decision theory are basic in statistical pattern recognition. However, the possible probability functions are numerous in real problems. Repetition of computations on different, but partially overlapping, sets of data can occur. (5) This is a procedure that can be facilitated by the choice of data structure. Many of the data structures presented here were invented as methods to encode array data in sequential form. The most common source of array data in pattern recognition is from images. Hence, methods from image pattern recognition(6) predominate in this paper. Many of these are equally applicable to other array data, such as in management information systems. Since computer science has grown rapidly, boundary lines are fluid and the concepts invented for one case could apply to many fields. Examples from artificial intelligence(7) and speech recognition/understanding(5) involve array and other data structures described below. The purpose of this paper is to focus on representing input data for future processing to meet alternative classification requirements. Frequently a working solution method in pattern recognition problems involves creation of temporary hypotheses. This is facilitated when the "patterned" raw data is arranged to enable many types of future computations. Note that this approach differs strongly from either 1. retention of all available numbers relating to the original pattern, or 2. reduction of the large amount of data in the original pattern to a small set of features. The following sections describe in detail how many of the approaches to data structure creation have been developed. Several data structures described relate to pictorial pattern recognition; these bridge image processing and the artificial intelligence area called scene analysis. The paper begins with an exposition of the pattern recognition problem, followed by an introduction to the data structure concept, and then a discussion of pictorial pattern recognition and image processing. A detailed review of the principal data structures used in pattern recognition follows; later sections are on line drawings, histograms and integral projections, medial axis transforms, and generalized cones. Sections on syntactic methods, trees, webs, and structure learning conclude the paper.

Sec. 2]

The Pattern Recognition Problem

275

2. THE PATTERN RECOGNITION PROBLEM The problem of pattern recognition(2) is for automatic decision-making regarding class membership or data type for an element of a group of data that can in some useful way be treated alike. Usually the underlying data are easily classified by human beings. Examples are alphabetic letters, numbers, electrocardiograms, and photographs-whether of faces, blocks on a table, or the earth from the air (rivers, bridges, crop fields). Early attempts to attack this problem began with the observation that humans approach classification tasks by isolating important elements of the pattern. These recognition aides or clues are usually called "features," and it is common to speak of "pattern vectors." These are lists of several different feature values evaluated for a certain pattern. Lines are often features in patterns-for example, the capital letters E and F. A global view of the pattern classification process is based on evaluating features, then combining the noted values via a statistical decision rule. However, patterns are not initially in a form that permits computer processing to find features, and processes that convert patterns into such forms are called "preprocessing." Examples include digitizing (sampling at regular intervals) waveforms such as electrocardiograms and sca~ning photographs (electrooptical conversion to a magnetic tape representation for computer processing). Subsequent work involved pattern domains where feature values are less important and structural relationships among parts of a pattern dominate the decision-making task. The representation of the pattern parts can often be broken into elements called "primitives." In many cases there is a set of allowed interrelationships: these are called a "grammar" or set of "syntactic rules." Structural analysis is an important part of pattern recognition in bubble-chamber photographs of nuclear events, spoken words, and Chinese characters. An extensive exposition of the theory and a discussion of these examples is supplied in reference 3, but, in brief, the essence of structural analysis is the investigation of relationships. If the allowed elementary relationships can be written as a small set of rules, iterative application of them can describe a much larger set of relations. Thus, the computer can test complex patterns for validity, while storing only a small rule set ("grammar"), not a large number of special cases. The two main approaches overlap, but the. former approach to the pattern recognition problem is called "statistical" and the latter "descriptive, syntactic, or structural"-"syntactic" is more frequently used. In statistical pattern recognition, sometimes the preprocessing and feature extraction processes are ignored and a mathematical form of the classification process

276

Data Structures and Pattern Recognition

[Chapter 5

is viewed as "the pattern recognition problem." This has given a narrow research target and led to a substantial theoretical literature, yet the significant practical problem (automating man-made classification of patterns) is still only partially solved. The purpose of this paper includes furthering this practical goal through the addition of data structure approaches to pattern recognition. Hence, for purposes of completeness only, we give the narrow "pattern recognition problem" as Assign the n X I vector of features x to one of m pattern classes C1 , ... , Cm·

(Probability distributions may be given, samples Xl, ... , Xk and their class memberships may be given, and classes Ci mayor may not overlap.) With this background, we turn now to a similar review of the computer science concept of "data structure." This idea is very different from that of "structural pattern recognition," since, as we see below, it starts from another problem viewpoint.

3. THE DATA STRUCTURE CONCEPT The role of a data structure in a general-purpose digital computer is a bridge between the software-the actual code in programs-and hardware, principally computer memory and central processor unit. The data structure serves as a base for algorithms since some computational methods are enabled (or facilitated, especially with regard to computing speed and memory space limitations of specific machines) by proper choice of organization of information. In addition to the conceptual types of structures, a variety of mundane details regarding the planning for future computation constitute the final data structure. These details include how many different items can be stored in a memory word, whether pointer variables are used, and whether there is one or several ways to access the structure. A data structure(I) is a method of organizing objects to be processed by one or more computer programs; several different types are common: trees, binary trees, forests, and "Lists." Many are generalizations of terms in mathematics and operations research (management science): e.g., vectors, matrices, files, and queues. Data structure terminology focuses on the structural rather than the mathematical property of each item, calling vectors linear lists, and calling matrices arrays. The simplest kind of linear list is a file. A familiar example is a file drawer in a desk: the sequence of folders stored there one after the other is

Sec. 3]

The Data Structure Concept

277

ordered linearly. If we call them by the letters of the alphabet, then "ab· .. z" is the linear list, both on this page (where it is itself an entity) or as a symbol of the real file folders. Files, and linear lists in general, may be sequentially organized as in "alphabetical order" or by a "nonsequential" approach called "linked." To continue the analogy, suppose that each file folder contains a record of where the next folder in the linear list is to be found. This record is called a "pointer" or "link." It may give the location as an "absolute address": "NEXT is at LOCA nON 32"; or as a "relative address": "NEXT is 3 BEFORE this location." The properties of the linear list of interest are strongly affected by the choice whether to use sequential organization; the term "sequential allocation" is used in place of "sequential organization" if actual computer memory locations are to be assigned for storing the file. These properties include such modifications of the file as deletion and insertion of elements to the original order. Other properties affected are time to traverse all the file elements and amount of storage needed for the entire file. Special uses of pointers can facilitate execution of certain algorithms. One such use is definition of backward-direction links to create "doubly linked" data. The addition of end-of-file to beginning-of-file pointers called "circular" links is also useful. When both backward and circular pointers are used, the data structure is called a "ring." The "backward" and "circular" links aide in "insertion of an element immediately prior to a given one" and "traversal of the entire file from any starting point," respectively. Of course, each added pointer takes up some physical storage and may also slow down execution of some algorithms: these are often important costs. Certain linear lists are also used that are named in accord with some special property. Some common examples involve insertion and deletion restrictions: stacks (insert and delete only at one place, "the top of the pile"), queues (insert at one place, delete at another: "enter at the back, exit at the front"); and "deques" (double-ended queues, with or without further insertion/deletion restrictions), are cases where the restriction is implied in the name. The fundamental data structure decision is whether to use sequential or linked memory organization and allocation. This choice, and the utilization of auxiliary pointers, can materially change the way algorithms can use the stored data. For example, special auxiliary pointers ("threads") can be used to build a queue for special processing while the data is being handled in a routine way. The same devices and techniques apply equally to the other data structure types. It is impossible to adequately describe the nature and uses of trees, lists, etc. in this brief section. Instead we now present only a short description of these "nonlinear" lists and refer the reader to

278

Data Structures and Pattern Recognition

[Chapter 5

reference I for greater depth on these and any data structure topics in this section. "Arrays" are nonlinear data structures that actually are superimposed orthogonal lists. Two-dimensional arrays are particularly important, but arrays may be of any dimension. "Trees" are similar to arrays; both are nonlinear structures since more than one element is adjacent or succeeds another. For example, in two-dimensional arrays there are either four or eight elements adjacent to a central point. The number of successors in a tree is arbitrary and may differ for elements in the same tree, ranging from none to any integer value. "Binary trees" are a completely different type of data structure with two "designated" successors for an element of the data structure, a "left" and a "right" successor. The possibilities are no, leftonly, right-only, and both successors, for any tree element. "Lists" are similar to trees except that an element in a List may have a List as a successor, as well as an ordinary "atomic" element. This concludes this introductory review of the types of data structures. The following shows how some data structure terms introduced above could be used: Circularly linked lists and threads can be used in pattern recognition to obtain class-discriminating information. For example, create a linked list of subparts of the same type, such as "arm" in a humanchromosome image, called PART. If the list is circularly linked, an algorithm "COUNT NUMBER OF _ _ " can process at any stage. (The algorithm begins anywhere in the list; the circular link permits it to access all elements in PART.) The subpart type to be counted can be set when the algorithm is called: For pictures of chromosomes it might be set "ARMS." The continually available "COUNT" can be used to drive other routines that classify using symmetry properties (if COUNT NUMBER OF _ _ does not output an even number, a "no symmetry" decision is immediate). This pictorial pattern recognition example leads us to a discussion of that topic and the related one of image processing.

4. PICTORIAL PATTERN RECOGNITION AND IMAGE PROCESSING Many human pattern recognition tasks involve pictures. Here the special abilities of a person in visual perception are stressed, and these are so remarkable in some cases that automation seems a remote goal. Nevertheless

Sec. 5]

Review of Applications to Pattern Recognition

279

there are numerous practical things that computers can accomplish in dealing with pictorial data, and they have come, together with modeling human perception, to be called "pictorial pattern recognition"(S)-i.e., decisionmaking given pictorial data. To understand that field we first examine the form that pictures take when they are available for computer processing. Digitized pictures are two-dimensional arrays of picture elements: "pixels," usually stored as long sequential records on magnetic tape. In this storage form the geometric relationships among pixels are not preserved since a "series of rows" representation is used. Each pixel is an average of local light intensity in the picture around a point. The "spread" or spot size and averaging characteristic may differ between scanning devices, as may the number of quantization levels for the average value stored. In addition, pixels will not correspond exactly to the same points or spots in a picture even if the same scanner is used twice on a given image. A single image is a large data set, often over one million data items: individual pixels and their light intensity values. Most images when digitized are of this order: 5 I 2 X 5 I 2 arrays (over a quarter of a million items) and 1024 X 1024 ones are common. Obtaining a digitized image-i.e., by electro-optical scanning-is "preprocessing." Computations on a digitized image to enhance certain spatial frequencies, improve visibility of some picture aspects, or enhance structural data (lines, corners) are called "image processing." (6) Lines are derived when edges of regions are isolated: an outline "drawing" or "line drawing" can be obtained by thresholding a spatial pseudogradient or other "edge operator"; these generally create a new picture element value from pixel light intensity and the intensities of the pixel's immediate eight neighbors. The preceding has introduced special terms in image pattern recognition. We now review other ways that data structures are used in pattern recognition, beginning with some pictorial applications.

5. REVIEW OF APPLICA TlONS TO PATTERN RECOGNITION This section discusses how data structure methods are used in pattern recognition. Some types of data structures that have been used in image pattern recognition(6) are the medial axis transform,(S) chain codes(9), vertex labels, (10,11) and generalized cones, (12) and most are described in detail in subsequent sections. Vertex-label structures enable computer-graphics

280

Data Structures and Pattern Recognition

[Chapter 5

display and analysis programs(4) for three-dimensional block scene images and they are also discussed extensively in textbooks (reference 2, pp. 441446; reference 7, pp. 193-212). An alternate view is that vertex labels are primitive elements that can describe block objects only when their relationships are stored by tree or List data structures; indeed, linear lists, trees, binary trees, and Lists all are used in analysis programs. (2--7.10,lll Hidden-line elimination* algorithms(13,14) are based on a special data structure. Although it is similar to a binary tree, a "quaternary tree" with four regionally defined successors-"northeast," a; "northwest," b; "southeast," c; and "southwest," d-differs in three ways: (1) It has been frequently used in pictorial pattern recognition and computer graphics, (2) it has been independently invented by several groups, and (3) the data structure notation is not standardized and its mathematical theory, unlike the binary tree case is not detailed. (Our quaternary tree notation uses decimal points to denote "successor" so a.a is the northwest of the a area-itself a "northwest" quarter of a given array. The sixteen subarrays a.a, a.b, ... , d.d are elements in a three-level tree that has a, b, c, d at the second level. For some experiments using the tree on image arrays, see [34].) Elimination algorithms use quaternary trees by scanning the four successors of a node, and examining only those areas where more than one line appears. The remainder of this section emphasizes the idea that data structures are a way of enabling the writing of algorithms to calculate many different properties of a "pattern" (a set of data associated with an object we want a computing machine to be able to process in a well-defined way: "recognize"). These calculations are called "feature extraction" in pattern recognition terms, but problems where there are well-defined features to calculate are not common in practice. Instead a subtle interaction of cost of further calculations and cost of misclassifying given the features already evaluated leads to stopping of computations of properties and computer classification. Basic to this is cost and probability (likelihood) evaluation at each stage, with comparison of the current likelihood given the number of features already "known" with two decision thresholds. When the value lies between the two decision thresholds, another feature is to be evaluated. The results and complete mathematical model, an extension of sequential decision theory in statistics, are thoroughly presented in reference 15, while the original research is by Y. T. Chien and K. S. Fu. (15a)

*

"Hidden-line elimination" concerns using the computer to generate graphics-console displays of architectural design concepts given as overlapping lines in computer memory; a solid-looking display, with occluded lines eliminated, is desired.

Sec. 5]

Review of Applications to Pattern Recognition

281

However, the role of the data structure is to facilitate calculation of many properties, and this is also needed in practical applications. The purpose of this section is to discuss past use of data structures to facilitate property evaluation and potentially beneficial future application areas: this requires a close look at diverse, yet related, approaches, but it does not involve cost coefficients and losses in sequential decision models. The following discussion reviews recent computer science research relevant to feature evaluation. The main use of data structures in pattern recognition is to enable better calculation of properties for recognition purposes: this may mean more precise evaluation of certain aspects of an object in some contexts, ability to compute many different properties (more, if the costs involved in a sequential decision model warrant it), etc. We generally need to adapt the features we compute when decision-making tasks change. A set of features can be inadequate later when a different type of decision must be made, and the data structure can advantageously be planned to allow for calculation of the many different features that may be needed. This introduces far greater flexibility than can be achieved by inserting a prior description of the features to be computed in a machine or program. Another important aspect of data structures in pattern recognition is that trees, binary trees, and Lists are ideal for hierarchical computations: they facilitate recursive calculations. In artificial intelligence computer programs where pattern recognition is accomplished by using additional sources of knowledge regarding the data to be recognized such as context, tree, and list (reentrant tree) data structures are frequently used. (4,5, 7) Such terms as "lexicon" (allowed words in a vocabulary), "syntax" (grammatical rules establishing valid strings), "semantic information" (meanings attached to symbol groups), "world model" (past contextual framework), and "problem domain" (objectives to be accomplished)(4,5,7) describe aspects of recognition that are included in hierarchical processing. Many examples in both statistical pattern recognition and descriptive (syntactic, structural) pattern recognition involve such hierarchical or multilevel processes; there repetition of computation is necessary to enable classification. In text and speech understanding systems(4,5,7) the computer program makes a temporary hypothesis, usually at a low level, and attempts to produce classification decisions. The result is tested for consistency with other information in storage, and if the result is poor, the program revises its low-level hypotheses and recalculates properties. These allow it to make new hypotheses and repeat the classification/testing process. Note that data used in practical classification tasks is best available as a data structure since then it may be used by

282

Data Structures and Pattern Recognition

[Chapter 5

many algorithms, each associated with different decision-making tasks. In speech or text examples, the literal meaning of "lexicon" (allowed vocabulary) and "syntax" (grammar), continues to "semantic" data (allowed meanings and interpretations), while the "world model" is what is described, and the "problem domain" is the specific goals and conditions. The relationship of "recognition" or classification tasks in artificial intelligence, e.g., text/speech "understanding," to those in pattern recognition is difficult to describe precisely. In one view the artificial intelligence applications are to more practical decision-making tasks-ones that involve some softer, more intuitive knowledge. However, the solutions proposed there often involve restrictive, problem-narrowing assumptions. Hence this paper takes the viewpoint of reference 16: Pattern recognition is an important part of artificial intelligence. Both are rapidly changing research areas and there is substantial overlap and interaction. For our purposes we treat them as identical when the basic goal is that of pattern recognition-machine classification of data-and use "complex tasks" to denote artificial intelligence applications involving that goal, such as speech understanding and scene analysis. Most complex tasks involve multiple uses of input data about a problem and its domain. This data, although originally obtained from a sensor such as an audio microphone or video camera, is first transformed into a digitized-signal stream for input to the computer. The data structure is built by the transformations placed on that input stream. Since complex tasks require very many different combinations of pieces of knowledge for classification, only some of this knowledge will come from the input data. Making all input data accessible to the hierarchical decision-making computer program at any time is the goal of the data structure. Some of this data may be descriptive versions of input data concerning an object to be classified (e.g., a three-dimensional block-is it a cube?; a spoken sound-is it a "three"?). However, past inputs are also required and these may come from other input streams (lexicon-dictionary; syntax-grammar; semantic information-three = 3 = 011; world model-integer numbers; problem domainfive postal-zip-code or thirteen or fourteen credit-card digits). Hence the data structure is essentially a way of making diverse types of information accessible to the decision-making programs. Since data structures facilitate computation for complex decision-making by putting several kinds of information together, the concept was generalized to the idea of a "frame"(l7): a data structure consisting of "all" the information related to a certain problem area. "All" is used here since some work may require very narrowly chosen frames. An example occurs in spoken Arabic numeral recognition

Sec. 6]

Line Drawings and Chain Codes

283

for postal zip codes and credit card numbers. There the homonym "too" meaning "also" may be omitted if spoken sounds must come exclusively from either of these "digits-only" problem domains. The following sections describe specific data structures used in pattern recognition. Many of these arose as methods to encode array data in sequential form. We begin with image pattern recognition array encoding techniques, then turn to data structures used in management information systems arrays. The concluding section describes how overall array structure can be learned if computer programs to do this take advantage of data structure methods.

6. LINE DRA WINGS AND CHAIN CODES This section discusses a data structure that has arisen in a natural way as a means for conveying visual information. A typical natural line drawing is a map. Others are schematic diagrams, engineering and architectural drawings, charts, and graphs. For an extensive description of computer methods for processing line drawings and the chain code data structure that makes this possible, see the excellent survey paper in reference 9 and the references there. The following draws heavily on that work. Line drawings have the meaning "an image used to convey information about a two-dimensional (2D) line structure," while a "line structure is a geometrically defined concept, consisting of an assembly of points, line segments, and curve segments in Euclidean space". (9) Subsequent sections discuss several special line structures that supply useful techniques for representing information about shape and greyness in image data. Two of these, the "medial axis transform (skeleton)" and "image histogram," and the more familiar graphs and charts, are technical artifices used to convey information to sophisticated interpreters of images. In this section we emphasize conversion of line drawings with nontechnical origins to a data structure that can be the base for computer pattern recognition algorithms. Line structures arise as abstract geometric entities. These may be specified analytically, and when computers are used, structures of high complexity may be generated with relative ease. They also occur when outlines or other point path trajectories are the generating technique (reference 9 uses the term "tracings"), or when a natural image is modeled or approximated (for example, by contour maps or lines separating uniform color or texture regions).

284

Data Structures and Pattern Recognition

[Chapter 5

Whatever the line structure source, a single data structure has dominated for computer representation: the "chain code." A chain code is a string of octal digits. Each element in the string ("link") corresponds to one directed line segment of a standard length associated with a uniform grid superimposed on the line drawing source image. If the grid interval is T and the link symbol ai, then the length is TCV);YP, where p is the modul0-2 value of ai and ai ranges over the integers 0, ... , 7. The direction of link ai is aiCn/4) with the zero reference identified with the positive x axis in Cartesian coordinates. (Values of ai above zero indicate the number of counterclockwise increments of angle n/4.) A "chain" is the entire set of links representing a line structure. "Signal codes" are four consecutive links with 04 as the first two values. They allow the representation of isolated line segments [by the signal "0401 Invisible chain follows (if chain is being plotted, this code causes pen to be lifted off paper)"(9)], change of scale, and the control of repetitions of links or groups of links. The virtue of the chain code is its compactness. It enables storage of complex pictorial data involving many regions or shapes in a condensed form that can be used to recreate the image when it is needed for display or analysis. At the same time, this recreation process is a costly one in terms of computer time, from the view of a data structure developed for pattern recognition. That is, recognition algorithms can rarely function on small segments of chain codes; rather, they require re-creation of entire shapes, and this is time consuming in hierarchical classification processes. Thus, the very compactness of representation works against the efficiency of a chain code in recognition applications. The two possibilities are: 1. Restrict use of chain codes to display-oriented applications. 2. Add auxiliary pointers and labels to each chain element to shorten the time needed in recognition applications. Since chain codes represent boundary data, they give a way to keep region shapes in serial computers. One complete boundary can be stored as a stack or a queue-both straight linear lists. Other data structures, such as trees, can group and order several regions in a scene so that all can be kept in chain codes, each stored possibly in a circularly linked queue ("ring" data structure). (See Fig. I, parts I and II.) Region-shape encoding requirements can be fulfilled in a different way when a shaded image is available instead of a line drawing.

I

6

7

I

o

4

\

r7f21~rJ

4

~2tC_/s

2~-4~

5

0

o

2

10101 71 514 14T2] 2)

'f

-----_._---

0

(A, B, QUEUES ABOVE) EITHER TREE DESCRIBES II

L _________ ....J

=:.

-

a

4EtI~ G~'151+I*J ~ tf

),1 -1 '\

A

..

- - - - -

Ctolo171514141212~

B

(3

Fig. 1: 1, 11. (1) Chain code and some linear list data structures storing it. (11) Several chain codes and their data structures.

II

+

4

-.~o_

4

2t~--4~ 2i_ _ /-

~

~

N

'"

C") C

:i'

III

::t

C")

Q.

~

~

:i'

Q1 ~

tl

... :i' CD

~

~

CI)

""

""

IB



laT--Jhl

_2_

- -rbJ

MAT

--

[d}- -

-B

fctr=Yl

-3-4_

Ib l i d I

-,-

~T--

r

"

"-

-5

d (d,B)

,"

//

/

e

NEXT (x) POINTS TO AN ADJACENT ELEMENT ON THE SAME MAT BRANCH

e

S

x, y-Ioe (x) - LOCATION OF POINT x IN x OR y DIRECTION

o

Fig. l: Ill. Rectangle with medial axis transform and linear list and tree data structures.

//

d (b,B)

" ~ .... --+ "" ">-- -