document repositories, such as digital library archives or .... Handwriting is usually created as a time-ordered sequence ...... on-line signature verification.
Pattern Analysis & Applications (2000)3:153–168 2000 Springer-Verlag London Limited
A Line-Oriented Approach to Word Spotting in Handwritten Documents A. Kołcz1, J. Alspector1, M. Augusteijn2, R. Carlson3 and G. Viorel Popescu4 1
Electrical and Computer Engineering Department; 2Computer Science Department; 3Mathematics Department, University of Colorado at Colorado Springs, Colorado Springs, CO; 4Rutgers University, CAIP, Piscataway, NJ, USA
Abstract: The problem of word spotting in handwritten archives is approached by matching global shape features. A set of visual templates is used to define the keyword class of interest, and initiate a search for words exhibiting high shape similarity to the model set. Major problems of segmenting cursive script into individual words are avoided by applying line-oriented processing to the document pages. The use of profile-oriented features facilitates the application of dynamic programming techniques to pattern matching, and allows us to achieve high levels of recognition performance. Results of experiments with old Spanish manuscripts show a high recognition rate of the proposed approach. Keywords: Dynamic time warping; Handwriting recognition; Holistic features; Template matching; Word recognition; Word spotting
1. INTRODUCTION Regions of interest within a text document can often be located by specifying a set of keywords that tend to be associated with the desired context. A number of variants of keyword based search have become standard interfaces to document repositories, such as digital library archives or the Internet. However, a major difficulty arises when the documents in question are not available in a computer-type format, but as digitised images, as is the case for archives containing handwritten material. In these cases, the concept of keyword search is still valid, but tends to be much more complex, and distinct approaches to its solution are possible. Our main problem is that of word spotting – that is, recognising instances of a specific input keyword [1] within a set of handwritten documents. The particular test application considered in this study is the task of word spotting in the manuscripts comprising the Archive of the Indies (Spanish: Archivo General De Indias). This particular repository represents the official communication between the Spanish Crown and its New World colonies and spans approximately four (i.e. 15th–19th) centuries. The word recognition probReceived: 4 February 1999 Received in revised form: 10 February 2000 Accepted: 25 February 2000
lem is here particularly challenging, due to the poor quality of the documents (e.g. ink bleed-through), as well as the multitude of stylistic variations encountered in these manuscripts. A recognition system has to cope with large variations of character size and style, as well as with many flourishes characteristic of handwriting in those epochs. Figure 1 shows a sample document page from the archive. Motivated by the difficulty of faithful character [2] or word segmentation [3] of cursive script [4], our approach uses global word shape [5] as the basis for recognition. We are interested in discriminating between only two classes of objects: instances of a given word; and all other textual objects in a document (where the latter can include both valid words and noisy artifacts). By casting the word-spotting problem as a two-class recognition task, it is possible to use techniques utilising global word features, which are generally deemed inefficient in large-vocabulary lexicon-based approaches to word recognition. Here, only one class of interest is present, so the word-spotting task may be compared to signal-noise separation and the use of more detailed features of the signal class is justified. Many recognition systems divide words into smaller elements, since this facilitates training when models for a large lexicon must be built. In such cases, it is rather inefficient to create special models for each of the words separately. Instead, it is advantageous to share the training
154
A. Kołcz et al.
We assume that the number of examples is sufficient to capture the essential properties of the particular region of interest within the target space. This paper is organised as follows. Section 2 discusses the principles of line-oriented keyword search in handwritten data, where the whole document is treated as a stream of data, which can be analysed sequentially for the presence of the relevant data items. The problem of feature extraction and matching is presented in Section 3, where the use of profile-based features is advocated for use within the lineoriented representation. Section 4 outlines the technique of dynamic time warping from the perspective of handwritten word matching. Application details pertaining to word-shape comparison applied in this work are discussed in Section 5. Experimental results obtained with the word-spotting system proposed here are presented in Section 6, and the paper is concluded in Section 7.
2. LINE ORIENTED DOCUMENT SEARCH Fig. 1. A sample page from the Archive of the Indies.
for all lexicon elements by basing it on common word elements, such as characters. On the other hand, holistic [6] features that take the overall shape of a word into account are potentially more resistant to variations occurring in writing. Their use can aid the recognition at the word level and reduce the number of examples necessary to train a small-lexicon system. Additionally, the use of global features is more readily extendable to cases where other types of visual information are involved. The word-spotting task is initiated by specifying an input keyword, either as a sequence of characters (i.e. a purely symbolic format) or by providing a set of examples (i.e. containing handwriting) of the word in question, and then performing the search to localise words in the document that exhibit high similarity to elements of the example set. The latter method is limited by the availability of a sufficient number of examples so that enough variability within the target class of handwritten words can be captured. The set of keyword models samples the space of all valid handwritten instances of a given word but, due to a large space of possible stylistic variations, a small number of examples cannot be expected to provide sufficient information for faithful coverage of that set. This problem becomes simplified, however, if the search is carried out within a limited space of stylistic variations (e.g. focussing on a particular historical epoch, a certain style, or even a person). Additionally, this method can be seen as more general in the sense that it analyses shape similarity, rather than the lexical structure, and can therefore be applied (at least in principle) to objects other than handwritten words (e.g. line sketches [7,8]). The approach taken in this work concentrates on the problem of word spotting using a set of visual templates (i.e. a set of example images of the target word). The examples will be called models or templates interchangeably.
Handwriting is usually created as a time-ordered sequence of lines of text where, within each line, the writing proceeds along a predefined direction (e.g. left to right). After digitisation, each line is represented by an image of finite resolution, and can be viewed as a sequence of pixel columns, ordered according to the direction of writing. Given an isolated line of handwritten text, word spotting can therefore proceed by scanning the line along the direction of writing, where the process of scanning is equivalent to taking each pixel column in turn, and checking if it represents a starting point of a valid word. The stream-like representation of handwritten documents allows treatment of the whole document (i.e. a sequence of pages) in a unified way and, ultimately, the whole information in the archive can be treated as a (very long) sequence of pixel columns, as illustrated in Fig. 2. This representation is also very convenient for drawing analogies between the current problem and that of wordspotting in the speech-recognition setting [9], which has received much more attention in the literature. The speech signal is naturally given by a stream of data, in a chronological order. Since the time axis is clearly defined, dynamicprogramming techniques [10] acknowledging the causality of the input signal, such as Dynamic Time Warping (DTW) and Hidden Markov Models (HMMs) [11], are natural candidates for modeling speech patterns at various levels of resolution (e.g. phonemes, words, sentences). Indeed, these methods have proven quite successful in word spotting and word-recognition applications [12]. In word-spotting the training data is often limited to the utterances of the keyword in question, and whole-word models are used to detect their positions in the data stream [13]. On the other hand, if more detailed data about the language are available (i.e. utterances of other words, data pertaining to different speakers, the knowledge about the underlying language, etc.), more complex (and, usually, more accurate) models can be created, which tend to be based on more elementary features
A Line-Oriented Approach to Word Spotting
155
Fig. 2. Outline of the line-oriented approach to word spotting. The set of documents to be searched is divided into a sequence of lines of handwriting, where the line-processing module scans each line image along the direction of writing to identify the positions of likely match candidates.
(e.g. phonemes) [12,14]. Our approach presented here is largely motivated by the success of dynamic-programming based techniques for word spotting in speech applications, even when very limited keyword models are present [13]. Although the stream representation of the pixel data has only semi-causal properties, many of the algorithms developed for causal speech signals can be adapted and extended to the current setting. In this section we describe the issues involved in two essential tasks: the accurate segmentation of a page image into individual lines of handwriting; and the appropriate
choice of subimages within each line as candidate words for further analysis. 2.1. The Line-Segmentation Algorithm
To segment a digitised document page into individual lines, the horizontal ink-density histogram of the binary page image is first obtained by projecting the inked pixels of the image onto the vertical axis (Fig. 3 provides an example). At each pixel position along the image’s vertical axis the value of the histogram represents the density of inked pixels
156
A. Kołcz et al.
Positions for which the histogram value falls below the average are eliminated. A line breakpoint position is identified by the histogram minimum between consecutive centrepoints, or by the middle point of the corresponding interval if the associated histogram value is close to the minimum (in this way, the process is more resistant to noise). In some degenerate cases, a line may become too narrow (in which case it is simply ignored), or too wide (in which case it becomes subject to further division). Figure 3 illustrates the segmentation process of an example document page. The ink-density histogram and its FFT spectrum can also be used to determine the overall skew of the page. Clearly, the segmentation process may introduce a certain amount of distortion as many strokes of handwriting will cross the line borders, and some essential information may be removed from the individual segmented lines while, at the same time, a certain amount of noise from bordering lines is injected. Figure 4 provides an example of three consecutive segmented lines, where the distortion caused by foreign elements from the lines above and below the central line is clearly visible. To minimise the effects of these distortions on the subsequent processing, a filtering stage is performed to remove some of the foreign objects inserted into the individual lines. Other approaches to line segmentation of handwritten documents have also been proposed in the literature [15,16]. 2.2. Filtering the Line-Segmentation Noise
Fig. 3. Line segmentation of a page. The FFT of the horizontal inkdensity histogram identifies the basic line spacing. Line centre-points are subsequently aligned with the histogram peaks, which leads to the determination of the breakpoint positions (see text for details).
in a horizontal line of pixels corresponding to that position. The peaks of the histogram will roughly correspond to the lines of handwriting, while the valleys indicate the positions of inter-line spacing. To obtain the actual line-breakpoint positions, an FFT spectrum of the histogram is first computed, and the fundamental line width w is obtained. The line centre-points are initially specified as equidistant (by w), such that the sum of squares of the corresponding histogram values is maximised.
The line-processing approach makes it difficult to compensate for the loss of information which occurred as a result of stroke clipping at the line’s top and bottom boundaries. However, it is possible to deal with the intrusions from neighbouring lines. Such noisy elements tend to be disconnected from the words belonging to the current line, and their density is highest near the upper and lower boundary of the current line. Since the ascenders/descenders are usually connected to the word’s core (i.e. the region of highest ink density in the horizontal ink density histogram), disconnected ‘inked’ objects positioned outside the core region of a word are likely to represent foreign objects from the neighbouring lines. This suggests that component connectivity could be used to decide if a given inked object represents a part of a word in the current line, or if it represents noise and should be removed. Occasional misclassifications might occur due to imperfections in scanning and binarisation of the images considered here (e.g. the ascending/descending strokes might contain gaps disconnecting them from the central region), and a practical application of the filtering procedure should be expected to remove some of the relevant information too. Our experience shows, however, that the advantage of
Fig. 4. Illustration of the distortions caused by line segmentation.
A Line-Oriented Approach to Word Spotting
157
removing numerous noisy elements will outweigh the occasional loss of useful data. The implemented noise-filtering algorithm consists of the following stages. 1. Label all pixels in a line image according to their connected-components membership. 2. For the line region of interest (e.g. a word) compute the horizontal ink-density histogram. 3. Identify the region’s core responsible for a fraction (in our implementation 0.65) of the ink density according to the horizontal ink-density histogram. 4. Identify the connected components that cross the upper and lower boundaries of the identified core. 5. Within the line region of interest (or its appropriately defined portion) ‘erase’ all pixels outside the core, where the pixels to be removed belong to connected components that do not cross the core’s boundaries (i.e. they reside completely outside). From this description, it is clear that this procedure depends upon the accuracy of determining the upper and lower boundaries of the core region of a word. The variant of this operation chosen for implementation takes a straightforward approach, whereby the median (vertical) position of the ink-density distribution of an image is found, and then the region containing the prescribed fraction of the total ink density is grown (by a greedy algorithm), starting from the median value. At each step, one pixel row is added to the core from above or below, according to the greatest increase of the density gained. In principle, if all words within a line image were written at the same level and approximately of the same size, it should be possible to perform the noise filtering as a one stage process, with the upper/lower boundaries of the core handwriting estimated by taking the complete line image into account. However, this assumption is rarely valid, as individual handwritten words usually exhibit a significant variation as far as their relative vertical positioning, slant, skew and character size are concerned. Therefore, it appears that it is necessary to divide the line image into a sequence of narrower regions of interest (i.e. segment images) to make the noise removal effective. We consider a sliding window approach, where a window corresponding approximately to the length of word (e.g. 100 pixels) traverses the line image from left to right. At each step, the image data within the window is used to determine the core region’s boundaries and carry out the segmentationnoise filtering. The results are then applied to the part of the original image corresponding to the initial portion of the window, determined by a fixed offset value (e.g. equal to 20 pixels). In this way, discontinuities in the core boundary estimation process have only a limited impact, and connected components that might be valid for words occurring later on (with the reach determined by the sub-image length) in the line image are not removed. The window is then slid by the same fixed offset and the process is repeated. The results of applying this technique to a line-segmented image are given in Fig. 5.
Fig. 5. Results of segmentation-noise removal based on non-overlapping segment images. The original segmented page is in the top; the middle image shows the result after noise removal, whereas the bottom image shows the identified upper/lower core boundaries obtained in each case.
Note that the filtering described here is performed on the line image before the word-spotting is started, and certain parameters (e.g. the offset and segment lengths) are chosen heuristically. A more accurate, although more computationally expensive approach, would be to perform this kind of filtering on the image of each word candidate identified by the search process. 2.3. Candidate Selection Heuristics
For every line of text, each pixel column is examined as a potential starting point of a candidate word. Such a procedure implies a large number of processing stages. To give an example, let us consider a typical page of handwriting from the Archives of the Indies with the resolution and format used in our experiments. Such a page has a horizontal resolution of 900 pixels, and typically contains approximately 10 lines of handwriting. Let as take every pixel position along each line of handwriting as a potential starting point of a candidate (assuming that the boundary effects near page margins are ignored), which has to be compared with a relatively small set of 10 models. Then the total number of candidate–model distance computations per page would have the value of 10·900·10 ⫽ 90,000, which is comparable to matching one candidate to a very large lexicon. Additionally, let us assume that the document to be searched contains 10 pages (e.g. it is a letter). Then the total number of word comparisons to be made approaches one million, which may be taxing even for a fast system. Of course, a practical word-spotting application is likely to involve documents hundreds of pages long (or even more), which justifies efforts to reduce the computational load.
158
A. Kołcz et al.
Fortunately, given the set of model images, one can define a set of heuristic rules that eliminate a bulk of ‘candidates’ that are unlikely to produce a valid match for any of the model words. For example, one straightforward requirement is that the starting pixel column of a candidate word is not blank (i.e. it contains at least one inked pixel). In this way the empty areas near the page margins, as well as between words, can be eliminated. Scale-2 Constraint. We assume that the examples have been extracted from documents similar to the one being searched, so it can be reasonably expected that instances of the target word should be similar in size to the elements of the model set. Since some size and style variation should always be expected, we set ‘reasonable’ limits on the length (or horizontal scale) difference between model and candidate words. Let length (m) denote the length of a model word m. A word c will be considered as a possible candidate for a match with m if its length, length (c), satisfies length (c) 苸
冋
1 length (m), ␣·length (m) ␣
册
(1)
where the scale parameter ␣ was chosen to have the value of 2. This horizontal scale limitation will be referred to as the scale-2 constraint, for simplicity. Penalising Gap Content. The candidate selection process involves the extraction of a line subimage, starting at the current pixel position and extending to match the length of a model word. This extracted window is then examined for the presence of ‘gaps’, where a gap is defined as a sequence of blank pixel columns delimited by inked ones. It is assumed that a valid candidate should not contain any single gap that extends further than an average length of a character (estimated), unless such a feature is also present in the model. Although the total empty space within the candidate image should roughly correspond to that of a model image, it appears reasonable to eliminate only the candidates whose gap content is higher than that dictated by the adopted scale constraints, and retain the remainder for which the inequality gaps (m) gaps (c) ⱕ 2· length (c) length (m)
(2)
is satisfied, where gaps corresponds to the number of gaps (i.e. blank pixel columns) within a word, whereas m and c indicate a model and a candidate, respectively. A problem with this approach might occur if some models contain very few gap regions. To deal with such cases, we insist that the number of gaps within a model word is at least G, in which case the above inequality is transformed into gaps (c) ⱕ 2· length (c) gaps(m)·[gaps(m) ⱖ G] ⫹ G·[gaps(m) ⬍ G] length(m)
(3)
where the threshold value G is related to the length of the model word. Alternative gap metrics, as well as methods of explicit word segmentation based on gap statistics, have been considered in the literature [17–19]. The Treatment of Stroke Artifacts. Many documents contain certain inked non-word elements that have been added after the document’s creation. These include underlining/emphasis lines, cross-overs, long horizontal lines (e.g. occurring after scanning when the original document was written on ruled paper), etc. Some of them are also due to imperfections of the scanning and binarisation operations. Due to the properties of the distance measure described later, the similarity between a word and a long horizontally oriented singular stroke can be often very high, and it is desirable that such ‘words’ are rejected early on in the recognition process. The approach implemented here utilises the transitioncount features of the model and candidate words. Namely, for the case of singular strokes it can be expected that most of the inked pixel columns will contain exactly two inkbackground transitions, whereas handwritten words usually exhibit a higher number of ink-background transitions per pixel column. Let M denote the average number of transitions per inked pixel column of a model word. If the average number of ink-background transitions C per inked pixel column is such that the inequality C ⫺ 2 ⬍ 0.8·(M ⫺ 2) is satisfied, the candidate is rejected. The constant factor of 0.8 has been chosen experimentally. The operation of subtracting 2 from C and M takes away the bias, since for all inked pixel columns the number ink-background transitions is at least 2.
3. TRANSITIONAL FEATURES 3.1. An Alternative Representation of PixelColumn Sequences
Many features of handwriting have been proposed in the literature and successfully used in practical systems [4]. The approach taken in this work is based on the observation that the image of a handwritten word can be interpreted as an ordered sequence of pixel columns. The ordering direction corresponds to the direction of writing, and can be loosely associated with the time axis. Although the pixel columns comprising the image were not inked strictly in the order in which they appear in the sequence, it is still possible to take advantage of the embedded timing information associated with that sequence. Since the pixel columns of a word are binary patterns, each can be fully encoded by recording the ink-background transitions. This provides a form of run-length encoding used in certain data-compression methods [20]. In this work, to ease the computational burden of the word-spotting task, we chose a set of simpler one-dimensional features derived from this complete representation. Other applications of the
A Line-Oriented Approach to Word Spotting
transitional features were proposed by Mohamed and Gader [21] and Gader et al [22]. 3.2. Profile-Oriented Features
For each pixel column we reduced the complete transitional representation to a set of three features: the top-most transition position, the bottom-most transition position and the count of the number of ink-background transitions. Figure 6 illustrates the process of feature extraction. When pixel columns contain no ink (i.e. they represent gaps), the profile values are set to the average value of the non-gap profile elements within the word. Several alternative gap assignments had only marginal influence on the overall recognition performance. The use of upper and lower profiles (understood as external outlines or object boundaries) of words and characters has been suggested by several authors [23,24], and utilised as features in recognition systems. Affine Matching of Profile Features. The upper and lower profiles capture the essential outline of a word, but can be adversely affected by line segmentation artifacts. The transition-count feature, on the other hand, is more resistant to noise and can be interpreted as a crude measure of word-image complexity. The one-dimensional feature profiles represent patterns (or scalar sequences) that can be more easily compared, in a pair-wise fashion, given the model
159
and candidate words. In the following section, we describe the procedure of profile matching, as well as the similarity measure used. A profile represents essentially a one-dimensional discrete function defined over a finite sequence corresponding to the pixel-columns of a word. Assuming that two such profiles have equal length, their distance can be computed by means of the Euclidean metric, or any other suitable distance measure. However, before the distance computation takes place, certain normalisation steps are necessary so that the comparison is meaningful. Initially, the profile of a candidate word is equal in length to the profile of a model, so length normalisation is not necessary. Subsequently, the profile of a candidate has its mean removed (such operation is initially performed on the model profile, too), and is vertically scaled such that its variance is brought to the level of the variance of the model profile. Given such transformed profiles, it is possible to estimate their relative vertical shift and (usually small) rotation angle. The estimated values are then used to transform to candidate profile for better alignment with the model. Recall that the initial candidate was extracted using a window of length corresponding to that of the model. However, the actual candidate word may be shorter or longer than this value, reducing the validity of the comparison. To overcome this problem the candidate extraction and feature normalisation is repeated once the candidate–model horizontal scale difference is estimated. Application of the scale-2 constraint in this process may lead to candidate rejection. The above normalising steps represent a set of affine transformations (i.e. translation, rotation, as well as horizontal and vertical scaling) that are used to match a candidate profile to the model template. Once the alignment is performed, the inter-profile distance is calculated as d (m, c) ⫽
冘
T
兩mt ⫺ ct兩
t⫽1
T
(4)
where mt is the tth element of the model profile, ct is the tth element of the candidate profile, and T is the common length of both sequences. The L1 distance was chosen because of its higher robustness (i.e. resistance to outliers) when compared to the Euclidean distance. The process of candidate normalisation and distance calculation is performed for the upper and lower profiles only. The transition-count profiles are left out in this stage, as they are less suitable for the class of affine transformations considered here. The operations within the class of affine transformations are essentially cost-free, with the exception of the horizontal scaling, which may invalidate a candidate. However, experience shows that in some cases it is desirable to associate a cost (or penalty) with certain types of transformations.
Fig. 6. Profile feature extraction for a sample word image. In the case of the upper and lower profiles, the areas corresponding to gap regions are set to the mean of non-gap profile values.
Penalising Flat Profiles. One weakness of the distancebased similarity measure is its sensitivity to profile misalignment, whereby even small relative misalignments of various parts of the profiles being compared can ultimately lead
160
A. Kołcz et al.
to large profile distances, even for clearly similar words. Additionally, if a profile is relatively flat (e.g. corresponding to a long horizontal stroke) after the normalising steps are completed, its distance to any other profile will be equivalent to a norm of the latter – and thereby small. To eliminate spurious matches to flat regions that pass through the initial selection stages, a weighting scheme is introduced penalising cases of large variability (or ‘energy’) differences between the profiles corresponding to the model and the candidate. Namely, a higher penalty weight is assigned to those inter-profile distances where the difference between the energies of the matched profiles is large. Here, the energies are computed before the variance equalisation is achieved. Thus, if em and ec correspond to the energies of the model and the candidate profiles, respectively, the penalising weight takes the form of 兩em ⫺ ec兩 w (e1,e2) ⫽ 1 ⫹ 2· em ⫹ ec
(5)
where the difference between profile energies is compared to their average. An energy e of a profile m is given by the square of its L2 norm, e.g. e(m) ⫽
1 T
冘 T
(mi)2
(6)
i⫽1
Taking the flatness penalty into account, the profile distance (4) between the model and candidate profiles (m and c, respectively) is transformed into d(m,c) ⫽
w(m,c) T
冘 T
兩mt ⫺ ct兩
(7)
3.3. The Similarity Measure
It is useful to develop a figure of merit expressing the confidence that a word has been spotted. In particular, we require that such a similarity transformation smoothly maps the distance function into the interval [0, 1]. Identical profiles should have the similarity of 1, while for ‘drastically’ different profiles, their similarity should approach 0. Many such transformations are possible. The particular choice used in this work uses an exponential function of the form
冉
p (m,c) ⫽ 1 ⫺ (1 ⫺ pu) (1 ⫺ pd)
(9)
where p is the overall ‘probability’ of a match, and pu (short for pu (m,c)) and pd (short for pd(m,c)) correspond to match probabilities according to the upper and lower profiles, respectively. Once the transition-count match probability pt(m,c) becomes available, the overall probability of a match takes the form of: p (m,c) ⫽ 1 ⫺ (1 ⫺ pu) (1 ⫺ pd) (1 ⫺ pt)
(10)
The above formula can be easily extended to incorporate additional features. We can view the pattern similarity estimates due to the individual profile features as decisions made by a group of (not necessarily independent) experts whose decisions should be combined to maximise the knowledge used in the overall similarity estimate. The approach taken here is to assume feature independence, which avoids the extensive learning needed to establish the probabilistic relationship between the ‘experts’ judgements’. Our approach is ‘training free’, but with a more elaborate methodology, one could apply the mixture-of-experts [25] techniques, according to which the similarity estimates due to the individual features are combined via a probabilistic weighted average, with the weights as well as the feature-based estimators trained on some example data sets.
t⫽1
where w(m,c) ⫽ w(em,ec). The penalty weight effectively puts a cost on the vertical scaling transformation, which equalises the variances of the model and candidate profiles. Certainly, other weighting schemes could be proposed.
p (m,c) ⫽ exp
sidered. To obtain the overall similarity measure based on the combined profile-based information we use a probabilistic analogy. Under the simplifying assumption of mutual profile independence, such a measure can be obtained in the form of
⫺
d(m,c) norm
冊
(8)
where norm denotes a normalising factor chosen according to the average norm of all candidates extracted from a document page. One interpretation of p (m,c) is that of a probability-like function that indicates the likelihood that profiles m and c correspond to the same word. This similarity is computed for each of the profiles con-
4. DYNAMIC TIME WARPING AS A PROFILE SIMILARITY MEASURE It is well known that handwriting exhibits high levels of variability, and that the rate of writing, although possessing many regular properties, changes over time, which results in a nonuniform width of characters and inter-character spacing. Therefore, linear alignment of word images or word profiles (outlined in the previous section), achieved through a uniform adjustment of horizontal scale, is likely to lead to poor character/stroke alignment of word elements, which results in high distances between words that visually appear very close. One way to overcome this problem is to use a form of nonlinear profile alignment where the horizontal scale difference is allowed to vary locally, thus accounting for the rate differences in handwriting generation. A well known dynamic-programming [10] method of nonlinear sequence alignment is Dynamic Time Warping (DTW) [11], which has been used extensively in speech recognition and other areas of sequence comparison (e.g. DNA sequence matching [26]). Many applications of this technique in the field of handwriting recognition have also been reported [27–30]. DTW represents a general method of finding optimal alignment between two sequences, such as those given by
A Line-Oriented Approach to Word Spotting
161
word profile features. Unlike uniform scaling, DTW allows local expansion/contraction of the sequences being compared, thus accounting for the variable rate of writing. For close patterns, the resulting distance between the sequences is likely to reflect their degree of similarity, whereas when the patterns in question are drastically different, constraints imposed on the DTW procedure tend to produce high distances indicating general pattern dissimilarity. 4.1. Definition of the Alignment Procedure
Given two sequences, ᑛ ⫽ {ai} (i ⫽ 1,%,M) and ᑜ ⫽ {ᑿj} (j ⫽ 1,%,N), of lengths M and N, respectively, the aim of the alignment (warping) procedure is to find two discrete index transforming functions
x : {1,%,T} → {1, %,M}
(11)
y : {1,%,T} → {1, %,N}
where {1,%,T} indexes the common time axis (e.g. T ⫽ max (M,N)), such that the distance between the transformed sequences D(ᑛ,ᑜ) ⫽
1 norm
冘 T
wt·d(x(t), y(t))
(12)
t⫽1
is minimised over all possible valid index-transforming functions x and y. Here d(i,j) denotes an appropriately defined distance metric between the ith element ai of sequence ᑛ and the jth element bj of sequence ᑜ; wt corresponds to a path-weighting term, which will be explained below. Here, norm denotes a normalising factor depending on the lengths of the sequences being warped, as well as on the particular type of the alignment procedure. The normalisation allows us to compare results obtained for sequences of different lengths. The value of norm depends upon the particular variant of the DTW algorithm, and for the results used in this work normalisation was obtained by setting norm ⫽ N ⫹ M. Since the elements of both sequences are ordered along an implicit time axis, a common requirement constraining the index functions is that of causality, stating that folding back in the sequence mapping is not allowed (i.e. both index sequences must be monotonic). This is expressed by
(i) ⱖ (j) for i ⱖ j
(13)
which has to be satisfied by both x and y. Another common constraint requires that the starting and ending points of the two sequences are aligned with each other, i.e.
x (1) ⫽ 1 x(T) ⫽ M ⫽ 1 y (T) ⫽ N
and y (1)
(14)
although this requirement may sometimes be relaxed, as will be shown later. Dynamic-Programming Solution. Given a 2-dimensional M by N array (henceforth called the association matrix) indexed by the original sequences ᑛ and ᑜ,
the time-alignment procedure is equivalent to finding an optimal path connecting the bottom-left corner (1,1) with the top-right corner (M,N) of the array. Since the overall optimality of a given path can be assessed only after the final point (M,N) is reached, the costs of all possible legal paths leading from (1,1) to (M,N) have to be computed, and the minimum-cost path is chosen as the solution offering the optimal alignment between sequences ᑛ and ᑜ. The rules of dynamic programming applied here determine that the overall path is optimal provided that each intermediate sub-path is optimal as well. Computationally, the time alignment procedure is solved by a two-pass iterative procedure. During the forward pass, for each element (i,j) of the association matrix (starting with (i,j) ⫽ (1,1)), the (minimal) cost D (ᑛi,ᑜi) of aligning the sub-sequences ᑯi ⫽ {ᑾ1,%,ᑾi} and ᑜj ⫽ {ᑿ1,%,ᑿj} is estimated, with the procedure terminating at (i,j) ⫽ (M,N). Then, during the backward pass, the optimal decisions are traced back to point (1,1) thus generating the optimal path joining the elements (1,1) and (M,N) of the array. The DTW procedure can be evaluated in polynomial time O (MN), and produces both the best path through the association matrix and the resulting distance between the sequences. When applied to profile comparison, the DTW distances obtained for each profile (i.e. upper, lower and transition count) are then transformed (as in the case of linear profile alignment) into similarity measures (8), which are used to determine the overall likelihood of a correct match (10). Local Continuity Constraints. The rules governing the transition between elements (i,j) and (i1,j1) of the association matrix are known as the local continuity constraints, different variants of which have been proposed in the literature [11,31,32]. Usually, only small differences (i1 ⫺ i) and (j1 ⫺ j) are allowed, so that no vital information of either sequence is lost. In particular, if i1 ⫺ i ⱖ 1 and
j1 ⫺ j ⱖ 1
(15)
no elements of the sequences are skipped – that is, each element of sequence ᑛ has to be associated with at least one element of sequence ᑜ, and vice versa. Association of several elements of one sequence with a single element of the other sequence represents a local contraction–expansion of the respective sequences. The local continuity rules are usually presented in a graph-like format, where the possible one-step branches leading to a given point are indicated. Figure 7 provides an example, using the local rules chosen in our experiments. The choice of local transition rules automatically restricts the region of the association matrix through which the
162
Fig. 7. Graphical illustration of the local continuity constraints chosen for the DTW implementation. An optimal sub-path terminating at (i, j) must be a direct (i.e. one-step) extension of an optimal sub-path terminating at one of the three predecessor points shown.
optimal alignment path is allowed to pass. Although such a choice affects the overall result of the DTW, there are no definite (or optimal) rules, and the choice is usually motivated by application-specific heuristics. In the least restrictive case (15), the path is free to follow any route (without back-tracking) from point (1,1) to (M,N). This may lead to undesired effects, where one of the sequences (or a large portion thereof) is mapped into a single point of the other sequence. Such behaviour is usually hard to justify and, as a result, the local transition rules are chosen in such a way that legal paths between (1,1) and (M,N) are restricted to a certain region of the association matrix, usually oriented about the main diagonal. Consequently, the overall global scale difference between the sequences compared is allowed to vary within certain limits only, which should be determined using the a priori problem-specific knowledge. Throughout this work it has been assumed that it is reasonable to expect the scale difference between model and candidate words to reside within the [0.5,2] interval, and this assumption motivated the choice of local transition rules depicted in Fig. 7. One advantage of constraining the freedom of optimal path choice is the reduction of the computational cost of the DTW procedure, as only a portion of the association-matrix elements has to be evaluated (1/3 in this case). Apart from the choice of local continuity constraints, other methods of globally restricting the legal region of alignment paths are also possible (e.g. through delimiting the legal-path region with a set of geometric constructs, such as straight lines); see Rabiner and Juang [11] for more details. Dealing with Candidate-Position Uncertainty. As no explicit word segmentation is performed in the candidate selection process, it is often the case that the candidate window does not exactly correspond to a complete handwritten word. The window boundaries may extend over the blank inter-word area or even overlap the neighbouring
A. Kołcz et al.
words. Consequently, the requirement of the DTW that the starting and ending points of the model and candidate words should always be associated with each other, as in Eq. (14), is too strict and should be relaxed. A common way to deal with this problem is to allow one of the sequences to skip a number of initial/final elements, thus introducing uncertainty margins. We chose an uncertainty margin of half the length of an average character (estimated) for the starting-point association and a full length of an average character for the ending-point association. The larger value used in the latter case was dictated by the fact that the end-point of the candidate-selection window is determined by estimating the scale difference between the candidate and the model, with the process being affected by the length of a model word. By taking these relaxed end-point constraints into account, the impact of candidate-selection errors can be reduced and the legal region for optimal paths is broadened, as illustrated in Fig. 8. Path Weighting. The local rules usually allow a small fixed configuration of allowable transitions relative to a given starting point. Each of the allowed moves corresponds to a differently oriented branch connecting the elements (i,j) and (i1,j1) of the association matrix, and there is usually a cost associated with it. The path cost has a multiplicative effect on the distance between the groups of sequence elements associated with each other (i.e. the wt coefficient in Eq. (12)). There are many different schemes of path weighting [31], the simplest of which corresponds to assigning an equal cost to each path. Several choices of path weighting were evaluated in this work, all leading to similar results in word-spotting performance. The equal-weight scheme was chosen as least arbitrary. 4.2. Extensions of DTW
The dynamic programming approach used in DTW is quite general, and its principles can also be applied when one of the sequences in question has a symbolic form, where a sequence corresponding to one of the feature profiles could be matched to a sequence of symbolic characters constituing a keyword (provided that such symbolic information is available). In such a case, the matching process would determine the segmentation of the profile into sub-sequences, each mapped onto one of the symbolic characters. This approach to sequence matching involves creation of the Hidden Markov Models (HMMs) [11], of which many successful applications have been reported in the literature [21,33]. We do not consider them here, since the HMM approach involves building a probabilistic model of each possible character (or a sub-character element), which requires large quantities of training data and computationally expensive learning procedures. Rather, our interest lies in developing a word-spotting system based on visual similarity only.
A Line-Oriented Approach to Word Spotting
163
Fig. 8. Left: illustration of the global constraints determining the legal region of global alignment paths within the association matrix. The region’s boundaries depend on the local continuity constraints (see Fig. 7), as well as on the uncertainty margins chosen. Right: superposition of the optimal paths actually selected during one of the word-spotting experiments.
5. APPLICATION OF DTW TO PROFILE SIMILARITY ESTIMATION Application of the DTW procedure to calculate inter-profile distances between model and candidate words is computationally feasible only when the number of comparisons is moderate. In the case where many pixel columns along a line image are treated as potential starting points of matching candidate words, the process becomes computationally too expensive to be used in practice, at least without the help of special-purpose hardware. To achieve a compromise allowing an effective use of the DTW methodology, a two-step approach was adopted. In the first stage a number of most likely match candidates was chosen using linear profile alignment. The more accurate method of similarity calculation via DTW was performed in the second stage. The first stage can be viewed as a pre-filtering step, where most unlikely candidates are eliminated, while the second stage offers more accurate discrimination within the set of candidates that are more ‘similar’ to the model. Although it is possible that the less-accurate linear alignment used in the first stage may bar some legitimate candidates from entering the second stage, the experimental results show that the two-stage approach performs well in practice. Below, we outline the processing steps involved. 5.1. Initial Candidate Selection
The set of affine transformations (including linear horizontal alignment) is applied to the upper and lower profiles of each candidate along the line. The calculated inter-profile distances are combined using Eq. (9) to give the estimated similarity measure between the candidate word and the model. Thus, for each pixel position along the line, a similarity value is obtained, indicating if a word starting at
that position is a likely candidate for match. Pixel positions rejected early on in the process have the similarity of zero. Thereby, a plot of the similarity function is obtained (see Fig. 9), where peaks correspond to the most likely starting points of match candidates. We additionally require that the peak corresponding to a valid candidate should be a local maximum within the window corresponding to half the length of the model word. In this way, spurious matches can be eliminated (within the assumed scaling constraints), since if a candidate is valid, it must be at least that long. Because the word-spotting procedure is carried out using a set of model words, the set of candidates for each document line is determined according to each model separately. Due to the inter-model similarity, these candidates tend to be redundant, and are subsequently combined by choosing the most likely candidate within each cluster. The clusters correspond to candidate positions distant from each other by no more than half of the average model length. In this way, only a few candidates are selected for each line of handwriting, and the processing can be passed to the DTW module, which calculates the distance between the selected candidates and the models. 5.2. Similarity Calculation
For the pre-filtered set of candidates the DTW procedure is applied to compute the inter-profile distances to the model words for each of the feature profiles considered (i.e. for the upper and lower word outlines, as well as the transitioncount features). The obtained distances are then transformed into the similarity values and combined using formulas (8) and (10), respectively. For each candidate, its similarity to the model set is chosen as the maximum similarity value between the candidate and any of the model words, i.e.
164
A. Kołcz et al.
Fig. 9. Plot (middle and bottom) of the similarity function corresponding to a word-spotting search within a three-line document (top). The target word of gobernador is indicated by a grey area. The middle set of lines corresponds to matching according to the upper profiles only, whereas the bottom set represents the results of matching according to the lower profiles only. Sharp peaks of the similarity function at the target word position (approx. 500) can be seen for both of the profiles considered.
p(c,ᏹ) ⫽ max p(c,m) m苸ᏹ
(16)
where c and m are the candidate and model words, respectively, and ᏹ represents the model set. In a practical word-spotting application, a threshold value
should then be chosen, such that candidates whose similarity to the model set is above the threshold are labelled as successful matches and are passed on to the user. In this study, however, we are more interested in the recognition performance as a function of the threshold, which can be visualised as a variant of the Receiver Operating Characteristics (ROC) curve [34], where the success rate of the procedure (measured by the percentage of correctly identified matches) is plotted with respect to the number of false alarms (i.e. the incorrectly identified match candidates).
6. WORD-SPOTTING EXPERIMENTS 6.1. Document Pre-processing Fig. 10. A set of models of the word gobernador.
All experiments utilised handwritten document pages belonging to the Archive of the Indies. The set of scanned
A Line-Oriented Approach to Word Spotting
165
Fig. 11. The ROC curves corresponding to the individual experiments. Left: the success-rate vs. false-alarms rate plots obtained for matching according to the individual profile features. Right: results of combining the individual profile information. Logarithmic scale is used for better presentation clarity.
166
document pages used exhibits degradation typical of old manuscripts and, as an initial stage in processing, the basic operations of noise removal (via morphological operators) followed by binarisation were performed. 6.2. Results
To assess the recognition performance of this word-spotting procedure, a number of experiments were carried out using a system implemented as a combination of C⫹⫹ and Matlab code on a Sun Ultra-1 system. In each experiment, a document consisting of several handwritten pages was searched for the presence of a particular keyword, using a set of examples as models. Since the accuracy of a modelbased search procedure is naturally dependent on the availability of the models, the search keywords were selected based on sufficient presence in the data set. In particular, we noticed that the phrase . . .gobernador dela provincia del peru. . . (governor of the province of Peru) was used quite often, since many of these documents represent letters written by the governor or on his behalf. Consequently, the words gobernador, provincia and peru were chosen as the search targets. The fourth target was given by the word pizarro, representing the name of one of the governors. For each target keyword, the document to be searched consisted of approximately 13 pages (most of which contained at least one instance of the target keyword), while the model set contained approximately 19 examples. The model words were manually selected from a set of pages separate from the 13-page test document; the details are shown in Table 1. To illustrate the scope of stylistic variations encountered in the archive, Fig. 10 shows some of the models used in the experiments with the word gobernador. Cases of identical model and target images were excluded from the matching process. The ROC curves for the experiments are given in Fig. 11. In each case, the success rate vs. false-alarms rate was plotted for the word-spotting performance according to each of the profiles individually, as well for the case of probabilistic combination of the profile-based results. As can be seen, the probabilistic combination of the profile-related similarity measures leads, in most cases, to a significant improvement of the recognition performance with respect to the individual performances of the contributing profiles. The results for words of similar length (i.e. gobernador, provincia and pizarro) are comparable, with the performance levels generally decreasing for shorter words (i.e. peru). Such behaviour is to be expected, as shorter profiles possess lower complexity and the chances of an erroneous match are higher. It seems that some instances of the target words are hard to recognise with the relatively small model sets used here, since the examples may not capture the essential stylistic variability necessary for reliable classification. With increasing user tolerance to false alarms, the recognition can be increased. However, from a practical point of view, a user may be satisfied even with moderate performance levels (e.g. approx. 40%) with no false alarms, as long as a significant percentage of the target words is identified.
A. Kołcz et al.
Table 1. The size of data and model sets used in the wordspotting experiments for the target words considered Target
#pages/#targets
#models
gobernador provincia pizarro peru
13/15 13/15 13/13 13/11
19 19 19 11
7. CONCLUSIONS We have demonstrated the utility of an approach to word spotting in handwritten documents based on the principles of shape similarity. The approach taken in this work showed that satisfactory performance levels can be achieved with a simple profile-based feature set and relatively few model examples, which should facilitate the application of such a method in practical systems. The use of holistic features makes our method effective even if the lexical structure of the handwriting is poorly understood or ambiguous, as is the case with the old Spanish texts used in our experiments. We used a line-oriented search strategy, where each document page is treated as a sequence of lines of text, each of which, in turn, is represented by an ordered sequence of pixel columns. Under this interpretation, the whole document (or even the whole archive) can be seen as a stream of data, which can be processed in a unified way, taking advantage of methods and tools available for sequence and time-series processing. In particular, the embedded time axis associated with each line of text makes it justifiable to use methods, such as dynamic time warping, that allow us to find an optimal alignment between two time-ordered sequences. One limitation of our approach is related to the potential lack of availability of a sufficient number of models, which may be encountered in practice. More research should be done on addressing this problem, possibly by extending the model space by performing a set of plausible deformations on the word models that are available.
References 1. Kuo S, Agazzi OE. Keyword spotting in poorly printed documents using pseudo 2-d hidden Markov models. IEEE Transactions on Pattern Analysis and Machine Intelligence 1994; 16(8):842–848 2. Casey RG, Lecolinet E. A survey of methods and strategies in character segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 1996; 18(7):690–706 3. Lu Y, Shridhar M. Character segmentation in handwritten words – an overview. Pattern Recognition 1996; 29(1):77–96 4. Impedovo S. Fundamentals in Handwriting Recognition. Springer-Verlag, 1994 5. Parisse C. Global word shape processing in off-line recognition of handwriting. IEEE Transactions on Pattern Analysis and Machine Intelligence 1996; 18(4):460–464 6. Govindaraju V, Krishnamurthy RK. Holistic handwritten word
A Line-Oriented Approach to Word Spotting
7.
8. 9. 10. 11. 12.
13.
14.
15.
16.
17. 18.
19.
20. 21.
22.
23. 24. 25. 26.
recognition using temporal features derived from off-line images. Pattern Recognition Letters 1996; 17:537–540 Geiger D, Gupta A, Costa LA, Vlontzos J. Dynamic programming for detecting, tracking and matching deformable contours. IEEE Transactions on Pattern Analysis and Machine Intelligence 1995; 17(3):294–302 Del Bimbo A, Pala P. Visual image retrieval by elastic matching of user sketches. IEEE Transactions on Pattern Analysis and Machine Intelligence 1997; 19(2):121–132 Markowitz J. Keyword spotting in speech. AI Expert 1994; 9(10):20–25 Bellman RE. Dynamic Programming. Princeton University Press, 1957 Rabiner L, Juang BJ. Fundamentals of Speech Recognition. Prentice Hall, 1993 Rohlicek JR, Jeanrenaud P, Ng K, Gish H, Musicus B, Siu M. Phonetic training and language modeling for word spotting. Proceedings of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, New York, CA, 1993: 459–462 Rohlicek JR, Russell W, Roukos S, Gish H. Continuous hidden markov modeling for speaker-independent word spotting. Proceedings of the 1989 IEEE International Conference on Acoustics, Speech, and Signal Processing, New York, CA, 1989: 627–630 Seneff S, Lau R, Meng H. ANGIE: a new framework for speech analysis based on morpho-phonological modelling. Proceedings of Fourth International Conference on Spoken Language Processing, ICSLP -96, New York, CA, 1996: 110–113 Abuhaiba I, Datta S, Holt M. Line extraction and stroke ordering of text pages. Proceedings of the Third International Conference on Document Analysis and Recognition, Los Alamitos, CA, 1995; 390–393 Likforman-Sulem L, Hanimyan A, Faure C. A Hough based algorithm for extracting text lines in handwritten documents. Proceedings of the Third International Conference on Document Analysis and Recognition, Los Alamitos, CA, 1995; 774–777 Seni G, Cohen E. External word segmentation of off-line handwritten text lines. Pattern Recognition 1994; 27(1):41–52 Mahadevan U, Nagabushnam RC. Gap metrics for word separation in handwritten lines. Proceedings of the Third International Conference on Document Analysis and Recognition, Los Alamitos, CA, 1995: 124–127 Mahadevan U, Srihari SN. Hypothesis generation for word separation in handwritten lines. In: AC Daunton, S Impedovo (eds), Progress in Handwriting Recognition. World Scientific, 1997: 511–514 Jayant NS, Noll P. Digital Coding of Waveforms. PrenticeHall, 1984 Mohamed AD, Gader PD. Handwritten word recognition using segmentation-free hidden markov modeling and segmentationbased dynamic programming techniques. IEEE Transactions on Pattern Analysis and Machine Intelligence 1996; 18:548–554 Gader PD, Mohamed M, Chiang JH. Handwritten word recognition with character and inter-character neural networks. IEEE Transactions on Systems, Man, and Cybernetics – Part B 1997; 27:158–164 Chen LH. Lexicon driven word recognition. Proceedings of the Third International Conference on Document Analysis and Recognition, Los Alamitos, CA, 1995: 919–922 Vincent N, Bouletreau V, Emptoz H. A fractal analysis of handwritten texts. In: AC Downton, S Impedovo (eds), Progress in Handwriting Recognition. World Scientific, 1997: 581–586 Jordan MI, Jacobs RA. Hierarchical mixtures of experts and the EM algorithm. Neural Computation 1994; 6:181–214 Sankoff D, Kruskal JB (eds) Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, 1983
167 27. Martens R, Claesen L. Dynamic programming optimisation for on-line signature verification. Proceedings of the Fourth International Conference on Document Analysis and Recognition, Los Alamitos, CA, 1997: 653–660 28. Ney H. A comparative study of two search strategies for connected word recognition: Dynamic programming and heuristic search. IEEE Transactions on Pattern Analysis and Machine Intelligence 1992; 14(5):586–595 29. Parizeau M, Lorette G. A comparative analysis of regional correlation, dynamic time warping and skeletal tree matching for signature verification. IEEE Transactions on Pattern Analysis and Machine Intelligence 1990; 12(7):710–717 30. Tappert CC, Suen CY, Wakahara T. The state of the art in on-line handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 1990; 12(8):787–805 31. Myers C, Rabiner LR, Rosenberg AE. Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Transaction on Acoustics Speech and Signal Processing 1980; 28(6):623–635 32. Sakoe H, Chiba S. Dynamic programming optimization for spoken word recognition. IEEE Transactions on Acoustics Speech and Signal Processing 1978; 26(1):43–49 33. Kim G, Govindaraju V. A lexicon driven approach to handwritten word recognition for real-time applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 1997; 19(4):366–378 34. Selin I. Detection Theory. Princeton University Press, 1965
Aleksander Kołcz received his PhD in electronics from the University of Manchester Institute of Science and Technology (UMIST) in 1996. Since 1996 he has been with the University of Colorado at Colorado Springs, where he works as a post-doctoral research associate in the Department of Electrical and Computer Engineering. His research interests are centered around intelligent and adaptive systems, as well as theoretical and application-based aspects of neural networks. He is a member of the IEEE.
Joshua Alspector received his PhD from Massachusetts Institute of Technology in 1971. In 1978 he joined Bell Laboratories/Bellcore, where he worked in the fields of integrated circuit technology and design, neural networks and information technology. There he invented the first neural network learning chip. He managed a research group in adaptive systems and was a principal investigator in the DARPA neural network program. In 1995, he became the El Pomar endowed chair in information technology in the electrical and computer engineering department at the University of Colorado at Colorado Springs. His current interests are in models and applications of neural networks and emerging applications of information technology. He is a fellow of the IEEE.
Marijke F. Augusteijn received her undergraduate degree in physics from the Technical University of Delft, Delft, The Netherlands, in 1970 and her PhD degree in physics from Ohio University, Athens, in 1980. She continued her study in computer science and received the MS degree from the University of Wisconsin-Madison in 1984. She is currently an Associate Professor in the Computer Science Department at the University of Colorado at Colorado Springs. Her research interests include artificial intelligence, neural networks, image processing, computer vision and pattern recognition.
Robert Carlson received his BS degree in mathematics in 1973 from MIT, and his PhD in mathematics in 1977 from UCLA. He was an instructor in the mathematics department of the University of Utah from 1977 until 1980. From 1981–1990 he worked for the Hughes Aircraft Company. Dr Carlson is currently a faculty member in the mathematics department of the University of Colorado at Colorado Springs, a position he has held since 1990. His main research interests are differential equations and computer vision.
168
George Viorel Popescu received his ME/BE from the ‘POLITEHNICA’ University of Bucharest. He is currently a PhD candidate in Human-Machine Interface Laboratory at Rutgers University. Previously, he conducted research at the University of Colorado at Colorado Springs in the area of handwriting recognition. His research interests include human computer interaction systems, image processing and virtual reality. His current work focuses on multimodal communication in collaborative multiuser environments and networked virtual reality systems.
Correspondence and offprint requests to: A. Kołcz, Electrical and Computer Engineering Department, University of Colorado at Colorado Springs, 1420 Austin Bluffs Pkwy, Colorado Springs, CO 80918, USA. Email: ark얀eas.uccs.edu
A. Kołcz et al.
SYMBOLS m c d(m,c) e w p M, N, T
model (image or profile) candidate (image or profile) distance measure energy of a profile weight/penalty probability of a match an index mapping function (in dynamic time warping) sequence lengths