Online Composite Sketchy Shape Recognition Using Dynamic Programming ZhengXing Sun, Bo Yuan, and Jianfeng Yin State Key Lab for Novel Software Technology, Nanjing University, P. R. China, 210093
[email protected]
Abstract. This paper presents a solution for online composite sketchy shape recognition. The kernel of the strategy treats both stroke segmentation and sketch recognition as an optimization problem of “fitting to a template”. A nested recursive optimization process is then designed by means of dynamic programming to do stroke segmentation and symbol recognition cooperatively by minimizing the fitting errors between inputting patterns and templates. Experimental results prove the effectiveness of the proposed method.
1 Introduction As people have been using pen and paper to express visual ideas for centuries, the most convenient, efficient and familiar way to input graphics is to draw sketches on a tablet using a digital pen, which is named as sketch-based user interface [1][2]. Owing to the fluent and lightweight nature of freehand drawing, sketch-based user interface is becoming increasingly significant for exploratory and/or creative activities in graphical computing [3]. However, the poor accuracy of online sketchy shape recognition engines is always frustrating, especially for the composite shapes and newly added users, because sketch is usually informal, inconsistent and ambiguous. In the process of composite sketchy shape recognition, two core phases should indispensably be included [4][5][6], which are respectively named as stroke segmentation and symbol recognition. Almost all of existing methods do them in a sequential manner. That is, the inputting strokes are firstly segmented into a set of geometric primitives in the phase of stroke segmentation, and the configuration of primitives is then recognized as a symbol in the phase of symbol recognition [4]. Obviously, while the recognition results in this process are dependent on both stroke segmentation and symbol recognition, the effectiveness of stroke segmentation would be premised as a precondition for the accuracy of symbol recognition. Existing research experiments do generally stroke segmentation simply with some local features of strokes such as curvature and pen speed [4][7], whereas they have put much more emphases on exploring the robust symbol recognition methods [4][5][6]. However, it could be impossible for these experiments to evaluate if the segmentation results of strokes would be acceptable to symbol recognition. On the one hand, almost all of them use empirical thresholds of local features to test the validity of an approximation of strokes. In fact, strokes of freehand drawing vary with different users, even the same user at different time. This ultimately leads to the problem of a W. Liu and J. Lladós (Eds.): GREC 2005, LNCS 3926, pp. 255 – 266, 2006. © Springer-Verlag Berlin Heidelberg 2006
256
Z.X. Sun, B. Yuan, and J. Yin
threshold being too tight for one user while too loose for another. On the other hand, all of them do stroke segmentation independently no matter what the symbol recognized might be. Actually, the user is purposeful with the intended symbol in mind when expressing his/her ideas with the particular sketchy shape based on both the current observations and the past experiences, though he/she draws one stroke after another. That is to say, stroke segmentation for any users is much more related to the ultimate intended symbols besides the local features and temporal information. This eventually results in the problem of the local features being only suited to some shapes but not to others. Accordingly, this paper will present an improved solution for online composite sketchy shape recognition, which is described as follows. As the kernel idea of our solution, we firstly regard both stroke segmentation and symbol recognition as the optimization problem of “fitting to a template”. The purpose of this strategy is twofold. On the one hand, templates can be used as both a guide to the process of stroke segmentation together with local features and an evaluation reference of an approximation of strokes. On the other hand, the process of both stroke segmentation and symbol recognition would be unified in the process of “fitting to a template”. In order to implement our proposed strategy, we define a nested recursive optimization process to do cooperatively stroke segmentation and symbol recognition by minimizing the fitting errors between inputting patterns and the defined shape models of the symbol (templates). This results in the combination of stroke segmentation and symbol recognition. Meanwhile, dynamic programming is adopted to implement this combined process and reduce the computing complexity, which is an efficient way of solving problems where you need to find the best decisions one after another [8]. The remainder of this paper is organized as follows: The related works in sketch recognition are summarized in Section 2. In Section 3, the main idea of our proposed strategy is outlined. In Section 4, details of our method will be discussed. Section 5 will present our experiments. Conclusions are given in the final Section.
2 Related Works A variety of recognition techniques have been proposed for sketch recognition. They can be mainly classified into three categories: feature-based methods, graph-based methods and machine learning methods. Feature-based methods make use of some features of sketchy shapes [2][9][10]. The benefit of feature-based approaches is that stroke segmentation is not necessary, while their drawback is that they can only recognize simple shapes, such as rectangles and circles. For example, Rubine [9] defined a gesture characterized by a set of eleven geometric attributes and two dynamic attributes. However, it is only applicable to single-stroke sketches and sensitive to drawing direction and orientation. Fonseca et al [10] proposed a method of symbol recognition using fuzzy logic based on a number of rotation invariant global features. Because their classification method relies on aggregate features of pen strokes, it might be difficult to differentiate between similar shapes. As one of the most prominent approaches to object representation and matching, graph-based methods have been recently applied to hand-drawn pattern recognition
Online Composite Sketchy Shape Recognition Using Dynamic Programming
257
problems [4][5][6]. In these methods, input patterns are first decomposed into basic geometric primitives and then assembled into a graph structure. Pattern detection is then formulated as a graph isomorphism problem. For these methods, a hypothesis must be abided that all of inputting strokes have felicitously been segmented. As a structural (or topological) representation of composite graphic objects, these methods are theoretically suitable for all composite shapes with variant complexities. The practical limitations for them are their computational complexity. Furthermore, their performance degrades drastically when applied to drawings that are heavily sketchy. Inspired by the success of machine learning methods in pattern recognition, researchers have recently took them to freehand drawing recognition. Sezgin et al [11] view the observed pattern as the result of a stochastic process governed by a hidden stochastic model, and identified the hidden stochastic model according to its probability of generating the output. In our previous researches, we have also made attempt to solve the problems of user adaptation for online sketch recognition based on machine learning method such as Support Vector Machine (SVM) [12] and Hidden Markov Model (HMM) [13]. Nevertheless, they have traditionally employed statistical learning methods where each shape is learned from a large corpus of training data. Due to their need for large training sets, these methods are not easily extensible to diverse applications and are normally useful only for the patterns they are originally trained for. In the early researches of stroke segmentation, the local curvatures of strokes were used for the detection of corner (segment) points, which usually help decompose the original stroke into basic primitives such as lines and arcs. Usually, curvature information alone is not a reliable way to determine such points. Temporal information such as pen speed has been recently explored as a means to uncover users’ intentions during sketching [7]. The speed based methods have been proven much more reliable to determine the intended segment points. Sezgin [7] and later Calhoun [4] used both curvature and speed information in a stroke to locate segment points. Saund [14] has used more perceptual context, including local features such as curvature and intersections, as well as global features such as closed paths. However, all of them use empirical thresholds to test the validity of an approximation. As an improvement, Heloise et al [15] have used dynamic programming to approximate recursively a digitized curve with a given number of line and arc segments based on templates. The main difference between their method and ours is that they put attention only to stroke segmentation for simple shapes.
3 Strategy Overview The flowchart comparison between the traditional methods and our strategy for online composite sketchy shape recognition is shown in Fig. 1, both designed to work for recognizing single isolated symbols. Both of the two processes contain four subprocesses: ink pre-processing, stroke segmentation, symbol recognition and user mediation. As the user draws, the ink pre-processing is firstly used to eliminate the noise that may come from restriction of input condition and input habits, and to normalize and uniform the positional distribution of sample points. The stroke segmentation process involves searching along the stroke for “segment points” that
258
Z.X. Sun, B. Yuan, and J. Yin
divide the stroke into different geometric primitives. The candidate segment points of each stroke are generated using both the motion of the pen and the curvature of the resulting ink [16]. Once the segment points have been identified, some geometric primitives are selected to match closely to the original ink and provide compact descriptions of the strokes that facilitate sketch recognition. Subsequently, the symbol recognition takes the geometric primitives composing the candidate symbols as input and returns some better definitions ranked by the geometric and topological similarity measure between the candidates and templates. Knowledge about the particular domain of the symbols must be used to prune the list of candidate symbols if possible. User mediation is designated for users to evaluate implicitly or explicitly the results of the symbol recognition, and to refine the recognition results if necessary [17]. Digital Ink Pre-processing
Stroke Segmentation
Inputting Drawing
Guided by only Local Features
A group of Primitives
Symbol Recognition
Symbol Templates
Ranked Symbols Related to a group of primitives
User Mediation
Recognition Results
(a) Processing flowchart of traditional methods Digital Ink Pre-processing
Inputting Drawing
Stroke Segmentation Local Features as one of guides
Candidates Primitive Set 1
Symbol Recognition
Ranked Symbols
User Mediation
Related to many groups of primitives
Recognition Results
Primitive Set 2 Symbol Templates Guide stroke segmentation
(b) Processing flowchart of our strategy
Fig. 1. Flowchart comparison between the traditional methods and our strategy
Most of traditional composite sketch recognition methods do stroke segmentation independently according to some local features of strokes no matter what the symbols recognized might be, as shown in Fig. 1(a). In these approaches, a set of primitives resulting from stroke segmentation is generally exclusive, and the results of symbol recognition are heavily sensitive to the segmentation results. This leads to a problem that a little mistake in stroke segmentation will cause a fatal error in symbol recognition. The main improvement of our strategy for composite sketchy shape recognition is that both stroke segmentation and symbol recognition are treated as the problem of fitting with template, as shown in Fig. 1(b). Besides the local features of strokes, we make use of the primitives of shapes in templates to guide stroke segmentation, where the “segment points” is detected according to the fitting errors between the primitives of segmented strokes and the primitives of symbols in templates. Some groups of primitives of segmented strokes with smaller fitting error are finally given as the candidate results. Symbol recognition can then search in terms of the fitting errors between the shape in these candidates and the symbols in templates, and return some of them with small fitting errors as recognition outputs. Because the number of the candidate results of stroke segmentation is exponential in the size of candidate segment points, an exhaustive search is obviously a poor strategy. Therefore, we design a nested recursive search process and adapt dynamic
Online Composite Sketchy Shape Recognition Using Dynamic Programming
259
programming approach to optimize the search process, where the inner do stroke segmentation and the outer do symbol recognition.
4 Sketchy Shape Recognition Based on Dynamic Programming 4.1 Sketchy Shape Representation and Fitting Error Calculation During freehand drawing, the contents what users stressed would be the spatial or topological relationships between primitives of the sketch. Accordingly, a symbol can be described as: S=(P,TP,AP,R,TR,AR), where, P is a set of primitives, TP is the types of primitives; AP is the attributes of primitives; R is a set of topological relationships between primitives; TR is the types of topologic relations, AR is the attributes of topological relationships. There are mainly two types of primitives in our researches: line and ellipse segment, while arc segment is treated as an instance of ellipse, that is: TP=(TPl,TPe). The attributes of primitives are defined by AP=(C0,C1,C2), where, for line segment, C0 and C1 are the start and end points of segment respectively and C2 = 0; for ellipse segment, C0 is the center point of the ellipse, C1 and C2 are the long and short axes respectively; for arc segment, C0 is the center point of the arc, C1 and C2 are start and end angle of arc counted in clockwise respectively. We consider six kinds of topologic relationships in our researches, including adjacency, cross, half-cross, tangency, parallelism and separation, as shown in Table 1. These relationships and their attributes can be represented as TR=(Ra,Rc,Rh,Rt,Rp,Rs) and AR=(ARa,ARc,ARh,ARt,ARp,ARs) respectively. The attributes ARa,ARc,ARh and ARt are defined by the angle between the two primitives at the common point in clockwise, as shown in Fig. 2(a) and (b). If one of the two primitives is an arc segment, the angle can be calculated in terms of a tangent line or a local chord at a common point on the arc, as shown in Fig. 2(c). The attribute ARp is defined by the ratio of the length of the superposition part of the two primitives to the average length of them as shown in Fig. 2(d). The attribute ARs sets to null. Table 1. Six kinds of topologic relationships Sign Ra Rc Rh Rt Rp Rs
Types Adjacency Cross Half-cross Tangency Parallelism Separation
Examples
α
②
Descriptions Two segments connect at least one end point. Two segments intersect each other. A segment ends at another. A segment is tangent to another. Two segments are parallel or concentric. Two segments do not intersect each other.
①
① ②
(a)
(b)
α
①
α
(c)
②
(d)
Fig. 2. Attributes definition of topological relationships between two primitives
260
Z.X. Sun, B. Yuan, and J. Yin
A general representation of a symbol is an attributed relationship graph, such as spatial relationship graph (SRG) [6]. However, graph matching is a well-known NPC problem, especially for composite sketchy shape recognition. In our research, we define an ordered topological relationship chains (OTRC) to represent the composition of a freehand drawing. An ordered topological relationship chain is a temporal ordered list of nodes, each of which records the attributes of a primitive and its relationships with its neighboring primitives in the composition of the symbol. This representation can largely reduce the computing complexity of matching, which will not limit users’ drawing orders. Furthermore, the temporal sequence can be used in user modeling to record statistically drawing orders of every type of shapes for a particular user to improve the recognition efficiency [17], as our experiments have shown that there is generally a fixed temporal order for primitive shapes when the same user draws the same shapes. In our strategy for composite sketchy shape recognition, the fitting error f(SM,T) between the inputting patterns (SM) and the templates (T) must be calculated. There are two types of fitting errors: the topological matching error fr(RSM,RT) between the inputting patterns and templates and the approximating error f(PSM,PT) to approximate the segments of stoke. We define some rules for the topological matching error fR(RSM, RT) as fallows: ⎧1, If TR1 ≠ TR2 ; ⎪(α −α ) 2π , If TR = TR ≡ R ; 1 2 a ⎪⎪ 1 2 f r (R1 , R2 ) = ⎨(α1 −α 2 ) (π 2), If TR1 = TR2 ≡ Rh , Rc , Rt ; ⎪( L − L ) (L + L ), If TR = TR ≡ R ; 1 2 1 2 p ⎪ 1 2 ⎪⎩0.5, If TR1 = TR2 ≡ Rs .
(1)
where α is the angle between two interrelated primitives and L is the length of superposed segment between two parallel primitives, as shown in Fig. 2. The approximating error f(PSM,PT) contains two parts. The first is the primitive matching error fp(PSM,PT), which is defined as a similarity measure between the segments of stoke and the primitives of symbols in the templates. We define this error as following rules: ⎧0, If TP1 = TP2 = TPl ; ⎪ f P (P1 , P2 ) = ⎨d (l + d ), If TP1 = TPe ,TP2 = TPl ; ⎪(α − β ) (α + β ), If TP = TP = TP . 1 2 e ⎩
(2)
where, d and l are the height and chord of the arc respectively, as shown in Fig. 3(a); α and β are the centric angles of the two arcs respectively, as shown in Fig. 3(b).
d
α
β
l (a)
(b)
Fig. 3. Calculation of the fitting error between two primitives
Online Composite Sketchy Shape Recognition Using Dynamic Programming
261
The second is the primitive approximating error fa(PSM) that is defines as a approximating measure to approximate segments of stoke with basic primitives, and treated as a distortion factor for the primitive matching error. Calculation of this error varies with the type of primitives used for approximating: if TP=TPl, fa(PSM) is the error of total least square fitting on the ink points; if TP=TPe, fa(PSM) is the error of elliptical fitting [18]. They are finally normalized to ranges between 0 and 1. Accordingly, the fitting error f(SM,T) between inputting patterns and template can be defined as follow: Np
( )
N p−1
f (SM,T ) = ∑ f a PSM × f p (PSM (i), PT (i)) + ∑ f r (RSM (i), RT (i)) i=1
(3)
i=1
where, NP are the number of primitives and their relationships; PSM and PT, RSM and RT are the primitives and spatial relationships between primitives in a inputting pattern and a templates used for matching respectively. 4.2 Stroke Segmentation Using Templates We regard stroke segmentation as a problem of “fitting to a template”, which is an optimization problem to select a set of segment points from the candidates of input strokes to fit to the template with minimal fitting error. Given a freehand drawing SM and a template T, SM is consisted of a sequence of strokes {Si}, each stroke contains a set of temporal ordered candidate segment points {Pij|}; a template T is represented as a set of primitives {t(i)}. The number of segment points needed to be identified is: k=NT-NS (in general, NS≤NT≤NB-1, where NT is the number of primitives of a symbol in templates, NS is the number of strokes and NB is the total numbers of ordered candidate segment points including the start and end points of strokes, NBi is the number of ordered candidate segment points for the ith stroke.). The problem of stroke segmentation using templates can then be defined as to select k segment points from the ordered candidate segment points to segment the drawing into k+1 segments such that the drawing represented by these segments is fit with several candidate symbols in the templates with minimal fitting errors. A bruteforce approach would do an exhaustive search on all the combinations of k segment points from the NB candidate segment points. However, the number of combinations is exponential to the size of NB. So, we simplify the search process by means of Dynamic Programming (DP) techniques [8]. Dynamic programming tends to break the original problem to sub-problems and chooses the best solution in the sub-problems, beginning from the smaller in size. The best solution in the bigger sub-problems is found by using the best ones of the smaller sub-problems through a retroactive formula, which connects the solutions [8]. Accordingly, we define firstly the optimal substructure for the segmentation problem. For a selected segment point, an optimal segmentation contains the optimal segmentation of the input stroke(s) up to this point. In other words, to find an optimal segmentation of a set of strokes with template T composed of NT numbers of ordered primitives, one assumes that the optimal solution for fragmenting everything up to the selected segment point with a template T{t(1), t(2),…, t(NT-1)} has been computed, and the piece from the selected segment point to the end is then fit with T{t(NT)}. A recursive solution is then designed based on the above optimal substructure. Let d(n,m,k,t) be a minimal fitting error to approximate every point up to the mth point in
262
Z.X. Sun, B. Yuan, and J. Yin
the nth stroke with template t, and let f(Sn,i,m,t(j)) be the fitting error resulting from fitting the segment from the ith point up to the mth point in the nth stroke using t(j). The best segmentation for a set of strokes with NS strokes using K segment points and a template T would thus be d(NS, NB,K,T). The recursive definition of d(n,m,k,t) is expressed as follows [15]: ⎧ n−1 ⎤ ⎪⎡ ⎪⎢∑ f (Si ,1, NBi , t (i ))⎥ + f (S n ,1, m, t (n)), if k = 0; i =1 ⎣ ⎦ ⎪ d (n, m, k , t ) = ⎨min{ f (S n , i, m, t ( NT )) + d (n, i, k −1, t ( j | j = 1,L, NT −1))}, if n = 1, k > 0; k 0. ⎪ ⎨⎪min{ f (S n , i, m, t ( NT )) + d (n, i, k −1, t ( j | j = 1,L, NT − 1))} ⎩ ⎩k k) to be the breakpoint, otherwise, the number of segment points required would exceed the number of candidate segment points available. When n>1 and k>0, in addition to checking the best breakpoint to use in Sn, the previous stroke (Sn-1) must also be checked because it is possible that the best breakpoint may lie in any of the previous strokes. Due to the optimal substructure, the optimal segmentation for the last point in the previous stroke Sn-1 is all that must be checked. For the unordered templates, we design a nested recursive search process based on dynamic programming for stroke segmentation, where the inner takes charge for all cases of the order of primitives in a template, the outer carries out all possible segmenting segments for a stroke. The algorithm has a run time complexity of O(K×NB2), where K is the number of segment points and NB is the total number of candidate segment points. The space requirement is O(K×NB) for keeping a table of solutions to the sub-problems. However, because the total number of candidate segment points is far less then of ink points of a sketchy symbol, our algorithm is much faster than the algorithm in [15]. 4.3 Sketchy Shape Recognition Using Dynamic Programming Sketchy shape recognition is actually a problem of “fitting to a template”. Similar to the recursive solution of stroke segmentation using template, we design a nested recursive process by means of dynamic programming to do sketch recognition and stroke segmentation cooperatively. That is, the inner circulation process searches all of the possible segment points by minimizing the approximating error as described in section 4.2, while the outer searches all of the possible symbols by minimizing the fitting error. Accordingly, let d(n,m,k,t,j) be a minimal fitting error to approximate every segment up to the mth segment in the nth stroke with the template t, and let f(Sn,i,m,t(u),j) be the fitting error resulting from fitting the segment from the ith point
Online Composite Sketchy Shape Recognition Using Dynamic Programming
263
up to the mth point in the nth stroke using t(u), where j is the index of primitive in the templates. The best result for a set of strokes with NS strokes using K segment points and a template T would thus be d(NS, NB,K,T,0). The recursive definition of d(n,m,k,t,j) can be then expressed as follows: d (n, m, k, t, j) =
⎧ ⎪ ⎪ f (S1 ,1, NBn , t[ NT], j), if NT = 1, k = 0; ⎪ min { f (S ,1, m, t[u], j) + d (n −1, NB , k , t − t[u],u)}, if NT > 1, k = 0; n n−1 ⎪0