Online Composite Sketchy Shape Recognition Based on Bayesian Networks Zhengxing Sun, Lisha Zhang, and Bin Zhang State Key Lab for Novel Software Technology, Nanjing University, PR China, 210093
[email protected]
Abstract. This paper presents a novel approach for online multi-strokes composite sketchy shape recognition based on Bayesian Networks. By means of the definition of a double-level Bayesian networks, a classifier is designed to model the intrinsic temporal orders among the strokes effectively, where a sketchy shape is modeled with the relationships not only between a stroke and its neighbouring strokes, but also between a stroke and all of its subsequence.. The drawing-style tree is then adopted to capture the users’ accustomed drawing styles and simplify the training and recognition of Bayesian network classifier. The experiments prove both effectiveness and efficiency of the proposed method.
1 Introduction As sketching can be to help us convey ideas and guide our thinking process both by aiding short-term memory and by helping to make abstract problems more concrete in the graphic computing [1], numerous researchers have been working on the subject of online sketch recognition for many years. It is true that some prototype systems come into being in online sketch recognition either as a natural input modality [2][3][4] or to recognize the composite sketchy shapes [5][6]. However, sketching is usually informal, inconsistent and ambiguous. Most of the existing methods are either only applicable to single-stroke sketches (such as rectangles and circles) [4] or limited to their computational complexity and high sensitivity to the segmentation process [5][6]. More important, this brings on the poor effect and efficiency of sketch recognition engines, especially for multi-strokes sketchy shapes and newly added users. Therefore, adaptive sketchy shape recognition is anticipant, where recognition engine should be trainable and adaptable to a particular user’s drawing styles, and have a priori probabilities of the sketchy shape classes for each user [7]. Obviously, one solution for adaptive sketch recognition is to construct appropriate classifiers based on machine learning to learn users’ drawing styles. As strokes are natural conceptual elements of freehand drawings, modeling sketchy shapes with strokes is important for classifiers. The conventional approaches convert a drawing input into local feature vectors, assume independences between them, and use statistical or structural classifiers such as Support Vector Machine (SVM) and Hidden Markov models (HMM) in classifying them. In our previous researches, we have developed a stroke classification method based on SVM [7][8], where a modified L. Jiao et al. (Eds.): ICNC 2006, Part II, LNCS 4222, pp. 506 – 515, 2006. © Springer-Verlag Berlin Heidelberg 2006
Online Composite Sketchy Shape Recognition Based on Bayesian Networks
507
turning function is used to model sketchy shapes, and virtual strokes are introduces to link continuous strokes orderly and translate the multi-strokes sketchy shapes into a single stroke. Recently, HMM has been used for online sketch recognition [8][10], where the drawing patterns are treated as the result of a stochastic process that is governed by a hidden stochastic model and identified according to its probability of generating the output, inspired by its success in speech recognition and handwriting recognition. However, the dimension of feature vectors in SVM classifier must be fixed for all shapes and the number of states in HMM classifier depends heavily on stabilization of drawing styles. Moreover, relationships between strokes cannot felicitously be modeled (In fact, stroke relationships are indispensable to discriminate between sketches of similar pen-movements, and only adjacent relationships between temporal-sequence strokes are considered). These limit their capability to capture users’ drawing styles. In this paper, we propose to explicitly model drawing sketches with strokes and their relationships in a probabilistic framework. Bayesian networks, a well-known framework for modeling dependencies, are adopted to represent drawing sketch models whose nodes correspond to strokes and whose arcs their dependencies. A decision tree is then defined to model and capture users’ drawing styles of special symbols or shapes. The rest of the paper is organized as follows. In section 2, the framework and its principle of our Bayesian Networks framework for adaptive online sketch recognition is introduced in detail. In section 3, some experimental results are evaluated and conclusions are given in the final section.
2 Bayesian Networks Framework for Sketch Recognition A Bayesian network [11] is a graph with probabilities for representing random variables and their dependencies. It efficiently encodes the joint probability distribution of a large set of variables. Its nodes represent random variables and its arcs represent dependencies between random variables with conditional probabilities at nodes. It is a Directed-Acyclic Graph (DAG) so that all edges are directed and there is no cycle when edge directions are followed. Using a Bayesian Network offers many advantages over traditional methods of determining causal relationships [11]. Independence among variables is easy to recognize and isolate while conditional relationships are clearly delimited by a directed graph edge: two variables are independent if all the paths between them are blocked. Not all the joint probabilities need to be calculated to make a decision; extraneous branches and relationships can be ignored. By optimizing the graph, every node can be shown to have at most k parents. The algorithmic routines required can then be run in O(2kn) instead of O(2n) time. In essence, the algorithm can run in linear time (based on the number of edges) instead of exponential time (based on the number of parameters). 2.1 Bayesian Networks Structure for Sketch Recognition There have been few researches using Bayesian network for online graphics recognition. Cho and Kim [12] have proposed a Bayesian network framework for explicitly modeling components and their relationships of Korean Hangul characters. A character is modeled with hierarchical components: a syllable model, grapheme models, stroke
508
Z. Sun, L. Zhang, and B. Zhang
models and point models. However, it is not enough for adaptive sketch recognition to model drawing styles with a point model represented by a 2-D Gaussian for point positions on X-Y plane. Alvarado et al [13] have described a model for dynamically constructing Bayesian network to represent varying hypotheses for the task of sketch recognition and allow both stroke data and contextual data to influence the probability of an interpretation for the users’ strokes. However, they put their emphasis upon drawing semantic description and inference for sketch understanding to assist sketch recognition, not for adaptive sketch recognition (Their sketch recognition process is not actually brought into Bayesian network framework). For adaptive online sketch recognition, we design a Bayesian Network structure as shown in Fig. 1. C
Shape Class
Strokes
Features
S1
S2
F1
F2
S3
F3
Fig. 1. Bayesian Networks Structure for Adaptive Sketch Recognition
We consider three types of random variables: two discrete variables - the shape classes C and stroke types S, one continue variable - feature vectors F. Accordingly, all nodes in our Bayesian Network structure are arranged in three level: only one node in top level is the shape class, nodes in middle level are several strokes constituted a sketchy shape and nodes in the bottom level are some multi-dimension feature vectors correspond to each of strokes. In Fig. 1, node C indicates a type of sketchy shape, nodes S1 , S 2 , S3 represent respectively the first, second and third stroke drawn by user for a sketchy shape C, and nodes F1 , F2 , F3 are feature vectors respectively corresponding to the first, second and third stroke. In order to simplify Bayesian Network structure, the directed arcs in the structure link each of nodes in keeping with three regulations relative to the speciality of freehand drawing: (1). Each of strokes is only relative to a sketchy shape and all previous inputting strokes; (2). Each of feature vectors for every stroke is only relative to a sketchy shape and all previous inputting strokes; (3). There are no relationships between the feature vectors corresponding to every stroke. As shown in Fig. 1, there are directed arcs between node C and other nodes because all of them are composed of a sketchy shape. According to regulations, there are directed arcs between each pair of nodes S1 ~ S2 , S1 ~ S3 , S2 ~ S3 , S1 ~ F1 ,
S1 ~ F2 , S2 ~ F2 , S1 ~ F3 , S2 ~ F3 , S3 ~ F3 respectively, and no arcs between nodes F1 , F2 , F3 .
Online Composite Sketchy Shape Recognition Based on Bayesian Networks
509
2.2 Maximum Posteriori Probability Estimation According to our definition of Bayesian Network, adaptive sketch recognition can be seen as the problem of solution of maximum posteriori probability. Supports there are m classes of sketchy shapes to be recognized and a sketchy shape is composed of n number of strokes, the posteriori probability can be calculated in terms of Bayes expressions as follow:
P (C S1 , S 2 ,", S n , F1 , F2 ," Fn ) =
P (C , S1 , S 2 ,", S n , F1 , F2 ," Fn ) P (S1 , S 2 ,", S n , F1 , F2 ," Fn )
(1)
Because P(S1, S2 ,", Sn , F1, F2 ,"Fn ) is independent on the shape classes C, the problem of maximum posteriori probability means to maximize the joint probability P(C, S1, S2 ,", Sn , F1, F2 ,"Fn ) . The sketch recognition process can then be described as to find a solution to maximum joint probability as follow:
CR = arg max P(C j , S1 , S2 ,", Sn , F1 , F2 ,", Fn ), j = 1,2,", m.
(2)
In a Bayesian network, the joint probability of a set of random variables
{x1 , x 2 , " , x n } is calculated by the multiplication of local conditional probabilities
of all the nodes. Let a node x i denotes the random variable x i , and par (xi ) denote the parent nodes of xi , from which dependency arcs come to the node xi . Then, the joint probability of {x1 , x2 ," , xn } is given as follows: n
P ( x1 , x 2 ,..., x n ) = ∏ P ( xi | par ( xi ))
(3)
i =1
Therefore, we can gain:
P(C j , S1 , S 2 ,..., Sn , F1 , F2 ,..., Fn ) = P(C j )* ∏ P(Si | par(Si ) * ∏ P( Fi | par( Fi )) n
n
i =1
i =1
= P(C j )* ∏ P(Si | C j , S1 , S2 ,...Si−1 ) * ∏ P( F i| C j , S1 , S 2 ,..., Si−1 , Si ) n
i =1
= P(C j )* ∏ n
i =1
n
P(C j , S1 , S 2 ,..., Si−1 , Si ) P(C j , S1 , S2 ,..., Si−1 )
(4)
i =1 n
*∏ P( Fi | C j , S1 , S 2 ,..., Si −1 , Si ) i =1
Where, P( F i| C j , S1 , S 2 ,..., S i−1 , S i ) is the conditional probability between continuous random variables with high order dependencies and P( Si | C j , S1 , S2 ,...Si−1 ) is the conditional probability between discrete random variables. We suppose that the feature vectors of a drawing satisfy the random variables Gaussian distributions, that is to say, P( Fi | C j , S1, S2 ,...,Si −1, Si ) ~ N (μ, Σ) . Therefore,
510
Z. Sun, L. Zhang, and B. Zhang
the conditional probability P( F i | C j , S1 , S 2 ,..., S i −1 , S i ) can be modeled by conditional Gaussian distributions using following formula:
p ( x) =
1 (2π )
d /2
|Σ|
1/ 2
1 exp[− ( x − μ )' Σ −1 ( x − μ )] 2
(5)
where, x is the feature vectors of random variables, d is the dimension of feature vectors, |Σ| is the determinant value of covariance matrix Σ of feature vectors Σ Σ-1 is the converse matrix of covariance matrix Σ. The conditional probability P(Si | C j , S1 , S2 ,...Si−1 ) can be estimated by the fre-
,
quency distribution in sample space as follow:
P (S i | C j , S1 , S 2 ,..., S i −1 ) = where
P (C j , S1 , S 2 ,..., S i )
P (C j , S1 , S 2 ,..., S i −1 )
=
Count (C j , S1 , S 2 ,..., S i )
Count (C j , S1 , S 2 ,..., S i −1 )
(6)
, Count(C , S , S ,..., S ) is the total number of samples which is composed j
1
i −1
2
of a series of strokes S1 , S 2 ,..., S i −1 and belongs to the shape class C j ,
Count(C j , S1 , S2 ,..., Si−1 , Si ) is the total number of samples which is composed of a
series of strokes S1 , S 2 ,..., S i −1 , S i .
( )
The priori probability P C j can be estimated simply by the frequency distribution in sample space as follow
P(C j ) = Count (C j ) Count ( AllSamples)
(7)
( )
Where, Count C j is the total number of samples belonged to the shape class C j
and Count ( AllSamples ) is the total number of samples. 2.3 Drawing Style Modeling
Theoretically, the conditional probabilities are computable. However, they require a high computing complexity and a large of posteriori probability matrix, for example, P(Si | C j , S1 , S2 ,...Si−1 ) requires reasoning M n cases and (N1 × N2 ×"× Nn ) -dimensions matrix for M classes of shape with n number of strokes. In fact, the posteriori probability matrix is sparse for a specific user, because his/her drawing styles (including the number and type of strokes, the temporal order of primitives and so on) for a shape is relatively fixed though the possible drawing styles of a shape may be diverse [7]. Therefore, we define a particular data structure to model drawing styles (the probability value in the posteriori probability matrix for sketch recognition is nonzero) of a shape for a specific user, as shown in Fig. 2(a), named as drawing-style tree (DS-tree) to be used as the model of Bayesian classifier training and recognition. A drawing-style tree is hierarchical according to the drawing sequences of strokes and enumerates all accustomed drawing styles/manner of a shape for a specific user, where the root represents a class of shape, every children node indicates the addition of a stroke and the attributes of drawing till current stroke,
Online Composite Sketchy Shape Recognition Based on Bayesian Networks
511
every directed arc represents the successive relationships between two strokes, each of level lists all possible type of strokes in a drawing step and each of branches represents a possible drawing style/sequence consisted of a series of strokes. Fig. 2(b) shows an example of a user accustomed drawing styles/manner of electronic diode symbol in term of drawing-style tree for a specific user. Root First Stroke Second Stroke Third Stroke
(a) Definition of drawing-style tree
(b) A drawing style tree of diode symbol
Fig. 2. Illustration of Drawing-Style Tree
The attributes of drawing in each node include the type of current stroke and its corresponding feature vectors, the list of previous strokes and their corresponding feature vectors, the statistical access frequency of drawing style along with a branch from root to current node (that is, the value of Count C j , S1 , S 2 ,..., S i−1 , Si ), as well
(
)
as the mean vector μ and the covariance matrix Σ of feature vectors corresponding to current drawing. Based on definition of drawing-style tree, the task of Bayesian Networks classifier training to establish a drawing-style tree for each of shape classes by calculating and updating iteratively the value of Count C j , S1 , S 2 ,..., S i and the parameters μ and Σ
(
)
of the corresponding node in term of the collected samples. For every stroke of sample, if there is node in every level of the drawing-style tree so that the type of the current stroke and the list of previous strokes at this node are same as that of sample, increment the number of statistical access frequency of drawing style or the value of Count C j , S1 , S 2 ,..., Si by one, add the feature vectors of this sample to node and
(
)
update the parameters μ and Σ of feature vectors of drawing in this node. If not, create a new node in current level of the drawing-style tree, calculate the attributes of drawing in this novel node according to the stroke of sample and set the statistical access frequency of drawing style as one. After established of drawing-style tree, the adaptive sketch recognition based on Bayesian Networks classifier for each of users’ online drawing can be describe as follow. For each of shape classes, the following probabilities are calculated: (a) the priori probability P C j by means of equation (7) for root node, (b) the conditional
( )
probability P(F i| C j , S1 , S2 ,...,Si ) according to equation (5) and the conditional probability P(Si | C j , S1, S2 ,...Si ) according to equation (6) for each of nodes, (c) the joint probability P(C j , S1, S2 ,...,Sn , F1 , F2 ,..., Fn ) according to equation (4) for each of
512
Z. Sun, L. Zhang, and B. Zhang
branches in a drawing-style tree, and (d) the maximal joint probability P(Si ) = P C j × P(C j , S1 , S2 ,...,Sn , F1 , F2 ,...,Fn ) for each of shape classes. For all shape
( )
classes, some candidate drawing styles or candidate recognition results can then be sorted by the magnitude of the maximal joint probability P(Si ) of each of classes.
3 Experimental Results and Evaluation In our experiments, we select 10 categories of typical symbols from the Chinese Electric Symbols Standard (GBT4728&GBT5465), which are most commonly used in circuit diagramming, as shown in Fig. 3(a). Some of them are simple in structure and others relatively complicated. We design two experiments to validate the effects of our proposed method in contrast with our two previous method: SVM classifier [7][8] and HMM classifier [10]. All experiments are run on Microsoft Windows XP with 1.4 GHz Intel CPU and 512MB memory).
Fig. 3. Symbols and strokes used in our experiments
The goal of first experiment is to compare the convergent performances of three classifiers under different sample sets. We collect 1000 samples for each of the 10 graphical symbols, which are divided into a training set and a testing set with the proportion of 7 to 3. We do the experiment six times, with the total samples of 100, 200, 400, 600, 800 and 1000, respectively. Thus, in all these experiments, the sizes of training samples are in turn 70, 140, 280, 420, 560 and 700. All samples are represented as the modified turning function [7] with 20-dimensional features. Fig. 4 shows the recognition precisions of three classifiers training under different sizes of sample sets in our first experiment and Fig. 5 shows the training time of three classifiers under different sizes of sample sets in our first experiment.
Fig. 4. Recognition precision of three classifiers under different size of samples
Online Composite Sketchy Shape Recognition Based on Bayesian Networks
(a) Bayesian Classifier
(b) SVM Classifier
513
(c) HMM Classifier
Fig. 5. Training time of three classifiers under different sizes of sample sets
From Fig. 4 we can see that the recognition precision of both HMM and Bayesian Networks classifiers reaches a convergence of perfect precision at a small sample sets (For HMM classifiers, the recognition precision reaches 97% at 400 samples, for Bayesian Networks classifier, it reaches 99% at 100 samples) while the recognition precision of SVM classifiers is poor. That is, both HMM and Bayesian Networks classifiers are more suitable for multi-stroke symbols recognition than SVM classifier. From Fig. 5 we can see that the training time of SVM and Bayesian Networks classifiers is acceptable while that of HMM classifier is tremendous. The second experiment is designed to verify the multi-users adaptability of three classifiers under different feature representations. We invite two users to draw 400 and two users to draw 500 samples for every graphical symbol, where three draw in only one manner they prefer themselves while one draws in several manners so that there are more than one kind of combinations of strokes in a symbol’ samples by using some types of strokes as shown in Fig. 3(b). The samples of each user are also divided into training samples and testing samples according to the proportion of 7 to 3. All samples are represented as the modified turning function [7] and the composite feature vectors [10] respectively. Table 1 lists the recognition precision of three classifiers for four users under two types of feature representation. From Table 1, we can see that the recognition precision of both HMM and Bayesian Networks classifiers is perfect for multi-users adaptation and insensitive to feature representation while the recognition precision of SVM is poor and sensitive to feature representation. Table 1. Average Recognition Precisions of three classifiers for multi-users (%) Turning Function Feature
Composite Feature
Classifier SVM HMM Bayesian
User1
User2
User3
User4
User1
User2
User3
User4
70.27 92.53 99.27
68.99 84.20 96.82
63.17 92.83 97.16
68.25 99.00 99.42
43.30 90.07 99.53
56.08 94.73 97.82
30.30 83.42 99.67
30.44 95.75 100
As an experimental result, we can conclude that the Bayesian Networks classifier proposed in this paper is the most preferable for online adaptive multi-strokes sketch recognition both in recognition and training performances and multi-users adaptability among three classifiers, at least in our defined domain. SVM is mainly designed for distinguishing different categories and their decision regions need to be convex and are sensitive to the definition of feature spaces. This restriction limits the flexibility
514
Z. Sun, L. Zhang, and B. Zhang
and accuracy of the SVM classifiers only suitable for simple sketchy shape recognition or stroke classification [7][8]. In addition, the training of SVM classifier is actually an iterative process sensitive to feature definition [7][8]. We employ a first-order left-to-right chain for HMM classifier [515], supposed that each stroke is strongly relevant to the previous and the next. This is an appropriate HMM mode for the strictly serial drawing in sketch. In other hand, our HMM classifier for sketch recognition is sensitive to stroke orders. These may result the larger training time and unsatisfactory recognition, especially for complicated symbols. Our Bayesian Networks classifier models a sketch with the relationships not only between stroke and its neighbouring strokes, but also between a stroke and all of its subsequence. Therefore, it may be more suitable to describing multi-stroke symbols than SVM and HMM. As shown in Table 1, when the three classifiers work on user4’s samples which are drawn in more than one manner, the recognition result of Bayesian Networks classifier is much more prominent than that of HMM and SVM classifiers on every kind of features. A most salience characteristic of Bayesian Networks classifier is the interdependencies of the causal relationships among the component variables, which is also the foundation of naive Bayes’ rule. In online sketchy shape recognition, although the features extracted from the strokes can be as interdependent as possible, there are at least time and serial orders among the strokes and people always input certain strokes before others when drawing a specific symbol, which do not perfectly conform to the independence hypothesis and affect the recognition results.
4 Conclusion In this paper, we develop a method for adaptive online sketchy shape recognition based Bayesian Networks. A sketch is modeled, by means of Bayesian Networks, with the relationships not only between stroke and its neighbouring strokes, but also between a stroke and all of its subsequence. The drawing-style tree is then adopted to capture the users’ accustomed drawing styles and simplify the training and recognition of Bayesian Networks classifier. A most salience characteristic of Bayesian Networks classifier for adaptive online sketchy shape recognition is that it can effectively model the intrinsic temporal orders among the strokes. The experiments prove both effectiveness and efficiency of the proposed method.
Acknowledgement The work described in this paper is supported by the grants from “the National Natural Science Foundation of China” [Grants No. 69903006 and 60373065] and “the Program for New Century Excellent Talents in University of China” [Grant No. NCET-04-04605].
References 1. Zhengxing Sun, Jing Liu, Informal user interface for graphical computing, Lecture Notes in Computer Science, Volume 3784, 2005, Pages 675-682. 2. Landay J. A. and Myers B. A.: Sketching Interfaces: toward more human interface design. IEEE Computer, Vol. 34, No. 3, 2001, page 56-64.
Online Composite Sketchy Shape Recognition Based on Bayesian Networks
515
3. Newman M. W., James L., Hong J. I., et al: DENIM: An informal web site design tool inspired by observations of practice. HCI, Vol. 18, 2003, page 259-324. 4. Fonseca M J, Pimentel C.and Jorge J A: CALI - an online scribble recognizer for calligraphic interfaces, AAAI Spring Symposium on Sketch Understanding, AAAI Press (2002), page 51-58. 5. Chris Calhoun, Thomas F S, Tolga Kurtoglu, et al: Recognizing multi-stroke symbols, AAAI Spring Symposium on Sketch Understanding, AAAI Press, 2002, page 15-23. 6. Xiaogang Xu, Zhengxing Sun, et al, An online composite graphics recognition approach based on matching of spatial relation graphs. IIDAR, Vol. 7, No.1, 2004, Pages 44-55. 7. Zhengxing Sun, Wenyin Liu, Binbin Peng, et al, User adaptation for online sketchy shape recognition, Lecture Notes in Computer Science, Volume 3088, 2004, Pages 303-314. 8. Zhengxing Sun, Lisha Zhang and Enyi Tang, An incremental learning algorithm based on SVM for online sketchy shape recognition, Lecture Notes in Computer Science, Volume 3610, 2005, Pages 655-659. 9. Sezgin T. M. and Davis R., HMM-Based Efficient Sketch Recognition, Proceedings of the 10th international conference on IUI, Jan., 2005, San Diego, California, USA. 10. Zhengxing Sun, Wei Jiang and Jianyong Sun , Adaptive Online Multi-Stroke Sketch Recognition based on Hidden Markov Model, Lecture Notes in Artificial Intelligences, Volume 3784, 2005, Pages 948-957.. 11. Friedman N, Geiger D and Goldszmidt M, Bayesian network classifiers, Machine learning, Kluwer Academic Publishers, Hingham, USA, Vol. 29, Issue 2-3, 1997, page 131-163. 12. Cho Sung-Jung and Kim Jin H., Bayesian network modeling of Hangul characters for online handwriting recognition, Proceedings of IDAR2003, page 207-211. 13. Alvarado. C and Davis R., Dynamically Constructed Bayesian Networks for Sketch Understanding, Proceedings of IJCAI-05, Edinburgh, Scotland.