in Proceedings of IJCNN2013 Final Version Available from IEEEXplore
Classifier Comparison for MSER-Based Text Classification in Scene Images Khalid Iqbal, Xu-Cheng Yin, Xuwang Yin, Hazrat Ali, and Hong-Wei Hao
Abstract— Text detection in images is an emerging area of interest with a growing motivation to researchers. Various methodologies have been developed to localize text contained in scene images. One main application of localizing scene image text is to produce a real time support to visually impaired persons. To design a real-time support platform for visually impaired persons, classification of textual information i.e. character and non-character information can provide a baseline for further research. However, the challenge exists in choosing the optimum classifier for this purpose. In this work, first, we used Maximally Stable Extremal Regions (MSERs) to detect character candidates in a scene image; then, we trained several classifiers, i.e., AdaboostM1, Bayesian Logistic Regression, Na¨ıve Bayes, and Bayes Net, to classify MSERs as characters and non-characters; and finally, we compared and analyzed the performances of these classifiers empirically. From experiments, it has been concluded that Bayesian Logistic Regression provides the better accuracy over the other three classifiers. This work argues that MSER based character candidates extraction and Bayesian Logistic Regression based text classification are two prominent and potential techniques in scene text detection.
I. I NTRODUCTION
I
N recent years, text detection in scene images [1]-[9] has been the focus of researchers, working on decision making algorithms. The aims of these algorithms can generally be classified into connected component-based and texture-based approaches. First, connected component-based techniques extract candidate regions from an image with the use of geometric constraints to eliminate non-text regions. A leading method in [7] used adaptive binarization technique to find connected components. Another recent approach in [8] is used as connected components with the help of stroke width transformation. In [9], connected components are extracted by using K-means in Fourier-Laplacian domain. The edge density and text straightness have been used to eliminate Khalid Iqbal is with Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China (email:
[email protected]). Xu-Cheng Yin is with Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China (corresponding author, phone: 8610-82371191; fax: 8610-62332873; email:
[email protected]). Xuwang Yin is with Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China (email:
[email protected]). Hazrat Ali is with Department of Communication Engineering, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China (email:
[email protected]). Hong-Wei Hao is with Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China (email:
[email protected]).
false positives. Second, texture-based techniques visualize text as a special texture differentiated from the background. Generally, features of target image regions are extracted to train a classifier. The goal of training a classifier is to differentiate between textual and non-textual information. Text in a given image is detected with the use of discrete cosine transformation by assuming horizontal and vertical text frequencies to extract features [3]. Similarly, features are collected using wavelet coefficients to classify text by support vector machines (SVM) [4]. In [5], [6], a robust text classifier is trained by feeding weak classifiers to Adaboost algorithm [5], [6]. The comparable classifiers can be trained by using Maximally Stable Extremal Region (MSER) [17] with an aim to determine the most excellent classifier. Recently, more and more researchers are interested on MSER-based text detection methods [32], [33]. MSER [17] is a method of extracting a sufficient amount of corresponding image elements that contributes to the wide-baseline matching. In addition, it has led to enhanced hi-fi resemblance and object recognition algorithms [6], [17], [23]. The key constraint of MSER is the detection of extraneous matching of image elements rather than text to classify it into (characters) and (non-characters). The wide-baseline matching of corresponding image elements detected by MSER referred to regions. In text detection task, classification is the midway between features extraction and text detection. In this perspective, a limited number of classifiers like SVM [4] and Adaboost are trained. These classifiers classify text with the use of wavelet coefficients and feeding of weak classifiers to Adaboost algorithm [5], [6]. However, classifiers like AdaboostM1 [13], Bayesian Logistic Regression [12], [14], Na¨ıve Bayes [12], [15] and Bayes Net [16] are well known for classification of data into two or multiple classes. First, AdaboostM1 is a boosting algorithm that takes training set as input and the class labels associated with the training set with an assumption of likely labels of predetermined cardinality. Second, Bayesian Logistic Regression uses Gaussian prior to avoid over fitting and predict labels for regions containing characters and non-characters. Bayesian Logistic Regression is also effective and competitive to AdaboostM1 for precise classification of characters in comparative analysis with Na¨ıve Bayes and Bayes Net. Third, a Na¨ıve Bayes classifier assumes a training dataset of MSER image regions corresponding to their labels. Lastly, Bayes Net can be represented with the use of graph based visual structures and conditional probability distributions of features. These well known classifiers are suitable for different scene images in classification of textual and non-textual information with an
objective of improved text detection. In this work, four classifiers are compared after training based on MSER detected features for scene images. The goal of training such classifiers is to reduce the intractable sample complexity. These classifiers are trained in two steps: (1) Pre-processing of MSER detected features, (2) Weka based Knowledge Flow Model. First, extracted regions of an input image are obtained with the use of features detected by MSER. These extracted image regions are resized into n x n dimensions. An adaptive threshold value is computed from the extracted region pixels to remove noise and obtain a corresponding binary image region. Each binary image region is viewed and labeled as character or non-character using ICDAR 2011 dataset [31]. The binary image region is considered as an instance with its class label. The combination of class label corresponding to their binary image region formed a training dataset. Second, knowledge flow model requires training data in attribute relation file format (.arff). Therefore, training data is prepared according to format of weka based knowledge flow model. The weka based knowledge flow model evaluates all the classifiers for their accuracy and efficiency. The remainder structure of this work is organized as follows. In section II, Maximally Stable Extremal Regions are explained. Section III presents training procedure of four well-known classifiers with their brief description. In Section IV, experimental results are compared in terms of accuracy, classification error and performance. Section IV concludes the paper, with the possible directions for future research. II. M AXIMALLY S TABLE E XTREMAL R EGIONS MSER can be used to define image regions with their outer boundary according to the intensity of a scene street image [17]. An informal explanation is as follows. Consider the entire set of possible thresholdings of a gray-level image I. The pixels of scene street image below the threshold refers to black and those above or equal to threshold refers to white. An increment or decrement in successive evolution of threshold allows detecting white or dark regions by inverting the intensity of image. A number of threshold for which the region is stable is known as the margin of region. The output of MSER detected regions are shown in Figure 1. The formal definition of the MSER concept and the basic supplementary definitions are given in Table 1. The readers are referred to [17] for further detail. A. Extension and Application of MSER MSERs has been adapted based on color gradient into agglomerative clustering after replacing intensity thresholds of color images [24]. Color objects and shapes can be tracked and described by detecting MSERs [24], [25]. MSER can, not only, be used to guide visually impaired persons for detecting characters in street scene images, but also, in image retrieval, recognition, tracking and robot navigation. Therefore, a dynamic application of MSERs in variety of current research fields permits to train more classifiers such as SVM [4] and adaboost [5], [6]. The goal of training new
Fig. 1.
MSER Detected Regions
TABLE I BACKGROUND OF P RELIMINARY C ONCEPT OF MSER Basic Concept Supplementary and Formal Definitions I is mapping, I : D ⊂ Z2 . Extremal regions are well defined if: (1) S is totally ordered (reflexive, antisymmetric and Image transitive binary relations ≤ exists), (2) An adjacency relation is defined. i.e. A ⊂ D × D Q is a contiguous subset of D. i.e. ∀ p, q ∈ Q Region there is a sequence p, a1 , a2 , a3 , ..., an , q and pAa1 , ..., ai Aai+1 , ..., an Aq. ∂Q = q ∈ D \ Q : ∃p ∈ Q : qAp.This means the boundary Region Boundary ∂Q of Q is the set of pixels being adjacent to at least one pixel of Q but ∈ /Q Q ⊂ D is a region, | ∀p ∈ Q, q ∈ ∂Q : I(p) > I(q) or Extremal Region p ∈ Q, q ∈ ∂Q : I(p) < I(q) Suppose Q1 , Q2 , ..., Qi−1 , Qi , ... be the sequence of nested extremal regions. i.e. Qi ⊂ Qi+1 . Extremal region Qi∗ MSER maximally stable iff q(i) = |Qi+4 \Qi+4 |/|Qi | has a local minimum at i∗ . 4 ∈ S is a parameter of the method.
classifiers over street images is to achieve high precision in combining surroundings to support visually impaired persons [6]. III. T RAINING C LASSIFIERS FOR CHARACTERS Classification in text detection is a mid way in localizing text in scene images. The localized text represents group of characters to form words and sentences. In this perspective, several classifiers, such as Bayesian Logistic Regression, adaboostM1, Naive Bayes and Bayes Nets, are trained over ICDAR 2011 dataset [31] as a contribution to support variety of aforementioned applied fields. A. Overview Features of an input image are detected using MSER [17] for classification of text. MSER has been performing well for detecting text regions. The detected regions are referred to Maximally Stable Extremal Regions (MSERs). The extracted features of each extremal region include the total number of regions, region centroid, and ranges of regions pixel list. These features are used to extract the pixels for each extremal region of the real street scene image as shown in Figure 2 (a
& b). The MSERs can be classified into characters and noncharacters. The Non-characters can be regarding as noise as they contribute to the unwanted set of data. Once the
Fig. 2.
detection of MSER features based region is accomplished, a combined average for each region is found as a threshold, by equation (2). Pr
i=1
Pk
j=1
Pij
rk
The intractable sample complexity can be reduced by using Na¨ıve Bayes classifier. By making a conditionally independence assumption, the number of estimated parameters are dramatically reduced from 2(2n − 1) to 2n. Na¨ıve Bayes assigns probability to each MSERs value in the labeled or target class. The resulting distribution is reduced to single probability based predicted value to the entire set of MSERs of an input image. However, Na¨ıve Bayes is more sensitive to classify inaccurate probability estimates for MSERs because of numeric prediction. [12], [15], [28] E. Bayes Net Classifier
Extracted Maximally Stable Extremal Region
¯ combined = X
D. Na¨ıve Bayes Classifier
(1)
The equation (2) computes the total of all pixels of each region to find the average values of each column. TThe pixel values are represented as Pij , while the total number of pixels in each column of an image region are given by rk . These average values are further added to compute the combined mean as a threshold value for each region. The goal is to eliminate the redundant pixels and to obtain binary image regions. The dimensions for the extremal regions are variable. Therefore, dimensions of each MSERs are resized into n x n, followed by manual labeling for character and non-character to prepare training set for the under taken classifiers. The classifiers analyzed in this wok are briefly are described in the subsequent subsections. B. Bayesian Logistic Regression Classifier Bayesian Logistic Regression is a discriminative probabilistic linear classifier and can predict MSERs labels using its training set.The predictive statistical analysis is performed under Bayes rule. The application of Bayesian Logistic Regression can be applied to classify text for two or more class labels. The most important advantage of Bayesian Logistic Regression is its dominating classification accuracy while estimating a probabilistic relationship between training set and labelled MSERs as characters or non-characters [13]. C. AdaboostM1 Classifiers The general meaning of ”boosting” is to improve the performance with an aim to reduce error of any ”weak” learning algorithm. The well known boosting algorithm is known as adaboost [13], [27]. AdaBoostM1 [13] is one of the two major versions of adaboost algorithms for binary classification problems. The boosting algorithm takes input of training set of labelled MSERs to classify image text by using a simple rule described in [13].
A Bayesian networks or Bayes Net [29] is a classifier that encodes probabilistic relationships among training set of MSERs and their labels. The Bayes Net has several advantages for data analysis in conjunction with statistical techniques. First, it encodes dependencies of training set and target class, and handles missing data entries. Second, it learns a causal relationship that can be used to gain knowledge by predicting consequences of intervention. Third, It allows to combine prior knowledge and training set. Fourth, Bayes Net avoids over fitting of data in conjunction with Bayesian statistical methods. In this perspective, Bayes Net is related to MSERs based training set and their target class. F. Knowledge Flow of Classifiers in Weka Tool Weka is an open source tool developed by [12] with a graphical interface that allows to train variety of classifier. However, well known classifiers are chosen for their comparative accuracy. Before using weka classifiers, weka tool requires to preprocess the MSERs based training set and labels or target class. The pre-processing of MSERs training set and labels of ICDAR 2011 [31] scene images include the preparation of the data set in .arf f format. A sample of ICDAR 2011 scene images are shown in Figure 3. It is obvious from Figure 1 and Figure 2 shows that the characters and non-characters information is fuzzy with lack of accurate description for localizing text as characters, words and sentences. The following steps are used to prepare input dataset based on extracted extremal regions. • Input an Image • Detetct MSER Features • Find Pixels (Pij ) image regions based on MSER features • Threshold Value=Combined Mean of Pixels based Image Regions using equation 1; • Find Binary Image Regions using Threshold Value • TrainingDataset=[Binary Regions,RegionsLabels] • WekaClassifiers(TrainingDataset) using Knowledge Flow Model After preparing the training the dataset, the comparative evaluation for the classifiers under consideration has been performed. The goal is to extract the character regions of scene images to pin point the textual information. The significance of this extraction is to smooth a trail in recognizing
Na¨ıve Bayes and Bayes Net provide competitive results for a variety of test images. The total image regions extracted from gray scale images using MSER features are dissimilar. However, Bayes Net has performed better than Na¨ıve Bayes for 11 out of 25 images to produce similar results for 7 images. Na¨ıve Bayes has proven to be quite close with slight improvement on 7 images. Consequently, Bayes Net ruined Na¨ıve Bayes. Similarly, adaboostM1 and Bayesian Logistic Regression provide accuracy on 4 images and 18 images out of 25 images respectively. Both of these classifiers perform uniformly on 3 images. As a result, the respective order, from high to low, of classifiers accuracy on classification of MSERs is Bayesian Logistic Regression, adaboostM1, Bayes Net and Na¨ıve Bayes. On the other hand, adaboostM1 exhibits poor performance only for a very small number of images than that of Na¨ıve Bayes and Bayes Net. For that reason, adaboostM1 is better classifier than Na¨ıve Bayes and Bayes Net. Ultimately, Bayesian Logistic Regression is known to be the best classifier for characters that can be useful in localizing text from street scene images for visually impaired persons [6]. Fig. 3.
ICDAR 2011 Sample Scene Images
text for reading images text to visually impaired people or mobile search location in order to recognize products as well as landmark in street scene images. The essential components for training a classifier in weka are training dataset (preprocessing steps), Class Assigner, Cross Validation, Classifiers use, Classifier Performance Evaluator and output. First, training dataset is prepared using the aforementioned preprocessing steps in attribute-relation file format 1 . This file format is an ASCII text file that expresses a list of instances sharing a set of attributes. Second, each instance represents either a character or noncharacter as labels for all extremal image region. IV. E XPERIMENTAL R ESULTS We performed several experiments on a number of ICDAR 2011 scene images to assess the effectiveness of classifiers as briefly discussed in previous section. These classifiers are trained on labeled MSERs to verify their classification accuracy, classified characters, classification error and classification performance in weka based implementation. The goal of classifying MSERs with these classifiers is to eliminate noisy or non-character regions for localizing text. The precise classification of MSERs can provide a basis to form words and sentences in text localizing process. A. Classifiers Accuracy on MSERs The training procedure for accuracy of well known classifiers on MSERs is summarized in Figure 5. All these classifiers are trained separately after extracting image regions using MSER [17]. From Figure 5, it is obvious that 1 http://www.cs.waikato.ac.nz/ml/weka/arff.html
Fig. 4.
Classifiers Accuracy on MSERs
B. Classification Error on MSERs A classifier error can be defined as the percentage complement of its accuracy. For a particular problem domain, the classifier guaranteeing the lowest classification error is declared to be the best classifier. Therefore, comparison of the classification errors for the four classifiers has been shown in Figure 6. It is obvious from Figure 6 that Na¨ıve Bayes and Bayes Net have the highest classification error than adaboostM1. Bayes Net has lower classification error than Na¨ıve Bayes on several scene images and generally higher classification error than adaboostM1. Similarly, Bayesian Logistic Regression has the lowest classification error than adaboostM1 on most of the scene images. Consequently, Bayesian Logistic Regression has the lowest classification error percentage and proven as the best classifier amongst all the four classifiers under consideration. C. Classifiers Performance on MSERs A classifier can be declared efficient only if it requires lowest processing time. The performance of the given classi-
is the ratio between total positive characters (|T P C|) and total characters (|T C|) as given in equation 2. Total number of characters (|T C|) is the sum of false negative characters (|F N C|) and total positive characters (|T P C|) classified by a classifier. P C| P recision = |T|C| (2)
Fig. 5.
Classifiers Error in Percentage
Recall is the ratio between total positive characters (|T P C|) and total sum of characters (|C|) as given in equation 3. Total sum of characters (|C|) is the total of false positive characters (|F P C|) and total positive characters (|T P C|) classified by a classifier. Recall =
TABLE II C ONFUSION M ATRIX FOR CLASSIFIERS E VALUATION Non-Characters Characters Totals Non-Characters |T N C| |F P C| |T N | Characters |F N C| |T P C| |T C| Totals |N C| |C| N
fiers is summarized in Figure 7 using knowledge flow model in Weka. We can conclude from Figure 7 that Na¨ıve Bayes classifier is the most efficient despite the fact of its higher classification error. Bayes Net has proven to be the second best classifier in comparison with adaboostM1. Bayesian Logistic Regression has proven better classifier with the lowest classification error and higher classification accuracy.
(3)
The precision and recall of each classifier is presented in Figure 7 to expose their differences on several scene images. The variation in MSERs and their labelling is due to the dynamic behavior of street images. Bayesian Logistic Regression classifier is an outstanding classifier for classifying variable length MSERs into variable length characters and non-characters. The preference given to Bayesian Logistic Regression is due to its consistent performance over different lengths of entire set MSERs from scene images containing variable number of characters. The consistent performance can be seen in Figure 7 (b) in comparison with Figure 7 (a,c and d).
(a) AdaboostM1 Classifier
(c) Naive Bayes Classifier
Fig. 7. Fig. 6.
|T P C| |T C|
(b) Bayesian Logistic Regression Classifier
(d) Bayes Net Classifier
Classifiers Precision and Recall
Performance of Classifiers on MSERs
V. C ONCLUSION AND F UTURE W ORK D. Precision and Recall A structure known as confusion matrix can be used to make decisions to evaluate classifiers. The confusion matrix has four categories as shown in Table 2. The classifiers evaluation with accuracy measure (as shown in Figure 4) can be misleading [30] in binary decision problems. Therefore, we argue the operational facts by utilizing Weka based confusion matrix [12] to measure statistical measures. Statistical measures include precision and recall. Precision and recall are two complementary measures that have inverse relationship. The precision and recall bars, as shown in Figure 7, can expose the differences among the classifiers. Precision
In this work, four classifiers are trained based on regions extracted from scene images using MSER after preprocessing. The goal was to classify MSERs into characters and non-characters for providing a support to localize text in street scene images to visually impaired people. A subset of renowned classifiers such as Bayesian Logistic Regression, adaboostM1, Na¨ıve Bayes and Bayes Net are tested over a number of ICDAR 2011 scene images for their accuracy, imprecision and performance. The major contribution is to provide a comparative analysis of classifiers that can be used to localize text. Bayesian Logistic Regression has been recognized as the best classifier in terms of accuracy and
imprecision. Na¨ıve Bayes classifier is proven efficient than the considered classifiers. However, Na¨ıve Bayes compromised accuracy with more classification error and Bayesian Logistic Regression is proven to be inefficient. At present, incompetence of a classifier cannot be compromised over precision. In future, Bayesian Logistic Regression classifier can be optimized to meet the real time application challenges especially in detecting text, to support visually impaired persons. ACKNOWLEDGEMENTS The research was supported by National Natural Science Foundation of China (61105018, 61175020). R EFERENCES [1] J. Liang, D. Doermann, and H. P. Li,(2005), Camera-based analysis of text and docu-ments: a survey. IJDAR, vol. 7, no. 2-3, pp. 84-104. [2] K. Jung, K. I. Kim, and A. K. Jain,(2004), Text information extraction in images and video: a survey. Pattern Recognition, vol. 37, no. 5, pp. 977-997. [3] Y. Zhong, H. Zhang, and A. K. Jain,(2000), Automatic caption localization in compressed video. IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 4, pp. 385-392. [4] Q. Ye, Q. Huang, W. Gao, and D. Zhao,(2005), Fast and robust text detection in images and video frames. Image Vision Comput., Vol. 23, pp. 565-576. [5] X. Chen and A. L. Yuille,(2004), Detecting and reading text in natural scenes., In Computer Vision and Pattern Recognition, Proceedings of the 2004 IEEE Computer Society, pp. II:366-373 Vol. 2. [6] X. Chen and A. L. Yuille, (2005) A time-efficient cascade for realtime object detection:With applications for the visually impaired. In Computer Vision and Pattern Recognition - Workshops, pp. 20-26. [7] S. M. Lucas, (2005),ICDAR 2005 text locating competition results., Eighth International Conference on Document Analysis and Recognition, Vol. 1, pp. 80-84. [8] B. Epshtein, E. Ofek, and Y. Wexler,(2010), Detecting text in natural scenes with stroke width transform., In Computer Vision and Pattern Recognition, pp. 2963-2970. [9] P. Shivakumara, T. Q. Phan, and C. L. Tan,(2011), A laplacian approach tomulti-oriented text detection in video. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 2, pp. 412-419. [10] D. Nister and H. Stewenius, (2006), Scalable recognition with a vocabulary tree., 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 2161-2168. [11] D. M. Chen, S. S. Tsai, V. Chandrasekhar, G. Takacs, R. Vedantham, R. Grzeszczuk, and B. Girod, (2010), Inverted Index Compression for Scalable Image Matching. In Proceedings of IEEE Data Compression Conference (DCC), Snowbird, Utah, March 2010, pp. 525-552. [12] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten (2009), The WEKA Data Mining Software: An Update. SIGKDD Explorations, Volume 11, Issue 1. [13] Y. Freund and R. E. Schapire, (1996), Experiments with a New Boosting Algorithm, In Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148-156. Morgan Kaufman, San Francisco. [14] A. Genkin, D. D. LEWIS and D. Madigan, (2007),Large-Scale Bayesian Logistic Regression for Text Categorization., American Statistical Association and the American Society for Quality Technometrics, Vol. 49, no. 3, pp. 291-304. [15] M. Hall and E. Frank, (2008),Combining naive Bayes and decision tables. In Proceedings of 21st Florida Articial Intelligence Research Society Conference, Miami, Florida. AAAI Press, pp. 318319. [16] R. Bouckaert, (2005), Bayesian Network Classifiers in Weka. Technical Report, Department of Computer Science, Waikato University, Hamilton, NZ 2005. [17] J. Matas, O. Chum, M. Urban, and T. Pajdla, (2004), Robust widebaseline stereo from maximally stable extremal regions., Image and Vision Computing, no.22, pp. 761-767. [18] L. Neumann, J. Matas, (2012), Real-Time Scene Text Localization and Recognition., 25th IEEE conference on Computer Vision and Pattern Recognition, pp.3538-3545.
[19] S. S. Tsai, D. Chen, V. Chandrasekhar, G. Takacs, N. M. Cheung, R. Vedantham, R. Grzeszczuk, and B. Girod,(2010), Mobile product recognition., In Proceedings of ACM Multimedia, Florence, Italy, 2010. [20] D. Chen, S. S. Tsai, C. H. Hsu, K. Kim, J. P. Singh, and B. Girod, Building book inventories using smartphones. in Proc. ACM Multimedia, 2010, pp. 651-654. [21] G. Takacs, Y. Xiong, R. Grzeszczuk, V. Chandrasekhar, W. Chen, L. Pulli, N. Gelfand, T. Bismpigiannis, and B. Girod, Outdoors augmented reality on mobile phone using loxel-based visual feature organization. In Proceedings of ACM Multimedia Information Retrieval, 2008, pp. 427-434. [22] G. Schroth, R. Huitl,D. Chen, M. Abu-Alqumsan, A. Al-Nuaimi, E. Steinbach, Mobile Visual Location Recognition. Signal Processing Magazine, IEEE , vol.28, no.4, pp.77-89, July 2011. [23] H. Chen, S. S. Tsai, G. Schroth, D. M. Chen., V. Chandrasekhar, G. Takacs, R. Vedantham, R. Grzeszczuk, and B. Girod, Robust text detection in natural images with edge-enhanced maximally stable extremal regions., 18th IEEE International Conference on Image Processing, 2011, pp.2609-2612. [24] P.-E. Forssen, Maximally Stable Colour Regions for Recognition and Matching, Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2007, pp. 1-8 . [25] M. Donoser , H. Bischof, Efficient Maximally Stable Extremal Region (MSER) Tracking., Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.553-560, 2006 [26] A. Jain , M. Murty, P. Flynn, Data clustering: a review., ACM Computing Surveys, vol. 31, pp. 264-323,1999 [27] Y. Freund, R. Schapire, (1997). A decision-theoretic generalization of on-line learning and an application to boosting., Journal of Computer and System Sciences, Vol. 55, pp.119-139. [28] E. Frank, L. Trigg, G. Holmes,I.H. Witten, (2000), Naive Bayes for regression., Machine Learning Vol. 41, Issue 1, pp. 515. [29] D. Heckerman, D. Geiger, D. M. Chickering (1995). Learning Bayesian networks: The combination of knowledge and statistical data., Machine Learning, Vol. 20, pp. 197-243. [30] F. Provost, T. Fawcett, R. Kohavi, (1998). The case against accuracy estimation for comparing induction algorithms., Proceeding of the 15th International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, pp. 445-453. [31] A. Shahab, F. Shafait, A. Dengel, ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images., International Conference on Document Analysis and Recognition (ICDAR), 2011 , pp.1491-1496. [32] Xuwang Yin, Xu-Cheng Yin, Hong-Wei Hao, Khalid Iqbal, Effective Text Localization in Natural Scene Images with MSER, GeometryBased Grouping and AdaBoost., International Conference on Pattern Recognition (ICPR), 2012, pp. 725-728. [33] Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, Hong-Wei Hao, Robust Text Detection in Natural Scene Images., CoRR abs/1301.2628, 2013.