This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON CYBERNETICS
1
A Genetic-Algorithm-Based Explicit Description of Object Contour and its Ability to Facilitate Recognition Hui Wei and Xue-song Tang
Abstract—Shape representation is an extremely important and longstanding problem in the field of pattern recognition. Closed contour, which refers to shape contour, plays a crucial role in the comparison of shapes. Because shape contour is the most stable, distinguishable, and invariable feature of an object, it is useful to incorporate it into the recognition process. This paper proposes a method based on genetic algorithms. The proposed method can be used to identify the most common contour fragments, which can be used to represent the contours of a shape category. The common fragments clarify the particular logics included in the contours. This paper shows that the explicit representation of the shape contour contributes significantly to shape representation and object recognition. Index Terms—Genetic algorithm (GA), object recognition, shape representation.
I. I NTRODUCTION BJECT recognition can be based on objects shapes. This method is effective because shape has geometric features that are invariant to color, translation, rotation, and scaling. Classical shape representations are divided into two categories: 1) contour-based [1]–[4] and 2) region-based [5]–[7]. Both categories use shape descriptors to represent shape features. In the latest study, shape borders are represented by a vector set in polar coordinates. This representation has been proven to be effective for assessments of shape similarity [8]. In a previous study, shape features were analyzed using a small-world model [9]. In the methods based on neural vision, shape edges are represented by a series of directed shortline segments [10]–[12]. These approaches may be merely a static description of shapes without representing the structural logic of the contours. The relationships among the contour fragments were weak, so the geometric meaning of the descriptor was arbitrary. For very complex objects, such representations can be quite complex and time-consuming to
O
Manuscript received April 18, 2014; revised July 31, 2014, September 27, 2014, and October 20, 2014; accepted October 21, 2014. This paper was recommended by Associate Editor Q. Zhao. H. Wei is with the Laboratory of Cognitive Algorithm and Model, Shanghai Key Laboratory of Intelligent Information Processing, Department of Computer Science, Fudan University, Shanghai 201203, China (e-mail:
[email protected]). X.-S. Tang is with Laboratory of Cognitive Algorithm and Model, Shanghai Key Laboratory of Data Science, Department of Computer Science, Fudan University, Shanghai 201203, China. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCYB.2014.2376939
produce. If a set of frequent contour segments (FCSs) can be obtained for a certain shape category, and if this set explicitly describes the structure of contour, it will facilitate the progress of conceptual definition of a shape category. The set of FCSs of shape contour is not only an explicit description but also clearly points out the geometric constraints that need to be satisfied for the shapes to be placed in a certain category. These fragments reflect the essence of geometric structure of shapes so it is stable and resistant to interference. These features of the FCSs are extremely valuable for recognition and concept-based top-down processing. The study of neural vision has shown that humans undergo frequent eye movements, which take place as the organisms search for objects. Regional evidences can be collected and verified under the supervision of the existing contour structural logic. It has been proven that the primary visual cortex and the senior visual cortex are mainly responsible for the detection and representation of contour in neural vision. It is conceivable that contours can be structured based on those FCSs observed here. This paper focuses on the FCSs for describing shape contour using frequently appearing patterns. This paper treats a contour as a set of line segments. Representing shapes in line segments has been proven to aid the recognition process in human vision [10]. Fig. 1 shows a sample of recognition process on a giraffe shape class. The left side is the workflow of learning FCSs using a genetic algorithm (GA). The set of training images is first represented by short lines. Then, the GA is used to search for the fragments that are repeated and similar. Finally, the set of images is represented by a few patterns that appear frequently. The image preprocessing details can be viewed in [10]–[12]. These studies discussed the process of representing images using short lines. On the right side, the learned FCSs are used for object recognition. Those edges that are similar with the FCSs are preserved while the others are removed. Afterward, we use an effective shape descriptor method in a previous work to acquire the recognition results [13]. The basic elements used in this paper for shape representation and recognition are contour fragments, which are widely used in past studies [14]–[18]. This paper defines a simple and explicit symbolic way of representing contour fragments. Other works have also used symbolic representations [19]–[21]. One problem here is that the number of different ways of partitioning a contour can be enormous. If we assign a sequential number to each fragment in a partition, the number
c 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. 2168-2267 See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2
IEEE TRANSACTIONS ON CYBERNETICS
Fig. 1. Framework of the FCSs-based shape representation and object recognition. Left: workflow of learning FCSs. Right: learned FCSs are utilized for object recognition. TABLE I C OMPARISON OF O UR M ETHOD TO THE OTHERS
of possible sequences can be also very high. Finding desirable solutions to the problem of representing a contour across such a large search space is a typical combinational optimization problem. Here, a GA-based approach was used to search for the most frequent patterns for shape representation. GAs are classical algorithms in optimization. GAs are suitable for finding good solutions in spaces that cannot be exhausted. Furthermore, GAs have advantages over other methodologies in that they are fast, inherently parallel, and widely applicable [32]. Other combinational optimizationoriented approaches, such as those based on backtracking and simulated annealing, were not used here because GAs can process large numbers of solutions simultaneously. In this scenario, offspring compete with each other for a better solution over a number of generations. Particle swarm optimization and ant colony optimization can also be used in this case. However, their performances are inferior to those of GAs under experimental conditions. In [33], GAs are used for extracting geometric features. Approaches based on GAs, including GA-hybrid approaches, have been used in other fields of computer vision, such as image indexing [34],
pose estimation [35], image filter [36], structural pattern recognition [37], and assessments of image similarity [38]. Table I summarizes the past works of object recognition and presents the main advantages of the proposed method. The main contributions of this paper are listed below. 1) The present method defines a new form of shape representation, which uses the FCSs discovered by a GA. This representation shows the essential structural logic of a contour. 2) GAs are used and redesigned for learning the conceptual shape representation and achieving object recognition, respectively. 3) The present representation shows a symbolic description of the geometric features of contour. The proposed method contributes to the formation of conceptual definitions based on geometric structure. It can also be used for shape retrieval and shape recognition. This paper is organized as follows. Section II shows how the GAs can be redesigned to fit the present model. The experimental configuration and results are given in Section III. This paper is concluded in Section IV.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. WEI AND TANG: GENETIC-ALGORITHM-BASED EXPLICIT DESCRIPTION OF OBJECT CONTOUR AND ITS ABILITY TO FACILITATE RECOGNITION
Fig. 2.
3
Framework of shape representation and object recognition based on GAs.
II. P ROPOSED GA S AND T HEIR D ESIGNS Fig. 2 shows the systematic review of the framework of the searching strategy, where the left side shows the shape representation and the right side shows the object recognition. The middle column shows the shared flow chart of the GAs. The explicit conceptual definition of the geometrical features can be concluded from the FCSs, as shown in the left part of the figure. By using the conceptual definition, shape recognition can be performed effectively in environments containing complex scenes, as shown in the right part of Fig. 2. A. Chromosome Representation In this paper, chromosome representation was designed based on the short-line representation of the image, which had already been vectorized. The objective of the GA is to identify the desired frequent patterns of the contour of an object. However, the number of line segments had to represent the contour of an image can reach hundreds. This means that there is an extremely high number of possible sequences. To solve this problem, a partition strategy was used to construct the initial chromosomes. For a set of short-line segments S = {s0 , s1 , . . . , sn−1 }, it is initially clustered into P = {p0 , p1 , . . . , pk−1 }, where k Ta 0 else.
(3)
C. Crossover In this paper, the traditional crossover operation is redesigned to realize crossover between two sequential genes based on different partitions of the same contour. Fig. 5 shows a crossover operation. The first rows of the left and right columns are two different partitions for the same contour, and the number aligned by each gene is the corresponding sequential number. Note that the size of the genes can vary across different chromosomes. This paper addresses the problem of chromosome representation with variable gene lengths. The red segments represent the region in which crossover takes place. The region is chosen at random. The size of region is limited to between 1/10 and 1/5 of the length of the entire contour. The genes that include the chosen region are operated for crossover, as shown in the second row. However, the short lines contained in the sets of genes from two chromosomes can be different, which makes it possible for the line segments to be found in either two genes or none after the operation, as shown in the third row. To deal with this problem, the original genes not involved in crossover are given priority. Line segments that belong to two genes are deleted from the gene that is not involved in crossover. Line segments that belong to no gene were added to the nearest gene involved in the operation. In a word, only the breaking points inside the region were exchanged between two chromosomes. As shown in the fourth row, the geometric representations of the chromosome offspring were determined. The exchanged genes were found to carry the sequential information of the ancestors, which could conflict in the offspring.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 6
IEEE TRANSACTIONS ON CYBERNETICS
Algorithm 2: Component-Oriented Mutation Input : C = {g0 , g1 , ....gk−1 } Output: Cn Get the short segment sets P = {p0 , p1 , ....pk−1 }; Choose a random si and pi where si ∈ pi ; Find pj that is the nearest neighbor of pi according to the distance from si ; 1 2 1 Split pi into 1pi and pi , where si ∈ pi ; n pj ← pj pi ; pni ← p2i ; Pn ← {p0 , p1 , .pni ..pnj .pk−1 }; Process Pn to get Cn ;
D. Mutation
Fig. 5.
Demonstrative sample of crossover operation.
To solve this problem, the sequential numbers of the genes in offspring were reassigned. As shown in the fifth row of Fig. 5, the set of genes for crossover of the left chromosome is Cc1 = {g2 , g5 }. Its sequential numbers can be reassigned n = {gn , gn }. The remaining set of genes apart from the as Cc1 1 2 genes for crossover of the right one is Cr2 = {g1 , g3 , g5 , g7 }. n and C Here, two sets, Cc1 r2 are combined, and all possible combinations of the sequential assignments are placed in a set V. Then, the one with the highest fitness, as determined using (4), was set as the new chromosome for the offspring. In the situation described above, the following is true: V = {{g1 , gn1 , g3 , gn2 , g5 , g7 }, {g1 , g3 , gn1 , g5 , gn2 , g7 }, {g1 , gn1 , g3 , g5 , gn2 , g7 }}: Cn = argmaxC∈V (fitness(C)).
(4)
In (4), Cn is the chromosome in the next generation and fitness is the function introduced in (1) in the previous section. In this sense, the new sequential order of the offspring can be obtained, as shown in the last two rows of Fig. 5. Note that the crossover operates on the sets of short lines. Then, the DCE algorithm can be used to consolidate the short-line segments into a long line set. The drawing primitive approximation can then be processed.
In the initial population, the chromosomes are randomly generated. Those randomly generated chromosomes must be updated for the partitions of the contour and for the reasonableness of sequential numbers of their genes. In order to meet these two requirements, two types of mutation operations were designed to cope with each requirement: componentoriented mutation and sequence-oriented mutation. First, the final evolved individual should be represented by a reasonable combination of multiplex patterns. The component-oriented mutation is designed to change the geometric partition of the short-line set. In detail, a random short-line segment s is chosen. The gene containing s is here called g1 . Then the algorithm locates its nearest gene, g2 , and splits g1 into two subsets. Then the subset near g2 is combined with g2 and removed from g1 . Both genes are modified, but the sequential numbers of them remain the same. Component-oriented mutation changes the combination of short lines, but the sequential order of the fragments is not changed. Sequence-oriented mutation is designed mainly for identification of the superior sequential logic for contour sequence. A simple method is used to exchange the sequential numbers of two selected genes. This operation changes the sequential numbers of the genes without modifying the partition of the original contour. It is expected that the two genes selected for exchange of sequential numbers would be geometrically different from each other. Exchange of genes that are geometrically similar is only slightly helpful to the optimization of the entire sequential logic of the contour. In contrast, exchanging two quite different genes could greatly contribute to the production of a good chromosome. The geometric similarity between two genes can be determined using the classical cyclic string matching algorithm, as the same in [4]. Here, the similarity is based on the long line set after the DCE. The detailed algorithms of the two operations are displayed in Algorithms 2 and 3. Fig. 6 shows samples of the two mutations. E. Split and Merge In addition to the traditional operations of crossover and mutation, two additional operations were designed in this paper. The first operation is called split. The size of short
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. WEI AND TANG: GENETIC-ALGORITHM-BASED EXPLICIT DESCRIPTION OF OBJECT CONTOUR AND ITS ABILITY TO FACILITATE RECOGNITION
Algorithm 3: Sequence-Oriented Mutation Input : C = {g0 , g1 , ....gk−1 } Output: Cn Choose a random gi ; Choose gj according to a normalized probability: S (p ,pi ) ; Prob(gi , gj ) = σp∈Pb Saj(p,p i) where Sb (pj , pi ) is the cyclic string matching results for pi and pj ; Swap the sequential position of two genes; gni ← gj ; gnj ← gi ; Cn ← {g0 , g1 , ..gni ..gnj ....gk−1 };
Fig. 6. mutation.
Left: component-oriented mutation. Right: sequence-oriented
line set of each gene in the chromosomes of initial population is arbitrary due to the random partition. It is reasonable to give genes with longer lengths higher probabilities of splitting. It is conceivable that partitions with average-sized genes are preferred for identifying frequent fragments from the contour. The sequential alignment of the two genes by split operation is consistent with the mechanism used in crossover, where the chromosome of alignment with the highest fitness is chosen as offspring. Algorithm 4 shows the details of the operation. For genes with little similarity that are located near each other, a merge operation is designed to join two into one. The purpose of doing so is similar to that of the split operation. Under most conditions, smaller genes are not desired for the evolutionary process, so the merge operation can join them into their adjacent genes. The probability of operation is related to the sum of lengths of two merging genes. The shorter the sum, the higher the probability. The problem of sequential alignment must still be addressed. The alignment with the highest fitness is kept for the next generation as described above. The details of the merge algorithm are shown in Algorithm 5. Fig. 7 is two demonstrations of the two operations. F. Termination Criteria The termination criteria for the shape representation and object recognition are different. For the former, the GA ceases
7
Algorithm 4: Split Input : C = {g0 , g1 , ....gk−1 } Output: Cn Get P = {p0 , p1 , ....pk−1 } from C; for i = 0; i < k − 1; i + + do Calculate the total length of short-line segments as TLi ← m−1 j=0 l(sj ) where is the length of sj ; end Choose a pσ = {sσ0 , sσ1 , ....sσl−1 } according to the normalized probability: TLi ; Prob(pi ) = k−1 i=0
TLi
Choose a random sσj from pσ ; Split pσ into p1σ = {sσ0 , sσ1 , ....sσj } and p2σ = {sσj+1 , sσj+2 , ....sσl−1 } Process p1σ and p2σ into g1σ and g2σ ; Cn ← {g0 , g1 , .g1σ ..g2σ ..gk }; Algorithm 5: Merge Input : C = {g0 , g1 , ....gk−1 } Output: Cn Get P = {p0 , p1 , ....pk−1 } from C; for i = 0; i < k − 1; i + + do Calculate the total length of short-line segments as TLi ← m−1 j=0 l(sj ) where l(sj ) is the normalized length of sj ; end for i = 0; i < k − 1; i + + do Find the nearest neighbor pj for pi ; Calculate TNLi = TLi + TLj ; end Choose pα = {sα0 , sα1 , .....sαm−1 } according to the normalized probability: 1 − k−1 TNLi ; Prob(pi ) = k−1 β
β
i=0
TNLi ×(k−1) β
Find pβ = {s0 , s1 , .....sl−1 } is the corresponding nearest neighbor of pα ; β β Merge pα and pβ into pγ = {sα0 , .....sαm−1 , s0 , .....sl−1 }; Process pγ into gγ ; Cn ← {g0 , g1 , .gγ ..gk−2 };
when the maximum number of generations is reached, e.g., 100 generations. The average fitness normally reaches the best fitness after 65 generations. For the latter, the fitness value can fluctuate even after 100 generations. For computational consideration, the algorithm stops after a certain proportion of genes has been matched with FCSs, as indicated in (3). The specific proportion used for shape category datasets (such as MPEG-7) is 85% while the one used for object recognition datasets (such as ETHZ and INRIA horses) is 40% because the images contain complex backgrounds. III. E XPERIMENTS AND C ASE S TUDIES In the last section, the chromosome representation and genetic operations are introduced. Before the experimental
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 8
IEEE TRANSACTIONS ON CYBERNETICS
implemented in C on a PC using a dual-core Intel processor running at 2.4 GHz. A. Conceptual Representation Extraction
Fig. 7.
Left: split. Right: merge. TABLE II C ONFIGURATION OF PARAMETERS FOR GA S
results are presented, here the configuration of the GA parameters is introduced in Table II. The initial size of the partition is the initial estimation for the target shape, which here means the number of parts of a reasonable shape representation could range from 3 to 10. Under this premise, it would significantly reduce the search space. The probabilities of split and merge operations are not given by fixed numbers because genes that are either too large or too small are very common during the earlier stages of evolution. After generations of operations, the variation in genes’ sizes tends to be stable. The split and merge operations cause dramatic modification to the partitions, which in most case gives birth to offspring with low fitness. In consequence, variable probabilities with respect to the generations of evolution are designed for those two operations. The formula is given in (5). Here, Ngeneration represents the current number of generations. The probabilities are initialized with 0.5 and decreased to 0.05 after 25 generations Ngeneration , 0.45 . (5) Probsplit = Probmerge = 0.5 − min 50 In the following text, experimental tests developed to evaluate the performances of the designed GAs for shape representation and object recognition. The test platform was
This section uses the MPEG-7 Core Experiment CE-Shape-1 dataset as the test dataset. The dataset is widely used for performance evaluation in shape matching and shape retrieval. It consists of 1400 images, 70 shape categories, and 20 images per category. By using the algorithm introduced in Section II, the modularized representation for each category can be acquired after evolution. It can be speculated that the drawing sequences of the shapes in one category share similar patterns. If pattern mining can be performed on those sequences, the common disciplines indicated by the sequences can be discovered and used to produce a set of FCSs. In this way, the sequences of drawing primitives of all the shapes from the category can be summarized by these logics. Therefore, one or a few of the rewriting logics that can be drawn using those sequences can serve as explicit definitions of the given categories of shapes. The FCSs can be used for object recognition because they explicitly point out the structural conditions that need to be satisfied for a specific shape category. If the definitive principle of the category can be discovered, then the essence of conceptual extraction can be achieved. The most distinctive feature of a camel is its humps. Fig. 8 shows experiment results of the FCSs from differently shaped contours of camels. The first column shows the original images. The second column shows the partition results of the short line set in the GA and the third column shows the long line segments produced by the second column. Both of the partitions are aligned with sequential numbers. The fourth column shows the corresponding sequences of drawing primitives for the six samples. The last column shows the final expression that can cover all six samples. This expression clearly defines the structures and positions of the two adjacent humps, which can be definitely considered as a symbolic definition of the category. The converge speed of the representation extraction for the camel shapes can be viewed in Fig. 9. The overall statistics of the computational efficiency of the MPEG-7 dataset is shown in Fig. 10. Forming a formal conceptual definition of a shape category contributes to the object recognition. Objects are classified into one category because they share similar definitive components rather than in similar shapes. For instance, the shapes of octopuses in the MPEG-7 dataset are distinctly different from each other. More importantly, methods such as shape context perform poorly when they are used to assess shape similarity. However, based on the similarity of the drawing primitives of the shapes, similar contour structures can be discovered easily. After acquiring the FCSs set for each category, the contour of each image can be represented by the FCSs for shape retrieval and object recognition. Retrieval is counted as correct if it is in the same class as the query. The number of correct retrievals in the top 40 ranks was counted, including the self-match. Retrieval rate for each method is reported as a percentage of the maximum possible number of correct retrievals. We use a simple shape descriptor as same as in [13] to calculate the distances. The shape descriptor is very simple
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. WEI AND TANG: GENETIC-ALGORITHM-BASED EXPLICIT DESCRIPTION OF OBJECT CONTOUR AND ITS ABILITY TO FACILITATE RECOGNITION
9
Fig. 8. Conceptual extraction of camel shapes. The first column shows the original images from the MPEG-7 dataset. The second column shows the short-line representation. The third column shows the long-line representation. Those contour fragments are indicated by different colors. The fourth column shows the drawing primitive representations of the contours, where the notions are introduced in Section II. The last column shows the most frequent patterns of the camel shapes.
Fig. 9. Statistics for the coverage speed. The average fitness value is initially quite lower than the highest fitness but closed to it after approximately 90 generations.
and intuitive to compare sets of contour fragments, whose details can be viewed in Fig. 11. The detailed correct retrievals for the 70 categories in the MPEG-7 dataset can be viewed in Fig. 12. The recognition test is performed as the standard leave-one-out procedure, the comparison with reported results is shown in Table III. B. Case Study for Object Recognition This paper conducts experiments on the ETHZ shape classes dataset [46], which is a benchmark for the latest
Fig. 10. Statistics of computational time for the 1400 images in MPEG-7 dataset. Most of images are below 2 s, and the average time is 0.728 s.
object recognition researches. This dataset features five diverse classes (bottles, swans, mugs, giraffes, apple-logos) and contains a total of 255 images collected from the web. It is highly challenging, as the objects appear in a wide range of scales, considerable intraclass shape variation, and many images are severely cluttered, with objects comprising only a fraction of the total image area (Fig. 13). For each class, a set of FCSs is obtained from a random training set of samples containing half of the available images (there are 40 for apple-logos, 48 for bottles, 87 for giraffes, 48 for mugs, and 32 for swans).
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 10
IEEE TRANSACTIONS ON CYBERNETICS
Fig. 12. Correct retrieval per category for the MPEG-7 dataset. Good performance is achieved for complicated categories such as butterfly, cow, dog, and pocket, whose inner textures are removed after processing to be matched with a set of FCSs. In some categories, such as Device6 and Device9, shapes are not correctly identified.
TABLE III C OMPARISON OF R ETRIEVAL AND R ECOGNITION R ESULTS FOR D IFFERENT A LGORITHMS T ESTED ON THE MPEG-7 DATASET
Fig. 11. According to [13], the descriptor computes any sets of points X on the plane. Given a point A ∈ X, a shape descriptor of point A denoted SX (A) is a histogram of all triangles spanned by A and all pairs of points B, C ∈ X, where points A, B, and C must be different. To be more specific, SX (A) is a 3-D histogram of the angle BAC, and two distances AB and AC. The shape descriptor S(X) of the set X is a joint 3-D histogram of all points. Then, the similarity between X and Y is obtained by the standard histogram intersection.
Learning models from different training sets allows to evaluate the stability of the proposed learning technique. Notice that the current method does not require negative training images i.e., images not containing any instances of the class. The same training split was copied from [29]. It makes up approximately half of the positive set. The FCSs of each category can be determined by using the previously discussed GA-based method for shape representation. The results of the learned frequent patterns of the five categories are shown in Fig. 14. Fig. 13 shows the positive detections made using the FCSs learned from training images. The dataset contains largescale variations and shows some intracategory shape variability (especially in swans and giraffes). Fig. 15 shows some examples of detections that are negative and some that fail to
detect all the targets. After the images are represented by the corresponding FCSs, the detection rates can be acquired for each category based on the distances calculated by the shape descriptor. The intersection over union (IoU) in this paper is set to 0.4, which is the same as in those earlier studies. The results of the detection rates versus false positive per image (DR versus FPPI) can be viewed in Fig. 16. The results are compared to [13], [29], and [31]. These works are three of the most representative researches in recent years. The precision and coverage results are here compared to [29]. The coverage is the percentage of ground-truth boundary points recovered by the method and precision is the percentage of output points that lie on the ground-truth boundaries. The comparison can be viewed in Table IV. Although the coverage results of the current method are slightly lower than the ones in the previous study, the precision results are visibly higher than those of this previous report [29]. The results prove that the current method can precisely localize object boundaries other than boundary boxes in the test images. More importantly, we compare the computational efficiency of our method with [31] on the ETHZ dataset by using the same test platform. The results can be viewed in Fig. 17. The average time of execution for five categories was as
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. WEI AND TANG: GENETIC-ALGORITHM-BASED EXPLICIT DESCRIPTION OF OBJECT CONTOUR AND ITS ABILITY TO FACILITATE RECOGNITION
11
Fig. 13. Examples of positive detections by using the FCSs learned from training images. The first three columns of both sides are the filter results on short lines, original images, and long lines, respectively. The last column of each side is the detection results for the given models using the shape descriptor. TABLE IV C OVERAGE /P RECISION R ESULTS
Fig. 14. FCSs results for the five categories in the ETHZ dataset. Left: given hand-drawn models from the dataset. Right: learned results of each category, which are the most frequent patterns from the training set.
follows: applelogos: 4.32 s, bottles: 2.91 s, giraffes: 4.95 s, mugs: 2.43 s, and swans: 3.74 s. The proposed method is based on GAs so it has considerable advantages in computational efficiency. The open source of algorithm in [31] uses
Fig. 15. Examples of incorrect detection results. Top: examples of detections that miss some positive targets. Left: six mugs but only five of them are detected. Right: two giraffes out of three are detected. Bottom: examples of negative detections. The detected fragments are shown in red. The corresponding part of the model is shown in the corner. The top-right one of the bottom detects two objects, where one is positive and another one is negative.
a uniform test set of 127 images. The charts only indicate the comparison for 127 images. The code of the compared algorithm can be retrieved from [47].
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 12
IEEE TRANSACTIONS ON CYBERNETICS
Fig. 18. Detections in the INRIA horses dataset. Left: positive examples. Right: false positives. For each image, the dashed grid shows the long-line representation of the detected contour fragments and the bold grid shows the corresponding FCSs. Note that all horses in the dataset are heading to the right, but some of them are transformed horizontally to be heading to the left for better analysis, e.g., the left two images.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 16. Comparison of the detection performances of the current method (bold green), [29] (blue), [31] (red), and [13] (black) on the ETHZ dataset. Each plot shows the detection rate as a function of false positives per image (FPPI) under the PASCAL criterion (a detected bounding box is considered correct if it overlaps larger than 60% IoU with the ground truth bounding box). The current method outperforms the others in most cases. (a) Applelogos. (b) Bottles. (c) Giraffes. (d) Mugs. (e) Swans. (f) Methods.
The current method was also used on the INRIA horses dataset [48]. This challenging dataset consists of 170 images containing one or more horses viewed from the side and 170 images without horses. Horses appear at different scales and poses and against cluttered backgrounds. Unlike the ETHZ dataset, the INRIA horses dataset does not provide a model for the target horses. In this paper, a model from one of the images is manually depicted using the given target boundary. Past works normally used the positive images for model training. Here, the model is randomly selected, and better performance may be achieved by an improved model produced using training images. Examples of detections are shown in Fig. 18. The average recognition rate of the current
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 17. Comparison of detection time with a method from a previous study with respect to the ETHZ dataset [31]. Results show that the current method significantly outperforms the previous one for all images. (a) Applelogos. (b) Bottles. (c) Giraffes. (d) Mugs. (e) Swans. (f) Methods.
Fig. 19. Detection rates on the INRIA horses dataset using the current method and one from [48].
method on the INRIA horses dataset is 84.5%, rather than the 80.77% of [29]. The detailed statistics can be viewed in Fig. 19.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. WEI AND TANG: GENETIC-ALGORITHM-BASED EXPLICIT DESCRIPTION OF OBJECT CONTOUR AND ITS ABILITY TO FACILITATE RECOGNITION
IV. C ONCLUSION In this paper, a GA was used to allow a system to learn conceptual shape representations. Shape is the most stable and distinct feature of an object, because shape features are invariant to translation, rotation, and scale. The contours of objects are normally distinct. If a contour can be defined as a set of FCSs, a search strategy based on the set can be performed. Shape representation and object recognition can be achieved using such fragments. The proposed representation is more flexible and succinct than traditional representations that are based on chain codes. More importantly, the set of drawing primitives themselves is an explicit representation. The representation gives a declarative description of the geometrical features of a contour. The proposed method can provide a basis for acquiring symbolic rules of the distinctive features of the objects. Using the sequence of drawing primitives, the FCSs that reveal the essence of geometrical features of shape contour are easy to identify. The FCSs are ubiquitous in the shapes of one category. This paper proves that evolutional computation can contribute greatly to the solutions of combinational optimization problems in object recognition fields. In the design of genetic framework, representation of differentiated approximations of shape contour is used in gene coding, where a chromosome is defined as a set of contour fragments in short-line segments. The short lines are further approximated and used to form long lines and to draw primitives. The genetic operations are also redefined with respect to contour features. By using two different fitness function designs, explicit shape representation and object recognition can be achieved. The latter indicates the effectiveness and adaptability of the former. Because the features of contours are intuitive, the proposed method is universally meaningful for the pattern recognition. Future works may use the proposed approach to develop grammatical pattern recognition methods, and facilitate the applications in object recognition by using initiative top-down processing.
ACKNOWLEDGMENT This work was supported in part by the 973 Program under Project 2010CB327900, in part by the National Science Foundation of China project under Project 61375122 and Project 81373556, and in part by the National Twelfth Five-Year Plan for Science and Technology under Project 2012BAI37B06.
R EFERENCES [1] H. Ling and D. W. Jacobs, “Shape classification using the innerdistance,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 2, pp. 286–299, Feb. 2007. [2] E. M. Arkin, L. P. Chew, D. P. Huttenlocher, K. Kedem, and J. S. Mitchell, “An efficiently computable metric for comparing polygonal shapes,” in Proc. 1st Annu. ACM-SIAM Symp. Discrete Algorithms, San Francisco, CA, USA, 1990, pp. 129–137. [3] M. Maes, “Polygonal shape recognition using string-matching techniques,” Pattern Recognit., vol. 24, no. 5, pp. 433–440, 1991. [4] L. J. Latecki and R. Lakamper, “Shape similarity measure based on correspondence of visual parts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 10, pp. 1185–1190, Oct. 2000.
13
[5] K. Chakrabarti, M. Ortega-Binderberger, K. Porkaew, and S. Mehrotra, “Similar shape retrieval in MARS,” in Proc. IEEE Int. Conf. Multimedia Expo. (ICME), vol. 2. New York, NY, USA, 2000, pp. 709–712. [6] G. Lu and A. Sajjanhar, “Region-based shape representation and similarity measure suitable for content-based image retrieval,” Multimedia Syst., vol. 7, no. 2, pp. 165–174, 1999. [7] A. Taza and C. Y. Suen, “Discrimination of planar shapes using shape matrices,” IEEE Trans. Syst., Man, Cybern., vol. 19, no. 5, pp. 1281–1289, Sep./Oct. 1989. [8] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition using shape contexts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 4, pp. 509–522, Apr. 2002. [9] A. R. Backes, D. Casanova, and O. M. Bruno, “A complex networkbased approach for boundary shape analysis,” Pattern Recognit., vol. 42, no. 1, pp. 54–67, 2009. [10] H. Wei, Y. Ren, and Z. Wang, “A group-decision making model of orientation detection,” in Proc. IEEE Int. Joint Conf. Neural Netw. (IJCNN), Brisbane, QLD, Australia, 2012, pp. 1–8. [11] H. Wei and Y. Ren, “An orientation detection model based on fitting from multiple local hypotheses,” in Neural Information Processing. Berlin, Germany: Springer, 2012, pp. 383–391. [12] H. Wei and Y. Ren, “A mathematical model of retinal ganglion cells and its applications in image representation,” Neural Process. Lett., vol. 38, no. 2, pp. 205–226, 2013. [13] C. Lu, L. J. Latecki, N. Adluru, X. Yang, and H. Ling, “Shape guided contour grouping with particle filters,” in Proc. IEEE Int. Conf. Comput. Vis., Kyoto, Japan, 2009, pp. 2288–2295. [14] M. R. Daliri and V. Torre, “Classification of silhouettes using contour fragments,” Comput. Vis. Image Understand., vol. 113, no. 9, pp. 1017–1025, 2009. [15] M. Pawan Kumar, P. Torr, and A. Zisserman, “Extending pictorial structures for object recognition,” in Proc. Brit. Mach. Vis. Conf., Kingston, ON, Canada, 2004, pp. 789–798. [16] R. Fergus, P. Perona, and A. Zisserman, “A visual category filter for google images,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Prague, Czech Republic, 2004, pp. 242–256. [17] C. Xu and B. Kuipers, “Object detection using principal contour fragments,” in Proc. IEEE Can. Conf. Comput. Robot Vis. (CRV), St. John’s, NL, Canada, 2011, pp. 363–370. [18] J. Shotton, A. Blake, and R. Cipolla, “Multiscale categorical object recognition using contour fragments,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 7, pp. 1270–1281, Jul. 2008. [19] M. R. Daliri and V. Torre, “Robust symbolic representation for shape recognition and retrieval,” Pattern Recognit., vol. 41, no. 5, pp. 1782–1798, 2008. [20] M. R. Daliri and V. Torre, “Shape recognition based on kerneledit distance,” Comput. Vis. Image Understand., vol. 114, no. 10, pp. 1097–1103, 2010. [21] D. Guru and H. Nagendraswamy, “Symbolic representation of twodimensional shapes,” Pattern Recognit. Lett., vol. 28, no. 1, pp. 144–155, 2007. [22] P. N. Suganthan, “Structural pattern recognition using genetic algorithms,” Pattern Recognit., vol. 35, no. 9, pp. 1883–1893, 2002. [23] K.-Z. Chen, X.-W. Zhang, Z.-Y. Ou, and X.-A. Feng, “Recognition of digital curves scanned from paper drawings using genetic algorithms,” Pattern Recognit., vol. 36, no. 1, pp. 123–130, 2003. [24] G. Garai and B. Chaudhuri, “A distributed hierarchical genetic algorithm for efficient optimization and pattern matching,” Pattern Recognit., vol. 40, no. 1, pp. 212–228, 2007. [25] G. G. Yen and N. Nithianandan, “Facial feature extraction using genetic algorithm,” in Proc. IEEE Congr. Evol. Comput. (CEC), vol. 2. Honolulu, HI, USA, 2002, pp. 1895–1900. [26] E. Ozcan and C. K. Mohan, “Shape recognition using genetic algorithms,” in Proc. IEEE Int. Conf. Evol. Comput., Nagoya, Japan, 1996, pp. 411–416. [27] S. Abdel-Gaied, “Employing genetic algorithms for qualitative shapes detection,” ICGST Int. J. Graph. Vis. Image Process. (GVIP), vol. 8, no. 4, pp. 19–25, 2008. [28] P. W. Tsang, “A genetic algorithm for affine invariant recognition of object shapes from broken boundaries,” Pattern Recognit. Lett., vol. 18, no. 7, pp. 631–639, 1997. [29] V. Ferrari, F. Jurie, and C. Schmid, “From images to shape models for object detection,” Int. J. Comput. Vis., vol. 87, no. 3, pp. 284–303, 2010. [30] Q. Zhu, L. Wang, Y. Wu, and J. Shi, “Contour context selection for object detection: A set-to-set contour matching approach,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Marseille, France, 2008, pp. 774–787.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 14
[31] C. Gu, J. J. Lim, P. Arbeláez, and J. Malik, “Recognition using regions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Miami, FL, USA, 2009, pp. 1030–1037. [32] U. Maulik and S. Bandyopadhyay, “Genetic algorithm-based clustering technique,” Pattern Recognit., vol. 33, no. 9, pp. 1455–1465, 2000. [33] G. Roth and M. D. Levine, “Geometric primitive extraction using a genetic algorithm,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 16, no. 9, pp. 901–905, Sep. 1994. [34] M. Saadatmand-Tarzjan and H. A. Moghaddam, “A novel evolutionary approach for optimizing content-based image indexing algorithms,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 37, no. 1, pp. 139–153, Feb. 2007. [35] Y. K. Yu, K. H. Wong, and M. M.-Y. Chang, “Pose estimation for augmented reality applications using genetic algorithm,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 35, no. 6, pp. 1295–1301, Dec. 2005. [36] C.-S. Lee, S.-M. Guo, and C.-Y. Hsu, “Genetic-based fuzzy image filter and its application to image processing,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 35, no. 4, pp. 694–711, Aug. 2005. [37] K. Khoo and P. N. Suganthan, “Structural pattern recognition using genetic algorithms with specialized operators,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 33, no. 1, pp. 156–165, Feb. 2003. [38] Z. Stejic, E. M. Iyoda, Y. Takama, and K. Hirota, “Image similarity computation using local similarity patterns generated by genetic algorithm,” in Proc. IEEE Congr. Evol. Comput. (CEC), vol. 1. Honolulu, HI, USA, 2002, pp. 771–776. [39] K. Deb, “Multi-objective optimization,” in Multi-Objective Optimization Using Evolutionary Algorithms. Chichester, U.K.: Wiley, 2001, pp. 13–46. [40] Y. Wang, Z. Cai, G. Guo, and Y. Zhou, “Multiobjective optimization and hybrid evolutionary algorithm to solve constrained optimization problems,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 37, no. 3, pp. 560–575, Jun. 2007. [41] J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” ACM SIGMOD Rec., vol. 29, no. 2, pp. 1–12, 2000. [42] B. J. Super, “Learning chance probability functions for shape retrieval or classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshop (CVPRW), Washington, DC, USA, 2004, p. 93.
IEEE TRANSACTIONS ON CYBERNETICS
[43] B. J. Super, “Retrieval from shape databases using chance probability functions and fixed correspondence,” Int. J. Pattern Recognit. Artif. Intell., vol. 20, no. 8, pp. 1117–1137, 2006. [44] M. R. Daliri and V. Torre, “Shape recognition and retrieval using string of symbols,” in Proc. 5th IEEE Int. Conf. Mach. Learn. Appl. (ICMLA), Orlando, FL, USA, 2006, pp. 101–108. [45] E. Attalla and P. Siy, “Robust shape similarity retrieval based on contour segmentation polygonal multiresolution and elastic matching,” Pattern Recognit., vol. 38, no. 12, pp. 2229–2241, 2005. [46] V. Ferrari, T. Tuytelaars, and L. Van Gool, “Object detection by contour segment networks,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Graz, Austria, 2006, pp. 14–28. [47] (May 10, 2014). UC Berkeley Computer Vision Group—Recognition. [Online]. Available: http://www.eecs.berkeley.edu/Research/Projects/ CS/vision/shape/glam_cvpr09_v2.zip [48] V. Ferrari, F. Jurie, and C. Schmid, “Accurate object detection with deformable shape models learnt from images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Minneapolis, MN, USA, 2007, pp. 1–8.
Authors’ photographs and biographies not available at the time of publication.